# Sentence Embedding with OpenAI API

To get an embedding, send your text string to the embeddings API endpoint along with a choice of embedding model ID (e.g., text-embedding-ada-002).  
source: https://platform.openai.com/docs/guides/embeddings/what-are-embeddings <br>  
Sample response:  
<code>
    {
  "data": [
    {
      "embedding": [
        -0.006929283495992422,
        -0.005336422007530928,
        ...
        -4.547132266452536e-05,
        -0.024047505110502243
      ],
      "index": 0,
      "object": "embedding"
    }
  ],
  "model": "text-embedding-ada-002",
  "object": "list",
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}
</code>

#### Load the API key

Use <b>python-dotenv</b> library and store the key in a <b>.env</b> file.  
Do not hard code the key or you might end up loading it to shared repositories.

In [None]:
import os
from dotenv import load_dotenv
import numpy as np

load_dotenv()

# if version of openai < 1.0 use the following 2 lines of code
#import openai
#openai.api_key = os.getenv("OPENAI_API_KEY")
from openai import OpenAI
client = OpenAI()

#### Core Function

The <i>get_embedding</i> function manages the API response.  
It accesses the embedding json field.

In [None]:
def get_embedding(text, model="text-embedding-ada-002"):
    text = text.replace("\n", " ")
    #return openai.Embedding.create(input = [text], model=model)['data'][0]['embedding']
    return client.embeddings.create(input = [text], model=model).data[0].embedding

embedded_sentence = get_embedding("Embed this sentence please")

print(len(embedded_sentence))

#### Usage Example

Distance with <i>np.linalg.norm</i> to compute the distance between two vectors.



In [None]:
new_query = 'information retrieval is about extracting relevant'\
            ' content from documents'
target_embedded = get_embedding(new_query)
print(new_query)
print(np.array(target_embedded).shape)

In [None]:
queries = ['the car is on the table',
         'to assign a label to a document, you analize it through text classification',
         'when a user asks something the virutal assistant replies',
         'superman is the hero I consider to be the greatest',
         'are you hungry?',
         'analyze a document and get the content you are looking for. this is the action of retrieving information',
         'neuron',
         'information retrieval is about extracting relevant' +
         ' content from documents']

In [None]:
print('np.linalg.norm')
print()
euclidean_distances = []
for q in queries:
    query_embedded = get_embedding(q)
    dist = np.linalg.norm(np.array(target_embedded)
                          - np.array(query_embedded)) # euclidean distance
    euclidean_distances.append(dist)
    print(q)
    print(float(dist))
    print()

In [None]:
print('cosine similarity')
print()
cosine_similarities = []
for q in queries:
    query_embedded = get_embedding(q)
    cosine_similarity = np.dot(target_embedded, query_embedded)/(np.linalg.norm(target_embedded)*np.linalg.norm(query_embedded))
    cosine_similarities.append(cosine_similarity)
    print(q)
    print(float(cosine_similarity))
    print("\u03F4 = {0:0.1f}° ".format(np.arccos(cosine_similarity)*180/np.pi))
    print()