# OpenAI Embeddings

(C) 2024 by [Damir Cavar](http://damir.cavar.me/)

Pulling embeddings for words and phrases from OpenAI's GPT.

    Model	~ Pages per dollar	Performance on MTEB eval	Max input
    text-embedding-3-small	62,500	62.3%	8191
    text-embedding-3-large	9,615	64.6%	8191
    text-embedding-ada-002	12,500	61.0%	8191

For this code to work you will need the `openai` Python module.

In [None]:
!pip install -U openai

In the file `secret.py` in the same folder as this notebook create the variable `openai_apikey` as a string variable and set it to your API key from OpenAI. You might have to create the file `secret.py`. 

In [1]:
import openai
from tenacity import retry, wait_random_exponential, stop_after_attempt, retry_if_not_exception_type
from secret import openai_apikey

Use either  `text-embedding-3-small` or `text-embedding-3-large` for the embeddings.

In [2]:
EMBEDDING_MODEL = "text-embedding-3-large"
EMBEDDING_CTX_LENGTH = 8191
EMBEDDING_ENCODING = 'cl100k_base'

Create a client for the OpenAI API endpoint:

In [3]:
client = openai.OpenAI(api_key=openai_apikey)

The following function will fetch the vectors for the words in the text_or_tokens string. It will wait and repeat, if the endpoint is busy or a delay is imposed by it:

In [4]:
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6), retry=retry_if_not_exception_type(openai.BadRequestError))
def get_embedding(text_or_tokens, client, model=EMBEDDING_MODEL):
	text_or_tokens = text_or_tokens.replace("\n", " ")
	return client.embeddings.create(input=text_or_tokens, model=model).data[0].embedding

Use this word list:

In [5]:
wordlist = list(set("""
cat dog fish bird
car truck bike bus
""".split()))

Loop over the word list and request the embedding vector from the OpenAI API endpoint:

In [None]:
for word in wordlist:
    try:
        embeddings = get_embedding(word, client)
        print(word, embeddings)
    except openai.BadRequestError as e:
        print(e)

(C) 2024 by [Damir Cavar](http://damir.cavar.com/)