<a href="https://colab.research.google.com/github/sudarshan-koirala/youtube-stuffs/blob/main/langchain/OpenAI_new_embeddings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# OpenAI embeddings with and without LangChain
- [New announcemnet link](https://openai.com/blog/new-embedding-models-and-api-updates)

## OpenAI New Embeddings
#### [New embeddings doc](https://platform.openai.com/docs/guides/embeddings/)

In [20]:
!pip install openai



In [2]:
# https://platform.openai.com/api-keys

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

··········


In [24]:
from openai import OpenAI
client = OpenAI()

response = client.embeddings.create(
    input="Open AI new Embeddings models is great.",
    model="text-embedding-3-small"
)

print(response.data[0].embedding[:5])

[-0.014905157499015331, -0.027476679533720016, 0.0051691289991140366, 0.0026080890092998743, -0.025682540610432625]


In [22]:
print(len(response.data[0].embedding))

1536


## OpenAI Embeddings with LangChain
## [New embeddings doc](https://platform.openai.com/docs/guides/embeddings/)

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [1]:
%%capture
!pip install langchain-openai

### Earlier, this is what we had / still have

In [4]:
text = "Open AI new Embeddings models is great."

In [6]:
# embed query
from langchain_openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-ada-002",)

embeddings = embed_model.embed_query(text)

print(len(embeddings))

1536


In [17]:
# embed document
from langchain_openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-ada-002",)

embeddings = embed_model.embed_documents([text])

print(embeddings[0][:5])

[-0.014460097826571622, 0.010717565066430538, -0.0005259705295691181, -0.030436506133245455, 0.0034633932689342997]


In [18]:
print(len(embeddings))

1


In [19]:
print(len(embeddings[0]))

1536


## Using OpenAI `text-embedding-3-large`, `text-embedding-3-small`

In [7]:
# embed query
from langchain_openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-3-large")

embeddings = embed_model.embed_query(text)



In [8]:
print(embeddings[:5])

[-0.011500772465203734, 0.024574424013190717, -0.01760469620928205, -0.017763427108804714, 0.02984140165968028]


In [9]:
print(len(embeddings))

3072


In [10]:
# embed documents
from langchain_openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-3-large")

embeddings = embed_model.embed_documents([text])



In [11]:
embeddings[0][:5]

[-0.011500772465203734,
 0.024574424013190717,
 -0.01760469620928205,
 -0.017763427108804714,
 0.02984140165968028]

In [12]:
# embed query
from langchain_openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-3-small")

embeddings = embed_model.embed_query(text)



In [13]:
print(len(embeddings))

1536


# Specify dimensions
- https://openai.com/blog/new-embedding-models-and-api-updates

Note: Make sure you have the latest OpenAI client

### Trade-off:

- Both embeddings support a novel "dimensions" parameter that lets you shorten the embeddings to trade accuracy for smaller vector sizes.
- gain the ability to use the embedding model with a data store that supports only up to limited (512, 1024) dimensions.
- sacrifice some accuracy because the reduced-dimensional embedding may not capture all the nuances present in the original higher-dimensional embedding.

In [14]:
# embed query
from langchain_openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-3-large", dimensions=1024)

embeddings = embed_model.embed_query(text)

print(len(embeddings))



1024


In [15]:
# embed query
from langchain_openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-3-large", dimensions=512)

embeddings = embed_model.embed_query(text)

print(len(embeddings))



512


# Conclusion
This enables very flexible usage. For example, when using a vector data store that only supports embeddings up to 1024 dimensions long, developers can now still use OpenAI's best embedding model `text-embedding-3-large` and specify a value of 1024 for the dimensions API parameter, which will shorten the embedding down from 3072 dimensions, **trading off some accuracy** in exchange for the smaller vector size.