# 02 Embeddings

In this lab, we'll explore how we can bring our own data into the models used by Azure OpenAI.

We'll start as usual by initiating a connection to the Azure OpenAI service.

**NOTE**: As with previous labs, we'll use the values from the `.env` file in the root of this repository.

In [1]:
import os
from langchain.llms import AzureOpenAI
from langchain_openai import AzureChatOpenAI
from langchain.schema import HumanMessage
from dotenv import load_dotenv

# Load environment variables
if load_dotenv():
    print("Found Azure OpenAI Endpoint: " + os.getenv("AZURE_OPENAI_ENDPOINT"))
else: 
    print("No file .env found")

# Create an instance of Azure OpenAI
llm = AzureChatOpenAI(
    azure_deployment = os.getenv("AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME")
)

Found Azure OpenAI Endpoint: https://kingfisher-hack.openai.azure.com/


Let's begin by asking the AI a simple question.

In [2]:
r = llm.invoke("Tell me about the latest Deadpool movie. When was it released? What is it about?")

# Print the response
print(r.content)

As of my last update, "Deadpool 3" is the upcoming installment in the Deadpool film series. It is highly anticipated but has not been released yet, with a planned release date set for May 3, 2024. The plot details have been kept under wraps, but the film is expected to continue the story of Wade Wilson, also known as Deadpool, with Ryan Reynolds reprising his role. The film is particularly exciting for fans because it will integrate the character into the Marvel Cinematic Universe (MCU) following Disney's acquisition of 21st Century Fox. Additionally, Hugh Jackman is set to return as Wolverine, which has generated considerable buzz. For the most up-to-date information, you might want to check the latest announcements or news releases.


What do you notice about the response?

The latest "Deadpool" movie is called "Deadpool and Wolverine". Depending on the model and version you are using, it may tell you that one of the previous movies is the latest, or it may be aware of the new movie but think it hasn't been released yet.

OpenAI models are trained on a large set of data, but that happened at a specific point in time depending on the model. So, many of the models have no information about events that took place in very recent months or years.

To help the AI out, we can provide additional information. This is the same process you would follow if you want the AI to work with your own company data. The AI won't know about information that isn't publicly available, so if you want the AI to work with that information, then you'll need to get that information into the model.

The thing is, you can't actually do that. The models are pre-trained, so the only way to get more information in is to retrain the model, which is an expensive and time consuming process.

However, there *are* ways to get the AI models to work with new data. The most popular of these methods is to use *embeddings*, which we'll explore in the next sections.

## Bring Your Own Data

Langchain provides a number of useful tools, which include tools to simplify the process of working with external documents. Below, we'll use the `DirectoryLoader` which can read multiple files from a directory and the `UnstructuredMarkdownLoader` which can process files in Markdown format. We'll use these to process a bunch of markdown formatted files that contain details of movies that were released more recently.

In [4]:
import os
from langchain.schema import Document

data_dir = "data/movies"

documents = []
for file_name in os.listdir(data_dir):
    if file_name.endswith(".md"):
        with open(os.path.join(data_dir, file_name), "r") as file:
            content = file.read()
            documents.append(Document(page_content=content))

print("Manually loaded documents:", documents)

Manually loaded documents: [Document(metadata={}, page_content='# Creed III\n\n## Overview\n\n After dominating the boxing world, Adonis Creed has been thriving in both his career and family life. When a childhood friend and former boxing prodigy, Damian Anderson, resurfaces after serving a long sentence in prison, he is eager to prove that he deserves his shot in the ring. The face-off between former friends is more than just a fight. To settle the score, Adonis must put his future on the line to battle Damian — a fighter who has nothing to lose.\n\n## Details\n\n**Release Date:** 2023-03-01\n\n**Genres:** Drama, Action\n\n**Popularity:** 433.823\n\n**Vote Average:** 7.2\n\n**Keywords:** philadelphia, pennsylvania, husband wife relationship, deaf, sports, sequel, orphan, former best friend, ex-con, childhood friends, juvenile detention center, boxing, prodigy\n\n'), Document(metadata={}, page_content="# Guy Ritchie's The Covenant\n\n## Overview\n\n During the war in Afghanistan, a loc

We now have a `documents` object which contains all of the information from our markdown documents about movies.

We can use the `question_answering` chain to provide the AI with access to our documents and then ask the same question about Deadpool movies again.

In [5]:
# Question answering chain
from langchain.chains.question_answering import load_qa_chain

# Prepare the chain and the query
chain = load_qa_chain(llm)
query = "Tell me about the latest Deadpool movie. When was it released? What is it about?"

result = chain.invoke({'input_documents': documents, 'question': query})

print (result['output_text'])

stuff: https://python.langchain.com/docs/versions/migrating_chains/stuff_docs_chain
map_reduce: https://python.langchain.com/docs/versions/migrating_chains/map_reduce_chain
refine: https://python.langchain.com/docs/versions/migrating_chains/refine_chain
map_rerank: https://python.langchain.com/docs/versions/migrating_chains/map_rerank_docs_chain

See also guides on retrieval and question-answering here: https://python.langchain.com/docs/how_to/#qa-with-rag
  chain = load_qa_chain(llm)


The latest Deadpool movie, titled "Deadpool & Wolverine," was released on July 24, 2024. In this film, Wade Wilson, also known as Deadpool, finds himself leaving behind his days as a morally flexible mercenary to live a civilian life. However, when an existential threat emerges to his homeworld, he reluctantly suits up again, this time teaming up with Wolverine, who is even more reluctant. The film is known for its humor, action, and the chemistry between Ryan Reynolds (Deadpool) and Hugh Jackman (Wolverine). It's described as a blend of comedy, action, and science fiction, offering a lot of meta-commentary and entertaining fan service.


Great! The model now knows the correct details for the latest Deadpool movie.

However, there's something lurking! Let's take a look at what happened behind the scenes.

We'll do two things here. First we'll add the `verbose=True` parameter to the chain, and we'll wrap the chain execution in a callback, which will allow us to capture the number of tokens consumed.

In [6]:
# Support for callbacks
from langchain.callbacks import get_openai_callback

# Prepare the chain and the query
chain = load_qa_chain(llm, verbose=True)
query = "Tell me about the latest Deadpool movie. When was it released? What is it about?"

# Run the chain, using the callback to capture the number of tokens used
with get_openai_callback() as callback:
    chain.invoke({'input_documents': documents, 'question': query})
    total_tokens = callback.total_tokens

print(f"Total tokens used: {total_tokens}")



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Use the following pieces of context to answer the user's question. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
# Creed III

## Overview

 After dominating the boxing world, Adonis Creed has been thriving in both his career and family life. When a childhood friend and former boxing prodigy, Damian Anderson, resurfaces after serving a long sentence in prison, he is eager to prove that he deserves his shot in the ring. The face-off between former friends is more than just a fight. To settle the score, Adonis must put his future on the line to battle Damian — a fighter who has nothing to lose.

## Details

**Release Date:** 2023-03-01

**Genres:** Drama, Action

**Popularity:** 433.823

**Vote Average:** 7.2

**Keywords:** philadelphia, pennsylvania, husband wife relationship, deaf, s

In the output from the last code section, you should see a lot of information. At the end, you should see a count of the number of tokens used. You might be surprised to see that the query uses anywhere from 2,500 to 6,000 tokens, depending on the model used. That's a lot of tokens!

With the verbose option enabled, the rest of the output shows the prompt that was constructed for the query. If you scroll back through the output, you'll see that the prompt included **all** of the information from our documents, so this is why the query used so many tokens.

As we've discussed previously, AI models have a maximum number of tokens you can use and a charging model based on the number of tokens consumed. In this example, the documents are relatively small in size and there's only 20 of them, but if we wanted to work with larger documents and more of them, then this method would quickly become expensive and eventually we'd hit the token limit.

## Embeddings

The solution to working with large amounts of external information is to use *embeddings*. OpenAI provide embedding models which allow human readable information to be analysed for meaning and intent. The output from an embedding model is data in a numeric format, known as *vectors*. These allow computers to group pieces of similar information together. The vectors are then kept in a *vector store*. When you want to ask a question, an embedding model is again used to convert the query text into vectors and the vector data that represents your query can then be searched in the vector store. Any similar vectors that are found in the database are likely to be a good response to your query.

To prevent overloading a prompt with a large number of tokens, instead of sending all of our documents to the AI, we can perform a vector search first to narrow down to a set of interesting results, and then use that smaller subset of information as part of a prompt.

Let's walk through the process of using embeddings to give the AI some details about our movies. We'll start by initiating an instance of an embeddings model. You'll notice this is similar to when we initialise one of our model deployments to run a query, but in this case we specify an embedding model. Typically the embedding model used has been `text-embedding-ada-002`, but there are newer alternatives available now.

In [7]:
from langchain_openai import AzureOpenAIEmbeddings

embeddings_model = AzureOpenAIEmbeddings(    
    azure_deployment = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME"),
    openai_api_version = os.getenv("OPENAI_EMBEDDING_API_VERSION"),
    model= os.getenv("AZURE_OPENAI_EMBEDDING_MODEL")
)

Now that we've initialised a model to create embeddings, let's go ahead and embed some documents.

As we did in the previous example, we'll use Langchain's built-in loaders to read the documents from a directory.

In [9]:
documents = []
for file_name in os.listdir(data_dir):
    if file_name.endswith(".md"):
        with open(os.path.join(data_dir, file_name), "r") as file:
            content = file.read()
            documents.append(Document(page_content=content))

The next step is to use a *splitter*. A splitter enables us to break up larger documents into chunks, so that we don't risk hitting the token limit when submitting our data to the embedding model.

In [10]:
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
document_chunks = text_splitter.split_documents(documents)

Created a chunk of size 2658, which is longer than the specified 1000
Created a chunk of size 2255, which is longer than the specified 1000


The next stage is to convert the chunks of split documents into vectors which we do by passing the data through an embedding model. The resultant vectors are then stored in a vector database. In this example, we're using the **Qdrant** (pronounced 'quadrant') database. We initialise it using the `location=":memory:"` option, so that the database will be stored in memory rather than persisted to disk.

In [11]:
from langchain.vectorstores import Qdrant

qdrant = Qdrant.from_documents(
    document_chunks,
    embeddings_model,
    location=":memory:",
    collection_name="movies",
)

The above code segment handles the process of initialising the Qdrant database, passing our documents through the embedding model and storing the resulting vectors in the database.

Next, we define a *retriever*. In Langchain, retrievers are an interface that allow results to be returned from vector stores. So, we establish a retriever for our Qdrant database.

In [12]:
retriever = qdrant.as_retriever()

Next we define a `RetrievalQA` chain. This handles the process of answering a question by performing the search on the vector store, then taking the results of that search and passing them to our AI model.

In [13]:
from langchain.chains import RetrievalQA
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

Now, we'll run our query again. However, we'll make one small change.

You may be thinking that it's not surprising that the AI now knows about the latest Deadpool movie, because we told it about the latest Deadpool movie! So, let's try and show that the AI is actually doing some work here, after all it is a reasoning engine.

If you're not a fan of these movies, Deadpool originates from Marvel comic books. And the collection of movies that originate from Marvel comic books are said to be part of the Marvel Cinematic Universe, sometimes referred to as the MCU. We haven't mentioned Marvel or MCU in the data we've provided, so if we modify the query slightly and ask the AI about the MCU instead of specifically about Deadpool, it should be able to use reasoning to figure out what we mean.

In [14]:
query = "Tell me about the latest MCU movie. When was it released? What is it about?"
result = qa.invoke(query)
print(result['result'])

I don't have information on the exact release date or details of the latest MCU movie as of now. However, based on the reviews provided, a recent MCU movie features Deadpool and Wolverine. It includes a mix of comedy, action, and nostalgic references to Foxfilms. The plot seems to revolve around Deadpool's humorous antics and his dynamic with Wolverine, with a storyline that may appear chaotic but is part of the film's charm. There are also references to comic books and production companies, along with cameos that add to the fun. If you are looking for more specific details, I would recommend checking official sources or recent announcements related to the MCU.


If all went well, the AI should have responded that the latest MCU movie is Deadpool & Wolverine which was released in July 2024.

So, we're getting the response we expected, but let's check in on one of the reasons why we've done all of this. Has the number of tokens used been reduced?  Let's use the same technique as before and employ a callback to find out.

In [15]:
with get_openai_callback() as callback:
    qa.invoke(query)
    total_tokens = callback.total_tokens

print(f"Total tokens used: {total_tokens}")

Total tokens used: 971


The exact number of tokens used may vary, but it should be clear that this query now uses far fewer tokens than our original query, typically around 2,000 fewer.

AI Orchestrators like Langchain and Semantic Kernel can help simplify the process of embedding, vectorization and search. In the preceding section, we stepped through the process of document splitting, embedding, vectorisation, storing vectors in a database and creating a retriever. In the next section, we use Langchain's document loader as we did previously to load and process our Markdown formatted documents, but this time we use a `VectorstoreIndexCreator` which you can see only requires a couple of parameters - the embedding model that we want to use and the source data (`loader`) to use. However, behind the scenes, the `VectorstoreIndexCreator` is carrying out all of the steps we did previously.

In [18]:
from langchain.document_loaders import DirectoryLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os

# Load documents directly
documents = []
for file_name in os.listdir(data_dir):
    if file_name.endswith(".md"):
        with open(os.path.join(data_dir, file_name), "r") as file:
            content = file.read()
            documents.append(Document(page_content=content))

# Create text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

# Split documents
split_documents = text_splitter.split_documents(documents)

# Create index
index = VectorstoreIndexCreator(
    embedding=embeddings_model
).from_documents(split_documents)

Now, to run a query against our data, we just need to specify the prompt and then call the index we've created above and pass in the model (`llm`) we want to use and the question we want to ask.

In [19]:
query = "Tell me about the latest Deadpool movie. When was it released? What is it about?"
index.query(llm=llm, question=query)

'The latest Deadpool movie, titled "Deadpool & Wolverine," was released on July 24, 2024. The film follows a listless Wade Wilson, who has left his days as the morally flexible mercenary, Deadpool, behind. However, when his homeworld faces an existential threat, Wade must reluctantly suit up again alongside an even more reluctant Wolverine. The movie belongs to the genres of action, comedy, and science fiction, featuring a blend of superhero antics and humorous storytelling.'

You can see this is a really simple way to implement embeddings and vectors as part of an AI application. It's great for getting up and running quickly.

We can use the callback method again to confirm that we're still seeing a reduced number of tokens being consumed.

In [20]:
# Run the chain, using the callback to capture the number of tokens used
with get_openai_callback() as callback:
    index.query(llm=llm, question=query)
    total_tokens = callback.total_tokens

print(f"Total tokens used: {total_tokens}")

Total tokens used: 961


## Next Section

Choose one or more of the following Vector Store and AI Orchestration options:

📣 [Implement Retrieval Augmented Generation with Qdrant as vector store](../03-VectorStore/qdrant.ipynb)

📣 [Implement Retrieval Augmented Generation with Azure CosmosDB as vector store and as semantic cache](../03-VectorStore/mongo.ipynb)

📣 [Implement Retrieval Augmented Generation with Azure AI Search as vector store with semantic ranking](../03-VectorStore/aisearch.ipynb)