# Lab 2: Build a RAG Application with LangChain, Part 1

Like with [Lab 1](./1_search.ipynb) you will use the transcripts from the [Boston Azure Youtube channel](https://www.youtube.com/bostonazure)(included in this repo) to create a RAG application with LangChain. 

## Learning Objectives

* Load the environment variables needed from the .env file (this assumes you are running this in VS Code)
* Learn how to interact with AzureOpenAI using LangChain (not the AzureOpenAI client like we did in Lab 1)
* Learn the features of LangChain needed to implement a RAG application

## Goals
1. Create functionality to provide a text query
2. Have an in-memory vector store searched for the most similar video transcripts
3. Send a prompt to OpenAI to then get a response to the query

> NOTE:
> 
> There is no web UI to this "application" we are going to build - only this notebook.
>

### Step 1: Load environment variables and setup LangChain to use AzureOpenAI

In [None]:
import os
from dotenv import load_dotenv
from langchain_openai import AzureChatOpenAI

load_dotenv()

llm = AzureChatOpenAI(
  openai_api_version="2023-05-15",
  azure_deployment= os.getenv("AZURE_OPENAI_MODEL_DEPLOYMENT_NAME")
)

The LangChain site has more information on [how to connect to an Azure OpenAI deployment](https://python.langchain.com/docs/integrations/chat/azure_chat_openai/) - though some code on that page is a little out of date, so I typically look at the [OpenAI docs](https://python.langchain.com/docs/integrations/chat/openai/) for general usage since it tends to be more up to date.

### Step 2: Use LangChain to interact with the AzureOpenAI chat model

In order to verify the model is setup correctly, we can test by making a call with the invoke method.

In [None]:
llm.invoke("What is MIT?")

If you look at the [LangChain ChatOpenAI docs](https://python.langchain.com/docs/integrations/chat/openai/), you'll see we can also follow a more typical chat model having both a system message and a human message, like the following:

In [None]:
messages = [
    ("system", "You are a helpful assistant that is very brief but polite in your answers. Answer questions in less than 50 words."),
    ("human", "What is MIT?")
]

llm.invoke(messages)

You may have noticed the response from the model is an [AIMessage](https://api.python.langchain.com/en/latest/messages/langchain_core.messages.ai.AIMessage.html#langchain_core.messages.ai.AIMessage). However, we can convert the output to a just a string response using the [StrOutputParser](https://python.langchain.com/docs/modules/model_io/concepts/#stroutputparser). 

#### Building a chain to format the output from the model

In [None]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

# combine the llm and parser into a chain
chain = llm | parser
chain.invoke("What is MIT?")

In order to more specific with the llm call, we can also pass a system and human message, instead of just the text:

In [None]:
messages = [
    ("system", "You are a helpful assistant that is very brief but polite in your answers. Answer questions in less than 50 words."),
    ("human", "What is MIT?")
]
chain.invoke(messages)

#### Prompt templates

So far we've called the LLM in two ways:
1. Passing a single text message
2. Passing a system and human message

Now let's take a step closer to what we want to do for the RAG pattern later on, which is passing **context** along with our message to the LLM. To do this we'll use the [ChatPromptTemplate](https://python.langchain.com/docs/modules/model_io/prompts/quick_start/#chatprompttemplate).

To start with, let's use just a single message that has placeholder for parameters to be plugged in.

In [None]:
from langchain.prompts import ChatPromptTemplate

template = """
Answer the question based on the context below. If you can't 
answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt.format(context="Harvard is in Cambridge", question="Where is Harvard?")

As you can see, since we didn't specify how the message should be categorized, it is defaulting to a human message for the string we provided. That should be fine for now.

Next, we add the prompt as the first item in our chain:

In [None]:
chain = prompt | llm | parser
chain.invoke({
    "context": "Harvard is in Cambridge",
    "question": "Where is Harvard?"
})

What does the chain above do? Sometimes it is easiest to read backwards:
* the parser takes the output from the llm
* the llm takes the output from the prompt
* the invoke passes in the parameters - in this case the parameters needed by the ChatPromptTemplate

The result is the llm's answer to the question taking into account the prompt which instructed the llm to use the context. This is RAG in it's most simple form.

Now let's do the same thing passing a system and human message.

In [None]:

prompt_template = ChatPromptTemplate.from_messages(
    [
        ("system", """You are a helpful assistant that is very brief but polite in your answers. Answer questions in less than 50 words.
            Answer the question based on the context below. If you can't 
            answer the question, reply "I don't know".

            Context: {context}
         """),
        ("human", "{question}")
    ],
)
prompt_template.format_messages(context="Harvard is in Cambridge", question="Where is Harvard?")

In [None]:
chain = prompt_template | llm | parser
chain.invoke({
    "context": "Harvard is in Cambridge",
    "question": "Where is Harvard?"
})

Now that we've got an idea of how to make calls to the LLM with LangChain, its time to look at the transcript file for this lab.

### Step 3: Load the transcript file (and take a look at what is in it)

First set the file we want to use and then use pandas to load load it.

In [None]:
DATASET_NAME = "./prep/output/master_transcriptions.json"

import pandas as pd
transcripts_dataset = pd.read_json(DATASET_NAME)

Now take a look at what is in the data frame. Unlike the transcript file we used in Lab 1, this file does not have the embeddings already.

In [None]:
transcripts_dataset

LangChain has many different [document loaders](https://python.langchain.com/docs/integrations/document_loaders/), since we are already familiar with loading our transcript file using pandas it makes sense to use the [Pandas DataFrame](https://python.langchain.com/docs/integrations/document_loaders/pandas_dataframe/) loader.

One of the things you can specify when using the DataFrameLoader is the **page_content_column**. In this case we'll use the text column (which is the transcript text). All additional columns will be added as metadata.

You may be wondering why we need a document loader ...

In this case, it is because we are planning to use the [DocArrayInMemorySearch](https://python.langchain.com/docs/integrations/vectorstores/docarray_in_memory/#using-docarrayinmemorysearch) for an in memory vector store later in this lab. 

Load the data frame with DataFrameLoader:

In [None]:
from langchain_community.document_loaders import DataFrameLoader

# load the dataset and specify to use the transcript text column for the page content
loader = DataFrameLoader(transcripts_dataset, page_content_column="text")
transcripts = loader.load()

In [None]:
transcripts

#### Embeddings

In [Lab 1](./1_search.ipynb) we didn't need to get the embeddings because they were already in the transcript file, however for this lab we don't have the embeddings **and** we want to see how the vector store can do the work of calling AzureOpenAI to do that for us.

LangChain documentation on [AzureOpenAIEmbeddings](https://python.langchain.com/docs/integrations/text_embedding/azureopenai/) doesn't quite mention it but the [OpenAIEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.openai.OpenAIEmbeddings.html#langchain_community.embeddings.openai.OpenAIEmbeddings) API reference does, if you don't pass a deployment parameter it will use the default 'text-embedding-ada-002' which is what I used.

Set the embeddings variable to use later:

In [None]:
from langchain_openai.embeddings import AzureOpenAIEmbeddings

embeddings = AzureOpenAIEmbeddings()

Verify the embeddings model is being called by passing it a query:

In [None]:
embedded_query = embeddings.embed_query("Where is MIT?")

# for text-embedding-ada-002 the correct number is 1536
print(f"Embedding length: {len(embedded_query)}")

# verify contents looks normal but shorten to only show 10
print(embedded_query[:10])

Now let's take a look at embeddings and their similarity (this time we use another package not calculate one like we did in Lab 1).

Get the embedding for two sentences:

In [None]:
sentence1 = embeddings.embed_query("MIT is in Cambridge")
sentence2 = embeddings.embed_query("Cambridge is across the river from Boston")

Using the `sklearn.metrics.pairwise` package's cosine_similarity calculate their values and output them to see how close in meaning the two sentences are:

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

query_sentence1_similarity = cosine_similarity([embedded_query], [sentence1])[0][0]
query_sentence2_similarity = cosine_similarity([embedded_query], [sentence2])[0][0]

query_sentence1_similarity, query_sentence2_similarity

#### In-Memory Vector Store

As I mentioned at the beginning of this lab, in order to get our RAG pattern implemented we are using an in-memory store here. LangChain offers different options but for this lab we are going to use the [DocArrayInMemorySearch](https://python.langchain.com/docs/integrations/vectorstores/docarray_in_memory/#using-docarrayinmemorysearch).

Before going on with the transcripts example, lets look at a simple example creating a `DocArrayInMemorySearch` from a list of strings. In my example, I've added strings that are all related to the Boston area in someway, but they are not all closely related.

Also, notice that we pass the `embeddings` variable we initialize earlier to the `AzureOpenAIEmbeddings`.

Run the following to load the in-memory vector store and have the store get the embeddings for each of the strings:

In [None]:
from langchain_community.vectorstores import DocArrayInMemorySearch

vectorstore1 = DocArrayInMemorySearch.from_texts(
    [
        "MIT is in Cambridge",
        "Harvard is in Cambridge",
        "Harvard is a university",
        "Cambridge is across the river from Boston",
        "Beacon Hill is in Boston",
        "Samuel Adams lived in Boston",
    ],
    embedding=embeddings
)

Now let's give it a text query to perform a similarity search on the store for us:

In [None]:
vectorstore1.similarity_search_with_score(query="Where is MIT?", k=4)

The above did a pretty good job at listing the relevant string at the top.

#### Using a Retriever

In LangChain [retrievers](https://python.langchain.com/docs/modules/data_connection/retrievers/) are an abstraction from the actual store of documents - this will help us later. We can get a retriever from the vector store and use a syntax that is more familiar to what we did earlier calling the LLM:

In [None]:
retriever1 = vectorstore1.as_retriever()
retriever1.invoke("Where is MIT?")

At this point we have most of the pieces we need to have a RAG system for this simple vector store of strings, let's go ahead and add the remaining pieces.

The remaining glue is something I honestly am still wrapping my head around and won't try to explain here: `RunnableParrallel` and `RunnablePassthrough`. For a more detailed example of RunnableParallel and RunnablePassthrough check out [Formatting inputs & output](https://python.langchain.com/docs/expression_language/primitives/parallel/).

If you're like me, it helps to first see how something is used in order to get an idea of what it does, then figure out the details later.

So, here goes. Run the below to create a RunnableParallel that takes two parameters (which will go into that prompt template we created earlier): context and question

In [None]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

setup = RunnableParallel(context=retriever1, question=RunnablePassthrough())

# ask a question
setup.invoke("Where did Sam Adams live?")

The above looks like everything we need to provide the prompt what it needs - ***and*** it has the proper values from the similarity comparison from the vector store.

Next, add it to the chain and see what it does:

In [None]:
chain = setup | prompt | llm | parser
chain.invoke("Where did Sam Adams live?")

Ok, so it responded with a nice answer, but did it just pull it from the context or did the LLM have to reason about what we gave it?

Let's try a more complicated question that isn't in the context:

In [None]:
chain.invoke("Where is Cambridge compared to where Sam Adams lived?")

Ok, that's pretty decent. That's a small RAG example. Now back to the transcript which is a little more complicated and interesting to see working.

Create an in-memory vector store with the transcripts and pass the embeddings variable to have it get the embeddings from AzureOpenAI.

> NOTE: This takes a few seconds (7.5 seconds on my home network)

In [None]:
vectorstore2 = DocArrayInMemorySearch.from_documents(transcripts, embeddings)

Walkthrough the steps we did earlier just to verify things look good.

First try a similarity search:

In [None]:
vectorstore2.similarity_search_with_score(query="What is LangChain?", k=4)

Now try the retriever:

In [None]:
retriever2 = vectorstore2.as_retriever()
retriever2.invoke("What is LangChain?")

All looks good, so now let's create a chain and try it out.

With this example, let's use the other syntax that allows us to skip the step of creating the RunnableParallel ourselves

In [None]:
chain = (
    {"context": retriever2, "question": RunnablePassthrough()}
    | prompt
    | llm
    | parser
)

chain.invoke("What is LangChain?")

Depending on the day, you may get a decent answer back ... but you probably won't with the `prompt` template. Try the `prompt_template` instead:

In [None]:
chain = (
    {"context": retriever2, "question": RunnablePassthrough()}
    | prompt_template
    | llm
    | parser
)

chain.invoke("What is LangChain?")

That prompt gives me a better answer. Keep that in mind: the prompt is **very important**.

That is the beginning of a RAG application using the Youtube transcripts.

### Reference

This notebook is a modified version of this notebook: [Building a RAG application from scratch](https://github.com/svpino/youtube-rag/blob/main/rag.ipynb). 

There is a Youtube video of the same content [Building a RAG application from scratch using Python, LangChain, and the OpenAI API](https://www.youtube.com/watch?v=BrsocJb-fAo)