<a href="https://colab.research.google.com/github/agnedil/Portfolio/blob/master/Elevated_RAG_with_LangChain_FourthBrain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Elevated RAG with LangChain - FourthBrain

In the following notebook, we'll examine a brief introduction to Retrieval Augmented Generation using LangChain! After that we'll focus on two major areas of possible improvement:

1. Different embedding models and reranking
2. Hybrid Retrieval stacks

We'll also be using the LangChain Expression Language to build our solutions.

LCEL is a production ready style of building and prototyping chains. With automatic async and built-in parallelization, LCEL ensures you're ready for production with very little developer-side lift!

To get started, as always, we have to grab some dependencies and decide on some data!

> NOTE: While we're going to be leveraging OpenAI/Cohere's endpoints for this demonstration - you could use any number of closed-source APIs, or open-source self-hosted models as a substitute.

## Dependencies and API Keys

We'll be leveraging OpenAI's `gpt-4-1106-preview` model today, and Cohere's `embedv3` embeddings.

So we'll need to get both dependencies, as we as provide an API key!

In [None]:
!pip install langchain openai cohere tiktoken -qU

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m808.6/808.6 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m221.5/221.5 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.9/48.9 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m188.2/188.2 kB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.2/48.2 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.0/75.0 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━

In [None]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

Enter your OpenAI API Key:··········


In [None]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Enter your Cohere API Key:")

Enter your Cohere API Key:··········


## Basic Model Access with LangChain

Now that we have our dependencies, we can begin construction a basic RAG chain!

### Testing Existing LLM Performance on our Domain

Before we jump into RAG, let's see how our LLM does out of the box to see if we even need to do RAG in the first place!

We'll do this by setting up a simple chain that will let us query our LLM.

The domain I've selected today is World of Warcraft lore - it's a fairly niche topic, and might not be something that OpenAI's `gpt-4-1106-preview` is great at!

Let's set up our simple QA chain.

#### Model

We'll be using GPT-4 Preview as discussed - and we'll be setting a few parameters:

- `model` - this allows us to specify our model
- `temperature` - this will let us control how "creative" we want our model to be. Since we'll be using this as a factual retiever, we'll set this to a low value.
- `model_kwargs` -> `seed` - setting the seed will let us ensure consistency across sessions!

In [None]:
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model="gpt-4-1106-preview", temperature=0, model_kwargs={"seed" : 1337})

#### Prompt Template

Since we need to pass in user-defined questions to our RAG chain, we'll want to set up a simple prompt template.

In [None]:
from langchain.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("{content}")

#### Output Parser

If we look at our LLM - we'll notice that it's outputs are Message objects - we can convert the response into a `str` by chaining a `StrOutputParser` at the end.

In the following cell's we'll explore how to look at output and input schema and then set-up our string-output parser.

In [None]:
from langchain.schema import StrOutputParser

str_output_parser = StrOutputParser()

Now we can see our inputs and outputs of each components in our chain to make sure they're compatible!

LLM:

In [None]:
print(llm.input_schema.schema())

{'title': 'ChatOpenAIInput', 'anyOf': [{'type': 'string'}, {'$ref': '#/definitions/StringPromptValue'}, {'$ref': '#/definitions/ChatPromptValueConcrete'}, {'type': 'array', 'items': {'anyOf': [{'$ref': '#/definitions/AIMessage'}, {'$ref': '#/definitions/HumanMessage'}, {'$ref': '#/definitions/ChatMessage'}, {'$ref': '#/definitions/SystemMessage'}, {'$ref': '#/definitions/FunctionMessage'}, {'$ref': '#/definitions/ToolMessage'}]}}], 'definitions': {'StringPromptValue': {'title': 'StringPromptValue', 'description': 'String prompt value.', 'type': 'object', 'properties': {'text': {'title': 'Text', 'type': 'string'}, 'type': {'title': 'Type', 'default': 'StringPromptValue', 'enum': ['StringPromptValue'], 'type': 'string'}}, 'required': ['text']}, 'AIMessage': {'title': 'AIMessage', 'description': 'A Message from an AI.', 'type': 'object', 'properties': {'content': {'title': 'Content', 'anyOf': [{'type': 'string'}, {'type': 'array', 'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}

In [None]:
print(llm.output_schema.schema())

{'title': 'ChatOpenAIOutput', 'anyOf': [{'$ref': '#/definitions/AIMessage'}, {'$ref': '#/definitions/HumanMessage'}, {'$ref': '#/definitions/ChatMessage'}, {'$ref': '#/definitions/SystemMessage'}, {'$ref': '#/definitions/FunctionMessage'}, {'$ref': '#/definitions/ToolMessage'}], 'definitions': {'AIMessage': {'title': 'AIMessage', 'description': 'A Message from an AI.', 'type': 'object', 'properties': {'content': {'title': 'Content', 'anyOf': [{'type': 'string'}, {'type': 'array', 'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]}, 'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'}, 'type': {'title': 'Type', 'default': 'ai', 'enum': ['ai'], 'type': 'string'}, 'example': {'title': 'Example', 'default': False, 'type': 'boolean'}}, 'required': ['content']}, 'HumanMessage': {'title': 'HumanMessage', 'description': 'A Message from a human.', 'type': 'object', 'properties': {'content': {'title': 'Content', 'anyOf': [{'type': 'string'}, {'type': 'array', 'items'

StrOutputParser:

In [None]:
print(str_output_parser.input_schema.schema())

{'title': 'StrOutputParserInput', 'anyOf': [{'type': 'string'}, {'$ref': '#/definitions/AIMessage'}, {'$ref': '#/definitions/HumanMessage'}, {'$ref': '#/definitions/ChatMessage'}, {'$ref': '#/definitions/SystemMessage'}, {'$ref': '#/definitions/FunctionMessage'}, {'$ref': '#/definitions/ToolMessage'}], 'definitions': {'AIMessage': {'title': 'AIMessage', 'description': 'A Message from an AI.', 'type': 'object', 'properties': {'content': {'title': 'Content', 'anyOf': [{'type': 'string'}, {'type': 'array', 'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]}, 'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'}, 'type': {'title': 'Type', 'default': 'ai', 'enum': ['ai'], 'type': 'string'}, 'example': {'title': 'Example', 'default': False, 'type': 'boolean'}}, 'required': ['content']}, 'HumanMessage': {'title': 'HumanMessage', 'description': 'A Message from a human.', 'type': 'object', 'properties': {'content': {'title': 'Content', 'anyOf': [{'type': 'string'}, {

In [None]:
print(str_output_parser.output_schema.schema())

{'title': 'StrOutputParserOutput', 'type': 'string'}


As we can see, all of our input and output's line up well - so we're good to go to construct our chain!

#### Basic Chain

Now that we have our components, and we've checked they're compatible, we can build our chain.

With LCEL - building a chain has never been easier!

In [None]:
chain = prompt | llm | str_output_parser

And that's it!

Let's test our chain and see how it does on some common Warcraft lore questions.

In [None]:
chain.invoke({"content" : "In World of Warcraft - who is Fyrakk?"})

"As of my last update in April 2023, Fyrakk is not a widely recognized character in the lore of World of Warcraft. It's possible that Fyrakk could be a minor character, a new addition to the game that was introduced after my knowledge cutoff, or a character from a specific quest or instance that is not central to the main storyline.\n\nWorld of Warcraft is a constantly evolving game with regular updates, expansions, and patches that introduce new content, characters, and storylines. If Fyrakk is a character that was added after April 2023, I would not have information on them.\n\nTo get the most accurate and up-to-date information about Fyrakk, you should check the latest World of Warcraft patch notes, the official forums, or the WoW community resources such as Wowhead or the WoW Wiki. These sources are frequently updated with new information as it becomes available and can provide details on characters, quests, and lore that have been recently added to the game."

Unfortunately, Fyrakk is the current major threat being faced by the Heroes of Azeroth!

We'll need to add some additional data in order to ensure our application is able to answer even the most current questions!

## Retrieval Augmented Generation with LangChain - Simple Implementation

Now that we see how the base solution underperforms - we'll implement a simple RAG chain to boost the performance and allow our application to have an understanding of even the most current lore!

### Data Collection and Parsing

Before we construct a RAG chain, we'll need to source some data that we wish to perform RAG over.

We could use the webpage dump from WoWPedia to achieve this goal - but there are over 500,000 pages of content - so we'll limit ourselves to a small subset of the lore, in this case: Information about the Primal Incarnates that were introduced in the Dragonflight Expansion.

#### Loading Data from Webpages

We will be using the `UnstructuredURLLoader` - powered by [Unstructured](https://unstructured.io/) to load our documents from their respective web-pages - keep in mind that, given an `.xml` sitemap - we could download ever page in an automated fashion - but we're limited the source data for this demo specifically.

In [None]:
import nest_asyncio

nest_asyncio.apply()

In [None]:
!pip install unstructured -qU

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m397.5/397.5 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m275.1/275.1 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m18.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m21.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for langdetect (setup.py) ... [?25l[?25hdone


In [None]:
from langchain.document_loaders import UnstructuredURLLoader

URL_LIST = [
    "https://wowpedia.fandom.com/wiki/Primal_Incarnates",
    "https://wowpedia.fandom.com/wiki/Raszageth",
    "https://wowpedia.fandom.com/wiki/Iridikron",
    "https://wowpedia.fandom.com/wiki/Vyranoth",
    "https://wowpedia.fandom.com/wiki/Fyrakk"
]

In [None]:
loader = UnstructuredURLLoader(urls=URL_LIST)

In [None]:
data = loader.load()

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [None]:
len(data)

5

#### Splitting/Chunking

Now that we have our 5 web-pages - we want to split them into smaller bite-sized pieces to be used in our Retrieval Pipeline.

We'll use the naive solution of the `RecursiveCharacterTextSplitter` first, which will simply split our documents recursively by a set of predefined characters.

This is a great strategy for simple text documents and more - but can be easily upgraded to a more performant solution.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size = 100,
    chunk_overlap = 0,
    model_name = "gpt-4-1106-preview"
)

In [None]:
split_documents = text_splitter.split_documents(data)

In [None]:
len(split_documents)

139

Now we've created a set of 139 documents from our original 5 web-pages - and we can use those, in combination with a VectorStore, to retrieve appropriate context for our questions!

#### Embeddings Model

Now that we've chunked our documents, we'll need to vectorize them and move them into a VectorStore - a place that will associate Vectors with Text Chunks.

We'll be using OpenAI's `text-embedding-ada-002` to start, which is a fine place to start if you're ever searching for a solution that will work great out of the box for a relatively low cost.

In [None]:
from langchain.embeddings import OpenAIEmbeddings

embeddings_oai = OpenAIEmbeddings()

#### VectorStore - FAISS

We'll be using the simple FAISS VectorStore today - though you could substitute this for any VectorStore or Vector Database that you prefer!

In [None]:
!pip install faiss-cpu -qU

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m55.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from langchain.vectorstores import FAISS

vector_store = FAISS.from_documents(split_documents, embeddings_oai)

#### Retriever

Now that we have a VectorStore - we'll need to convert it to a retriever. Luckily, this is a straight forward process with LangChain!

In [None]:
retriever = vector_store.as_retriever()

### Setting Up Prompt Template

Now that we have a few of our base components:

1. LLM
2. Retriever

We need to create a PromptTemplate that will let us instruct our LLM in how to use what we provide to it!

In [None]:
RAG_PROMPT_TEMPLATE = """\
Use the following context to answer the user's questions. If you don't know the answer, please respond with 'I don't know'.

Context:
{context}

Question:
{question}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT_TEMPLATE)

### Setting up RAG chain

With that, we finally have everything we need!

We have:

1. Our Retriever (FAISS-backed VectorStore Retriever with WoWPedia web-pages)
2. Our Augmentor (PromptTemplate with context and question format options)
3. Our Generator (`gpt-4-1106-preview`)

Let's throw them into a chain!

In [None]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

entry_point_and_retriever = RunnableParallel(
    {
        "context" : retriever,
        "question" : RunnablePassthrough()
    }
)

rag_chain = entry_point_and_retriever | rag_prompt | llm | str_output_parser

In [None]:
rag_chain.invoke("In World of Warcraft - who is Fyrakk?")

'Fyrakk is a proto-dragon with the title <The Blazing>. He is male and belongs to the dragonkin race. Fyrakk is a boss-level character with an unknown level, and his resource is mana. He is affiliated with the Primal Incarnates and the Primalists. His location is the Vault of the Incarnates, and according to the lore, he is deceased but still killable in the game. Fyrakk is known for his mastery over fire and Shadowflame, and he was ultimately defeated before he could corrupt the core of Amirdrassil, a World Tree. He is also voiced by Matthew Mercer in the game.'

There we go! Now we've added current data to our application - but we can still do better!

Let's take this RAG application to the next level!

## Elevated RAG with LangChain

Now it's time to apply a few basic patterns that will substantially improve your RAG application.

We'll do 2 major things:

1. Combine our dense vector search retrieval with sparse search to provide a better range of context.
2. Add a Reranker to our retrieved context to ensure we have the most relevant information.

### Hybrid Retrieval

We'll be using a strategy called "hybrid retrieval" to improve our Retrieval Augmented Generation pipeline today.

The basic idea is as follows:

1. Dense Vector Search is very good at retrieving semantically related context.
2. Sparse Search is great at retrieving context based on keywords.

Dense Vector Search can over-index on semantic relatedness due to noise within the user's query - and sparse search can miss obvious connections because of different keywords.

By their powers combined - we can build a better system!

>NOTE: In preparation for the next step - we will be retrieving 10 documents.

In [None]:
!pip install rank_bm25 -qU

In [None]:
from langchain.retrievers import BM25Retriever

In [None]:
bm25_retriever = BM25Retriever.from_documents(split_documents)
bm25_retriever.k = 5

In [None]:
from langchain.embeddings import CohereEmbeddings

cohere_embeddings = CohereEmbeddings(model="embed-english-light-v3.0")

faiss_vectorstore = FAISS.from_documents(split_documents, cohere_embeddings)
faiss_retriever = faiss_vectorstore.as_retriever(search_kwargs={"k": 5})

In [None]:
from langchain.retrievers import EnsembleRetriever

ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, faiss_retriever], weights=[0.75, 0.25]
)

Great! Now we can move onto the next step which will let us leverage our newly created retriever with a reranker!

### Reranking with LangChain

For the most part, you can think of reranking as follows:

1. Retrieve a large number of potential documents using a computationally efficient method.
2. Rerank the retrieved contexts using a computationally more expensive method with greater effectiveness.
3. Provide the `top_k` reranked documents as context to the LLM.

Essentially, this lets us grab a large pool of potentially relevant documents, and then rerank them based on a more performant method that is more expensive - but due to reranker over a small subset of our total documents, we wind up retaining overall performance of our application.

Thanks to LangChain and Cohere - this is a simple process to implement!

In [None]:
from langchain.retrievers.document_compressors import CohereRerank

reranker = CohereRerank(top_n=5)

In [None]:
from langchain.retrievers import ContextualCompressionRetriever

rerank_retriever = ContextualCompressionRetriever(
    base_compressor=reranker, base_retriever=ensemble_retriever
)

Perfect! Now we have a reranker backed by a hybrid retriever!

Let's see how this does in a new chain!

### Elevated RAG Chain

We'll use exactly the same process to build our chain as we did before!

In [None]:
entry_point_and_elevated_retriever = RunnableParallel(
    {
        "context" : rerank_retriever,
        "question" : RunnablePassthrough()
    }
)

elevated_rag_chain = entry_point_and_elevated_retriever | rag_prompt | llm | str_output_parser

In [None]:
elevated_rag_chain.invoke("In World of Warcraft - who is Fyrakk?")

"Fyrakk, also known as Fyrakk the Blazing, is a character in World of Warcraft. He was imprisoned within the Vault of the Incarnates in Thaldraszus until he was released by Raszageth. Fyrakk is a Primal Incarnate who was given a powerful new weapon, the axe Fyr'alath, the Dream Render, which allowed him and his forces to invade the Emerald Dream. He led a siege against the World Tree but faced obstacles such as the Temple that barred his path to the tree's core. Despite his efforts and the use of his mastery over fire and Shadowflame, Fyrakk was ultimately defeated by Azeroth's heroes and leaders at the heart of Amirdrassil, before he could corrupt the World Tree's core. His status is deceased."