# Comprehensive Tutorial on Building a RAG Application Using LangChain

## Introduction

Even though the amount of information today's LLMs have access is growing constantly, there are mounds of private data not included in their training. For this reason, one of the most popular applications of LLMs in enterprise settings is retrieval-augmented generation (RAG). When you design a RAG system, you retrieve information from a private data source and feed it to a language model to generate contextually relevant responses. 

In this tutorial, you will learn how to develop RAG applications using a massively popular framework - LangChain. I'll guide you through the process step-by-step, from setting up your environment to implementing the core components of a RAG system. By the end of this tutorial, you'll have created your own chatbot capable of answering questions using any outside source, opening up a world of possibilities for leveraging private data in your AI applications.

Let's get started!

## What is RAG?

To clarify what RAG is, let's consider a simle example.

A first-year college student, Chandler, is consider to skip a few classes but wants to ensure he isn't violating the university attendance policy. Like with anything these days, he asks ChatGPT the question.

Of course, ChatGPT can't answer it. The chatbot isn't dumb - it just doesn't have access to Chandler's university documents. So, Chandler finds the policy document himself and discovers that it is a long, technical read he doesn't want to wade through.

Instead, he gives the entire document to ChatGPT and asks the question again. This time, he gets his answer. 

This is an individual case of retrieval augmented generation. The language model's answer (generation) is augmented (enriched) by context retrieved from a source not part of its original training. 

A scalable version of a RAG system would be able to answer any student question by searching university documents itself, finding the relevant ones and retrieving chunks of text that most likely contain the answer. 

## Components of a RAG Application

Such a system, despite sounding straightforward, would have a lot of moving components. Before building one ourselves, we need to review what they are and how they play together.

### Documents

The first component is a document or a collection of documents. Based on the type of RAG system we are building, the documents can be text files, PDFs, web pages (RAG over unstructured data) or graph, SQL, NoSQL databases (RAG over structured data). They are used to ingest various types of data into the system.

### Document loaders

LangChain implements hundreds of classes called _document loaders_ to read data from various document sources such as PDFs, Slack, Notion, Google Drive, and so on. 

Each DocumentLoader class is unique but they all share the same `.load()` method. For example, here is how you can load a PDF document and a webpage in LangChain:

In [6]:
from langchain_community.document_loaders import PyPDFLoader, WebBaseLoader  # pip install langchain-community

pdf_loader = PyPDFLoader("framework_docs.pdf")
web_loader = WebBaseLoader(
    "https://python.langchain.com/v0.2/docs/concepts/#document-loaders"
)

pdf_docs = pdf_loader.load()
web_docs = web_loader.load()

The PyPDFLoader class handles PDF files using the PyPDF2 package under the hood, while the `WebBaseLoader` scrapes the given webpage contents. 

`pdf_docs` contains four document objects, one for each page:

In [12]:
len(pdf_docs)

4

While `web_docs` contain only one:

In [18]:
print(web_docs[0].page_content[125:300].strip())

You can view the v0.1 docs here.IntegrationsAPI referenceLatestLegacyMorePeopleContributingCookbooks3rd party tutorialsYouTubearXivv0.2v0.2v0.1🦜️🔗LangSmithLangSmith DocsLangCh


These document objects are later given to embedding models to understand the semantic meaning behind their text. 

For specifics on other types of document loaders, LangChain offers a [dedicated how-to page](https://python.langchain.com/v0.2/docs/how_to/#document-loaders).

### Text splitters

Once you have loaded your documents, it is crucial to break them down into smaller and more manageable chunks of text. Here are the main reasons:

1. Many embedding models (more on them later) have a maximum token limit.
2. Retrieval is more accurate when you have smaller chunks.
3. The language model is fed the exact context.

LangChain offers many types of text splitters under its `langchain_text_splitters` package and they differ based on document type. 

Here is how to use `RecursiveCharacterTextSplitter` to split plain text based on a list of separators and chunk size:

In [None]:
!pip install langchain_text_splitters

In [20]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Example text
text = """
RAG systems combine the power of large language models with external knowledge sources.
This allows them to provide up-to-date and context-specific information.
The process involves several steps including document loading, text splitting, and embedding.
"""

# Create a text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=50,
    chunk_overlap=10,
    length_function=len,
    separators=["\n\n", "\n", " ", ""],
)

# Split the text
chunks = text_splitter.split_text(text)

# Print the chunks
for i, chunk in enumerate(chunks):
    print(f"Chunk {i + 1}: {chunk}")

Chunk 1: RAG systems combine the power of large language
Chunk 2: language models with external knowledge sources.
Chunk 3: This allows them to provide up-to-date and
Chunk 4: and context-specific information.
Chunk 5: The process involves several steps including
Chunk 6: including document loading, text splitting, and
Chunk 7: and embedding.


This splitter is versatile and works well for many uses cases. It creates each chunk with a character count as close to `chunk_size` as possible. It can recursively switch between which separators to split at to keep the character count.

In the above example, our splitter tries to split on newlines first, then single spaces, and finally between any characters to reach the desired chunk size.

There are many other splitters inside `langchain_text_splitters` package. Here are some:
- HTMLSectionSplitter
- PythonCodeTexSplitter
- RecursiveJsonSplitter

and so on. Some of the splitters create semantically meaningful chunks by using a transformer model under the hood. 

The right text splitter has a significant impact on the performance of a RAG system.

For specifics on how to use text splitters, see the relevant [how-to guides here](https://python.langchain.com/v0.2/docs/how_to/#text-splitters).

### Embedding models

Once documents are split into text, they need to be encoded into their numeric representation, which is a requirement for all computation models working with text data.

In the context of RAG, this encoding is called _embedding_ and done by _embedding models_. They create a vector representation of a piece of text that captures their semantic meaning. 

By presenting text in this way, you can do mathematical operations on them like searching our document database for text most similar in meaning or find an answer to a user query.

LangChain supports all major embedding model providers such as OpenAI, Cohere, HuggingFace, and so on. They are implemented as `Embedding` classes and provide two methods: one for embedding documents and one for embedding queries (prompts). 

Here is an example code that embeds the chunks of text we created in the previous section using OpenAI:

In [22]:
from langchain_openai import OpenAIEmbeddings

# Initialize the OpenAI embeddings
embeddings = OpenAIEmbeddings()

# Embed the chunks
embedded_chunks = embeddings.embed_documents(chunks)

# Print the first embedded chunk to see its structure
print(f"Shape of the first embedded chunk: {len(embedded_chunks[0])}")
print(f"First few values of the first embedded chunk: {embedded_chunks[0][:5]}")

Shape of the first embedded chunk: 1536
First few values of the first embedded chunk: [-0.020282309502363205, -0.0015041005099192262, 0.004193042870610952, 0.00229285703971982, 0.007068077567964792]


The output above shows that the embedding model is creating a 1536-dimensional vector for all chunks in our documents. 

To embed a single query, you can use the `embed_query()` method:

In [23]:
query = "What is RAG?"
query_embedding = embeddings.embed_query(query)
print(f"Shape of the query embedding: {len(query_embedding)}")
print(f"First few values of the query embedding: {query_embedding[:5]}")

Shape of the query embedding: 1536
First few values of the query embedding: [-0.012426204979419708, -0.016619959846138954, 0.007880032062530518, -0.0170428603887558, 0.011404196731746197]


### Vector stores

In large-scale RAG applications where you may have gigabytes of documents, you will end up with gazillion text chunks and thus, vectors. There isn't any use to them if you can't store them reliably.

This is why _vector stores or databases_ are all the rage now. Apart from storing your embeddings, vector databases take care of performing vector search for you. These databases are optimized to quickly find the most similar vectors when given a query vector, which is essential for retrieving relevant information in RAG systems.

Here is a snippet of code that embeds the contents of a web page and stores the vectors into a Chroma vector database (Chroma is an open-source vector database solution that runs entirely on your machine):

In [None]:
!pip install chromadb langchain_chroma

In [33]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load the web page
loader = WebBaseLoader("https://python.langchain.com/v0.2/docs/tutorials/rag/")
docs = loader.load()

# Split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(docs)

First, we load the page with `WebBaseLoader` and create our chunks. Then, we can directly pass the chunks to the `from_documents` method of `Chrome` along with our embedding model of choice:

In [29]:
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

db = Chroma.from_documents(chunks, OpenAIEmbeddings())

All vector database objects in LangChain expose a `similarity_search` method that accepts a query string:

In [38]:
query = "What is indexing in the context of RAG?"
docs = db.similarity_search(query)

print(docs[1].page_content)

data. If you are interested for RAG over structured data, check out our tutorial on doing question/answering over SQL data.Concepts​A typical RAG application has two main components:Indexing: a pipeline for ingesting data from a source and indexing it. This usually happens offline.Retrieval and generation: the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.The most common full sequence from raw data to answer looks like:Indexing​Load: First we need to load our data. This is done with Document Loaders.Split: Text splitters break large Documents into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won't fit in a model's finite context window.Store: We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a VectorStore and Embeddings model.Retrieval and


The result of `similarity_search` is a list of documents that most likely contain the information we are asking in the query. 

For specifics on how to use vector stores, see the relevant [how-to guides here](https://python.langchain.com/v0.2/docs/how_to/#vector-stores).

### Retrievers

### 

## LangChain Basics

## Step-by-Step Workflow to Building a RAG App in LangChain

## Conclusion

## Code