# Retreival Augmented Generation
Combine the capabilities of large language models with external data retrieval

Ehances language models performance by providing access to specific data that the pre-trained model's general knowledge lacks

The effectiveness of LLM RAG diminishes significantly when the augmented data is already common knowledge and inherently included in the foundation model

Step 1 : Load the data. Examples here for CSV, PDF and HTML files

In [1]:
from langchain_community.document_loaders.csv_loader import CSVLoader
csv_loader = CSVLoader(file_path = 'data/Questions.csv')
docs1 = csv_loader.load()

In [2]:
from langchain_community.document_loaders import PyPDFLoader
pdf_loader = PyPDFLoader(file_path='data/Resume_Checklist.pdf')
docs2 = pdf_loader.load()

In [3]:
from langchain_community.document_loaders import UnstructuredHTMLLoader
html_loader = UnstructuredHTMLLoader(file_path='data/NationalAIStrategy.html', mode='single', strategy='fast')
docs3 = html_loader.load()

Example of how you can reference instances of documents - content and metadata

In [4]:
first_document = docs3[0]
first_document.metadata
# first_document.page_content

{'source': 'data/NationalAIStrategy.html'}

Step 2 : Split the documents into chunks that can be quickly retreived and integrated into the model prompt. A chunk needs to be useful to the LLM. Larger not always better. Choose the chunk size parameter wisely. Also, need to make sure that we don't lose information between chunks. Set the chunk overlap parameter to include information beyond the boundary

Different approaches :

In [5]:
from langchain_text_splitters import CharacterTextSplitter
# Separates the chunks by paragraph. Often too long
text_splitter = CharacterTextSplitter(separator="\n\n", chunk_size=1000, chunk_overlap=10)
chunks = text_splitter.split_text(first_document.page_content)

Created a chunk of size 4255, which is longer than the specified 1000
Created a chunk of size 1161, which is longer than the specified 1000


In [6]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
# If the first separator results in chunks longer than chunk size, will use the next one, and so on
text_splitter2 = RecursiveCharacterTextSplitter(separators=["\n\n", "\n", " ", ""], chunk_size=500, chunk_overlap=50)
chunks = text_splitter2.split_text(first_document.page_content)

In [7]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
# An example where you are splitting a document rather than text
chunks = splitter.split_documents(docs2)

Step 3 : Now we have split the documents into chunks, we need to embed and store them for retreival

Embeddings are representations of the text as vectors in a high dimensional vector space. Similar text is stored together within this space 

Vector stores are databases designed to store this high dimensional vector data

When we receive a user input, it will itself be embedded and used to query the database. The most similar documents will then be returned
using the Chroma database

In [8]:
from langchain_openai import OpenAIEmbeddings
embedding_model = OpenAIEmbeddings(model='text-embedding-3-small')

In [10]:
from langchain_chroma import Chroma
# If we used the split_text method, would use Chroma.from_text below
vector_store = Chroma.from_documents(documents=chunks, embedding=embedding_model)

Pulling this together then, we need to define three components:

A retreiver : Takes the user input and retreives the relavent document chunks

A prompt : To combine the user input and document chunks

Our LLM model

In [11]:
# The arguments specify what sort of search to perform and how moany chunks to retreive per query
retriever = vector_store.as_retriever(search_type = 'similarity', search_kwargs={'k': 2})

In [12]:
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template("""
Use the following pieces of context to answer the question at the end.
If you don't know the answer, say that you don't know.
Context: {context}
Question: {question}
""")

In [13]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model='gpt-4o-mini', temperature=0.2)

We can now define our RAG chain

Runnable Pass Through just passes through the input

In [14]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

chain = (
    {'context': retriever, 'question': RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

And using it

In [15]:
chain.invoke('What is the advice as to what not to include in your resume?')

'The advice regarding what not to include in your resume is to avoid sections that are not directly relevant to a Data Science position. Specifically, you should leave off sections such as hobbies, volunteer experience, and interests, as they can make your resume lengthy and distract from the important information. Additionally, you should minimize abbreviations, remove redundant phrases, avoid too much technical jargon, and ensure there are no typos.'

Note re debugging : Use LangSmith

I was getting an I don't know answer and wanted to see the prompt that went into the LLM

Looking in LangSmith, everything was working but no content was getting provided as the chunk size was too small