# Chapter 3 - RAG Part II: Chatting with Your Data

## Introducing Retrieval-Augmented Generation (RAG)

### Retrieving Relevant Documents

A RAG system for an AI app typically follows three core stages:
* **Indexing**: This stage involves preprocessing the external data source and storing embeddings that represent the data in a vector store where they can be easily retrieved.
* **Retrieval**: This stage involves retrieving the relevant embeddings and data stored in the
vector store based on a user’s query.
* **Generation**: This stage involves synthesizing the original prompt with the retrieved relevant
documents as one final prompt sent to the model for a prediction.

Let’s run through an example from scratch again, starting with the indexing stage:

**NOTE**: Do not forget to launch a new pgvector docker container before using this notebook. Execute ```docker compose up -d``` in the terminal.

In [2]:
from langchain_community.document_loaders import TextLoader
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_postgres.vectorstores import PGVector
from dotenv import load_dotenv
import os

load_dotenv()

# load the document, split it into chunks
raw_documents = TextLoader("./rime.txt").load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(raw_documents)

# define embedding model
hf_embedding = HuggingFaceEmbeddings(
    model="sentence-transformers/all-mpnet-base-v2", # use this model to perform the embedding
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": False},
)

# vector store credentials
connection_credentials = f"postgresql+psycopg://{os.getenv('POSTGRES_USER')}:{os.getenv('POSTGRES_PASSWORD')}@localhost:8888/{os.getenv('POSTGRES_DB')}"

# embed each chunk and insert it into the vector store
db = PGVector.from_documents(documents=documents, embedding=hf_embedding, connection=connection_credentials)


The indexing stage is now complete. In order to execute the retrieval stage, we need to perform similarity search calculations—such as cosine similarity—between the user’s query and our stored embeddings, so relevant chunks of our indexed document are retrieved.

The retrieval process consist of:
1. Convert the user's query into embeddings.
2. Calculate the embeddings in the vector store that are most similar to the user's query.
3. Retrieve the relevant document embeddings and their corresponding text chunk.

We can represent these steps programmatically using LangChain as follows:

In [3]:
# create a retriever
retriever = db.as_retriever(search_kwargs={"k": 2})

# fetch query's relevant documents
docs = retriever.invoke(input="Who's the ancyent marinere?")
docs

[Document(id='6b021b8c-c42e-4cdd-8a3f-b7983e58ab2d', metadata={'source': './rime.txt'}, page_content='THE RIME OF THE ANCYENT MARINERE, IN SEVEN PARTS.\n\nARGUMENT.\n\nHow a Ship having passed the Line was driven by Storms to the cold Country towards the South Pole; and how from thence she made her course to the tropical Latitude of the Great Pacific Ocean; and of the strange things that befell; and in what manner the Ancyent Marinere came back to his own Country.\n\nI.\n\n     It is an ancyent Marinere,\n       And he stoppeth one of three:\n     "By thy long grey beard and thy glittering eye\n       "Now wherefore stoppest me?\n\n     "The Bridegroom\'s doors are open\'d wide\n       "And I am next of kin;\n     "The Guests are met, the Feast is set,--\n       "May\'st hear the merry din.--\n\n     But still he holds the wedding-guest--\n       There was a Ship, quoth he--\n     "Nay, if thou\'st got a laughsome tale,\n       "Marinere! come with me."'),
 Document(id='5d16722a-aafc-4

Note that we are using a vector store method you haven’t seen before: ```as_retriever```. This function abstracts the logic of embedding the user’s query and the underlying similarity search calculations performed by the vector store to retrieve the relevant documents.

There is also an argument ```k```, which determines the number of relevant documents to fetch from the vector store. In this example, the argument ```k``` is specified as 2. This tells the vector store to return the two most relevant documents based on the user’s query.

## Generating LLM Predictions Using Relevant Documents

Once we’ve retrieved the relevant documents based on the user’s query, the final step is to add them to the original prompt as context and then invoke the model to generate a final output.

Here’s a code example continuing on from our previous example:

In [4]:
from langchain_deepseek import ChatDeepSeek
from langchain_core.prompts import ChatPromptTemplate

load_dotenv()

retriever = db.as_retriever(search_kwargs={"k": 2})

prompt = ChatPromptTemplate.from_template(
    template=
    """
    Answer the question based only on the following context:
    {context}

    Question: {question}
    """
)
llm = ChatDeepSeek(model="deepseek-chat", temperature=0.0)


In [5]:
chain = prompt | llm

# fetch relevant documents
question = """Who's the ancyent marinere?"""
docs = retriever.invoke(input=question) # get_relevant_documents method is deprecated use invoke instead

# run the workflow
answer = chain.invoke(input={"context": docs, "question": question})
print(f"answer: {answer.content}\n\ndocs: {docs}")

answer: Based solely on the provided context, the "ancyent Marinere" is an old sailor who stops a wedding guest to tell him a story. He is described as having a "long grey beard," a "glittering eye," and a "skinny hand." He compels the wedding guest to listen to his tale against his will.

docs: [Document(id='6b021b8c-c42e-4cdd-8a3f-b7983e58ab2d', metadata={'source': './rime.txt'}, page_content='THE RIME OF THE ANCYENT MARINERE, IN SEVEN PARTS.\n\nARGUMENT.\n\nHow a Ship having passed the Line was driven by Storms to the cold Country towards the South Pole; and how from thence she made her course to the tropical Latitude of the Great Pacific Ocean; and of the strange things that befell; and in what manner the Ancyent Marinere came back to his own Country.\n\nI.\n\n     It is an ancyent Marinere,\n       And he stoppeth one of three:\n     "By thy long grey beard and thy glittering eye\n       "Now wherefore stoppest me?\n\n     "The Bridegroom\'s doors are open\'d wide\n       "And I a

Note the following changes:
* We implement dynamic context and question variables into our prompt, which allows us to define a ```ChatPromptTemplate``` the model can use to generate a response.
* We define a ```DeepSeek``` interface to act as our LLM. ```Temperature``` is set to ```0``` to eliminate the creativity in outputs from the model.
* We create a chain to compose the prompt and LLM. A reminder: the ```|``` (pipe) operator takes the output of prompt and uses it as the input to llm.
* We invoke the chain passing in the context variable (our retrieved relevant docs) and the user’s question to generate a final output.

We can encapsulate this retrieval logic in a single function:

In [6]:
from typing import Any
from langchain_core.runnables import chain

@chain
def qa(question: str) -> dict[str, Any]:

    # fetch relevant documents
    docs = retriever.invoke(input=question)
    # prepare prompt
    formatted_prompt =  prompt.invoke(input={"context": docs, "question": question})

    answer = llm.invoke(input=formatted_prompt) # return llm's answer

    return {"answer": answer, "docs": docs} # return answer and relevant docs

In [7]:
response = qa.invoke(input="From where to where was the ship sailing?")

print(f"answer: {response['answer'].content}\n\nrelevant docs: {response['docs']}")

answer: Based solely on the provided context, the ship's journey is described in the Argument section:

The ship was sailing **from "the cold Country towards the South Pole" to "the tropical Latitude of the Great Pacific Ocean."**

relevant docs: [Document(id='6b021b8c-c42e-4cdd-8a3f-b7983e58ab2d', metadata={'source': './rime.txt'}, page_content='THE RIME OF THE ANCYENT MARINERE, IN SEVEN PARTS.\n\nARGUMENT.\n\nHow a Ship having passed the Line was driven by Storms to the cold Country towards the South Pole; and how from thence she made her course to the tropical Latitude of the Great Pacific Ocean; and of the strange things that befell; and in what manner the Ancyent Marinere came back to his own Country.\n\nI.\n\n     It is an ancyent Marinere,\n       And he stoppeth one of three:\n     "By thy long grey beard and thy glittering eye\n       "Now wherefore stoppest me?\n\n     "The Bridegroom\'s doors are open\'d wide\n       "And I am next of kin;\n     "The Guests are met, the Feas

Notice how we now have a new runnable ```qa``` function that can be called with just a question and takes care to first fetch the relevant docs for context, format them into the prompt, and finally generate the answer. In the Python code, the ```@chain``` decorator turns the function into a runnable chain. This notion of encapsulating multiple steps into a single function will be key to building interesting apps with
LLMs.

**NOTE:** Do not forget to remove the pgvector container when done using this notebook. Execute ```docker compose down --volumes``` in the terminal.