# Chapter 3 - RAG Part II: Chatting with Your Data

## Introducing Retrieval-Augmented Generation (RAG)

### Retrieving Relevant Documents

A RAG system for an AI app typically follows three core stages:
* **Indexing**: This stage involves preprocessing the external data source and storing embeddings that represent the data in a vector store where they can be easily retrieved.
* **Retrieval**: This stage involves retrieving the relevant embeddings and data stored in the
vector store based on a user’s query.
* **Generation**: This stage involves synthesizing the original prompt with the retrieved relevant
documents as one final prompt sent to the model for a prediction.

Let’s run through an example from scratch again, starting with the indexing stage:

In [1]:
from langchain_community.document_loaders import TextLoader
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_postgres.vectorstores import PGVector
from dotenv import load_dotenv
import os

load_dotenv()

# load the document, split it into chunks
raw_documents = TextLoader("./rime.txt").load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(raw_documents)

# define embedding model
hf_embedding = HuggingFaceEmbeddings(
    model="sentence-transformers/all-mpnet-base-v2", # use this model to perform the embedding
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": False},
)

# vector store credentials
connection_credentials = f"postgresql+psycopg://{os.getenv('POSTGRES_USER')}:{os.getenv('POSTGRES_PASSWORD')}@localhost:8888/{os.getenv('POSTGRES_DB')}"

# embed each chunk and insert it into the vector store
db = PGVector.from_documents(documents=documents, embedding=hf_embedding, connection=connection_credentials)


The indexing stage is now complete. In order to execute the retrieval stage, we need to perform similarity search calculations—such as cosine similarity—between the user’s query and our stored embeddings, so relevant chunks of our indexed document are retrieved.

The retrieval process consist of:
1. Convert the user's query into embeddings.
2. Calculate the embeddings in the vector store that are most similar to the user's query.
3. Retrieve the relevant document embeddings and their corresponding text chunk.

We can represent these steps programmatically using LangChain as follows:

In [3]:
# create a retriever
retriever = db.as_retriever(search_kwargs={"k": 2})

# fetch query's relevant documents
docs = retriever.invoke(input="Who's the ancyent marinere?")
docs

[Document(id='2a80a61a-87a4-4795-8aa3-56150da2c80d', metadata={'source': './rime.txt'}, page_content='THE RIME OF THE ANCYENT MARINERE, IN SEVEN PARTS.\n\nARGUMENT.\n\nHow a Ship having passed the Line was driven by Storms to the cold Country towards the South Pole; and how from thence she made her course to the tropical Latitude of the Great Pacific Ocean; and of the strange things that befell; and in what manner the Ancyent Marinere came back to his own Country.\n\nI.\n\n     It is an ancyent Marinere,\n       And he stoppeth one of three:\n     "By thy long grey beard and thy glittering eye\n       "Now wherefore stoppest me?\n\n     "The Bridegroom\'s doors are open\'d wide\n       "And I am next of kin;\n     "The Guests are met, the Feast is set,--\n       "May\'st hear the merry din.--\n\n     But still he holds the wedding-guest--\n       There was a Ship, quoth he--\n     "Nay, if thou\'st got a laughsome tale,\n       "Marinere! come with me."'),
 Document(id='2301595d-ab31-4

Note that we are using a vector store method you haven’t seen before: ```as_retriever```. This function abstracts the logic of embedding the user’s query and the underlying similarity search calculations performed by the vector store to retrieve the relevant documents.

There is also an argument ```k```, which determines the number of relevant documents to fetch from the vector store. In this example, the argument ```k``` is specified as 2. This tells the vector store to return the two most relevant documents based on the user’s query.