<a href="https://colab.research.google.com/github/Emanuel071/collabwork/blob/main/Es_BasicRAG_copy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to build a basic RAG app

This notebook gives a step-by-step example for the article "How to build a basic RAG app", available at https://ruxu.dev. The purpose of this example is to showcase a very basic RAG pipeline that uses a document to answer a user's questions over it.

For this example, we will ingest a paper called “Retrieval-Augmented Generation for Lange Language Models: A Survey”. We will query the LLM using the information contained in this paper, so it can answer the user's questions on its contents.

see article https://dev.to/rogiia/how-to-build-a-basic-rag-app-h9p

First, we will install required dependencies.

In [None]:
!pip install langchain langchain-community pypdf sentence_transformers faiss-cpu langchain-anthropic

Collecting langchain
  Downloading langchain-0.2.2-py3-none-any.whl (973 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m973.6/973.6 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-community
  Downloading langchain_community-0.2.3-py3-none-any.whl (2.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m12.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pypdf
  Downloading pypdf-4.2.0-py3-none-any.whl (290 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.4/290.4 kB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentence_transformers
  Downloading sentence_transformers-3.0.0-py3-none-any.whl (224 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.7/224.7 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting faiss-cpu
  Downloading faiss_cpu-1.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.0 MB)
[2K     [90m━━━━━━━━━━━

## Parsing the document
First, we will load the PDF document and parse it using LangChain's PyPDF connector.

In [None]:
from langchain_community.document_loaders import PyPDFLoader

document_url = "https://arxiv.org/pdf/2312.10997.pdf"
loader = PyPDFLoader(document_url)
pages = loader.load()

In [None]:
print(pages[0].page_content[0:250])

1
Retrieval-Augmented Generation for Large
Language Models: A Survey
Yunfan Gaoa, Yun Xiongb, Xinyu Gaob, Kangxiang Jiab, Jinliu Panb, Yuxi Bic, Yi Daia, Jiawei Suna, Meng
Wangc, and Haofen Wanga,c
aShanghai Research Institute for Intelligent Autonom


Once we have the text from the document, we have to split it into smaller chunks. We can use LangChain's available splitters, like RecursiveCharacterSplitter in this case:

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=40,
    length_function=len,
    is_separator_regex=False,
)
chunks = text_splitter.split_documents(pages)
print(chunks[0])

page_content='1\nRetrieval-Augmented Generation for Large\nLanguage Models: A Survey\nYunfan Gaoa, Yun Xiongb, Xinyu Gaob, Kangxiang Jiab, Jinliu Panb, Yuxi Bic, Yi Daia, Jiawei Suna, Meng\nWangc, and Haofen Wanga,c\naShanghai Research Institute for Intelligent Autonomous Systems, Tongji University\nbShanghai Key Laboratory of Data Science, School of Computer Science, Fudan University' metadata={'source': 'https://arxiv.org/pdf/2312.10997.pdf', 'page': 0}


We will be using BGE-small, an opensource embeddings model. We will download it from HuggingFace Hub and run it on all chunks to calculate their vector representations.

In [None]:
from langchain_community.embeddings import HuggingFaceBgeEmbeddings

model_name = "BAAI/bge-small-en"
model_kwargs = {"device": "cpu"}
encode_kwargs = {"normalize_embeddings": True}
bge_embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs
)

chunk_texts = list(map(lambda d: d.page_content, chunks))
embeddings = bge_embeddings.embed_documents(chunk_texts)
print(embeddings[0])

  from tqdm.autonotebook import tqdm, trange
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/90.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/684 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

[-0.05254931002855301, 0.010038575157523155, -0.0062491269782185555, -0.012186411768198013, -0.0038278785068541765, 0.0449368990957737, -0.01909380778670311, 0.01942513883113861, 0.056532278656959534, 0.003400492714717984, -0.006539005786180496, -0.029364701360464096, 0.06734544783830643, 0.03767259418964386, 0.04291080683469772, 0.02694469504058361, 0.0025654013734310865, 0.04653036221861839, -0.0030540241859853268, -0.04369943588972092, 0.04743994399905205, -0.014691959135234356, -0.02880352921783924, -0.022866912186145782, -0.051217809319496155, 0.02544221840798855, -0.01760285533964634, -0.04081522300839424, -0.012394286692142487, -0.2337203025817871, -0.018066903576254845, -0.03340132161974907, 0.08302459120750427, 0.021731827408075333, -0.005400180816650391, 0.0260979812592268, -0.040765970945358276, 0.03784996271133423, -0.019831310957670212, 0.03637311980128288, 0.009351233020424843, 0.012908305041491985, 0.010659882798790932, 0.00425981217995286, -0.015125353820621967, -0.0363

Once we have the vector representations for all chunks, we can create an in-memory vector database and store all vectors in it. For this example, we will be using a FAISS database.

In [None]:
from langchain_community.vectorstores import FAISS

text_embedding_pairs = zip(chunk_texts, embeddings)
db = FAISS.from_embeddings(text_embedding_pairs, bge_embeddings)

The database is now set up. Now, we will be taking queries from the user on this information. In this case, the user asks which are the drawbacks of Naive RAG. We encode this query using the same embeddings model as before. Then, we retrieve the top 5 most similar chunks to that query.

In [None]:
query = "why is rag so powerfull?"

contexts = db.similarity_search(query, k=5)

print(contexts[0])

page_content='and avoiding contradictions.\nAnswer Relevance requires that the generated answers are\ndirectly pertinent to the posed questions, effectively addressing\nthe core inquiry.\n2) Required Abilities: RAG evaluation also encompasses\nfour abilities indicative of its adaptability and efficiency:\nnoise robustness, negative rejection, information integration,'


After retrieving the relevant context, we build a prompt using this information and the user's original query. We will use Claude's Haiku as a LLM for this example:



> This example uses Claude API to call the model. In order for it to work, remember to set the Secret Variable "ANTHROPIC_API_KEY" to your own Anthropic API Key, or change the model to any of your choice.



In [None]:
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from google.colab import userdata

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are an expert at answering questions based on a context extracted from a document. The context extracted from the document is: {context}"),
        ("human", "{question}"),
    ]
)

api_key = userdata.get('ANTHROPIC_API_KEY')
model = ChatAnthropic(model='claude-3-haiku-20240307', api_key=api_key)

chain = prompt | model

response = chain.invoke({
    "context": '\n\n'.join(list(map(lambda c: c.page_content, contexts))),
    "question": query
})

SecretNotFoundError: Secret ANTHROPIC_API_KEY does not exist.

In [None]:
response.content

'Based on the context provided, the key drawbacks of Naive RAG are:\n\n1. Retrieval Challenges: The retrieval phase in Naive RAG often struggles with precision and recall, leading to the selection of misaligned or irrelevant chunks of information, and missing crucial information.\n\n2. Generation Difficulties: In the generation phase, Naive RAG models may face the issue of hallucination, where they produce contextually inconsistent or factually incorrect outputs.\n\nThe context highlights that these retrieval and generation challenges are notable weaknesses of the Naive RAG approach, and have motivated the development of more advanced RAG paradigms that aim to address these limitations.'