# Retrieval-Augmented Generation (RAG) With Langchain and Opensourse LLM(Flan-Alpaca GPT-4)

This notebook demonstrates the implementation of a **Retrieval-Augmented Generation (RAG)** system using LangChain and an open-source LLM such as **Flan-Alpaca GPT-4**. The RAG approach enhances large language models (LLMs) by integrating external knowledge, enabling them to generate accurate and contextually relevant responses. This specific implementation:

**1. Data Ingestion from PDF Files:**

  * Reads data from one or more PDF files.

  * Processes the text content for use in downstream tasks.

**2. Vector Store for Knowledge Storage:**

 * Converts the extracted textual data into embeddings (dense numerical representations).

 * Stores these embeddings in a vector store, enabling efficient similarity-based retrieval.


**3. LLM Query and Response:**

 * Uses LangChain to interact with the vector store and retrieve relevant information based on user queries.

 * Combines retrieved information with the LLM to generate a response, ensuring the model is informed by the external knowledge.

 * Also Provides Reference of Response from the given Data

### Use T4 Runtime(Runtime > Change Runtime Type > Hardware Accelerator  > T4 > Save)

### Installing Necessary Packages

In [1]:
!pip install -q optimum

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/424.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m419.8/424.1 kB[0m [31m18.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m424.1/424.1 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/46.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.0/46.0 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/480.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m17.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
!pip install -q auto-gptq transformers

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.5/23.5 MB[0m [31m81.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.2/13.2 MB[0m [31m106.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
!pip install -q sentence_transformers unstructured
!pip install -q -U langchain-community
!pip install -q einops accelerate langchain bitsandbytes
!pip install pypdf

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/981.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━[0m [32m860.2/981.5 kB[0m [31m29.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m18.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m149.4/149.4 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m42.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m586.9/586.9 kB[0m [31m26.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m274.9/274.9 kB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [3

In [4]:
!pip install -q  chromadb

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m606.2/606.2 kB[0m [31m23.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m76.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m278.6/278.6 kB[0m [31m23.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.8/94.8 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m78.1 MB/s[0m eta [36m0:00:00

### Importing Necessary Packages

In [5]:
import os
import requests
from langchain.llms import HuggingFaceHub
from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import UnstructuredURLLoader
from langchain.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.document_loaders import PyPDFDirectoryLoader

### Task 1 - Data Loading

In [25]:
#Create A New Folder
!mkdir pdf_files

mkdir: cannot create directory ‘pdf_files’: File exists


In [9]:
#Load pdf files (Upload PDF files in pdf_files)
loader = PyPDFDirectoryLoader("/content/pdf_files")
data = loader.load()

In [None]:
#Load Data from Websites:(Uncomment This)
# url=["Give URL Name here"]
# loader = UnstructuredURLLoader(urls=url)
# dataurl = loader.load()

In [11]:
#Creating Chunks of the given Data
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
chunks = text_splitter.split_documents(data)

### Loading Embedding Function (MiniLM L6)

In [12]:
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

  embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

### Creaing a Vectore to store the Given data with Chromadb

In [13]:
persist_directory = 'pop'

In [14]:
vectorstore = Chroma.from_documents(chunks, embedding_function,persist_directory=persist_directory)

In [15]:
retriever = vectorstore.as_retriever()

### Task 2: Set Up RAG with LangChain

### Loading LLM (Flan Alpaca Gpt-4)

In [16]:
from langchain import HuggingFacePipeline
from transformers import AutoTokenizer, pipeline
import torch

model = "declare-lab/flan-alpaca-gpt4-xl"

tokenizer = AutoTokenizer.from_pretrained(model)

pipeline = pipeline(
    "text2text-generation", #task
    model=model,
    tokenizer=tokenizer,
    trust_remote_code=True,
    device_map="auto",
    max_length=2000,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id
)

tokenizer_config.json:   0%|          | 0.00/2.35k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.53k [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/142 [00:00<?, ?B/s]

Device set to use cuda:0


In [17]:
from langchain import HuggingFacePipeline
llm = HuggingFacePipeline(pipeline = pipeline, model_kwargs = {'temperature':0.5})

  llm = HuggingFacePipeline(pipeline = pipeline, model_kwargs = {'temperature':0.5})


In [18]:
llm

HuggingFacePipeline(pipeline=<transformers.pipelines.text2text_generation.Text2TextGenerationPipeline object at 0x7a554bf4fee0>, model_kwargs={'temperature': 0.5})

### Task 3: Build the Chatbot

### Creating Langchain Question Chain (Combine LLM with The Given Data)

In [19]:
qa_chain = RetrievalQA.from_chain_type(llm=llm,
                                  chain_type="stuff",
                                  retriever=retriever,
                                  return_source_documents=True)

### Function for getting reponse from LLM also gets the reference (Where the LLM gets Infromation)

In [20]:
def process_llm_response(llm_response):
    print(llm_response['result'])
    print('\n\nSources:')
    for source in llm_response["source_documents"]:
        print(source.metadata['source'])

### Reponse from LLM form PDF (Attention Is All You Need - Research paper)

In [21]:
query = "What is the text about ?"
llm_response = qa_chain(query)
process_llm_response(llm_response)
llm_response

  llm_response = qa_chain(query)
Token indices sequence length is longer than the specified maximum sequence length for this model (1016 > 512). Running this sequence through the model will result in indexing errors


The text is about generative models that learn dependencies and entailments of sentences through self-attention and other attention mechanisms.


Sources:
/content/pdf_files/Attention Is All You Need.pdf
/content/pdf_files/Attention Is All You Need.pdf
/content/pdf_files/Attention Is All You Need.pdf
/content/pdf_files/Attention Is All You Need.pdf


{'query': 'What is the text about ?',
 'result': 'The text is about generative models that learn dependencies and entailments of sentences through self-attention and other attention mechanisms.',
 'source_documents': [Document(metadata={'page': 11, 'source': '/content/pdf_files/Attention Is All You Need.pdf'}, page_content='[25] Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated\ncorpus of english: The penn treebank. Computational linguistics, 19(2):313–330, 1993.\n[26] David McClosky, Eugene Charniak, and Mark Johnson. Effective self-training for parsing. In\nProceedings of the Human Language Technology Conference of the NAACL, Main Conference,\npages 152–159. ACL, June 2006.\n[27] Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. A decomposable attention\nmodel. In Empirical Methods in Natural Language Processing, 2016.\n[28] Romain Paulus, Caiming Xiong, and Richard Socher. A deep reinforced model for abstractive\nsummarizat

In [22]:
query = "What is attention mechanisms and how is it better ?"
llm_response = qa_chain(query)
process_llm_response(llm_response)
llm_response

According to the input, an attention mechanism is a method for learning to map and prioritize elements in a sequence or set of sequences, where the focus is on extracting information relevant to the current position on the sequence. The method is based on the idea of weighted summation, where a weight is given for different aspects of the sequence to determine the most relevant element. In the example provided, a self-attention mechanism exhibits behavior associated with the structure of the sentence, where the attention heads exhibit behavior related to the verb 'making'. This mechanism is considered better as it can execute operations faster than a recurrent layer.


Sources:
/content/pdf_files/Attention Is All You Need.pdf
/content/pdf_files/Attention Is All You Need.pdf
/content/pdf_files/Attention Is All You Need.pdf
/content/pdf_files/Attention Is All You Need.pdf


{'query': 'What is attention mechanisms and how is it better ?',
 'result': "According to the input, an attention mechanism is a method for learning to map and prioritize elements in a sequence or set of sequences, where the focus is on extracting information relevant to the current position on the sequence. The method is based on the idea of weighted summation, where a weight is given for different aspects of the sequence to determine the most relevant element. In the example provided, a self-attention mechanism exhibits behavior associated with the structure of the sentence, where the attention heads exhibit behavior related to the verb 'making'. This mechanism is considered better as it can execute operations faster than a recurrent layer.",
 'source_documents': [Document(metadata={'page': 2, 'source': '/content/pdf_files/Attention Is All You Need.pdf'}, page_content='3.2 Attention\nAn attention function can be described as mapping a query and a set of key-value pairs to an output,\

In [24]:
query = "How does attention mechanisms work? Explain in simple terms"
llm_response = qa_chain(query)
process_llm_response(llm_response)
llm_response

Attention mechanisms work by mapping a query and a set of key-value pairs to an output and then computing a weighted sum between them, where the query, keys, values, and output are all vectors.


Sources:
/content/pdf_files/Attention Is All You Need.pdf
/content/pdf_files/Attention Is All You Need.pdf
/content/pdf_files/Attention Is All You Need.pdf
/content/pdf_files/Attention Is All You Need.pdf


{'query': 'How does attention mechanisms work? Explain in simple terms',
 'result': 'Attention mechanisms work by mapping a query and a set of key-value pairs to an output and then computing a weighted sum between them, where the query, keys, values, and output are all vectors.',
 'source_documents': [Document(metadata={'page': 2, 'source': '/content/pdf_files/Attention Is All You Need.pdf'}, page_content='3.2 Attention\nAn attention function can be described as mapping a query and a set of key-value pairs to an output,\nwhere the query, keys, values, and output are all vectors. The output is computed as a weighted sum\n3'),
  Document(metadata={'page': 12, 'source': '/content/pdf_files/Attention Is All You Need.pdf'}, page_content='Attention Visualizations\nInput-Input Layer5\nIt\nis\nin\nthis\nspirit\nthat\na\nmajority\nof\nAmerican\ngovernments\nhave\npassed\nnew\nlaws\nsince\n2009\nmaking\nthe\nregistration\nor\nvoting\nprocess\nmore\ndifficult\n.\n<EOS>\n<pad>\n<pad>\n<pad>\n<pad>

### LLM Reponese to URL of World Population from Wikipedia

In [None]:
query = "Which country has contributed the most to the world population?"
llm_response = qa_chain(query)
process_llm_response(llm_response)
llm_response

From the given information, it appears that India and China have contributed the most to the world population. These two countries together make up about 36% of the world's population.


Sources:
https://en.wikipedia.org/wiki/World_population
https://en.wikipedia.org/wiki/World_population
https://en.wikipedia.org/wiki/World_population
https://en.wikipedia.org/wiki/World_population


{'query': 'Which country has contributed the most to the world population?',
 'result': "From the given information, it appears that India and China have contributed the most to the world population. These two countries together make up about 36% of the world's population.",
 'source_documents': [Document(metadata={'source': 'https://en.wikipedia.org/wiki/World_population'}, page_content="Many countries in the developing world have experienced extremely rapid population growth since the early 20th century, due to economic development and improvements in public health. China's population rose from approximately 430 million in 1850 to 580 million in 1953,[51] and now stands at over 1.3 billion. The population of the Indian subcontinent, which was about 125 million in 1750, increased to 389 million in 1941;[52] today, India, Pakistan and Bangladesh are collectively home to about 1.63 billion people.[53] Java, an island in Indonesia, had about 5 million inhabitants in 1815; it had a popula

In [None]:
query = "Which country has the largest population of youth and what is the value in 2023?"
llm_response = qa_chain(query)
process_llm_response(llm_response)
llm_response

According to the data provided, India has the largest population of youth with an estimated 17.5% in 2023.


Sources:
https://en.wikipedia.org/wiki/World_population
https://en.wikipedia.org/wiki/World_population
https://en.wikipedia.org/wiki/World_population
https://en.wikipedia.org/wiki/World_population


{'query': 'Which country has the largest population of youth and what is the value in 2023?',
 'result': 'According to the data provided, India has the largest population of youth with an estimated 17.5% in 2023.',
 'source_documents': [Document(metadata={'source': 'https://en.wikipedia.org/wiki/World_population'}, page_content='Largest populations by country\n\nFurther information: List of countries and dependencies by population\n\nTen most populous countries\n\nCountry / Dependency Population % of world Date Source (official or from the United Nations) India 1,425,775,850 17.5% 14 Apr 2023 UN projection [ 89 ] China 1,409,670,000 17.3% 17 Jan 2024 National annual estimate [ 90 ] United States 337,370,476 4.14% 7 Jan 2025 National population clock [ 91 ] Indonesia 278,696,200 3.42% 1 Jul 2023 National annual estimate [ 92 ] Pakistan 229,488,994 2.82% 1 Jul 2022 UN projection [ 93 ] Nigeria 216,746,934 2.66% 1 Jul 2022 UN projection [ 93 ] Brazil 218,640,393 2.68% 7 Jan 2025 National 

In [None]:
query = "When was the oldest recorded world population and how much was it?"
llm_response = qa_chain(query)
process_llm_response(llm_response)
llm_response

The oldest recorded world population was 320 million in 1682.


Sources:
https://en.wikipedia.org/wiki/World_population
https://en.wikipedia.org/wiki/World_population
https://en.wikipedia.org/wiki/World_population
https://en.wikipedia.org/wiki/World_population


{'query': 'When was the oldest recorded world population and how much was it?',
 'result': 'The oldest recorded world population was 320 million in 1682.',
 'source_documents': [Document(metadata={'source': 'https://en.wikipedia.org/wiki/World_population'}, page_content='The median age of human beings as of 2020[update] is 31 years.[9]\n\nHistory\n\nFurther information: Estimates of historical world population and Human history\n\nEstimates of world population by their nature are an aspect of modernity, possible only since the Age of Discovery. Early estimates for the population of the world[10] date to the 17th century: William Petty, in 1682, estimated the world population at 320 million (current estimates ranging close to twice this number); by the late 18th century, estimates ranged close to one billion (consistent with current estimates).[11] More refined estimates, broken down by continents, were published in the first half of the 19th century, at 600 million to 1 billion in the 

In [None]:
query = "Which country in the world has the lowest population?"
llm_response = qa_chain(query)
process_llm_response(llm_response)
llm_response

The country in the world with the lowest population is Antarctica.


Sources:
https://en.wikipedia.org/wiki/World_population
https://en.wikipedia.org/wiki/World_population
https://en.wikipedia.org/wiki/World_population
https://en.wikipedia.org/wiki/World_population


{'query': 'Which country in the world has the lowest population?',
 'result': 'The country in the world with the lowest population is Antarctica.',
 'source_documents': [Document(metadata={'source': 'https://en.wikipedia.org/wiki/World_population'}, page_content="Population by region\n\nFurther information: Demographics of the world\n\nSix of the Earth's seven continents are permanently inhabited on a large scale. Asia is the most populous continent, with its 4.64 billion inhabitants accounting for 60% of the world population. The world's two most populated countries, India and China, together constitute about 36% of the world's population. Africa is the second most populated continent, with around 1.34 billion people, or 17% of the world's population. Europe's 747 million people make up 10% of the world's population as of 2020,while the Latin American and Caribbean regions are home to around 653 million (8%). Northern America, primarily consisting of the United States and Canada, has 