In [1]:
import warnings
warnings.filterwarnings('ignore', category=DeprecationWarning)

In [2]:
import os
from dotenv import load_dotenv

# Loading environment variables from .env file
load_dotenv()

# Fetching the API key
gemini_api_key = os.getenv("GEMINI_API_KEY")

# Verifing the key loaded
print("API Key loaded successfully!" if gemini_api_key else "API Key not found!")

API Key loaded successfully!


In [3]:
from langchain_google_genai import ChatGoogleGenerativeAI

# Initializing Gemini
geminillm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash", 
    google_api_key=gemini_api_key,
    temperature=0.7
)

print("Gemini LLM initialized!")

  from .autonotebook import tqdm as notebook_tqdm


Gemini LLM initialized!


In [4]:
from langchain_community.document_loaders import PyPDFLoader

pdf_path = "testPaper.pdf" 
loader = PyPDFLoader(pdf_path)

pages = loader.load_and_split()

print(f"Loaded {len(pages)} pages\n")
print(f"First page content preview:\n{pages[0].page_content[:500]}...")

Loaded 28 pages

First page content preview:
Retrieval-Augmented Generation for
Knowledge-Intensive NLP Tasks
Patrick Lewis†‡, Ethan Perez⋆,
Aleksandra Piktus†, Fabio Petroni†, Vladimir Karpukhin†, Naman Goyal†, Heinrich Küttler†,
Mike Lewis†, Wen-tau Yih†, Tim Rocktäschel†‡, Sebastian Riedel†‡, Douwe Kiela†
†Facebook AI Research;‡University College London;⋆New York University;
plewis@fb.com
Abstract
Large pre-trained language models have been shown to store factual knowledge
in their parameters, and achieve state-of-the-art results when ﬁ...


In [5]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Creating a text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,        # chars per chunk
    chunk_overlap=400,      # overlap between chunks
    length_function=len,
)

chunks = text_splitter.split_documents(pages)
print(f"total chunks - {len(chunks)}")

total chunks - 116


In [6]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma

# Initializing embeddings model
HuggingFaceembeddingsModel = HuggingFaceEmbeddings(
       model_name="all-MiniLM-L6-v2"
   )

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=HuggingFaceembeddingsModel,
    persist_directory="./chroma_db"
)

print("embeddings created")

  HuggingFaceembeddingsModel = HuggingFaceEmbeddings(


embeddings created


In [7]:
print(f"Number of embeddings created: {vectorstore._collection.count()}")

Number of embeddings created: 116


In [8]:
# Testing embeddings
print(chunks[0].page_content[:200])
sample_embedding = HuggingFaceembeddingsModel.embed_query(chunks[0].page_content)

print(f"\nEmbedding dimension: {len(sample_embedding)}")
print(f"First 5 values: {sample_embedding[:5]}")

Retrieval-Augmented Generation for
Knowledge-Intensive NLP Tasks
Patrick Lewis†‡, Ethan Perez⋆,
Aleksandra Piktus†, Fabio Petroni†, Vladimir Karpukhin†, Naman Goyal†, Heinrich Küttler†,
Mike Lewis†, W

Embedding dimension: 384
First 5 values: [-0.06688090413808823, -0.03467119485139847, -0.026010597124695778, 0.08180363476276398, 0.01612071320414543]


In [9]:
# Test similarity search
query = "What is this paper about?"

# Retrieve top 5 most relevant chunks
relevant_chunks = vectorstore.similarity_search(query, k=5)

print(f"Query: {query}\n")
print("=" * 80)

for i, chunk in enumerate(relevant_chunks, 1):
    print(f"\nChunk {i}:")
    print(chunk.page_content[:300])
    print("...")

Query: What is this paper about?


Chunk 1:
speciﬁc by a large margin. Table 3 shows typical generations from each model.
Jeopardy questions often contain two separate pieces of information, and RAG-Token may perform
best because it can generate responses that combine content from several documents. Figure 2 shows
an example. When generating 
...

Chunk 2:
RAG-S This 14th century work is divided into 3 sections: "Inferno", "Purgatorio" & "Paradiso"
For 2-way classiﬁcation, we compare against Thorne and Vlachos [57], who train RoBERTa [35]
to classify the claim as true or false given the gold evidence sentence. RAG achieves an accuracy
within 2.7% of t
...

Chunk 3:
Processing Systems 32, pages 3261–3275. Curran Associates, Inc., 2019. URL https://
arxiv.org/abs/1905.00537.
[62] Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei Zhang, Shiyu Chang,
Gerry Tesauro, Bowen Zhou, and Jing Jiang. R3: Reinforced ranker-reader for open-domain
question an
...

Chunk 4:
2020. URL h

In [10]:
# Helper function to join documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [11]:
from langchain_classic import hub
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

prompt = hub.pull("rlm/rag-prompt")

retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | geminillm
    | StrOutputParser()
)

In [12]:
rag_chain.invoke("When was the research paper published?")

'The research paper was published in 2020. Its URL is https://arxiv.org/abs/2004.07159.'

In [13]:
rag_chain.invoke("Who are the authors of this paper?")

'I apologize, but the provided context does not contain information about the authors of this specific paper. It lists authors for several referenced papers, but not for the paper itself.'

In [14]:
rag_chain.invoke("What are the key finds of this paper?")

'RAG-Token performs well on Jeopardy questions by combining content from multiple documents. The paper finds that after the first token of a title is generated, the document posterior flattens, suggesting the generator can complete titles using its parametric knowledge without depending on specific documents. This hypothesis is supported by evidence where a BART-only baseline successfully completes titles from partial decoding.'

In [15]:
rag_chain.invoke("What is this paper about?")

"This paper discusses RAG (Retrieval-Augmented Generation) models, evaluating their performance in tasks like Jeopardy question generation and fact verification. It compares RAG models against BART, highlighting RAG's ability to generate more factual and specific responses. The paper also explores how these models leverage both parametric knowledge and non-parametric memory for completing generations."

In [16]:
rag_chain.invoke("What is the abstract of this paper?")

'I am sorry, but the provided context does not contain the abstract of the paper. It includes discussions about model performance, generation diversity, specific examples, and references, but not a summary of the entire paper.'

In [17]:
# Test what chunks are being retrieved
query = "What is the main contribution of this paper?"

# See what the retriever finds
retrieved_docs = retriever.invoke(query)

print(f"Query: {query}\n")
print("=" * 80)

for i, doc in enumerate(retrieved_docs, 1):
    print(f"\nRetrieved Chunk {i}:")
    print(doc.page_content[:500])
    print("\n" + "-" * 80)

Query: What is the main contribution of this paper?


Retrieved Chunk 1:
speciﬁc by a large margin. Table 3 shows typical generations from each model.
Jeopardy questions often contain two separate pieces of information, and RAG-Token may perform
best because it can generate responses that combine content from several documents. Figure 2 shows
an example. When generating “Sun”, the posterior is high for document 2 which mentions “The
Sun Also Rises”. Similarly, document 1 dominates the posterior when “A Farewell to Arms” is
generated. Intriguingly, after the ﬁrst toke

--------------------------------------------------------------------------------

Retrieved Chunk 2:
2020. URL https://arxiv.org/abs/2004.07159.
[5] Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. Reading Wikipedia to Answer
Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), pages 1870–1879, Vancouver, Canada,
July 2017. Asso

In [18]:
print("First page content:")
print(pages[0].page_content[:1000])

First page content:
Retrieval-Augmented Generation for
Knowledge-Intensive NLP Tasks
Patrick Lewis†‡, Ethan Perez⋆,
Aleksandra Piktus†, Fabio Petroni†, Vladimir Karpukhin†, Naman Goyal†, Heinrich Küttler†,
Mike Lewis†, Wen-tau Yih†, Tim Rocktäschel†‡, Sebastian Riedel†‡, Douwe Kiela†
†Facebook AI Research;‡University College London;⋆New York University;
plewis@fb.com
Abstract
Large pre-trained language models have been shown to store factual knowledge
in their parameters, and achieve state-of-the-art results when ﬁne-tuned on down-
stream NLP tasks. However, their ability to access and precisely manipulate knowl-
edge is still limited, and hence on knowledge-intensive tasks, their performance
lags behind task-speciﬁc architectures. Additionally, providing provenance for their
decisions and updating their world knowledge remain open research problems. Pre-
trained models with a differentiable access mechanism to explicit non-parametric
memory have so far been only investigated for extra

In [19]:
for i in range(5):
    print(f"\n{'='*80}")
    print(f"Chunk {i}:")
    print(f"Length: {len(chunks[i].page_content)} characters")
    print(f"Content preview:\n{chunks[i].page_content[:300]}")


Chunk 0:
Length: 928 characters
Content preview:
Retrieval-Augmented Generation for
Knowledge-Intensive NLP Tasks
Patrick Lewis†‡, Ethan Perez⋆,
Aleksandra Piktus†, Fabio Petroni†, Vladimir Karpukhin†, Naman Goyal†, Heinrich Küttler†,
Mike Lewis†, Wen-tau Yih†, Tim Rocktäschel†‡, Sebastian Riedel†‡, Douwe Kiela†
†Facebook AI Research;‡University C

Chunk 1:
Length: 955 characters
Content preview:
edge is still limited, and hence on knowledge-intensive tasks, their performance
lags behind task-speciﬁc architectures. Additionally, providing provenance for their
decisions and updating their world knowledge remain open research problems. Pre-
trained models with a differentiable access mechanism

Chunk 2:
Length: 923 characters
Content preview:
ory for language generation. We introduce RAG models where the parametric
memory is a pre-trained seq2seq model and the non-parametric memory is a dense
vector index of Wikipedia, accessed with a pre-trained neural retriever. We com-
pare two RAG f