In [45]:
pip install -U langchain langchain-community langchain-chroma langchain-text-splitters chromadb pypdf

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [46]:
%pip install -U langchain-classic

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [60]:
import os
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_ollama import OllamaEmbeddings, ChatOllama
from langchain_chroma import Chroma
from langchain_classic.chains import create_retrieval_chain
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

# -------------------- CONFIG --------------------
PDF_FILE = "C:/Users/jaybr/Downloads/RAG.pdf"
DB_DIR = "./rag_db"
EMBED_MODEL = "nomic-embed-text"
LLM_MODEL = "llama3.2"

# -------------------- LOAD PDF --------------------
loader = PyPDFLoader(PDF_FILE)
documents = loader.load()

# -------------------- SPLIT TEXT --------------------
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1500,
    chunk_overlap=300,
    separators=["\n\n", "\n", ".", " "]
)
chunks = text_splitter.split_documents(documents)

# -------------------- VECTOR STORE --------------------
embeddings = OllamaEmbeddings(model=EMBED_MODEL)
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory=DB_DIR
)

# -------------------- RETRIEVER --------------------
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k":15})

# -------------------- LLM --------------------
llm = ChatOllama(model=LLM_MODEL, temperature=0)

# -------------------- PDF-ONLY PROMPT --------------------
prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are a PDF question-answering assistant.\n"
     "Answer ONLY using the provided context below.\n"
     "Do NOT use any external knowledge or assumptions.\n"
     "If the answer is not in the context, say exactly:\n"
     "'I could not find the answer in the provided document.'\n\n"
     "Context:\n{context}"
    ),
    ("human", "{input}")
])

# -------------------- RAG CHAIN --------------------
qa_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, qa_chain)

# -------------------- QUERY --------------------
query = "Explain the results section in the PDF"

response = rag_chain.invoke({"input": query})

print("\nü§ñ AI Answer (strictly PDF only):")
print(response["answer"])



ü§ñ AI Answer (strictly PDF only):
The Results section appears to be a table (Table 1) that presents the test scores for various models on two different datasets: Open-Domain QA and TQA-Wiki.

Here's a breakdown of the table:

**Columns:**

* Model: The name of the model being evaluated.
* NQ: The score for the Natural Question (NQ) dataset.
* TQA: The score for the TQA (Task-Oriented Question Answering) dataset.
* WQ: The score for the WikiQuestion (WQ) dataset.
* CT: The score for the Closed-Book Test (CT).

**Rows:**

Each row represents a different model, with its corresponding scores on each of the four datasets.

**Scores:**

The scores are presented as percentages, indicating the percentage of correct answers or performance achieved by each model on each dataset. For example, T5-11B has a score of 34.5% on the NQ dataset and 37.4% on the CT dataset.

**Models:**

Some notable models mentioned in the table include:

* T5-11B: A variant of the T5 model.
* T5-11B+SSM: Another var

In [61]:
#without llm
import os
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma

# ---------------- CONFIG ----------------
PDF_FILE = "C:/Users/jaybr/Downloads/RAG.pdf"
DB_DIR = "./rag_db"
EMBED_MODEL = "nomic-embed-text"

# ---------------- LOAD PDF ----------------
loader = PyPDFLoader(PDF_FILE)
documents = loader.load()

# ---------------- SPLIT TEXT ----------------
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=300,
    separators=["\n\n", "\n", ".", " "]
)
chunks = text_splitter.split_documents(documents)

print(f"‚úÖ Split PDF into {len(chunks)} chunks.")

# ---------------- VECTOR STORE ----------------
embeddings = OllamaEmbeddings(model=EMBED_MODEL)

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory=DB_DIR
)

# ---------------- RETRIEVE (NO LLM) ----------------
query = "Explain the results described in the PDF"

results = vectorstore.similarity_search(query, k=5)

print("\nüìö Retrieved chunks (NO LLM):\n")
for i, doc in enumerate(results, 1):
    print(f"--- Chunk {i} ---")
    print(doc.page_content)
    print("-" * 80)


‚úÖ Split PDF into 50 chunks.

üìö Retrieved chunks (NO LLM):

--- Chunk 1 ---
Table 2 shows that RAG-Token performs better than RAG-Sequence on Jeopardy question generation,
with both models outperforming BART on Q-BLEU-1. 4 shows human evaluation results, over 452
pairs of generations from BART and RAG-Token. Evaluators indicated that BART was more factual
than RAG in only 7.1% of cases, while RAG was more factual in 42.7% of cases, and both RAG and
BART were factual in a further 17% of cases, clearly demonstrating the effectiveness of RAG on
--------------------------------------------------------------------------------
--- Chunk 2 ---
Table 2 shows that RAG-Token performs better than RAG-Sequence on Jeopardy question generation,
with both models outperforming BART on Q-BLEU-1. 4 shows human evaluation results, over 452
pairs of generations from BART and RAG-Token. Evaluators indicated that BART was more factual
than RAG in only 7.1% of cases, while RAG was more factual in 42.7% o