### RAG with PDF 📄 Data extraction to give context to LLM 🧠

In [48]:
%pip install pypdf

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [49]:
from dotenv import load_dotenv
load = load_dotenv('./../.env')


In [50]:
from langchain_ollama import ChatOllama

llm = ChatOllama(
    base_url="http://localhost:11434",
    model = "qwen2.5:3b",
    temperature=0.5,
    max_tokens = 250
)

### 1. Extracting the PDF files

In [51]:
from langchain_community.document_loaders import PyPDFLoader

pdf1 = "./attention.pdf"
pdf2 = "./LLMForgetting.pdf"
pdf3 = "./TestingAndEvaluatingLLM.pdf"

pdfFiles = [pdf1, pdf2, pdf3]

documents = []

for pdf in pdfFiles:
    loader = PyPDFLoader(pdf)
    documents.extend(loader.load())

print(len(documents))

253


In [52]:
print(documents[:2])

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'author': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': './attention.pdf', 'total_pages': 15, 'page': 0, 'page_label': '1'}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser∗\nGoogle Brain\nlukas

### 2. Text Splitting into Chunks

In [53]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, 
                                               chunk_overlap=200, add_start_index=True)

all_splits = text_splitter.split_documents(documents)

len(all_splits)

640

### 3. Embedding

In [54]:
from langchain_ollama import OllamaEmbeddings

embeddings = OllamaEmbeddings(model="llama3.2:3b")

vector_1 = embeddings.embed_query(all_splits[0].page_content)
vector_2 = embeddings.embed_query(all_splits[1].page_content)

assert len(vector_1) == len(vector_2)
len(vector_1), len(vector_1)

(3072, 3072)

### 4. Vector Stores

In [55]:
#%pip install -qU langchain-chroma

In [56]:
from langchain_chroma import Chroma

vector_store = Chroma.from_documents(
    documents = all_splits,
    embedding=embeddings,
    persist_directory="./chroma_langchain_db",  # Where to save data locally, remove if not necessary
)

### 5. Retriving from the Persistant Vector Datastore

In [57]:
from langchain_chroma import Chroma


vector_store = Chroma(persist_directory='./chroma_langchain_db', embedding_function=embeddings)

result = vector_store.similarity_search("What is Bias testing", k=3)

for doc in result:
    print(doc.page_content)

60 CHAPTER 4. LOGICAL REASONING CORRECTNESS
the following challenges: 1) If an LLM concludes correctly, it is unclear
whether the response stems from reasoning or merely relies on simple
heuristics such as memorization or word correlations (e.g., “dry floor”
is more likely to correlate with “playing football”). 2) If an LLM
fails to reason correctly, it is not clear which part of the reasoning
process it failed (i.e., inferring not raining from floor being dry or
inferring playing football from not raining). 3) There is a lack of
a system that can organize such test cases to cover all other formal
reasoning scenarios besides implication, such as logical equivalence
(e.g., If A then B, if B then A; therefore, A if and only if B). 4)
Furthermore, understanding an LLM’s performance on such test cases
provides little guidance on improving the reasoning ability of the
LLM. To better handle these challenges, a well-performing testing
60 CHAPTER 4. LOGICAL REASONING CORRECTNESS
the following 

In [58]:
result = vector_store.similarity_search_with_score("What are the types of LLM Testing")

result[0]

(Document(id='46fe7072-2837-4ea5-94ea-0a3342a41109', metadata={'author': '', 'creationdate': '2025-01-07T01:36:50+00:00', 'creator': 'LaTeX with hyperref', 'keywords': '', 'moddate': '2025-01-07T01:36:50+00:00', 'page': 9, 'page_label': '10', 'producer': 'pdfTeX-1.40.25', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'source': './LLMForgetting.pdf', 'start_index': 0, 'subject': '', 'title': '', 'total_pages': 15, 'trapped': '/False'}, page_content='Under review\nFigure 6: The performance of general knowledge of the BLOOMZ-7.1b and LLAMA-7b\nmodel trained on the instruction data and the mixed data. The dashed lines refers to the\nperformance of BLOOMZ-7.1b and LLAMA-7B and the solid ones refer to those of mixed-\ninstruction trained models.\nincreases to 3b, BLOOMZ-3b suffers less forgetting compared to mT0-3.7B. For example, the\nFG value of BLOOMZ-3b is 11.09 which is 5.64 lower than that of mT0-3.7b. These results\nsugges

### 6. Retrivers in Langchain

In [59]:
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs = {"k": 1}
)

retriever.batch(
    [
        "What is the Bias Measurement",
        "How to test human safety against LLM",
        "How LLM forgets the context"
    ]
)


[[Document(id='79dfb367-7668-4d52-8327-45d2989d0bbf', metadata={'author': '', 'creationdate': '2024-10-22T16:30:56+00:00', 'creator': 'LaTeX with hyperref', 'keywords': '', 'moddate': '2024-10-22T16:30:56+00:00', 'page': 44, 'page_label': '36', 'producer': 'pdfTeX-1.40.26', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.26 (TeX Live 2024) kpathsea version 6.4.0', 'source': './T2320074_Thesis Report.pdf', 'start_index': 0, 'subject': '', 'title': '', 'total_pages': 78, 'trapped': '/False'}, page_content='suggests two distinct design approaches. This data highlights a clear hierarchy of\nroom importance and reveals how different spaces are prioritized in home designs,\nwith some rooms being optional in many layouts.\nThe final annotated graph data, selected chronologically due to resource constraints,\nrepresents a more focused version of the initial filtered graphs, though some informa-\ntionfromtheinitialdatasetmayhavebeenmissed. Whilethefinaldatasetmaintains\nthe cen

### Document Retrival Manually

In [60]:
from langchain_core.output_parsers import StrOutputParser
from langchain.prompts import ChatPromptTemplate

query = "What exactly does Testing the Factual Correctness of LLM tells"

# To test if model is answering without context.
#query = "can burgers be made better with llms"

retrieved_docs = retriever.get_relevant_documents(query)

context_text = "\n\n".join([doc.page_content for doc in retrieved_docs])

prompt_template = ChatPromptTemplate.from_template(
    """
    You are an AI Assisant. Use the following context to answer the question correctly.
    If you dont know the answer, just tell, I dont know.
    
    Also, summarize the response in MD format
    
    "context: {context} \n\n"
    "question: {question} \n\n"
    "AI answer:
    
    """
)

chain = prompt_template | llm | StrOutputParser()

response = chain.invoke({"context": context_text, "question": query})

print(response)

I don't know.

The provided context discusses reproducibility and comparability issues in evaluating Graph Neural Networks (GNN) models across various datasets. It mentions a study by Errica et al. (2020) where they re-evaluated five state-of-the-art GNN models on nine different benchmark data-sets from chemical and social domains. The key points include structure-agnostic baselines that outperform many traditional GNN models on specific types of datasets, such as D & D for chemical domains.

The question "What exactly does Testing the Factual Correctness of LLM tells?" is not addressed in the provided context. Therefore, I don't have enough information to answer this question based on the given text.


### Using Langchain Hub for Prompt

In [61]:
from langchain_core.output_parsers import StrOutputParser
from langchain import hub

query = "How to test Translation in LLM?"

retrieved_docs = retriever.get_relevant_documents(query)

context_text = "\n\n".join([doc.page_content for doc in retrieved_docs])

prompt = hub.pull("rlm/rag-prompt")

chain = prompt | llm | StrOutputParser()

response = chain.invoke({"context": context_text, "question": query})

print(response)

To test translation in an LLM (Large Language Model), you can add a specific instruction at the beginning of the prompt, such as "Answering starts with 'Yes' or 'No'". This helps guide the model on what type of response to produce. You then translate the expected response into English using Google Translate and feed both the translated prompt and the response back into the LLM for evaluation.


### Retrieving data using RetrievalQA

In [62]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever, return_source_documents=True)

question = "What exactly does Testing the Factual Correctness of LLM tells"

# To test if model is answering without context.
#question = "can burgers be made better with llms"


response = qa_chain.invoke(question)

sources = set(doc.metadata.get("source", "Unknown") for doc in response["source_documents"])

print(response['result'])
print("\n📕 Sources Used:")
for source in sources:
    print(f"- {source}")

I don't know. The context provided does not contain any information about testing the factual correctness of an Large Language Model (LLM).

📕 Sources Used:
- ./T2320074_Thesis Report.pdf
