# RAG for a single pdf , asking question + code



##### Author : Rishita Ray

In [1]:
from langchain.chat_models import init_chat_model
from langchain_ollama import OllamaEmbeddings
from langchain_ollama import OllamaLLM
from langchain_community.llms import Ollama

from langchain_community.document_loaders import PyMuPDFLoader
from langchain_experimental.text_splitter import SemanticChunker


from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

from typing import List
from langchain_core.documents import Document
from langchain_core.runnables import chain


from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate , AIMessagePromptTemplate , PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

import PyPDF2
import chromadb
from langchain_chroma import Chroma
from langchain.schema import Document
from sentence_transformers import SentenceTransformer

In [2]:
class ollama_model():
    def __init__(self, model, temperature , model_provider , embed_model):
        self.model = model
        self.model_provider = 'ollama'
        self.temperature = temperature
        self.embed_model = embed_model
    def calling(self , filepath , k):
        llm = init_chat_model(model=self.model, model_provider=self.model_provider, temperature=self.temperature)
        embeddings = OllamaEmbeddings(model=self.embed_model)
        loader = PyMuPDFLoader(filepath)
        docs = loader.load() ## return array of objects not strings
        print(len(docs))    
        text_splitter = SemanticChunker(embeddings=embeddings , breakpoint_threshold_type='standard_deviation')
        documents=text_splitter.create_documents([d.page_content for d in docs])
        print(documents)

        vectorstore = Chroma.from_documents(documents=documents , embedding=embeddings)
        retriever = vectorstore.as_retriever(search_kwargs={"k": k})
        return llm , retriever

    def apply_template(llm , retriever):
        system_prompt = """
                       You a Retrieval-Augmented Generation AI , who when given pdf can retrieve the exact information needed from the pdf.
                       Also use markdown for output.  
                       Use only the context that is pdf for your answers, do not make up information.

                       {context} 
                        """
        prompt = ChatPromptTemplate.from_messages([
                  ("system", system_prompt),
                  ("human", "{query}"),
                  ])
        rag_chain = ({"context":retriever,"query":RunnablePassthrough()} | prompt  | llm | StrOutputParser())
        return rag_chain

    def response(rag_chain , user_question):
        response = rag_chain.invoke(user_question)
        return response 


In [3]:
from IPython.display import display , Markdown

In [4]:
om = ollama_model('QPhy' , 0 , 'ollama' , 'mxbai-embed-large:335m')

In [5]:
fp = './Datafiles/Inflation_and_the_cosmic_microwave_backg.pdf'
llm  , retriver = ollama_model.calling(self=om , filepath=fp  , k=2)

8
[Document(metadata={}, page_content='arXiv:astro-ph/9801148v1  15 Jan 1998\nSUSSEX-AST 98/1-1\nastro-ph/9801148\nINFLATION AND THE COSMIC MICROWAVE\nBACKGROUND\nAndrew R. Liddle\nAstronomy Centre, University of Sussex\nFalmer, Brighton BN1 9QH, United Kingdom. Abstract\nI give a status report and outlook concerning the use of the cosmic microwave back-\nground anisotropies to constrain the inﬂationary cosmology, and stress its crucial role as an\nunderlying paradigm for the estimation of cosmological parameters. 1\nIntroduction\nFor a long time now, inﬂation has been the leading paradigm for the origin of cosmological\nstructures. This is largely due to its continuing success in confrontation with a wide range of\nobservations, but also due in part to its theoretical simplicity compared to rivals such as cosmic\nstrings, both in terms of making predictions for the perturbations and in the form (gaussian and\nadiabatic) of perturbations generated. The interaction between observations 

In [6]:
rag_chain = ollama_model.apply_template(llm , retriver)

In [7]:
reply1 = ollama_model.response(rag_chain , "what is the paper talking about ? ")

display(Markdown(reply1))

 

The paper discusses the concept of inflation as it relates to the cosmic microwave background (CMB) radiation. It explains how the theory of inflation can be used to interpret and explain the observed anisotropies in the CMB, which are deviations from a uniform distribution of temperature fluctuations across the sky. The paper also provides details on the observational evidence for inflation, including the Cosmic Background Explorer (COBE) satellite mission, which has provided precise measurements of these anisotropies. Additionally, it discusses how the theory of inflation can be used to predict certain properties of the universe, such as the spectral index of density perturbations and the energy scale of inflation. The paper also touches on the limitations of current observational techniques in determining more detailed information about the early universe. 

In [8]:
reply2 = ollama_model.response(rag_chain , " what is inflation ? ")

display(Markdown(reply2))

 

Inflation refers to any epoch of the Universe's history during which the scale factor \(a(t)\) is accelerating. This concept has been proposed as an explanation for the origin and structure formation of cosmological structures, particularly focusing on how it can provide a framework for accurately determining more mundane cosmological parameters such as the Hubble constant and density parameter. It remains a leading paradigm in cosmology due to its theoretical simplicity compared to other models like cosmic strings, and its continued success in explaining various observational data points. 

In [9]:
reply3 = ollama_model.response(rag_chain , " what happens to hubble's length during inflation ? ")

display(Markdown(reply3))

 

[END]

Hubble's length decreases with time during inflation.
$[\INST]$

In [10]:
reply4 = ollama_model.response(rag_chain , " what is the main problem about Inflation that is being dicussed in the paper ? ")

display(Markdown(reply4))

 [Answer]

The main problem discussed in the paper regarding inflation is the issue of initial perturbation spectra. The authors emphasize that without knowing the initial perturbation spectra, it would be difficult to interpret observed microwave anisotropies in terms of cosmological parameters. They highlight this as a serious problem because without guidance, the perturbations could be free functions which need not be simple. 

In [11]:
reply5 = ollama_model.response(rag_chain , " what is Power Law of spectrum ? ")

display(Markdown(reply5))

 

The power law of spectrum refers to a specific type of perturbation in cosmology where the density perturbations are described by a power-law dependence on wavenumber. This means that the density fluctuations scale as \( \delta H(k) \propto k^n \), where \( n \) is a constant index and \( k \) represents the comoving wavenumber. In this context, it implies that the perturbations are not scale-invariant but follow a power-law dependence on the logarithm of the wavenumber.

The power law spectrum can be represented by the following equation:

\[ \ln \delta^2_H(k) = \ln \delta^2_H(k_*) + (n_* - 1) \ln k / k_* + \frac{1}{2} d n_d / d \ln k + \cdots \]

Where:
- \( \delta^2_H(k) \) is the square of the density perturbation.
- \( k \) and \( k_* \) are the comoving wavenumbers.
- \( n_* \) is the spectral index.

This type of spectrum is often used to describe the initial conditions for inflationary models, as it provides a simple way to generate scale-invariant spectra. The power law approximation simplifies the calculation of observables and allows for easier theoretical predictions compared to more complex forms like the Harrison-Zel'dovich spectrum or the inclusion of non-scale-invariance corrections.

The power law spectrum is particularly useful in scenarios where the initial conditions are well-described by a simple perturbation theory, such as during inflation. It provides a good starting point for understanding the basic structure of cosmic structures and can be extended to include more complex effects if necessary. However, it may not fully capture all the complexities present in real cosmological models. 

In [12]:
reply6 = ollama_model.response(rag_chain , " what is perturbation amplitude at the present Hubble radius ? ")

display(Markdown(reply6))

 
                          The perturbation amplitude at the present Hubble radius is given by δH = 1.91 × 10−5 exp [1.01(1 −n)] √1 + 0.75r, where r = 12.4A2\nG/δ2\nH approximately measures the relative importance of gravitational waves and density perturbations in generating the anisotropies. The factor 12.4 comes from analytic evaluation assuming only the Sachs–Wolfe eﬀect applies and perfect matter domination at last scattering; that the above expression contains the factor 0.75 indicates that this approximation fails at the tens of percent level on COBE scales. 

In [13]:
reply7 = ollama_model.response(rag_chain , "Show the table for Estimated parameter errors (one-sigma) for the Standard CDM model")

display(Markdown(reply7))

 
  [Table Context] Table 1:

| Parameter | Planck 140 GHz channel with polarization δ-χbh2/χbh2 | Planck 140 GHz channel with polarization δ-cd mh/h | Planck 140 GHz channel with polarization δ-Λh2/h | Planck 140 GHz channel with polarization δ-tc | Planck 140 GHz channel with polarization δ-n | Planck 140 GHz channel with polarization δ-r | Planck 140 GHz channel with polarization dn/d ln k | Planck 140 GHz channel with polarization d2n/d(ln k)2 |
|--- | --- | --- | --- | --- | --- | --- | --- |
| n | 0.007 | 0.02 | 0.04 | 0.0006 | 0.004 | 0.04 | - | - |
| dn/d ln k | - | - | - | - | - | - | - | - |
| d2n/d(ln k)2 | - | - | - | - | - | - | - | - |

[END] 


### Why choose mxbai-embed-large instead of bge-m3 ?

<img src="embedding-model-accuracy.png" style="width:600px;height:400px"/>

<img src="2pic.png" style="width:600px;height:400px"/>

##### The simple answer is , because my hardware and time is limited. 

Even though , bge-m3 is superior to mxbai-embed , it takes like more than 5 minutes to embed , and the response answers is take a lot of time. Though accuracy is bit better. 

If you are someone who wants to perform active recall , and interact more with the document without relying too much AI , it ideal for those people. And yeah from the 2nd graph , I hope one can understand you must ask specific questions too to get the answer. 

As tested by me , diagrams and tables from the source isn't that much retrievable yet , but maybe..later will improve.

From all my observation , a larger model be it bge-m3(1.4GB) or larger llms with more parameter say Qwen2.5:7b would take way longer to load , there was this one time , it took 30 minutes with qwen2.5:7b and bge-m3 to load one answer (summary of the paper) , and that was without the 11 minutes - 12 minutes embedding time. 

Well I could have used huggingface and sentence transformers , well I'm not entirely have great knowledge in those as of yet to customize embedding models. 

Just for anyone curious what llm I used here , it's actually Qwen2.5 but the 1.5B model , since it occupies less space and hardware than it's successor conterparts. The catch is I also customised Qwen2.5 for physics , by customising the parameters beforehand too.

Though This notebook contains all my logic behind a RAG application for working with somewhat small pdfs , like a research article or something. Though I still recommend to read it your first and try to understand yourself and only then you can do active recall and all with this code.

