## RAG Prototype for the Life of QUAID E AZAM

This is PDF Loader and convert those into chunks

In [1]:
from typing import List, Dict, Optional
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document

class SmartPDFProcessor:
    """Advanced PDF processing with error handling + custom metadata injection"""

    def __init__(self, chunk_size: int = 1000, chunk_overlap: int = 200):
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=chunk_size,
            chunk_overlap=chunk_overlap,
            separators=[" "],
        )

    def process_pdf(
        self, 
        pdf_path: str, 
        custom_metadata: Optional[Dict[str, str]] = None
    ) -> List[Document]:
        """Process PDF with smart chunking and metadata enhancement"""

        # Load PDF
        loader = PyPDFLoader(pdf_path)
        pages = loader.load()

        processed_chunks = []

        for page_num, page in enumerate(pages):
            cleaned_text = self._clean_text(page.page_content)

            # Skip nearly empty pages
            if len(cleaned_text.strip()) < 50:
                continue

            # Step 1: Page details FIRST
            metadata = {
                "page": page_num + 1,
                "total_pages": len(pages),
                "chunk_method": "smart_pdf_processor",
                "char_count": len(cleaned_text),
            }

            # Step 2: Add PDF metadata (if any)
            if page.metadata:
                metadata.update(page.metadata)

            # Step 3: Add custom metadata (overrides everything if same key)
            if custom_metadata:
                metadata.update(custom_metadata)

            # Create chunks
            chunks = self.text_splitter.create_documents(
                texts=[cleaned_text],
                metadatas=[metadata]
            )

            processed_chunks.extend(chunks)

        return processed_chunks

    def _clean_text(self, text: str) -> str:
    # Remove excessive whitespace
        text = " ".join(text.split())
    
    # Fix common PDF extraction issues
        text = text.replace("ﬁ", "fi")
        text = text.replace("ﬂ", "fl")
    
        return text

In [2]:
processer=SmartPDFProcessor()
Processed_chunks_1=processer.process_pdf('/home/hammadali08/Personal/FYP Datasets/My Brother by Fatima Jinnah.pdf')
Processed_chunks_2=processer.process_pdf('/home/hammadali08/Personal/FYP Datasets/Jinnah of Pakistan.pdf')
Processed_chunks_3=processer.process_pdf('/home/hammadali08/Personal/FYP Datasets/First phase of struggle.pdf')

In [3]:
print(Processed_chunks_1[234].page_content)
print(Processed_chunks_1[234].metadata)

now firmly in his grip. His exemplary conduct as a Magistrate won him praise from his superiors, and when the period of the temporary appointment that he was holding was over, Sir Charles Ollivant offered him another and better judicial appointment on Rs. 1,500 a month, a princely salary then. "No, thank you, Sir", he replied. "I will soon be able to earn that much in a single day", was his firm retort. As soon as he resigned his post as acting Presidency Magistrate, he was approached by a number of people to act as their lawyer. He gave up his small room in the Apollo Hotel and took a modest apartment in the Apollo Bunder area, got it tastefully decorated and furnished, and opened a new office in a building, where some leading lawyers had their offices. He spared no money within his limited income, in converting his office into an elegant and attractive Chamber, which any lawyer would be proud to own. His feet were now set firmly on the ladder of success, and he sent letters and
{'pag

In [4]:
print(Processed_chunks_2[234].page_content)
print(Processed_chunks_2[234].metadata)

politically frustrating, the future never seemed as promising to both of them as it did that winter at the start of 1919. The Muslim League had appointed Jinnah to lead a deputation to Prime Minister Lloyd George that year to plead for at least one Muslim delegate to the forthcoming Paris Peace Conference. Most Indian Muslims felt, as League President A. K. Fazlul Haq put it, that "Muslim countries are now the prey of the land-grabbing propensities of the Christian nations, in spite of the solemn pledges given by these very nations that the World War was being fought for the protection of the rights of the small and defenseless minorities." 129 Sir Satyendra P. Sinha and the Maharaja of Bikaner (1880- 1943) had been appointed to represent India at the Imperial War Conference in 1917, but since neither was Muslim, the League feared that Islamic interests were being shortchanged or ignored. With the Ali brothers and other popular Khilafat leaders, including Delhi's scholarly devout
{'pag

In [5]:
print(Processed_chunks_3[234].page_content)
print(Processed_chunks_3[234].metadata)

Saya Babaji on 25 March 1896. The accused was defended by Rustom Wadia. After allegedly murdering Saya Babaji with a "prolonged" instrument belonging to a female flower-seller, the accused had surrendered to the police. The police, considering him a lunatic, sent him to the Colaba 283 Ibid. 284 Ibid. 285 BG, 15 Feb. 1901.
{'page': 63, 'total_pages': 225, 'chunk_method': 'smart_pdf_processor', 'char_count': 2715, 'producer': 'Nuance PDF Create 8', 'creator': 'Microsoft Word - New.docx', 'creationdate': '2020-05-20T01:40:17-07:00', 'author': 'Riaz Ahmad', 'dateofpublication': '1985', 'digitizedby': 'M. H. Panhwar Institute of Sindh Studies, Jamshoro.', 'keywords': '', 'language': 'English', 'moddate': '2023-12-18T12:36:54+05:00', 'publishedby': 'Riaz Ahmad, Islamabad.', 'reproducedby': 'Sani Hussain Panhwar', 'scannedby': 'Aziz Khan', 'title': 'Quaid-i-Azam Mohammad Ali Jinnah: First Phase of his Freedom Struggle - The Formative Years (1892-1920)', 'website': 'lib.sindh.org', 'source': '

### Sentence Transformer by Huggingface

In [6]:
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
embeddings

  embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
  from .autonotebook import tqdm as notebook_tqdm


HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='sentence-transformers/all-MiniLM-L6-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

### Storing Vectors in ChromaDB

In [7]:
from langchain_community.vectorstores import Chroma
persist_directory = 'ChromaDB'
vectordb = Chroma.from_documents(documents=Processed_chunks_1, 
                                 embedding=embeddings,
                                 persist_directory=persist_directory)
vectordb.persist()

  vectordb.persist()


In [8]:
## Adding other processed Chunks
vectordb.add_documents(documents=Processed_chunks_2)
vectordb.add_documents(documents=Processed_chunks_3)

['cbed18dc-8429-4da1-ab93-5ecd0c6d71b9',
 '13438e96-d60e-4118-b073-13771b68d874',
 '92c3ee22-f1b3-4f64-8712-788c58829a1f',
 '301c428e-9d44-4810-bd77-51b4900fc529',
 '2ffd048f-770f-4be8-bfe7-5fb975d8e13a',
 '5588565a-d040-4e75-908e-d95ffe3d1283',
 'b8830af2-d278-4c8a-bbed-9c7ab8936f76',
 '42366b53-b530-4aa2-9d4e-82a3a5dd648c',
 '3156c9a6-ab1c-4b0a-8dfe-5187cccb4f4e',
 'a67d28c2-01e8-49d8-b42e-9a7cf9938c5a',
 'ee1f6e8c-fe04-4e40-8221-91cbc547781e',
 '0d4ac406-8aec-4c0a-8b17-1423123e0a3b',
 '5ccd1dcb-d4e8-4053-99fa-b4c0831cefc1',
 '8b7aceca-b88c-416e-8abe-5d7916c1b5aa',
 '7065bb89-91c5-4b75-89cb-ddf39f45327e',
 '0ce977e0-4374-4778-b364-b0e9bc87657f',
 'fd64a187-7be7-498a-a3ca-e59a42422e91',
 '4857b49b-798b-4ba1-8d51-0c0e020e86ea',
 '839082a6-7b38-4247-9cfc-4a271f001672',
 'ac10ecd1-91fe-4d23-9744-0fb7be5e49a6',
 'c00c481c-9e40-4ab1-9971-af90d4b6cb89',
 'e1ed7a51-703f-4c75-bf7f-4af38d9243e6',
 'e965d9e2-30d6-4906-9cdd-aabc4c612fa5',
 '67da4b90-911d-4cee-a6da-d5b13a1703c7',
 '464dad96-55c3-

### Testing working with some simillarity searches

In [9]:
vectordb.similarity_search('When Quaid e Azam dies?',k=5)

[Document(metadata={'reproducedby': 'Sani Hussain Panhwar', 'publishedby': 'Sharif al Mujahid, Karachi.', 'website': 'lib.sindh.org', 'digitizedby': 'M. H. Panhwar Institute of Sindh Studies, Jamshoro.', 'creator': 'Microsoft Word - Cover.docx', 'title': 'My Brother', 'page': 14, 'creationdate': '2020-05-20T01:40:17-07:00', 'author': 'Fatima Jinnah', 'page_label': '15', 'dateofpublication': '1986', 'moddate': '2023-12-22T09:39:48+05:00', 'chunk_method': 'smart_pdf_processor', 'scannedby': 'Aziz Khan', 'language': 'English', 'char_count': 3127, 'source': '/home/hammadali08/Personal/FYP Datasets/My Brother by Fatima Jinnah.pdf', 'keywords': '', 'producer': 'Nuance PDF Create 8', 'total_pages': 66}, page_content="of gigantic magnitude. As the Head of the State, the task of steering the ship of Pakistan's destiny to a safe harbor fell to his hands that were worn out with work. I watched with sorrow and pain that in his hour of triumph the Quaid-e-Azam was far from being physically fit. He 

In [10]:
vectordb.similarity_search_with_score('Who was Fatima Jinnah?',k=4)

[(Document(metadata={'digitizedby': 'M. H. Panhwar Institute of Sindh Studies, Jamshoro.', 'publishedby': 'Riaz Ahmad, Islamabad.', 'author': 'Riaz Ahmad', 'source': '/home/hammadali08/Personal/FYP Datasets/First phase of struggle.pdf', 'page': 32, 'website': 'lib.sindh.org', 'char_count': 2860, 'dateofpublication': '1985', 'producer': 'Nuance PDF Create 8', 'scannedby': 'Aziz Khan', 'creator': 'Microsoft Word - New.docx', 'reproducedby': 'Sani Hussain Panhwar', 'language': 'English', 'chunk_method': 'smart_pdf_processor', 'title': 'Quaid-i-Azam Mohammad Ali Jinnah: First Phase of his Freedom Struggle - The Formative Years (1892-1920)', 'moddate': '2023-12-18T12:36:54+05:00', 'creationdate': '2020-05-20T01:40:17-07:00', 'page_label': '33', 'total_pages': 225, 'keywords': ''}, page_content='her sisters and brothers called her Manbai Poofi. In the family she was popular as a great story teller. See Fatima Jinnah, op. cit., pp. 63-64. 145 Rizwan Ahmad, op. cit., p. 65. 146 Fatima Jinnah, 

## Adding GROQ LLM

In [11]:
from langchain_groq import ChatGroq
import os

os.environ['Groq_api_key'] = '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'

In [12]:
llm = ChatGroq(model_name="llama-3.1-8b-instant", temperature=0.2,api_key='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')

In [13]:
test_response = llm.predict('Hello, how are you?')
test_response

  test_response = llm.predict('Hello, how are you?')


"I'm functioning properly, thank you for asking. I'm a large language model, so I don't have emotions or feelings like humans do, but I'm here to help answer any questions or provide information you might need. How can I assist you today?"

In [14]:
## Initializing the Chat_Model
from langchain.chat_models.base import init_chat_model
llm=init_chat_model("groq:llama-3.1-8b-instant", temperature=0.2,api_key='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')
print(llm.invoke('Quaid e Azam'))

content='Quaid-e-Azam, also known as Muhammad Ali Jinnah, was a Pakistani politician and statesman who served as the founder of Pakistan and its first Governor-General. He is considered one of the most influential figures in Pakistani history.\n\n**Early Life and Education**\n\nMuhammad Ali Jinnah was born on December 25, 1876, in Karachi, British India, to a Gujarati Muslim family. He studied law at the Inns of Court School of Law in London and was called to the bar in 1896. He returned to India and established a successful law practice in Bombay.\n\n**Politics and Leadership**\n\nJinnah entered politics in the early 20th century and became a key figure in the Indian National Congress. However, he later became disillusioned with the Congress and its leadership, particularly Mahatma Gandhi. In 1937, Jinnah founded the All-India Muslim League, which aimed to protect the rights and interests of Muslims in India.\n\n**Pakistan Movement**\n\nJinnah\'s vision for a separate homeland for Mus

## RAG Chain Using LCEL(Langchain Expression Language)

In [15]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate

In [25]:
costum_prompt=ChatPromptTemplate.from_template('''You are an assistant for question answering tasks from the data you gets.
Use the following context to answer the question.
If you do not know the answer just say Sorry! I don't Know.
write the necessary detail according to the given context.
Make sure to give some detail about the book from where the context is been retrieved. Just the book name and the page where this was written.


Context:
{context}

Question:
{question}

Answer:''')

In [26]:
## Formatting the Output Documents for the Prompt: The work was done by Create_stuff_document_chain
def format_docs(docs):
    return"\n\n".join(doc.page_content for doc in docs)

In [27]:
## Convert vector store to RAG chain
retriever = vectordb.as_retriever(
    search_kwargs={"k": 10})

In [28]:
## Building chain with LCEL
rag_chain_lcel=(
    {
        'context':retriever | format_docs,
        'question':RunnablePassthrough()}   ## As the Question will be given in the RUNTIME
    | costum_prompt
    | llm
    | StrOutputParser()
)
rag_chain_lcel

{
  context: VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7f47708f9010>, search_kwargs={'k': 10})
           | RunnableLambda(format_docs),
  question: RunnablePassthrough()
}
| ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question answering tasks from the data you gets.\nUse the following context to answer the question.\nIf you do not know the answer just say Sorry! I don't Know.\nwrite the necessary detail according to the given context.\nMake sure to give some detail about the book from where the context is been retrieved. Just the book name and the page where this was written.\n\n\nContext:\n{context}\n\nQuestion:\n{question}\n\nAnswer:"), additional_kwargs={})])
| ChatGr

In [29]:
output = rag_chain_lcel.invoke("What role did Quaid-e-Azam play in the Indian National Congress?")
print(output)

According to the context, the Quaid-e-Azam played a significant role in the Indian National Congress. He was a devout friend of Dadabhoy Naoroji, a great influence on his political individuality, and together they rendered yeoman service to the Indian National Congress in its early years of existence.

The Quaid-e-Azam also pioneered the idea of organizing Indian students into an Association, offering a meeting place and a forum, which he thought would be of immense benefit to the students.

Book: Quaid-i-Azam as Seen by His Contemporaries, 
Page: Not specified


In [30]:
output=rag_chain_lcel.invoke('What was Jinnahs role in the Lucknow Pact of 1916?')
print(output)

Jinnah was the architect of the Lucknow Pact of 1916. He played a key role in bringing about this unity between the Muslims and the Hindus, and his efforts finally resulted in the adoption of the joint agreement known as the Lucknow Pact in December 1916. 

Book: Jinnah - The First Phase
Page: 45


In [31]:
output=rag_chain_lcel.invoke('What was Quaid-e-Azams vision for Muslims in India?')
print(output)

According to the given context, the vision of Quaid-e-Azam for Muslims in India is not explicitly mentioned. However, it can be inferred that he wanted to protect the rights and interests of Muslims in India, as he was a key figure in the Pakistan Movement and played a crucial role in the creation of Pakistan.

But, if we look at the book "Jinnah of Pakistan - Stanley Wolpert" (page 380), it mentions that Quaid-e-Azam's vision for Muslims in India was to create a separate homeland for them, where they could live with dignity and respect.

However, the exact details of his vision are not provided in the given context.


In [32]:
output=rag_chain_lcel.invoke('Was Jinnah married? Who was his wife?')
print(output)

The book from which this context is retrieved is "Jinnah of Pakistan" by Stanley Wolpert.

According to the context, Jinnah was married, but his married life was short-lived. His first wife died within a couple of months after their marriage. 

As for his second wife, the context does not provide a clear answer. However, it mentions that Jinnah's mother had suggested a girl named Emi Bai from an Ismaili Khoja family of Paneli as a potential match for Jinnah. But it does not confirm whether Jinnah married Emi Bai or not.

However, it is mentioned that there were efforts made by Mrs. Drake to push the match of her daughter to Jinnah, but without success.


In [33]:
output=rag_chain_lcel.invoke('How did Jinnahs health affect his leadership?')
print(output)

Jinnah's health had a significant impact on his leadership. As he grew weaker and more conscious of his impending death, he became less patient with inefficiency and ineptitude, and more easily angered by excuses for not getting things done. His relations with his closest colleagues deteriorated rapidly in the final months of his life. He was also less able to tolerate the daily political and administrative responsibilities that came with being prime minister, and preferred to enjoy at least a taste of power in his private capacity.

Additionally, Jinnah's health issues, particularly his tuberculosis, made him aware that he had little time left to live, which added to his sense of urgency and desire to see significant progress in his struggling infant land. His health also made him more irritable and prone to outbursts, as evident from his calling Liaquat "mediocre" in a luncheon conversation.

Despite his physical weakness, Jinnah's spirit remained high, and he continued to work tirel