## PDF Document Loaders
- Load various kind of documents from the web and local files.
- Apply LLM to the documents for summarization and question answering.

### Project 1: Question Answering from PDF Document
- We will load the document from the local file and apply LLM to answer the questions.
- Lets use research paper published on the missuse of the health supplements for workout. 

In [None]:

# !pip install pymupdf tiktoken


In [29]:
from dotenv import load_dotenv

load_dotenv('./../.env')

True

In [35]:
from langchain_community.document_loaders import PyMuPDFLoader

loader = PyMuPDFLoader("D:/ML/HKA_Chatbot/Langchain-and-Ollama-main/Langchain-and-Ollama-main/08_Document_Loaders/Metasurface/s41467-020-15972-9.pdf")

docs = loader.load()

In [36]:
len(docs)

4

In [37]:
# docs[0].metadata
# print(docs[0].page_content)

In [38]:
### Read the list of PDFs in the dir
import os

pdfs = []
for root, dirs, files in os.walk("rag-dataset"):
    # print(root, dirs, files)
    for file in files:
        if file.endswith(".pdf"):
            pdfs.append(os.path.join(root, file))

In [39]:
docs = []
for pdf in pdfs:
    loader = PyMuPDFLoader(pdf)
    temp = loader.load()
    docs.extend(temp)

    # print(temp)
    # break

In [40]:
len(docs)

64

In [41]:
def format_docs(docs):
    return "\n\n".join([x.page_content for x in docs])


context = format_docs(docs)

In [42]:
docs[0]

Document(metadata={'producer': 'iLovePDF', 'creator': '', 'creationdate': '', 'source': 'rag-dataset\\gym supplements\\1. Analysis of Actual Fitness Supplement.pdf', 'file_path': 'rag-dataset\\gym supplements\\1. Analysis of Actual Fitness Supplement.pdf', 'total_pages': 15, 'format': 'PDF 1.7', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'moddate': '2024-10-21T11:38:50+00:00', 'trapped': '', 'modDate': 'D:20241021113850Z', 'creationDate': '', 'page': 0}, page_content='Citation: Espeño, P.R.; Ong, A.K.S.;\nGerman, J.D.; Gumasing, M.J.J.; Casas,\nE.S. Analysis of Actual Fitness\nSupplement Consumption among\nHealth and Fitness Enthusiasts. Foods\n2024, 13, 1424. https://doi.org/\n10.3390/foods13091424\nAcademic Editors: Ilija Djekic\nand Nada Smigic\nReceived: 30 March 2024\nRevised: 15 April 2024\nAccepted: 18 April 2024\nPublished: 6 May 2024\nCopyright: © 2024 by the authors.\nLicensee MDPI, Basel, Switzerland.\nThis article is an open access article\ndistributed\nunder

In [43]:
# print(context)

In [44]:
import tiktoken

encoding = tiktoken.encoding_for_model("gpt-4o-mini")


In [45]:
encoding.encode("congratulations"), encoding.encode("rqsqeft")

([542, 111291, 14571], [81, 31847, 80, 5276])

In [46]:
len(encoding.encode(docs[0].page_content))

968

In [47]:
len(encoding.encode(context))

60268

In [48]:
969*64

62016

In [49]:
### Question Answering using LLM
from langchain_ollama import ChatOllama

from langchain_core.prompts import (SystemMessagePromptTemplate, HumanMessagePromptTemplate,
                                    ChatPromptTemplate)



from langchain_core.output_parsers import StrOutputParser

base_url = "http://localhost:11434"
model = 'llama3.2:3b'

llm = ChatOllama(base_url=base_url, model=model)


In [51]:
system = SystemMessagePromptTemplate.from_template("""You are helpful AI assistant who answer user question based on the provided context. 
                                                    Do not answer in more than {words} words""")

prompt = """Answer user question based on the provided context ONLY! If you do not know the answer, just say "I don't know".
            ### Context:
            {context}

            ### Question:
            {question}

            ### Answer:"""

prompt = HumanMessagePromptTemplate.from_template(prompt)

messages = [system, prompt]
template = ChatPromptTemplate(messages)

# template
# template.invoke({'context': context, 'question': "How to gain muscle mass?", 'words': 50})

qna_chain = template | llm | StrOutputParser()

In [52]:
qna_chain

ChatPromptTemplate(input_variables=['context', 'question', 'words'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['words'], input_types={}, partial_variables={}, template='You are helpful AI assistant who answer user question based on the provided context. \n                                                    Do not answer in more than {words} words'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='Answer user question based on the provided context ONLY! If you do not know the answer, just say "I don\'t know".\n            ### Context:\n            {context}\n\n            ### Question:\n            {question}\n\n            ### Answer:'), additional_kwargs={})])
| ChatOllama(model='llama3.2:3b', base_url='http://localhost:11434')
| StrOutputParser()

In [None]:
response = qna_chain.invoke({'context': context, 'question': "How do you choose material for metaatoms?", 'words': 50})
print(response)

The text doesn't provide an answer to the question "How to gain muscle mass?" It appears to be a passage from a scientific article discussing the potential health risks associated with the use of botanical supplements, such as black cohosh, kava kava, saw palmetto, and others. The passage mentions various adverse effects and potential interactions that have been reported in case studies and clinical cases.

If you're looking for information on how to gain muscle mass, I'd be happy to try and help with that! However, please note that the text doesn't provide any relevant information on this topic. If you'd like, I can suggest some general tips or resources on building muscle mass.


In [21]:
response = qna_chain.invoke({'context': context, 'question': "How to reduce the weight?", 'words': 50})
print(response)

The text does not provide a direct answer to how to reduce weight, but rather discusses the potential toxicities and interactions of botanical supplements. However, based on the information provided, some general advice can be inferred:

1. Be cautious when using herbal supplements, as they can cause adverse effects such as liver damage or interact with other medications.
2. Consult a healthcare professional before taking any new supplement, especially if you have underlying health conditions or are taking prescription medications.
3. Choose reputable sources for herbal supplements and follow the recommended dosages.
4. Monitor your body's response to any new supplement and discontinue use if adverse effects occur.

It is not mentioned in the text how to reduce weight, but rather it can be inferred that a healthy lifestyle, including a balanced diet and regular exercise, is the most effective way to achieve weight loss.

Some of the specific botanical supplements discussed in the text 

In [22]:
response = qna_chain.invoke({'context': context, 'question': "How to do weight loss?", 'words': 50})
print(response)

The text does not provide a direct answer to the question "How to do weight loss?" but instead discusses various topics related to pharmacology and toxicology, including:

1. The risks of unregulated dietary supplements.
2. The mechanisms of drug-induced liver injury (DILI) and its associated effects on mitochondrial function, oxidative stress, and bile acid homeostasis.
3. Case reports of adverse reactions to botanicals, such as black cohosh, kava kava, saw palmetto, Echinacea, valerian, yohimbe, milk thistle, ginseng, garlic, and ginkgo biloba.
4. Potential herb-drug interactions, including the activation of metabolic enzymes like PXR and AhR.

The text does mention weight loss as a potential application for certain botanicals, such as black cohosh, but it does not provide information on how to achieve weight loss or recommend any specific weight loss methods.

If you are looking for advice on weight loss, I would be happy to try and assist you with some general tips and recommendati

In [23]:
response = qna_chain.invoke({'context': context, 'question': "How many planets are there outside of our solar system?", 'words': 50})
print(response)

It seems you've provided a random question and answer, but not related to the text. The text is discussing the risks and adverse effects of botanicals (plant-based supplements), while the question about the number of planets outside of our solar system appears unrelated.

If you'd like to ask a specific question or seek clarification on something from the text, I'll do my best to help!


### Project 2: PDF Document Summarization

In [24]:
system = SystemMessagePromptTemplate.from_template("""You are helpful AI assistant who works as document summarizer. 
                                                   You must not hallucinate or provide any false information.""")

prompt = """Summarize the given context in {words}.
            ### Context:
            {context}

            ### Summary:"""

prompt = HumanMessagePromptTemplate.from_template(prompt)

messages = [system, prompt]
template = ChatPromptTemplate(messages)

summary_chain = template | llm | StrOutputParser()

In [25]:
summary_chain

ChatPromptTemplate(input_variables=['context', 'words'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='You are helpful AI assistant who works as document summarizer. \n                                                   You must not hallucinate or provide any false information.'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'words'], input_types={}, partial_variables={}, template='Summarize the given context in {words}.\n            ### Context:\n            {context}\n\n            ### Summary:'), additional_kwargs={})])
| ChatOllama(model='llama3.2:3b', base_url='http://localhost:11434')
| StrOutputParser()

In [26]:
response = summary_chain.invoke({'context': context, 'words': 50})
print(response)

This article discusses the potential health risks associated with the use of botanicals, which are herbal remedies derived from plants. The authors review various studies and case reports to highlight the possible toxicities and interactions of botanicals with other medications.

**Key points:**

1. Botanicals can cause liver damage and induce drug-metabolizing enzymes, leading to potential herb-drug interactions.
2. Case reports have linked certain botanicals to severe health problems, such as liver failure, seizures, bleeding disorders, and cardiovascular events.
3. The mechanisms underlying the toxic effects of botanicals are often complex and involve multiple pathways, including mitochondrial dysfunction, oxidative stress, and alteration of bile acid homeostasis.
4. Some botanicals, such as black cohosh and ginkgo biloba, have been implicated in causing excessive bleeding due to their ability to inhibit platelet aggregation.
5. Herb-drug interactions are a significant concern, part

In [27]:
response = summary_chain.invoke({'context': context, 'words': 500})
print(response)

This article reviews the potential toxicities and interactions of various botanical supplements, including:

1. Black cohosh (Cimicifuga racemosa): associated with jaundice and liver failure in menopausal women, possibly due to oxidative stress.
2. Kava kava: linked to liver toxicity, potentially caused by depletion of glutathione and inhibition of cyclooxygenases.
3. Saw palmetto: associated with cholestatic hepatitis and pancreatitis.
4. Echinacea: may cause acute liver failure without a clear mechanism.
5. Valerian: can induce jaundice that reverses with steroid administration.
6. Yohimbine: can cause seizures, tachycardia, and hypertension due to its sympathomimetic properties.
7. Milk thistle: may exacerbate hemochromatosis in individuals predisposed to iron overload.
8. Ginseng: implicated in a transient ischemic attack without a clear mechanism.
9. Black cohosh: regulates heart rate via activation of serotonin receptors, possibly causing bradycardia.
10. Garlic and ginkgo biloba

### Project 3: Report Generation from PDF Document

Streamlit Tutorial: https://www.youtube.com/watch?v=hff2tHUzxJM&list=PLc2rvfiptPSSpZ99EnJbH5LjTJ_nOoSWW

In [28]:
response = qna_chain.invoke({'context': context, 
                             'question': "Provide a detailed report from the provided context. Write answer in Markdown.", 
                             'words': 2000})
print(response)

**Detailed Report on Adverse Effects of Botanical Supplements**

The use of botanical supplements has been associated with various adverse effects, ranging from mild to severe. The following is a summary of reported cases and potential mechanisms:

**Hepatotoxicity**

*   **Black cohosh (Cimicifuga racemosa)**: Associated with jaundice and liver failure in menopausal women, with pathological oxidative stress observed.
*   **Kava kava**: Liver toxicity, sometimes requiring transplants, attributed to depletion of glutathione and inhibition of cyclooxygenases.
*   **Saw palmetto**: Cholestatic hepatitis, with alterations in bile secretion linked to pancreatitis.
*   **Echinacea**: Acute liver failure, without a specific mechanism hypothesized.

**Non-hepatic Symptoms**

*   **Yohimbe**: Seizure with tachycardia and hypertension in a bodybuilder.
*   **Milk thistle**: Exacerbated hemochromatosis (iron overload) in a genetically predisposed individual, which resolved upon cessation of suppl