# Chatbot Development Notebook

### Part 1: Installing Necessary Libraries

In [None]:
!pip install langchain langchain-community faiss-cpu langchain-groq sentence-transformers pypdf gradio

Collecting langchain-community
  Downloading langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.13.0-cp39-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.7 kB)
Collecting langchain-groq
  Downloading langchain_groq-1.1.0-py3-none-any.whl.metadata (2.4 kB)
Collecting pypdf
  Downloading pypdf-6.4.0-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-classic<2.0.0,>=1.0.0 (from langchain-community)
  Downloading langchain_classic-1.0.0-py3-none-any.whl.metadata (3.9 kB)
Collecting requests<3.0.0,>=2.32.5 (from langchain-community)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7.0,>=0.6.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting groq<1.0.0,>=0.30.0 (from langchain-groq)
  Downloading groq-0.36.0-py3-none-any.whl.metadata (16 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7.0,>=

## Library Descriptions:

### LangChain

Description: LangChain is a framework designed to simplify the development of applications that use language models. It provides a set of tools and abstractions to help developers build chatbots, question-answering systems.

### FAISS (Facebook AI Similarity Search)

Description: FAISS is a library developed by Facebook AI Research that enables efficient similarity search and clustering of dense vectors.

### LangChain-GROQ

Description: This library provides free access/API's to use LLM models.

### Sentence-Transformers

Description: Sentence-Transformers provides models for generating sentence embeddings. In chatbot development, sentence embeddings can be used to understand user queries better and find the most relevant responses by comparing the similarity of embeddings.

##Pypdf

Description: It is a library specifically designed To read PDF files in Python.

## Import the Necessary Packages

In [None]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from pypdf import PdfReader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import PromptTemplate
from langchain_groq import ChatGroq
import os

## Steps to Create a Chatbot with LangChain

### Read the PDF File
- Use a PyPdf to read the content of the PDF file.
- This will allow you to extract the text from the document so the chatbot can understand it.

In [None]:
pdf_path = "/content/Health_policy.pdf" ## Path to Document

reader = PdfReader(pdf_path)   ## Loading the pdf file

text=""
for i, page in enumerate(reader.pages, start=1):
    pg_text = page.extract_text()
    print(f"\n--- Page {i} ---\n")
    print(pg_text if pg_text else "[No text found on this page]")
    text += pg_text + "\n"


--- Page 1 ---

 

--- Page 2 ---

NATIONAL HEALTH POLICY, 2017 
 
Contents 
1 Introduction:   1 
2 Goal, Principles and Objectives 1 
2.1 Goal 1 
2.2 Key Policy Principles 1 
2.3 Objectives 3 
2.4 Specific Quantitative Goals and Objectives 3 
  3 Policy Thrust 6 
3.1 Ensuring Adequate Investment  6 
3.2 Preventive and Promotive Health 6 
3.3 Organisation of Public Health Care Delivery 7 
3.3.1 Primary Care Services & Continuity of Care 8 
3.3.2 Secondary Care Services 9 
3.3.3 Reorienting Public Hospitals 10 
3.3.4 Closing Infrastructure and Human Resource/Skill Gaps 10 
3.3.5 Urban Health Care  10 
4.1 RMNCH+A services 11 
4.2 Child and Adolescent Health 11 
4.3 Interventions to address malnutrition and micronutrient deficiencies 11 
4.4 Universal Immunisation 12 
4.5 Communicable Diseases  12 
4.6 Non Communicable Diseases 13 
4.7 Mental Health 13 
4.8 Population Stabilisation 13 
5 Women’s Health and Gender Mainstreaming 14 
6 Gender Based Violence 14 
7 Supportive supervision 14 

### Break the Document into Smaller Parts (Chunks)
- If the document is large, divide it into smaller sections (chunks).
- Why? Models have a token limit, meaning they can only process a certain amount of text at a time. Smaller chunks ensure the chatbot can efficiently find the right information without exceeding this limit.

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=80)
document = Document(page_content=text)
chunks = text_splitter.split_documents([document])

### Convert Chunks into Vector Embeddings & Creating Vector Store
- Vector embeddings are like turning the text into a mathematical format that the model can understand better.
- This step helps the chatbot "remember" the content of the PDF in a way that makes it easier to search and retrieve relevant information.
-  The vector store acts like a library where the chatbot can quickly find the information it needs when answering questions.

In [None]:
## Defining Embedding Model
model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

  model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
## Creating VectorDB
vectorstore = FAISS.from_documents(chunks, embedding=model)
vectorstore.save_local("VectorDB")

## Query the PDF File
- Once the vector store is ready, you can ask questions about the PDF.
- The chatbot will search the vector store for the most relevant chunks and use them to provide answers.

This function search the vectordb and get the most relevant information to answer the question

### Defining a LLM model

In [None]:
os.environ["GROQ_API_KEY"]="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"

llm = ChatGroq(
    model="llama-3.3-70b-versatile",
    temperature=0.2,
    max_retries=2,
    # other params...
)

In [None]:
def get_chunks(ques):
  db = FAISS.load_local("VectorDB", model, allow_dangerous_deserialization=True)
  docs = db.similarity_search(ques, k=5)
  return docs

This is the Function Where you provide a prompt to LLM Model. Prompt is the input text or instructions you provide to the model to generate a response.


In [None]:
def getChain():
    prompt_template = """
    You are a highly specialized question-answering conversational chatbot, trained to provide concise, precise,
    and context-specific answers based exclusively on the text provided to you.
    Instructions:
    - Contextual Relevance: Each question will be accompanied by a provided text that contains relevant information
      from which to formulate your response.
    - The answers should derived only from the provided context do not answer your using general knowledge.

    Your task is to generate an accurate, specific, and fact-based response while ensuring it aligns with the given
    context.
    Avoid introducing external information or assumptions.

    Response Protocol:
    - If the question pertains to the provided context  answer succinctly and accurately.

    Inputs:
    - Context: {context}
    - Question: {question}
    """

    prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
    return prompt | llm

This function takes a question as input, forwards it to a LLM for processing, and returns the generated response.

In [None]:
def get_ans(docs, ques):
    chain = getChain()
    response = chain.invoke({"context": docs, "question": ques})
    return response.content  # AIMessage has .content

In [None]:
query="What is the Goal of the policy?"

chunks = get_chunks(query)
resp = get_ans(chunks, query)

print(resp)

The goal of the policy is the attainment of the highest possible level of health and well-being for all at all ages, through a preventive and promotive health care orientation in all developmental policies, and universal access to good quality health care services without anyone having to face financial hardship.


In [None]:
def answer(query):
  chunks = get_chunks(query)
  resp = get_ans(chunks, query)
  return resp

In [None]:
## Gradio Frontend

In [None]:

# If needed (run once)
# !pip install -q gradio

import gradio as gr

# Your existing function:
# def answer(query):
#     chunks = get_chunks(query)
#     resp = get_ans(chunks, query)
#     return resp

def chat_fn(user_message, history):
    """
    Gradio passes (user_message:str, history:list[tuple[user, assistant]])
    Your function doesn't use history, so we ignore it.
    """
    return answer(user_message)

ui = gr.ChatInterface(
    fn=chat_fn,
    title="📄 PDF Chatbot",
    description="Ask questions about your PDF.",
    textbox=gr.Textbox(placeholder="Type your question and press Enter...", lines=2),
)

# In a notebook:
ui.launch(inline=True)   # Use share=True for a public link


  self.chatbot = Chatbot(


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://210ea2810c05ffc9ff.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




1. What is the primary aim of the National Health Policy 2017?
2. What is the main goal of this policy?
3. How does the policy plan to strengthen urban healthcare?
4. How does the policy aim to control tuberculosis and HIV/AIDS?
5. How does the policy address pricing of drugs and medical devices?
