<a href="https://colab.research.google.com/github/Key2-Success/poshan_saathi/blob/main/Kitu_Komya_AI_Fellows_Technical_Assessment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AI Fellows Program - Technical Assessment Case Study
Prepared by Kitu Komya, on 10/08/2025

## Introduction
Welcome to Poshan Saathi (Nutrition Companion in Hindi).


## Outline
1. Setting up knowledge bases
2. Creating user onboarding questions
3. Interactive UI to onboard users!
4. Building out LLM on top of RAG and prompt refinement!
5. Interactive Chatbot

# 1. Setting up knowledge bases
We create a knowledge base with an annotated data dictionary describing the 3 trusted documents used for this prototype. All documents originate from official sources, with varying scopes. Given that we are developing a local prototype to fit local needs, we will refer to the knowledge bank in the order as follows: Indian governing body --> Indian professional organization --> global organization.

In [None]:
# install required packages for Jupyter Notebook. for a web app, I would use Docker to install and manage versions
!pip install llama-index openai



In [None]:

import pandas as pd

pd.set_option('display.max_colwidth', None) # prevent truncating the cell value
pd.read_csv('knowledge_base/knowledge_base_dictionary.csv', index_col=0) # view the data dictionary

Unnamed: 0_level_0,file_name,file_type,doc_title,doc_language,org_geographic_scope,org_official_name,org_display_name,doc_source,doc_year_published,doc_num_pages,doc_reference_order,doc_description,doc_intended_use,doc_remarks
doc_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1,anc_guidelines_india_mohfw,pdf,Training Manual on Care During Pregnancy and Child Birth,English,India,Ministry of Health and Family Welfare,MoHFW,https://nhsrcindia.org/sites/default/files/2021-12/Care%20During%20Pregnancy%20and%20Childbirth%20Training%20Manual%20for%20CHO%20at%20AB-HWC.pdf,2021,80,1,An Indian governing body's ANC guidelines,"Primary source, given doc is both regional and governing",
2,anc_guidelines_india_fogsi,pdf,Routine Antenatal Care for the Healthy Pregnant Women,English,India,Federation of Obstetric and Gynaecological Societies of India,FOGSI,https://www.fogsi.org/wp-content/uploads/2024/08/Binder_Routine-Antenatal-Care-for-the-Healthy-Pregnant-Women.pdf,2024,28,2,An Indian professional organization's ANC guidelines,"Secondary source, given doc is regional yet professional organization",
3,anc_guidelines_global_who,pdf,WHO antenatal care recommendations for a positive pregnancy experience,English,Global,World Health Organization,WHO,https://iris.who.int/server/api/core/bitstreams/cb09dd39-1cfc-432c-9baf-feb6a5c40aa4/content,2021,40,3,A global organization's ANC guidelines,"Tertiary source, given doc is global organization",


# 2. Creating user onboarding questions
A quick setup of classes to create a user profile and user onboarding.

In [None]:
from openai import OpenAI
from google.colab import userdata

client = OpenAI(api_key=userdata.get('poshan-saathi')) # access OpenAI's API (with secret API key :D)

In [None]:
# create a class to define user profile
class UserProfile:
    def __init__(self, name=None, age=None, pregnancy_week=None, diet_type=None, weight_kg=None,
                 height_cm=None, medical_conditions=None):
        self.name = name
        self.age = age
        self.pregnancy_week = pregnancy_week
        self.diet_type = diet_type
        self.weight_kg = weight_kg # India typically uses the metric system
        self.height_cm = height_cm # India typically uses the metric system
        self.medical_conditions = medical_conditions or None


    def __str__(self):
        return (f"Name: {self.name}\n"
                f"Age: {self.age}\n"
                f"Pregnancy Week: {self.pregnancy_week}\n"
                f"Diet Type: {self.diet_type}\n"
                f"Weight: {self.weight_kg} kg \n"
                f"Height: {self.height_cm} cm \n"
                f"Medical Conditions: {self.medical_conditions}\n")

In [None]:
from ipywidgets import interact, widgets
from IPython.display import display

# create a class for interactive user onboarding
class UserOnboarding:
    def __init__(self):
        # basic widgets
        self.name_widget = widgets.Text(description="Name:")
        self.age_widget = widgets.IntText(description="Age:")
        self.week_widget = widgets.IntSlider(description="Week", min=1, max=45, value=12)
        self.diet_widget = widgets.Dropdown(description="Diet", options=["üî¥ Non-Vegetarian", "üü° Ovo-Vegetarian", "üü¢ Vegetarian",])
        self.weight_kg_widget = widgets.FloatText(description="Weight (kg):")
        self.height_cm_widget = widgets.FloatText(description="Height (cm)")
        self.submit_button = widgets.Button(description="Submit", button_style="success")

        # medical conditions widget
        self.medical_conditions_widget = widgets.SelectMultiple(
            options=["Low iron", "Hypertension", "Diabetes"],
            description="Medical Conditions"
        )

        # when user clicks submit button, run callback function
        self.submit_button.on_click(self.on_submit)

        # display output
        self.output = widgets.Output()

    # display UI widgets
    def display(self):
        display_ui = widgets.VBox([
            self.name_widget,
            self.age_widget,
            self.week_widget,
            self.diet_widget,
            self.weight_kg_widget,
            self.height_cm_widget,
            self.medical_conditions_widget,
            self.submit_button
        ])
        display(display_ui, self.output)

    # define tasks upon clicking submit button
    def on_submit(self, b):
        # collect data and create a UserProfile
        user = UserProfile(
            name=self.name_widget.value,
            age=self.age_widget.value,
            pregnancy_week=self.week_widget.value,
            diet_type=self.diet_widget.value,
            weight_kg=self.weight_kg_widget.value,
            height_cm=self.height_cm_widget.value
        )

        # print output
        with self.output:
            self.output.clear_output()
            print("\nCollected User Profile:")
            print(user)

# 3. Interactive UI to onboard users!
Here's a simple UI that onboards the user's basic demographics. These values will be used to personalize the Chatbot further!

In [None]:
# onboard users using interactive UI!
onboarding = UserOnboarding()
onboarding.display()

VBox(children=(Text(value='', description='Name:'), IntText(value=0, description='Age:'), IntSlider(value=12, ‚Ä¶

Output()

# 4. Building out LLM on top of RAG and prompt refinement!
We now take the 3 credible documents and put them into LlamaIndex's Vector Store, including its citation metadata. Then we combine RAG retrieval with the user's specific needs, prompt-refined LLM generation, safety guardrails, and source annotation. Phew!

In [None]:
import os
from llama_index.readers.file import PDFReader
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from openai import OpenAI
from typing import TypedDict, List

os.environ["OPENAI_API_KEY"] = userdata.get('poshan-saathi')

# define a TypedDict to enforce type-safety for PDF files
class PDFFile(TypedDict):
    path: str
    source_name: str
    year_published: int

# create a RAG wrapper class
class PregnancyRAG:
    FALLBACK_RESPONSE = {
        "answer": "I‚Äôm sorry, I don‚Äôt have enough information to answer that question.",
        "sources": []
    }

    GUARDRAIL_RESPONSE = {
        "answer": "I‚Äôm here to provide nutritional guidance only. For medical concerns or emergencies, please contact a qualified healthcare provider immediately.",
        "sources": []
    }

    def __init__(self, pdf_files: List[PDFFile], llm_fallback_model="gpt-4.1-nano"):
        self.pdf_files = pdf_files
        self.llm_fallback_model = llm_fallback_model
        self.indexes = {}
        self._build_indexes()
        self.client = OpenAI(api_key=userdata.get('poshan-saathi')) # init OpenAI client

    # load and index all PDFs with metadata
    def _build_indexes(self):

        # load in all PDFs
        reader = PDFReader()
        for file in self.pdf_files:
            docs = reader.load_data(file["path"])
            print(f"Loaded {len(docs)} docs from {file['path']}")

            # add metadata to each document to use as citation source
            for i, doc in enumerate(docs, start=1):  # start=1 for 1-indexed pages
                doc.metadata = file.copy()  # avoid overwriting the original dict
                doc.metadata["page"] = i

            # index all documents in a vector store
            self.indexes[file["source_name"]] = VectorStoreIndex.from_documents(docs)

    # apply a query through an ordered RAG system
    def query_rag_ordered(self, question, order=["MoHFW","FOGSI","WHO"]):

      # set a cosine similarity threshold, set after trial and error
      threshold = 0.7

      # append user's dietary type to user query before going through RAG
      query_with_user_metadata = f"{question} [Dietary preference: {onboarding.diet_widget.value}]"

      for source_name in order:
          # retrieve from RAG
          response = self.indexes[source_name].as_query_engine().query(query_with_user_metadata)

          # filter nodes if there is a response from RAG and if the score exceeds a certain threshold
          filtered_nodes = []
          for node in getattr(response, "source_nodes", []):
              if hasattr(node, "score") and node.score >= threshold:
                  filtered_nodes.append(node)

          # only proceed if we have high-quality retrieval
          if not filtered_nodes:
              continue  # try next source in order

          # build context string with citation metadata
          context_text = "\n".join([
              f"{n.node.get_text()} (Source: {n.metadata['source_name']}, Page: {n.metadata['page']}, Year: {n.metadata['year_published']})"
              for n in filtered_nodes
          ])

          # apply prompt refinement
          prompt = f"""
            You are Poshan Saathi, a friendly pregnancy nutrition assistant. Your aim is to provide positive and helpful information around nutrition to
            pregnant women living in India.

            Rules:
            - Never give medical diagnoses or treatment advice. Respond with: {self.GUARDRAIL_RESPONSE}
            - If a user is experiencing any kind of urgent or sudden symptoms, respond with: {self.GUARDRAIL_RESPONSE}
            - Tailor advice to the user‚Äôs diet: {onboarding.diet_widget.value}, week of pregnancy: {onboarding.week_widget.value},
            medical condition's: {onboarding.medical_conditions_widget.value}.
            - Share no more than 2-3 clear and relevant sentences.

            Context:
            {context_text}

            Question:
            {query_with_user_metadata}
           """

          # generate answer from LLM
          answer_text = self.query_llm(prompt)

          # return answer and sources
          sources_info = [{
              "source_name": n.metadata["source_name"],
              "page": n.metadata["page"],
              "year_published": n.metadata["year_published"]
          } for n in filtered_nodes]

          return {
              "answer": answer_text,
              "sources": sources_info
          }

      # fallback if no RAG source provided a good answer
      return self.FALLBACK_RESPONSE


    # call an OpenAI model directly
    def query_llm(self, prompt):
        response = self.client.chat.completions.create(
            model=self.llm_fallback_model,
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content

In [None]:
# list of PDFs with metadata
pdf_files = [
    {"path": "knowledge_base/anc_guidelines_india_mohfw.pdf", "source_name": "MoHFW", "year_published": 2021},
    {"path": "knowledge_base/anc_guidelines_india_fogsi.pdf", "source_name": "FOGSI", "year_published": 2024},
    {"path": "knowledge_base/anc_guidelines_global_who.pdf", "source_name": "WHO", "year_published": 2021},
]

rag = PregnancyRAG(pdf_files)

Loaded 80 docs from knowledge_base/anc_guidelines_india_mohfw.pdf
Loaded 28 docs from knowledge_base/anc_guidelines_india_fogsi.pdf
Loaded 40 docs from knowledge_base/anc_guidelines_global_who.pdf


# 5. Interactive Chatbot
Feel free to play with the friendly Chatbot named Poshan Saathi! It has been instructed to only provide relevant medical information from reliable sources with citations. If it learns you need medical attention or need diagnoses, it will redirect you. It will also redirect you if you ask irrelevant questions.

In [None]:

# --- Chatbot Interface ---
class NutritionChatbot:
    def __init__(self, rag_instance):
        self.rag = rag_instance
        self.chat_history = []
        self.input_box = widgets.Text(
            placeholder="Ask about nutrition during pregnancy...",
            description="You:",
            layout=widgets.Layout(width="100%")
        )
        self.output_area = widgets.Output()
        self.send_button = widgets.Button(description="Send", button_style='success')
        self.send_button.on_click(self.handle_query)
        display(self.input_box, self.send_button, self.output_area)

    def handle_query(self, b):
        question = self.input_box.value.strip()
        if not question:
            return

        with self.output_area:
            print(f"\nüßë‚Äçüçº You: {question}")
            response = self.rag.query_rag_ordered(question)
            print(f"ü§ñ Poshan Saathi: {response['answer']}")
            if response["sources"]:
                print(f"üìò Source(s): {response['sources']}")
        self.input_box.value = ""


# --- Launch Chatbot ---
NutritionChatbot(rag)

Text(value='', description='You:', layout=Layout(width='100%'), placeholder='Ask about nutrition during pregna‚Ä¶

Button(button_style='success', description='Send', style=ButtonStyle())

Output()

<__main__.NutritionChatbot at 0x7dafce1c35f0>