<a href="https://www.kaggle.com/code/drdebabratamondal/ayurvedic-diagnostic-assistant-ada-1-0-final?scriptVersionId=235075108" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# 💊 Ayurvedic Diagnostic Assistant (ADA) with Gemini, RAG & Structured Output

## 🔍 Project Overview

This notebook implements an **Ayurvedic Diagnostic Assistant (ADA)** leveraging Google's Gemini model. The assistant aims to:

1.  **Analyze patient symptoms** through the lens of traditional Ayurvedic principles (Tridosha theory: Vata, Pitta, Kapha).
2.  **Retrieve relevant information** from a corpus of Ayurvedic texts using Retrieval-Augmented Generation (RAG) to ground its analysis in domain-specific knowledge.
3.  **Generate structured diagnostic reports** in JSON format, detailing potential imbalances, supporting evidence, and recommending treatments (diet, herbs, therapies, lifestyle).
4.  **Present the information clearly** through a formatted HTML display and an interactive user interface.

**🤖 Key Generative AI Capabilities Demonstrated:**

*   **Retrieval Augmented Generation (RAG):** Enhances the Gemini model's responses by incorporating knowledge retrieved from a specialized Ayurvedic text dataset.
*   **Structured Output (JSON Mode):** Instructs the model to generate responses in a consistent, machine-readable JSON format, enabling reliable data extraction and presentation.
*   **Few-shot Prompting:** Uses illustrative examples within the prompt to guide the model towards the desired format and style of Ayurvedic analysis.
*   **Embeddings & Vector Search:** Employs text embeddings and a vector database (FAISS) to enable semantic search over the Ayurvedic knowledge base for effective RAG.
*   **Document Understanding:** Processes and chunks text from Ayurvedic documents to build the knowledge base for RAG.
*   **Basic GenAI Evaluation:** Includes sample cases with expected outcomes to perform a simple validation of the system's diagnostic accuracy.

## ⚙️ Step 1: Setting Up the Environment

This first code cell imports necessary Python libraries for data manipulation (`pandas`, `numpy`), file system operations (`os`), date/time handling (`datetime`), progress bars (`tqdm`), JSON processing (`json`), regular expressions (`re`), interacting with the Google Generative AI API (`google.generativeai`), displaying rich output in the notebook (`IPython.display`), and suppressing warnings. It also includes logic to list files in the Kaggle input directory, providing visibility into the available datasets.

In [1]:
# Step 1: Setting Up the Environment

import numpy as np # linear algebra
import pandas as pd # data processing
import datetime # Added for current date display
import os
from tqdm.notebook import tqdm # Imported tqdm for notebooks

 
input_dir = '/kaggle/input'
print(f"Checking files in {input_dir}...") 

# Counting the total number of files to set up the progress bar
total_files = 0
try:
    for _, _, filenames in os.walk(input_dir):
        total_files += len(filenames)
except FileNotFoundError:
    print(f"Warning: Input directory '{input_dir}' not found. Skipping file check.")
    total_files = 0

# Iterating again just to update the progress bar
if total_files > 0:
    print(f"Processing {total_files} files (progress bar below):")
    # Use tqdm context manager for automatic closing
    with tqdm(total=total_files, desc="Checking Input Files", unit="file") as pbar:
        for dirname, _, filenames in os.walk(input_dir):
            for filename in filenames:
            
                pbar.update(1) # Increment progress bar for each file
    print("\nFile check complete.") # Confirmation message after the loop
else:
    print("No files found or directory doesn't exist.")


# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Importing other necessary libraries
import json
import re
import warnings
import google.generativeai as genai
from IPython.display import display, Markdown, HTML
from sklearn.metrics.pairwise import cosine_similarity

warnings.filterwarnings('ignore')

print("Setting up the environment (other libraries)...")

Checking files in /kaggle/input...
Processing 2175 files (progress bar below):


Checking Input Files:   0%|          | 0/2175 [00:00<?, ?file/s]


File check complete.
Setting up the environment (other libraries)...


**This initial step imports all necessary libraries:**

- Standard libraries for data handling and processing

- Google's Generative AI library for Gemini 2.0 Flash

- SentenceTransformer for creating high-quality embeddings

- Kaggle library to download the dataset

- Display tools for rendering rich output in Kaggle

## 🔑 Step 2: API Configuration and Model Selection

This cell configures the connection to the Google Generative AI service. It retrieves the necessary API key, ideally from Kaggle Secrets for security, or falls back to an environment variable for local testing. It then initializes the `genai` client library and selects the specific generative model to be used for the diagnostic task – in this case, `gemini-2.0-flash`, chosen for its balance of speed and capability.

In [2]:
# Step 2: Gemini API Configuration and Model Selection
try:
    # For Kaggle environment
    from kaggle_secrets import UserSecretsClient
    user_secrets = UserSecretsClient()
    GOOGLE_API_KEY = user_secrets.get_secret("GOOGLE_API_KEY")
except:
    # Fallback for local testing
    GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")

# Configuring the Gemini API
genai.configure(api_key=GOOGLE_API_KEY)

# Selecting Gemini 2.0 Flash model
model = genai.GenerativeModel('gemini-2.0-flash')
print("API configured and model selected: gemini-2.0-flash")


API configured and model selected: gemini-2.0-flash


This section:

- Retrieves the Google API key from Kaggle Secrets or environment variables
- Configures the Gemini client with the API key
- Selects the Gemini 2.0 Flash model as specified


## 📚 Step 3: Loading and Processing the Ayurveda Dataset

This step is crucial for the RAG capability. It defines a function `process_ayurveda_texts` to load the knowledge base.

**Note:** This code assumes the **Ayurvedic Texts dataset (`rcratos/ayurveda-texts-english`) has been added to the notebook environment using Kaggle's \"+ Add Data\" / \"+ Add Input\" feature.** The specified file path (`/kaggle/input/ayurveda-texts-english`) points to where Kaggle typically mounts added datasets.

The function:
1.  Locates the dataset directory (checking if it exists based on the standard Kaggle input path).
2.  Iterates through `.txt` files within the directory structure.
3.  Reads the content of each text file.
4.  Chunks the text into smaller, manageable paragraphs or segments (using newline splitting or word count as a fallback) suitable for embedding and retrieval.
5.  Assigns metadata (source text name, category based on subfolder, filename) to each chunk.
6.  Handles potential errors during file processing.
7.  Includes a fallback mechanism: If the dataset cannot be found or processed, it loads a small set of predefined sample Ayurvedic texts to ensure the application can still run, albeit with a limited knowledge base.

The processed text chunks form the foundation of the knowledge base that the RAG system will search.

In [3]:
# Step 3: Downloading and Processing the Ayurveda Dataset

# Process the Ayurveda texts from the attached dataset directory
def process_ayurveda_texts(data_input_dir='/kaggle/input/ayurveda-texts-english'): # Default path
    print(f"Processing Ayurveda texts from {data_input_dir}...")
    texts = []

    # Check if the dataset directory exists
    if not os.path.exists(data_input_dir):
         print(f"Error: Dataset directory not found at {data_input_dir}.")
         print("Please ensure the 'rcratos/ayurveda-texts-english' dataset is added to the notebook via '+ Add Data'.")
         return [] # Return empty list if directory is missing

    # Walk through all files in the data directory
    for root, dirs, files in os.walk(data_input_dir):
        # Skip hidden directories/files if necessary (e.g., .ipynb_checkpoints)
        files = [f for f in files if not f.startswith('.')]
        dirs[:] = [d for d in dirs if not d.startswith('.')]

        for file in files:
            if file.endswith('.txt'):
                file_path = os.path.join(root, file)
                try:
                    relative_path = os.path.relpath(root, data_input_dir)
                    category = os.path.basename(relative_path) if relative_path != '.' else 'root' # Use 'root' or similar if files are directly in the main folder
                    text_name = os.path.splitext(file)[0]

                    with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
                        content = f.read()

                    # Simple split into paragraphs or large chunks
                    chunks = [para for para in content.split('\n\n') if len(para.split()) > 30] # Example: split by double newline, min 30 words

                    if not chunks or len(chunks) == 1: # Fallback if paragraph split fails or yields one huge chunk
                         words = content.split()
                         chunk_size = 500 # Adjust chunk size as needed
                         chunks = [' '.join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]

                    for i, chunk in enumerate(chunks):
                         if len(chunk.strip()) > 10: # Ensure chunk is not just whitespace
                            texts.append({
                                "id": f"{text_name}-chunk-{i}",
                                "content": chunk.strip(),
                                "metadata": {
                                    "source": text_name,
                                    "category": category,
                                    "file_path": os.path.basename(file_path) # Add original filename for reference
                                }
                            })
                except Exception as e:
                    print(f"Error processing file {file_path}: {e}")

    if not texts:
        print(f"Warning: No text chunks were processed from {data_input_dir}. Check file structure and content.")
        # Fallback will be triggered later if needed

    print(f"Processed {len(texts)} text chunks from the Ayurveda dataset.")
    return texts

# Process the dataset directly
try:
    # Make sure the dataset is attached via Kaggle UI first!
    AYURVEDA_DATASET_PATH = '/kaggle/input/ayurveda-texts-english' # Or the specific path shown in /kaggle/input
    ayurvedic_texts = process_ayurveda_texts(AYURVEDA_DATASET_PATH)

    if not ayurvedic_texts: # Check if processing yielded any texts
         print("Processing completed but no texts were extracted. Falling back to sample texts.")
         raise ValueError("No texts extracted from input directory") # Force fallback

except Exception as e:
    print(f"Error processing dataset from input directory: {e}")
    print("Falling back to sample Ayurvedic texts...")
    # Provide sample texts in case of issues (Keep your fallback list here)
    ayurvedic_texts = [
        {
            "id": "sample-vata-1",
            "content": "Vata dosha governs movement in the body, including blood circulation, breathing, blinking, and heartbeat. Characteristics include dryness, coldness, lightness, and irregularity. Imbalance often manifests as anxiety, insomnia, dry skin, constipation, and joint pain. Balancing Vata involves warmth, routine, grounding activities, and nourishing foods.",
            "metadata": {"source": "Sample Knowledge", "category": "Doshas"}
        },
        {
            "id": "sample-pitta-1",
            "content": "Pitta dosha controls digestion, metabolism, and energy production. Its qualities are hot, sharp, light, and oily. Pitta individuals often have strong digestion and intellect but can be prone to anger, inflammation, heartburn, and skin rashes when imbalanced. Cooling foods, moderation, and avoiding excess heat help balance Pitta.",
            "metadata": {"source": "Sample Knowledge", "category": "Doshas"}
        },
        {
            "id": "sample-kapha-1",
            "content": "Kapha dosha provides structure, lubrication, and stability. It is characterized by heaviness, coldness, slowness, and oiliness. Balanced Kapha brings strength and calmness. Imbalance can lead to lethargy, weight gain, congestion, and possessiveness. Stimulation, exercise, warmth, and light foods are key to balancing Kapha.",
            "metadata": {"source": "Sample Knowledge", "category": "Doshas"}
        },
         {
            "id": "sample-treatment-1",
            "content": "Abhyanga, or self-massage with warm oil, is highly recommended for Vata imbalance. It pacifies dryness, improves circulation, calms the nerves, and promotes better sleep. Sesame oil is traditionally used. Swedana (steam therapy) often follows Abhyanga to help the oil penetrate deeper.",
            "metadata": {"source": "Sample Knowledge", "category": "Treatments"}
        }
    ]
    print(f"Using {len(ayurvedic_texts)} sample text chunks.")

# Ensuring we have texts before proceeding
if not ayurvedic_texts:
     print("CRITICAL ERROR: No Ayurvedic texts available (processing failed and fallback is empty). RAG will not function correctly.")

Processing Ayurveda texts from /kaggle/input/ayurveda-texts-english...
Processed 38978 text chunks from the Ayurveda dataset.


This section:

- Downloads the Ayurveda texts dataset from Kaggle using the Kaggle API
- Processes the text files by:
    - Reading each file in the dataset
    - Splitting long texts into manageable chunks (1000 words each)
    - Organizing them with metadata about source and category
- Includes a fallback to sample texts in case of download issues


## 🔗 Step 4: Creating the Vector Database for RAG

This cell builds the core component for semantic search in the RAG pipeline: the vector database. It uses libraries from `langchain` and `sentence-transformers`.

**Hardware Acceleration:** Generating embeddings for a large corpus can be computationally intensive. **Using a GPU accelerator (like the T4 x2 available on Kaggle, selected in the notebook settings) significantly speeds up this process,** particularly the embedding generation handled by the `HuggingFaceEmbeddings` library when configured with `device='cuda'`.

The process involves:
1.  **Install Dependencies:** Installs `faiss-cpu` (for the vector store), `sentence-transformers` (for embeddings), and `langchain` components.
2.  **Load Embedding Model:** Initializes a high-quality sentence embedding model (`all-MiniLM-L6-v2`) using `HuggingFaceEmbeddings` from `langchain`. It's explicitly configured to use the GPU (`device='cuda'`) for faster embedding generation, leveraging the selected accelerator.
3.  **Prepare Documents:** Converts the text chunks processed in Step 3 into LangChain `Document` objects, preserving the content and metadata.
4.  **Generate Embeddings & Build Index:** Creates vector embeddings for each document using the loaded model (running efficiently on the GPU). It then uses FAISS (Facebook AI Similarity Search) via `langchain_community.vectorstores.FAISS` to build an efficient index of these embeddings on the CPU. This process is done in batches to handle potentially large datasets without exhausting memory.
5.  **Create Retriever:** Creates a LangChain `retriever` object from the FAISS index. This retriever provides a simple interface (`get_relevant_documents`) to perform semantic similarity searches against the indexed Ayurvedic texts. It's configured to return the top 5 most relevant documents (`k=5`).

This vector database and retriever are essential for finding relevant context to augment the LLM's knowledge.

In [4]:
# Step 4: Creating an Enhanced Vector Database

import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

!pip install -q faiss-cpu sentence-transformers
!pip install -q langchain langchain-community langchain-huggingface
!pip install -q huggingface_hub[hf_xet]

from langchain_community.vectorstores import FAISS 
from langchain_huggingface import HuggingFaceEmbeddings
from tqdm.notebook import tqdm
from langchain.schema import Document
import os
import math

print("Initializing HuggingFace embedding model...")
model_name = 'all-MiniLM-L6-v2'
model_kwargs = {'device': 'cuda'}
encode_kwargs = {'normalize_embeddings': False} 
local_embeddings = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)
print(f"HuggingFace embedding model '{model_name}' loaded, configured for GPU usage.")

# Assuming 'ayurvedic_texts' is ready from Step 3
if not ayurvedic_texts:
    print("Error: ayurvedic_texts list is empty. Cannot build vector database.")
    vector_db = None
    retriever = None
else:
    # Prepare documents
    langchain_docs = []
    print(f"Preparing {len(ayurvedic_texts)} text chunks for LangChain...")
    for text_chunk in tqdm(ayurvedic_texts, desc="Preparing Docs"):
        content = str(text_chunk.get('content', ''))
        metadata = text_chunk.get('metadata', {})
        if not isinstance(metadata, dict):
             metadata = {'original_metadata': str(metadata)}
        doc = Document(page_content=content, metadata=metadata)
        langchain_docs.append(doc)

    print(f"Generating embeddings (on GPU) and building FAISS index (on CPU) for {len(langchain_docs)} documents in batches...")
    try:
        batch_size = 1000 # Adjust based on GPU memory for embedding
        num_docs = len(langchain_docs)
        num_batches = math.ceil(num_docs / batch_size)
        vector_db = None

        print(f"Processing in {num_batches} batches of size {batch_size}...")
        for i in tqdm(range(0, num_docs, batch_size), desc="Building FAISS Index", total=num_batches):
            batch_docs = langchain_docs[i : min(i + batch_size, num_docs)]
            if not batch_docs: continue

            if vector_db is None:
                # Embedding happens on GPU here via local_embeddings
                vector_db = FAISS.from_documents(batch_docs, local_embeddings)
            else:
                # Embedding happens on GPU here via local_embeddings
                vector_db.add_documents(batch_docs)

        if vector_db is not None:
             print("\nFAISS index built successfully in batches.")
            
             retriever = vector_db.as_retriever(search_kwargs={'k': 5})
             print("Retriever created.")
        else:
             print("\nFAISS index could not be initialized.")
             retriever = None

    except Exception as e:
        print(f"\nError building FAISS index during batch processing: {e}")
        import traceback
        traceback.print_exc()
        print("RAG functionality will be impaired.")
        vector_db = None
        retriever = None

# Note: The 'retriever' object is the one used for searching in Step 6.

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/30.7 MB[0m [31m58.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m30.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m207.5/207.5 MB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.1/21.1 MB[0m [31m72.8 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency

2025-04-20 17:22:00.836816: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1745169721.060658      19 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1745169721.121881      19 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

HuggingFace embedding model 'all-MiniLM-L6-v2' loaded, configured for GPU usage.
Preparing 38978 text chunks for LangChain...


Preparing Docs:   0%|          | 0/38978 [00:00<?, ?it/s]

Generating embeddings (on GPU) and building FAISS index (on CPU) for 38978 documents in batches...
Processing in 39 batches of size 1000...


Building FAISS Index:   0%|          | 0/39 [00:00<?, ?it/s]


FAISS index built successfully in batches.
Retriever created.


This section:

- Uses SentenceTransformer to create high-quality embeddings
- Implements an enhanced vector database that:
    - Stores documents and their metadata
    - Creates embeddings using the SentenceTransformer model
    - Provides semantic search functionality
- Adds all the processed Ayurvedic texts to the database with a progress bar


## 🎯 Step 5: Creating Few-Shot Examples for Structured Output

This cell defines `few_shot_examples`, a string containing examples of the desired input-output behavior for the LLM. This is a key part of **few-shot prompting** and **structured output** generation.

*   **Purpose:** These examples demonstrate to the Gemini model exactly how to analyze patient symptoms according to Ayurvedic principles and, critically, how to format the output as a JSON object.
*   **Structure:** Each example consists of:
    *   `Patient Symptoms:` A natural language description.
    *   `Ayurvedic Analysis:` The corresponding desired output, formatted as a JSON string.
*   **Content:** The JSON structure includes fields like `dominant_dosha`, `imbalances`, `diagnosis`, `supporting_evidence` (linking symptoms to Ayurvedic concepts), and detailed `recommended_treatments` categorized appropriately (diet, herbs, medicines, therapies, lifestyle).
*   **Guidance:** By providing these examples, we guide the model to generate responses that adhere to this specific schema and level of detail, making the output predictable and easier to parse programmatically.

In [5]:
# Step 5: Creating Few-Shot Examples for Structured Output

few_shot_examples = """
Example 1:
Patient Symptoms: "I've been experiencing joint pain that worsens in cold weather, cracking sounds in my knees, constipation, and anxiety. I have trouble sleeping and my skin is very dry."
Ayurvedic Analysis:
{
  "dominant_dosha": "Vata",
  "imbalances": ["Vata excess in joints (Sandhi Vata)", "Vata affecting colon (Kostha Vata)", "Vata affecting nervous system (Majja Dhatu)"],
  "diagnosis": "Sandhigata Vata (Osteoarthritis with Vata predominance) with associated Anidra (Insomnia) and Kostha Baddhata (Constipation)",
  "supporting_evidence": {
    "symptoms_matching_vata": ["joint pain worse in cold", "cracking sounds (Vata in joints)", "constipation", "anxiety", "insomnia", "dry skin"],
    "pulse_indication": "Likely irregular, thready, feeble (Vata pulse)",
    "tongue_indication": "Likely dry, rough, possibly cracked, maybe a brownish coating"
  },
  "recommended_treatments": {
    "dietary": ["Warm, cooked, unctuous foods", "Favor sweet, sour, and salty tastes", "Include ghee and healthy oils", "Avoid cold, dry, light foods", "Warm water/herbal teas"],
    "herbs": ["Ashwagandha (Withania somnifera) - for strength and stress", "Guggulu (Commiphora wightii) - specific for joints", "Shallaki (Boswellia serrata) - for joint inflammation", "Haritaki (Terminalia chebula) - for constipation"],
    "ayurvedic_medicines": ["Yogaraj Guggulu or Mahayogaraj Guggulu - classical formulation for joints", "Mahanarayan Oil - for external massage on joints", "Ashwagandharishta - for nerve strength and stress", "Castor oil (Eranda Taila) - gentle purgative for Vata constipation (use cautiously)"],
    "therapies": ["Abhyanga (regular warm oil massage)", "Swedana (steam therapy, especially Nadi Sweda for joints)", "Basti (medicated enema, particularly Anuvasana or Matra Basti with appropriate oils)"],
    "lifestyle": ["Maintain regular daily routine (Dinacharya)", "Keep warm, avoid cold drafts", "Gentle, grounding yoga and stretching", "Pranayama (e.g., Nadi Shodhana)", "Meditation for anxiety"]
  }
}

Example 2:
Patient Symptoms: "I frequently get heartburn and acid reflux, especially after eating spicy foods. I have a reddish complexion, feel hot often, and get irritated easily. I also have some skin rashes that worsen when I'm stressed."
Ayurvedic Analysis:
{
  "dominant_dosha": "Pitta",
  "imbalances": ["Pitta excess in digestive tract (Annavaha Srotas)", "Pitta affecting skin (Rakta Dhatu, Bhrajaka Pitta)", "Pitta affecting mind (Sadhaka Pitta)"],
  "diagnosis": "Amlapitta (Hyperacidity/GERD) with associated Raktaja Kustha (Pitta-type skin issues)",
  "supporting_evidence": {
    "symptoms_matching_pitta": ["heartburn", "acid reflux (sour/bitter taste)", "reddish complexion", "feeling hot", "irritability/anger", "skin rashes worsened by stress/heat"],
    "pulse_indication": "Likely moderate strength, sharp, jumping (Pitta pulse)",
    "tongue_indication": "Likely reddish tongue body, possibly with a yellowish coating"
  },
  "recommended_treatments": {
    "dietary": ["Cooling foods and drinks", "Favor sweet, bitter, and astringent tastes", "Avoid spicy, sour, salty, fermented foods", "Avoid alcohol, caffeine, excessive fried food", "Regular meal times, avoid skipping meals"],
    "herbs": ["Amalaki (Emblica officinalis) - cooling, Vit C rich, balances Pitta", "Guduchi (Tinospora cordifolia) - immunomodulator, Pitta-shamaka", "Shatavari (Asparagus racemosus) - cooling, soothing for GI tract", "Yashtimadhu (Glycyrrhiza glabra) - demulcent for GI lining (use cautiously if BP issues)", "Neem (Azadirachta indica) - bitter, for skin issues"],
    "ayurvedic_medicines": ["Avipattikar Churna - classical formula for hyperacidity", "Kamadudha Rasa (with Mukta) - cooling antacid formulation", "Chandanasava - cooling formulation, helpful for burning sensations", "Sutshekhar Rasa - often used for Pitta conditions including GI issues"],
    "therapies": ["Virechana (therapeutic purgation) - primary Pitta detoxification (under guidance)", "Cooling oil application (e.g., Chandanadi Taila, Coconut oil)", "Shirodhara with cooling liquids (e.g., milk, buttermilk)"],
    "lifestyle": ["Avoid excessive heat and sun exposure", "Moderate exercise, avoid overheating", "Practice stress-reducing techniques (meditation, calming pranayama like Sheetali)", "Spend time in nature, near water", "Moonlight walks"]
  }
}
"""
print('process completed.....')

process completed.....


This section creates detailed few-shot examples to guide the model in generating structured output:

- Each example shows a complete Ayurvedic diagnosis in JSON format
- The examples demonstrate the expected structure and depth of analysis
- They cover different dosha types (Vata and Pitta) to show variety


## 🔬 Step 6: Implementing the RAG-Enhanced Diagnostic Function

This cell defines the core logic of the Ayurvedic Diagnostic Assistant in the `generate_ayurvedic_diagnosis` function. It orchestrates the RAG process and the call to the Gemini model.

1.  **Retrieve Context (RAG - Retrieval):**
    *   Takes the `patient_symptoms` as input.
    *   Uses the `retriever` (created in Step 4) to search the vector database for text chunks semantically relevant to the input symptoms.
    *   Formats the retrieved document content into a `context` string, clearly marking each document and its source.
    *   Includes error handling in case retrieval fails or the retriever wasn't initialized.
2.  **Construct Prompt:**
    *   Builds a detailed prompt for the Gemini model.
    *   Sets the persona ("expert Ayurvedic physician").
    *   Includes the retrieved `context`, instructing the model to use it *if relevant*.
    *   Inserts the `few_shot_examples` (from Step 5) to guide the output format.
    *   Provides the actual `patient_symptoms` for analysis.
    *   Explicitly asks for the analysis in the specified JSON format, including a ` ```json` hint.
3.  **Generate Response (RAG - Generation):**
    *   Calls the `model.generate_content` method with the constructed prompt.
    *   Sets `temperature` to 0.2 for more deterministic and focused output.
    *   **Crucially, requests JSON output directly** using `response_mime_type="application/json"`.
4.  **Parse Output:**
    *   Attempts to parse the model's response directly as JSON.
    *   Includes robust **fallback parsing** using regular expressions (`re.search`) to extract the JSON block if direct parsing fails (e.g., if the model wraps the JSON in markdown backticks or includes extra text).
    *   Handles potential `JSONDecodeError` and other exceptions during generation or parsing, returning an error dictionary if necessary.

This function integrates the RAG pipeline, few-shot prompting, and structured output request to produce the desired Ayurvedic analysis.

In [6]:
# Step 6: Implementing the RAG-Enhanced Diagnostic Function

if 'retriever' not in globals() or retriever is None:
    print("Warning: Retriever is not available. RAG search will be skipped.")

    class DummyRetriever:
        def get_relevant_documents(self, query):
            print("Dummy retriever used: No documents retrieved.")
            return []
    retriever = DummyRetriever()


def generate_ayurvedic_diagnosis(patient_symptoms):

    print(f"Searching for documents relevant to: {patient_symptoms[:100]}...") 
    try:
        #relevant_docs = retriever.get_relevant_documents(patient_symptoms)
        relevant_docs = retriever.invoke(patient_symptoms)
        print(f"Retrieved {len(relevant_docs)} documents.")
    except Exception as e:
        print(f"Error during document retrieval: {e}")
        relevant_docs = []

    # Extract the content from search results
    context_docs = []
    if relevant_docs:
        for i, doc in enumerate(relevant_docs):
            source = doc.metadata.get('source', 'Unknown')
            # Include similarity score if retriever provides it (FAISS retriever usually doesn't directly)
            # We'll omit similarity here as it's not standard from basic FAISS retriever
            context_docs.append(f"--- Relevant Document {i+1} ---\\nSource: {source}\\nContent: {doc.page_content}\\n--- End Document {i+1} ---")
        context = "\\n\\n".join(context_docs)
    else:
        context = "No specific documents found in the knowledge base for these symptoms."
        print("No relevant documents found, proceeding without specific context.")

    # Create the prompt with few-shot examples and retrieved context
    prompt = f"""You are an expert Ayurvedic physician with decades of experience.
Analyze the patient's symptoms based on Ayurvedic principles (Tridosha theory - Vata, Pitta, Kapha).
Use the following retrieved Ayurvedic knowledge ONLY IF RELEVANT to supplement your analysis. If no specific context is found or it seems irrelevant, rely on your general Ayurvedic expertise.

### Retrieved Context (Use if relevant):
{context}
### End Context

Here are examples of how to structure your analysis in JSON format:

{few_shot_examples}

Now, analyze the following patient symptoms and provide a structured Ayurvedic diagnosis in the **exact same JSON format** as the examples. Focus on identifying the dominant dosha, imbalances, providing a likely diagnosis (Prakriti/Vikriti), and recommending treatments based ONLY on the provided symptoms and general Ayurvedic principles, referencing the context only if truly applicable. **Ensure treatments distinguish between single 'herbs' and classical 'ayurvedic_medicines' as shown in the examples.**

Patient Symptoms: "{patient_symptoms}"

Ayurvedic Analysis (JSON Output Only):
```json
""" # Added ```json hint for the model

    print("Generating diagnosis with Gemini...")
    # Generate the diagnosis using Gemini 2.0 Flash
    try:
        response = model.generate_content(
            prompt,
            generation_config={"temperature": 0.2, "response_mime_type": "application/json"} # Request JSON output
            # Note: gemini-2.0-flash might not fully support response_mime_type yet. Fallback parsing needed.
        )

        # Attempt to parse JSON directly first (if model respects mime type)
        try:
            diagnosis_json = json.loads(response.text)
            print("Successfully parsed JSON response directly.")
            return diagnosis_json
        except (json.JSONDecodeError, TypeError):
             print("Direct JSON parsing failed, attempting regex extraction...")
             # Fallback to regex extraction if direct parsing fails or mime type not supported
             diagnosis_text = response.text
             # Improved regex to find JSON block, potentially within backticks
             json_match = re.search(r'```json\\s*(\\{.*?\\})\\s*```', diagnosis_text, re.DOTALL | re.IGNORECASE)
             if not json_match:
                 json_match = re.search(r'(\\{.*?\\})', diagnosis_text, re.DOTALL) # Broader search

             if json_match:
                 json_str = json_match.group(1)
                 try:
                     diagnosis_json = json.loads(json_str)
                     print("Successfully parsed JSON using regex fallback.")
                     return diagnosis_json
                 except json.JSONDecodeError as json_e:
                     print(f"JSONDecodeError after regex extraction: {json_e}")
                     return {"error": "Could not parse the diagnosis as JSON", "raw_response": diagnosis_text}
             else:
                 print("No JSON block found using regex.")
                 return {"error": "No JSON found in the response", "raw_response": diagnosis_text}

    except Exception as e:
        print(f"Error during Gemini API call: {e}")
        # Extract more details if possible from the exception object
        error_details = str(e)
        # Check for specific Google API error types if needed
        return {"error": f"Failed to generate response from AI model: {error_details}", "raw_response": ""}

print('diagnostic function ready.....')

diagnostic function ready.....


This function implements the RAG-enhanced diagnostic capability:

1. It searches the vector database for relevant Ayurvedic knowledge based on patient symptoms
2. It extracts content from the top 5 most relevant documents, including relevance scores
3. It creates a prompt that includes:
    - The retrieved Ayurvedic knowledge as context
    - Few-shot examples to guide the model's response format
    - The patient's symptoms to analyze
4. It generates a response using Gemini 2.0 Flash
5. It extracts and parses the JSON portion of the response

## 🎨 Step 7: Creating a Visually Appealing Display Function

To present the structured JSON output from the model in a user-friendly way, this cell defines the `display_diagnosis` function.

*   **Input:** Takes the Python dictionary (parsed from the JSON output) containing the diagnosis.
*   **Functionality:**
    *   Performs input validation to ensure it receives a dictionary.
    *   Checks for and displays any errors returned by the generation function.
    *   Formats the diagnostic details (dominant dosha, imbalances, diagnosis, evidence, treatments) into clean, readable HTML.
    *   Uses distinct sections, colors, icons/emojis (like 🩺, ⚖️, 🔍, 🌱, ✅, ❌, 🍴, 🌿, 💊, 👐, ❤️), and styling (borders, padding, shadows) for better visual organization and appeal.
    *   Separates treatment recommendations into logical categories (Dietary, Herbal Suggestions, Ayurvedic Medicines/Formulations, Therapies, Lifestyle).
    *   Includes the current date and a standard disclaimer about consulting a qualified practitioner.
*   **Output:** Uses `IPython.display.HTML` to render the generated HTML directly within the notebook output area.

In [7]:
# Step 7: Creating a Visually Appealing Display Function

import datetime
from IPython.display import display, HTML, Markdown

# Updated Function: display_diagnosis with Emojis and separate Medicines section

def display_diagnosis(diagnosis):
    """
    Displays the Ayurvedic diagnosis results in a visually appealing HTML format
    with integrated emojis, separating Herbs and Ayurvedic Medicines/Formulations.

    Args:
        diagnosis (dict): A dictionary containing the diagnosis results.
    """
    # --- Input Validation ---
    if not isinstance(diagnosis, dict):
        display(Markdown(f"""
        <div style="border: 2px solid orange; padding: 15px; background-color: #fff8e1; color: #6f4f00;">
            <h2><span style="color:orange;">🤔</span> Input Error</h2>
            <p><strong>Details:</strong> The provided diagnosis input is not a valid dictionary.</p>
            <pre style="white-space: pre-wrap; word-wrap: break-word;">Input Type: {type(diagnosis)}</pre>
        </div>
        """))
        return

    # --- Error Handling ---
    if "error" in diagnosis:
        error_message = diagnosis.get('raw_response', diagnosis['error'])
        display(Markdown(f"""
        <div style="border: 2px solid red; padding: 15px; background-color: #fff0f0; color: #a00;">
            <h2><span style="color:red;">⚠️</span> Error in Diagnosis Generation</h2>
            <p><strong>Details:</strong> {diagnosis['error']}</p>
            <hr>
            <strong>Raw Response (if available):</strong>
            <pre style="white-space: pre-wrap; word-wrap: break-word; max-height: 200px; overflow-y: auto;">{error_message}</pre>
        </div>
        """))
        return

    # --- Date Setup ---
    try:
        current_date = datetime.date.today().strftime("%B %d, %Y")
    except Exception:
        current_date = "N/A"

    # --- HTML Structure Start ---
    html = f"""
    <div style="font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; max-width: 800px; margin: 10px auto; padding: 25px; border: 1px solid #ccc; border-radius: 10px; background-color: #fdfdfd; box-shadow: 0 5px 15px rgba(0,0,0,0.08);">
        <h2 style="color: #3a5a40; border-bottom: 2px solid #588157; padding-bottom: 10px; text-align: center; margin-bottom: 25px;">🩺 Ayurvedic Diagnosis Report 🩺</h2>

        <!-- Primary Assessment Section -->
        <div style="margin-bottom: 25px; background-color: #ffffff; padding: 20px; border-radius: 8px; box-shadow: 0 2px 5px rgba(0,0,0,0.05); border-left: 5px solid #588157;">
            <h3 style="color: #4a6b51; margin-top: 0; margin-bottom: 15px;">⚖️ Primary Assessment</h3>
            <p style="margin-bottom: 8px;"><strong>👤 Dominant Dosha:</strong> <span style="font-size: 1.1em; font-weight: bold; color: #3a5a40;">{diagnosis.get('dominant_dosha', 'N/A')}</span></p>
            <p style="margin-bottom: 15px;"><strong>📋 Provisional Diagnosis:</strong> <span style="font-size: 1.1em;">{diagnosis.get('diagnosis', 'N/A')}</span></p>

            <h4 style="color: #4a6b51; margin-bottom: 10px;">⚠️ Identified Imbalances:</h4>
            <ul style="list-style: none; padding-left: 5px;">
    """ # --- Imbalances ---
    imbalances = diagnosis.get('imbalances', [])
    if imbalances:
        for imbalance in imbalances:

            html += f"<li style='margin-bottom: 6px; padding-left: 25px; position: relative;'><span style='position: absolute; left: 0; color: #e67e22;'>➡</span>{imbalance}</li>"
    else:
        html += "<li style='color: #777; font-style: italic;'>No specific imbalances listed.</li>"
    html += """
            </ul>
        </div>

        <!-- Supporting Evidence Section -->
        <div style="margin-bottom: 25px; background-color: #e8f5e9; padding: 20px; border-radius: 8px; box-shadow: 0 2px 5px rgba(0,0,0,0.05); border-left: 5px solid #a5d6a7;">
            <h3 style="color: #2e7d32; margin-top: 0; margin-bottom: 15px;">🔍 Supporting Evidence</h3>
    """ # --- Evidence ---
    evidence_map = diagnosis.get('supporting_evidence', {})
    displayed_evidence = False
    symptoms_list = []
    # Consolidate all symptom matching keys
    for key in evidence_map.keys():
        if 'symptoms_matching' in key:
            items = evidence_map.get(key, [])
            if isinstance(items, list):
                symptoms_list.extend(items)
            elif items:
                 symptoms_list.append(str(items))

    symptoms_list = sorted(list(set(symptoms_list)))

    if symptoms_list:
         displayed_evidence = True
         # Use f-string for consistency
         html += f"<h4 style='color: #2e7d32; margin-bottom: 10px;'>✅ Symptoms Matching Dosha:</h4>"
         html += "<ul style='list-style-type: disc; padding-left: 25px; margin-bottom: 15px;'>"
         for item in symptoms_list:
             html += f"<li style='margin-bottom: 4px;'>{item}</li>"
         html += "</ul>"

    pulse_indication = evidence_map.get('pulse_indication')
    if pulse_indication:
         displayed_evidence = True
         html += f"<h4 style='color: #2e7d32; margin-bottom: 5px;'>💓 Pulse Indication:</h4>"
         html += f"<p style='margin-left: 10px; margin-bottom: 15px;'>{pulse_indication}</p>" # f-string interpolation works here

    tongue_indication = evidence_map.get('tongue_indication')
    if tongue_indication:
         displayed_evidence = True
         html += f"<h4 style='color: #2e7d32; margin-bottom: 5px;'>👅 Tongue Indication:</h4>"
         html += f"<p style='margin-left: 10px; margin-bottom: 15px;'>{tongue_indication}</p>" # f-string interpolation works here

    processed_keys = [k for k in evidence_map.keys() if 'symptoms_matching' in k] + ['pulse_indication', 'tongue_indication']
    other_keys = sorted([k for k in evidence_map.keys() if k not in processed_keys and evidence_map[k]])

    if other_keys:
        displayed_evidence = True
        html += f"<h4 style='color: #2e7d32; margin-bottom: 10px;'>❓ Other Indicators:</h4>"
        for key in other_keys:
             value = evidence_map[key]
             # Use f-string for consistency
             html += f"<p style='margin-left: 10px; margin-bottom: 10px;'><strong>{key.replace('_', ' ').title()}:</strong> "
             if isinstance(value, list):
                  html += ", ".join(map(str, value))
             else:
                  html += str(value)
             html += "</p>"

    if not displayed_evidence:
         html += "<p style='color: #777; font-style: italic;'>No specific supporting evidence provided.</p>"
    html += """
        </div>

        <!-- Recommended Treatments Section -->
        <div style="margin-bottom: 25px; background-color: #ffffff; padding: 20px; border-radius: 8px; box-shadow: 0 2px 5px rgba(0,0,0,0.05); border-left: 5px solid #81c784;">
            <h3 style="color: #3a5a40; margin-top: 0; margin-bottom: 15px;">🌱 Recommended Treatments</h3>
    """ # --- Treatments ---
    treatments = diagnosis.get('recommended_treatments', {})
    displayed_treatments = False
    treatment_categories = {
        'dietary': {'emoji': '🍴', 'list_style': '🥗', 'title': 'Dietary Recommendations'},
        'herbs': {'emoji': '🌿', 'list_style': '🌿', 'title': 'Herbal Suggestions (Single Herbs)'},
        'ayurvedic_medicines': {'emoji': '💊', 'list_style': '💊', 'title': 'Ayurvedic Medicines/Formulations'},
        'therapies': {'emoji': '👐', 'list_style': '✨', 'title': 'Recommended Therapies'},
        'lifestyle': {'emoji': '❤️', 'list_style': '🚶', 'title': 'Lifestyle Adjustments'}
    }
    known_keys_ordered = ['dietary', 'herbs', 'ayurvedic_medicines', 'therapies', 'lifestyle']

    for key in known_keys_ordered:
        config = treatment_categories.get(key)
        if key in treatments and treatments[key] and config:
            items = treatments[key]
            if not isinstance(items, list):
                items = [items]

            if items:
                displayed_treatments = True
                # Use f-string for consistency
                html += f"<h4 style='color: #4a6b51; margin-bottom: 10px;'>{config['emoji']} {config['title']}:</h4>"
                html += f"<ul style='list-style: none; padding-left: 5px; margin-bottom: 15px;'>"
                list_emoji = config['list_style']
                for rec in items:
                    # Use f-string for consistency
                    html += f"<li style='margin-bottom: 6px; padding-left: 25px; position: relative;'><span style='position: absolute; left: 0; color: #4caf50;'>{list_emoji}</span>{rec}</li>"
                html += "</ul>"

    other_treatment_keys = sorted([k for k in treatments.keys() if k not in treatment_categories and treatments[k]])
    if other_treatment_keys:
        displayed_treatments = True
        html += f"<h4 style='color: #4a6b51; margin-bottom: 10px;'>➕ Other Recommendations:</h4>"
        for key in other_treatment_keys:
             items = treatments[key]
             if not isinstance(items, list):
                 items = [items]

             if items:
                 # Use f-string for consistency
                 html += f"<p style='margin-left: 10px; margin-bottom: 5px;'><strong>{key.replace('_', ' ').title()}:</strong></p>"
                 html += "<ul style='list-style: none; padding-left: 5px; margin-bottom: 15px;'>"
                 for rec in items:
                      # Use f-string for consistency
                      html += f"<li style='margin-bottom: 6px; padding-left: 25px; position: relative;'><span style='position: absolute; left: 0; color: #4caf50;'>✔️</span>{rec}</li>"
                 html += "</ul>"

    if not displayed_treatments:
        html += "<p style='color: #777; font-style: italic;'>No specific treatment recommendations provided.</p>"
    # The footer section is now inside the f-string, so {current_date} will be replaced
    html += f"""
        </div>

        <!-- Footer -->
        <div style="text-align: center; margin-top: 30px; font-size: 0.9em; color: #888; border-top: 1px solid #eee; padding-top: 15px;">
            <p style="margin-bottom: 5px;">🤖 Generated by Ayurvedic Diagnostic Assistant</p>
            <p style="margin-bottom: 5px;">Made with ❤️ in India 🇮🇳</p>
            <p>📅 Date: {current_date}</p>
            <p style="margin-top: 10px; font-size: 0.85em; font-style: italic;">Reminder: This is an AI-generated analysis for informational purposes. Always consult a qualified Ayurvedic practitioner for professional medical advice.</p>
        </div>
    </div>
    """ # --- HTML End ---

    # Display the final HTML
    display(HTML(html))

print('display function updated .....')

display function updated .....


This function creates an enhanced visual display for the diagnosis:

- It uses a modern, clean design with cards for different sections
- It includes visual hierarchy with colors and spacing
- It formats different types of data appropriately
- It includes the current date for reference


## 🎛️ Step 8: Creating an Interactive User Interface

This cell uses `ipywidgets` to build a simple interactive interface within the notebook, allowing users to easily input symptoms and trigger the analysis.

*   **Components:**
    *   `widgets.Textarea`: A multi-line text box for entering patient symptoms.
    *   `widgets.Button` (Analyze): Triggers the `generate_ayurvedic_diagnosis` function with the input symptoms and displays the results using `display_diagnosis`.
    *   `widgets.Button` (Clear): Clears the symptoms input and the output area.
    *   `widgets.Output`: Areas to display status messages (like "Analyzing...") and the final formatted diagnosis.
    *   `widgets.HTML`: Used for titles and descriptive text.
    *   Layout widgets (`HBox`, `VBox`): Organize the components vertically and horizontally.
*   **Behavior:** Button clicks are linked to Python functions (`on_analyze_button_clicked`, `on_clear_button_clicked`) that handle the application logic (getting input, calling the analysis function, clearing output).
*   **Structure:** The widgets are assembled into a `VBox` container (`ui_container`) for display. (Note: The final display is handled in Step 10).

In [8]:
# Create an interactive user interface
from ipywidgets import widgets, Layout

# Create input widgets
symptoms_input = widgets.Textarea(
    value='',
    placeholder='Enter patient symptoms in detail...',
    description='Symptoms:',
    disabled=False,
    layout=Layout(width='100%', height='150px')
)

analyze_button = widgets.Button(
    description='Generate Diagnosis',
    button_style='success',
    tooltip='Click to analyze symptoms',
    icon='stethoscope',
    layout=Layout(width='200px')
)

clear_button = widgets.Button(
    description='Clear',
    button_style='warning',
    tooltip='Clear input and results',
    icon='eraser',
    layout=Layout(width='100px')
)

button_container = widgets.HBox([analyze_button, clear_button], layout=Layout(justify_content='center'))

output_area = widgets.Output()
status_area = widgets.Output()

# Define button click behaviors
def on_analyze_button_clicked(b):
    status_area.clear_output()
    output_area.clear_output()
    
    with status_area:
        if not symptoms_input.value.strip():
            print("Please enter patient symptoms before generating a diagnosis.")
            return
        print("Analyzing symptoms... This may take a moment.")
    
    with output_area:
        diagnosis = generate_ayurvedic_diagnosis(symptoms_input.value)
        status_area.clear_output()
        display_diagnosis(diagnosis)

def on_clear_button_clicked(b):
    symptoms_input.value = ''
    output_area.clear_output()
    status_area.clear_output()

analyze_button.on_click(on_analyze_button_clicked)
clear_button.on_click(on_clear_button_clicked)

# Display the UI with improved styling
header = widgets.HTML(
    value="<h1 style='text-align:center; color:#3a5a40;'>Ayurvedic Diagnostic Assistant</h1>"
           "<p style='text-align:center;'>Enter the patient's symptoms in detail for a comprehensive Ayurvedic analysis</p>"
)

ui_container = widgets.VBox([
    header,
    widgets.HTML(value="<hr>"),
    symptoms_input,
    button_container,
    status_area,
    output_area
], layout=Layout(width='100%', padding='20px'))

#display(ui_container)
print('Go to Step 10 & Run Final UI for the Results.....')


Go to Step 10 & Run Final UI for the Results.....


This section creates an enhanced interactive user interface:

- A text area for entering patient symptoms
- Buttons for generating a diagnosis and clearing the form
- Status and output areas for displaying results
- Improved styling with a clear visual hierarchy


## 🧪 Step 9: Defining Sample Cases for Testing and Validation

To facilitate testing and demonstrate the system's capabilities, this cell defines a list of `sample_cases`. It also creates a function and buttons to run these predefined examples.

*   **Sample Data:** Includes three distinct cases representing Vata, Pitta, and Kapha dominant imbalances, each with typical symptoms and the `expected_dosha`.
*   **`display_sample_case` Function:**
    *   Takes a case number as input.
    *   Retrieves the corresponding symptoms and expected dosha from the `sample_cases` list.
    *   Displays the symptoms being analyzed.
    *   Calls the `generate_ayurvedic_diagnosis` function.
    *   Uses `display_diagnosis` to show the formatted results.
    *   **Performs basic validation:** Compares the `dominant_dosha` identified by the model against the `expected_dosha` for the sample case.
    *   Displays a visual indicator (✅ or ❌) and a message indicating whether the dosha identification was correct.
    *   Includes error handling for invalid case numbers or issues during generation/display.
*   **Sample Buttons:** Creates `ipywidgets.Button` for each sample case. Clicking a button triggers the `display_sample_case` function for that specific case via a lambda function.

This step provides a simple mechanism for **GenAI evaluation** by comparing model output against known expected results for specific inputs.

In [9]:
# Step 9: Example Usage with Sample Cases

# (Keep sample_cases definition as is)
sample_cases = [
    {   # Pitta-dominated case
        "symptoms": "Frequent heartburn after meals, excessive body heat, reddish skin eruptions, irritability, and acidic taste in mouth. I sweat profusely and have intense hunger pangs.",
        "expected_dosha": "Pitta"
    },
    {   # Vata-dominated case
        "symptoms": "Chronic lower back pain, dry skin, irregular digestion, anxiety, and insomnia. Symptoms worsen in cold weather and improve with warmth.",
        "expected_dosha": "Vata"
    },
    {   # Kapha-dominated case
        "symptoms": "Persistent nasal congestion, lethargy after meals, weight gain despite normal diet, and stiff joints in the morning. Thick white tongue coating.",
        "expected_dosha": "Kapha"
    }
]

def display_sample_case(case_number):
    """Display and analyze a sample case from the predefined list"""
    case_index = case_number - 1  # Convert to 0-based index

    # Check if case_index is valid
    if not 0 <= case_index < len(sample_cases):
        with output_area:
             output_area.clear_output()
             display(Markdown(f"<p style='color:red;'>Error: Invalid sample case number {case_number}.</p>"))
        return

    with output_area: # Ensure output goes to the correct area
        output_area.clear_output() # Clear previous output in this area

        # Display sample case header
        display(Markdown(f"## Analyzing Sample Case {case_number}"))

        # Get the sample case details
        case = sample_cases[case_index]
        symptoms = case["symptoms"]
        expected_dosha = case["expected_dosha"]

        # Display symptoms
        display(Markdown(f"**Symptoms:**\n\n```\n{symptoms}\n```")) # Use code block for better formatting

        # Generate and display diagnosis
        display(Markdown("\n**Generating Diagnosis...** *(This might take a moment)*"))
        try:
            # It's good practice to wrap external calls in try-except
            diagnosis = generate_ayurvedic_diagnosis(symptoms)

            # --- Rectification: Check for error in diagnosis ---
            if "error" in diagnosis:
                display(Markdown("\n**Diagnosis Generation Failed:**"))
                display_diagnosis(diagnosis) # Use the display function to show the formatted error
                return # Stop processing this case
            # --- End Rectification ---

            # Display results comparison
            display(Markdown("\n**Diagnosis Results:**"))
            display_diagnosis(diagnosis) # Use the formatted display function

            # Validate expected dosha
            actual_dosha = diagnosis.get("dominant_dosha", "Unknown") # Safe access
            display(Markdown(f"\n---\n**Validation:**\n- Expected Dominant Dosha: `{expected_dosha}`\n- Identified Dominant Dosha: `{actual_dosha}`")) # Use backticks

            # Add visual validation indicator
            if actual_dosha.strip().lower() == expected_dosha.strip().lower(): # Added strip() and lower() for robustness
                display(Markdown("<p style='color:green; font-weight:bold;'>✅ Correct dosha identification</p>"))
            else:
                display(Markdown(f"<p style='color:red; font-weight:bold;'>❌ Dosha mismatch (Expected: {expected_dosha}, Got: {actual_dosha})</p>"))

        except Exception as e:
             # Catch unexpected errors during generation or display
             display(Markdown(f"<p style='color:red;'>An unexpected error occurred: {e}</p>"))


# Create sample case buttons
sample_buttons = []
for i in range(len(sample_cases)):
    btn = widgets.Button(
        description=f'Case {i+1}',
        tooltip=f'Test sample case {i+1} ({sample_cases[i]["expected_dosha"]})', # Add expected dosha to tooltip
        layout=widgets.Layout(width='130px', margin='5px'), # Slightly wider
        button_style='info' # Changed style for visual distinction
    )
    btn.case_number = i+1
    # Use a lambda that captures the current button's case_number
    btn.on_click(lambda b, num=i+1: display_sample_case(num))
    sample_buttons.append(btn)

print('sample cases successfully embeded.....')

sample cases successfully embeded.....


This completes the sample case implementation with:

1. Three comprehensive sample cases covering different dosha imbalances
2. Automatic validation against expected results
3. Visual feedback (green check/red cross) for quick validation
4. Organized button layout with tooltips
5. Clear section separation in the notebook

The sample cases help users:

- Test the system without manual input
- Verify the model's dosha identification accuracy
- Understand different types of Ayurvedic diagnoses
- Validate the RAG system's knowledge retrieval capabilities

To continue with the complete implementation, here's the final user interface integration:


## 📜 Step 10: Final User Interface Integration and Display

This final code cell assembles all the previously defined UI components (`ipywidgets`) into a cohesive and polished interface for the Ayurvedic Diagnostic Assistant.

*   **Structure:**
    *   Uses `widgets.Tab` to create a tabbed interface, separating the "Symptom Entry" area (input text area, analyze/clear buttons) from the "Diagnosis Review" area (where results and status messages appear).
    *   Includes informative HTML headers, titles, and separators for clarity.
    *   Integrates the sample case buttons (created in Step 9) below the main tabbed interface, allowing easy access for testing.
    *   Adds a final disclaimer note.
*   **Layout:** Employs `VBox`, `HBox`, and `Layout` parameters to control the arrangement, spacing, padding, borders, and overall appearance of the UI elements, aiming for a clean and professional look.
*   **Display:** The `display(final_ui)` command renders the complete interactive interface within the notebook.

Users can now interact with the system: enter symptoms in the first tab, click "Generate Diagnosis", review the formatted results in the second tab, or test the system using the sample case buttons below.

In [10]:
# Step 10: Final User Interface Integration

# Create main UI panels (reusing widgets defined earlier)
input_panel = widgets.VBox([
    widgets.HTML("<h2 style='color:#3a5a40;'>📝 Patient Symptoms Input</h2>"),
    symptoms_input,
    widgets.HBox([analyze_button, clear_button], layout=widgets.Layout(justify_content='center', margin='15px 0 0 0')) # Added margin
], layout=widgets.Layout(width='90%', margin='15px auto')) # Adjusted width/margin

output_panel = widgets.VBox([
    widgets.HTML("<h2 style='color:#3a5a40;'>📜 Diagnosis Results</h2>"),
    status_area, # **** Added Status Area here ****
    output_area
], layout=widgets.Layout(width='95%', margin='15px auto')) # Adjusted width/margin

# Create tabbed interface
tab_layout = widgets.Layout(padding='15px') # Slightly reduced padding
tabs = widgets.Tab(children=[input_panel, output_panel], layout=tab_layout)
tabs.set_title(0, '⌨️ Symptom Entry')
tabs.set_title(1, '🧐 Diagnosis Review')

# Assemble complete interface
final_ui = widgets.VBox([
    widgets.HTML("<h1 style='text-align:center; color:#3a5a40;'>🕉️ Ayurvedic Diagnostic Assistant ✨</h1>"),
    widgets.HTML("<div style='text-align:center; margin-bottom:20px; font-size: 1.1em;'>"
                 "🌿 Combining Ancient Wisdom with Modern AI 🤖</div>"),
    tabs,
    widgets.HTML("<hr style='margin: 25px auto; width: 80%; border-top: 1px dashed #ccc;'>"), # Added separator
    widgets.HTML("<h3 style='margin-top:10px; text-align:center;'>🧪 Test System with Sample Cases:</h3>"),
    widgets.HBox(sample_buttons, layout=widgets.Layout(justify_content='center', flex_flow='wrap', margin='10px 0')),
    widgets.HTML("<div style='margin-top:25px; color:#666; font-size:0.9em; text-align:center; border-top: 1px solid #eee; padding-top: 15px;'>" # Adjusted styling
                 "ℹ️ <strong>Note:</strong> This is a demonstration system. Always consult a qualified Ayurvedic practitioner for medical advice.</div>")
], layout=widgets.Layout(width='95%', margin='20px auto', border='1px solid #ccc', padding='20px', border_radius='10px'))

print("✨ Final UI Ready! ✨")
display(final_ui)

✨ Final UI Ready! ✨


VBox(children=(HTML(value="<h1 style='text-align:center; color:#3a5a40;'>🕉️ Ayurvedic Diagnostic Assistant ✨</…

This completes the implementation of the Ayurvedic Diagnostic Assistant.

**Summary of Features:**

1.  **Core Engine:** Uses Google's Gemini 2.0 Flash model.
2.  **Knowledge Base:** Leverages Ayurvedic texts via a Kaggle Dataset.
3.  **RAG Implementation:** Employs semantic search using SentenceTransformer embeddings and a FAISS vector database to retrieve relevant context.
4.  **Structured Output:** Generates diagnoses in a consistent JSON format, guided by few-shot examples and API parameters.
5.  **User Interface:** Provides an interactive `ipywidgets`-based UI with input areas, buttons, tabs, and formatted HTML output.
6.  **Validation:** Includes sample cases with expected outcomes for basic testing and demonstration.

**How to Use:**

1.  Navigate to the "⌨️ Symptom Entry" tab.
2.  Enter patient symptoms in natural language in the text area.
3.  Click the "Generate Diagnosis" button.
4.  Switch to the "🧐 Diagnosis Review" tab to see the structured diagnosis report.
5.  Alternatively, click the "Case 1", "Case 2", or "Case 3" buttons below the tabs to run predefined examples and see their results and validation.
6.  Use the "Clear" button to reset the symptom input and results area.

This notebook effectively demonstrates the integration of several key GenAI capabilities (RAG, Structured Output, Few-shot Prompting, Embeddings, Vector Search) to create a domain-specific assistant. Remember that this is a demonstration tool and should not replace professional medical advice from a qualified Ayurvedic practitioner.

## ✍️ Authors

This project is developed by:

*   **Dr. Debabrata Mondal**
    *   Kaggle: [@drdebabratamondal](https://www.kaggle.com/drdebabratamondal)
    *   LinkedIn: [Dr. Debabrata Mondal](https://www.linkedin.com/in/drdebabratamondal/)
*   **Sarbojeet Bhowmick**
    *   Kaggle: [@sarbojeetbhowmick](https://www.kaggle.com/sarbojeetbhowmick)
    *   LinkedIn: [Sarbojeet Bhowmick](https://www.linkedin.com/in/sarbojeet-bhowmick-5a8a7287/)
*   **Yadnyesh Dashpute**
    *   Kaggle: [@yadnyeshdashpute](https://www.kaggle.com/yadnyeshdashpute)
    *   LinkedIn: [Yadnyesh Dashpute](https://www.linkedin.com/in/yadnyesh-dashpute-1b5829250/)
*   **Aman Kumar Batra**
    *   Kaggle: [@amankumarbatra3](https://www.kaggle.com/amankumarbatra3)
    *   LinkedIn: [Aman Kumar Batra](https://www.linkedin.com/in/aman-batra/)

---

## 📖🎬 Further Exploration

Dive deeper into the project and its implementation through these resources:

*   📝 **Read the Blog Post:** [Click here!!!]([https://drdebabratamondal.com/ayurveda-and-ai/])
*   📝 **Read the Blog Post on Medium:** [Click here!!!]([https://drdebabratamondal.medium.com/bridging-ancient-wisdom-and-modern-ai-our-journey-building-an-ayurvedic-diagnostic-assistant-with-f6ab8b76a017])
*   ▶️ **Watch the Video Walkthrough:** [Click here!!!]([https://youtu.be/tT2-s4OluoY])

---