# Combined RAG System: Intent Parser + Medical RAG + Nutrition RAG

A unified nutrition recommendation system that combines medical knowledge with food database search.

## System Flow

```
┌────────────────────┐
│    USER QUERY      │  "I have diabetes, want chicken and eggs, allergic to tomatoes"
└─────────┬──────────┘
          │
          ▼
┌────────────────────┐
│ 1. INTENT PARSER   │  → Extracts: conditions, ingredients, allergies, style
└─────────┬──────────┘
          │
          ▼
┌────────────────────┐
│ 2. MEDICAL RAG     │  → Returns: nutrition constraints from PDF knowledge base
│    (PDFs)          │     {sugar_g: max 10, sodium_mg: max 1500, fiber_g: min 25}
└─────────┬──────────┘
          │
          ▼
┌────────────────────┐
│ 3. AUGMENTED QUERY │  → Combines: user request + medical constraints + allergies
│    BUILDER         │     into a rich prompt for the LLM
└─────────┬──────────┘
          │
          ▼
┌────────────────────┐
│ 4. NUTRITION RAG   │  → LLM generates recommendations using augmented query
│    (Food Database) │     with full constraint awareness
└─────────┬──────────┘
          │
          ▼
┌────────────────────┐
│ 5. SAFETY CHECK    │  → Filters out: allergens, foods exceeding limits,
│                    │     foods on "avoid" list
└─────────┬──────────┘
          │
          ▼
┌────────────────────┐
│ 6. FINAL RESULT    │  → LLM recommendation + verified safe ingredients
└────────────────────┘
```

## Components

| Component | Input | Output |
|-----------|-------|--------|
| Intent Parser | Natural language query | Structured intent (conditions, allergies, ingredients) |
| Medical RAG | Medical conditions | Nutrition constraints + foods to avoid |
| Query Builder | Intent + Constraints | Augmented prompt with all context |
| Nutrition RAG | Augmented query | Food recommendations with explanations |
| Safety Check | Candidate foods | Filtered safe foods only |

## 1. Setup and Imports

In [44]:
# Core imports
import json
import re
import os
from pathlib import Path
from typing import List, Dict, Any, Optional
from dataclasses import dataclass, field

# LangChain imports
from langchain_community.document_loaders import PyPDFLoader
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.vectorstores.faiss import DistanceStrategy
from langchain.schema import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_core.output_parsers import JsonOutputParser

# Ollama LLM
from langchain_ollama import OllamaLLM

# Data processing
import pandas as pd
import numpy as np

print("All imports successful!")

All imports successful!


In [45]:
# Configuration paths
PDF_FOLDER = "../data/pdfs/"
NUTRITION_DATA_PATH = "../data/raw/nutrition.xlsx"
MEDICAL_VECTORSTORE_PATH = "../data/processed/medical_pdfs_vectorstore"
NUTRITION_VECTORSTORE_PATH = "../data/processed/nutrition_vectorstore"

# LLM Configuration
LLM_MODEL = "llama3.2"

print(f"PDF folder: {PDF_FOLDER}")
print(f"Nutrition data: {NUTRITION_DATA_PATH}")

PDF folder: ../data/pdfs/
Nutrition data: ../data/raw/nutrition.xlsx


---
## 2. Intent Parser Component

Parses user query to extract:
- `medical_conditions`: diabetes, hypertension, parkinsons, etc.
- `dietary_restrictions`: keto, vegan, low_sodium, etc.
- `allergies`: tomatoes, shellfish, gluten, etc.
- `ingredients`: chicken, eggs, salad, etc.
- `cooking_style`: quick, slow_cooker, under_30_min, etc.

In [None]:
@dataclass
class UserIntent:
    """Structured representation of parsed user intent."""
    medical_conditions: List[str] = field(default_factory=list)
    dietary_restrictions: List[str] = field(default_factory=list)
    allergies: List[str] = field(default_factory=list)
    ingredients: List[str] = field(default_factory=list)
    cooking_style: List[str] = field(default_factory=list)
    
    def __repr__(self):
        return (
            f"UserIntent:\n"
            f"  medical_conditions: {self.medical_conditions}\n"
            f"  dietary_restrictions: {self.dietary_restrictions}\n"
            f"  allergies: {self.allergies}\n"
            f"  ingredients: {self.ingredients}\n"
            f"  cooking_style: {self.cooking_style}"
        )


class IntentParser:
    """Parses user queries to extract structured intent using LLM."""
    
    def __init__(self, model_name: str = "llama3.2"):
        self.llm = OllamaLLM(
            model=model_name,
            temperature=0,
            format="json"
        )
        self.parser = JsonOutputParser()
        self.chain = self._build_chain()
    
    def _build_chain(self):
        system_instructions = """
        You are a medical nutrition data extractor. Analyze the user query and extract:
        
        - 'medical': Clinical conditions (e.g., 'diabetes_type2', 'hypertension', 'parkinsons').
        - 'restrictions': Diet types (e.g., 'keto', 'vegan', 'low_sodium', 'low_sugar').
        - 'allergies': Food allergens or intolerances (e.g., 'tomatoes', 'shellfish', 'gluten', 'onions').
        - 'ingredients': Available/desired foods (e.g., 'chicken', 'eggs', 'salad').
        - 'style': Cooking preferences (e.g., 'quick', 'under_30_min', 'slow_cooker', 'breakfast').

        Rules:
        - Return empty list [] if category not mentioned
        - Use snake_case for multi-word values
        - Normalize conditions: 'high blood pressure' -> 'hypertension', 'sugar problem' -> 'diabetes'
        """
        
        prompt = ChatPromptTemplate.from_messages([
            ("system", system_instructions),
            ("user", "Analyze this query: {query}")
        ])
        
        return prompt | self.llm | self.parser
    
    def parse(self, query: str) -> UserIntent:
        """Parse user query and return structured intent."""
        try:
            result = self.chain.invoke({"query": query})
            return UserIntent(
                medical_conditions=result.get("medical", []),
                dietary_restrictions=result.get("restrictions", []),
                allergies=result.get("allergies", []),
                ingredients=result.get("ingredients", []),
                cooking_style=result.get("style", [])
            )
        except Exception as e:
            print(f"Error parsing intent: {e}")
            return UserIntent()


# Initialize Intent Parser
intent_parser = IntentParser(model_name=LLM_MODEL)
print("Intent Parser initialized!")

Intent Parser initialized!


In [47]:
# Test the Intent Parser
test_query = "I have parkinson. I want to make something with chicken, eggs and salad. Quick breakfast. I can't eat tomatoes or onions."

print(f"Query: {test_query}\n")
intent = intent_parser.parse(test_query)
print(intent)

Query: I have parkinson. I want to make something with chicken, eggs and salad. Quick breakfast. I can't eat tomatoes or onions.

UserIntent:
  medical_conditions: ['parkinsons']
  dietary_restrictions: ['keto', 'vegan', 'low_sodium', 'low_sugar']
  allergies: ['tomatoes', 'onions']
  ingredients: ['chicken', 'eggs', 'salad']
  cooking_style: ['quick', 'breakfast']


---
## 3. Medical RAG Component

Extracts nutrition constraints from medical PDFs based on conditions.

**Input:** Medical conditions (diabetes, hypertension, parkinsons)

**Output:** Nutrition parameters:
```json
{
    "sugar_g": {"max": 10},
    "sodium_mg": {"max": 1500},
    "fiber_g": {"min": 5},
    "increase": ["fiber", "vegetables"],
    "avoid": ["processed foods"]
}
```

In [None]:
# class MedicalRAG:
#     """RAG system for extracting nutrition constraints from medical documents."""
    
#     def __init__(self, pdf_folder: str, vectorstore_path: str = None):
#         self.pdf_folder = pdf_folder
#         self.vectorstore_path = vectorstore_path
#         self.embeddings = None
#         self.vectorstore = None
#         self.retriever = None
#         self.rag_chain = None
#         self.llm = None
        
#     def initialize(self, load_from_disk: bool = True):
#         """Initialize the Medical RAG system."""
#         print("Initializing Medical RAG...")
        
#         # Initialize embeddings
#         self.embeddings = HuggingFaceEmbeddings(
#             model_name="BAAI/bge-small-en-v1.5",
#             model_kwargs={'device': 'cpu'},
#             encode_kwargs={'normalize_embeddings': True}
#         )
        
#         # Try to load from disk first
#         if load_from_disk and self.vectorstore_path and Path(self.vectorstore_path).exists():
#             print(f"Loading vectorstore from {self.vectorstore_path}")
#             self.vectorstore = FAISS.load_local(
#                 self.vectorstore_path, 
#                 self.embeddings, 
#                 allow_dangerous_deserialization=True
#             )
#             chunks = self._load_and_chunk_pdfs()  # For BM25
#         else:
#             # Load and process PDFs
#             chunks = self._load_and_chunk_pdfs()
#             self.vectorstore = FAISS.from_documents(
#                 documents=chunks,
#                 embedding=self.embeddings,
#                 distance_strategy=DistanceStrategy.COSINE
#             )
        
#         # Create hybrid retriever
#         bm25_retriever = BM25Retriever.from_documents(chunks)
#         bm25_retriever.k = 5
        
#         vector_retriever = self.vectorstore.as_retriever(
#             search_type="mmr",
#             search_kwargs={"k": 5, "fetch_k": 20, "lambda_mult": 0.7}
#         )
        
#         self.retriever = EnsembleRetriever(
#             retrievers=[bm25_retriever, vector_retriever],
#             weights=[0.4, 0.6]
#         )
        
#         # Initialize LLM with JSON format mode for reliable parsing
#         self.llm = OllamaLLM(model=LLM_MODEL, temperature=0.3, format="json")
#         self._build_chain()
        
#         print(f"Medical RAG initialized with {self.vectorstore.index.ntotal} vectors")
    
#     def _load_and_chunk_pdfs(self) -> List[Document]:
#         """Load PDFs and split into chunks."""
#         pdf_files = list(Path(self.pdf_folder).glob("*.pdf"))
#         print(f"Found {len(pdf_files)} PDF files")
        
#         all_documents = []
#         for pdf_path in pdf_files:
#             try:
#                 loader = PyPDFLoader(str(pdf_path))
#                 docs = loader.load()
#                 for doc in docs:
#                     doc.metadata["source_file"] = pdf_path.name
#                 all_documents.extend(docs)
#             except Exception as e:
#                 print(f"Error loading {pdf_path.name}: {e}")
        
#         # Chunk documents
#         text_splitter = RecursiveCharacterTextSplitter(
#             chunk_size=1000,
#             chunk_overlap=200,
#             separators=["\n\n", "\n", ". ", " ", ""]
#         )
#         chunks = text_splitter.split_documents(all_documents)
#         print(f"Created {len(chunks)} chunks")
#         return chunks
    
#     def _build_chain(self):
#         """Build the RAG chain for extracting nutrition parameters."""
#         system_prompt = """You are a medical nutrition specialist. Extract nutrition parameters for the given medical condition(s).

# OUTPUT FORMAT - Return ONLY valid JSON:
# {{
#     "conditions": ["list of conditions"],
#     "constraints": {{
#         "sugar_g": {{"max": number_or_null}},
#         "sodium_mg": {{"max": number_or_null}},
#         "fiber_g": {{"min": number_or_null}},
#         "protein_g": {{"max": number_or_null, "min": number_or_null}},
#         "calories": {{"max": number_or_null}},
#         "saturated_fat_g": {{"max": number_or_null}},
#         "cholesterol_mg": {{"max": number_or_null}}
#     }},
#     "increase": ["foods/nutrients to eat more"],
#     "limit": ["foods/nutrients to reduce"],
#     "avoid": ["foods to completely avoid"],
#     "notes": ["important dietary notes"]
# }}

# RULES:
# 1. Extract from context only - use null if not specified
# 2. For diabetes: focus on sugar, carbs
# 3. For hypertension: focus on sodium, potassium
# 4. For Parkinson's: consider protein timing with medication
# 5. Return ONLY JSON, no other text

# CONTEXT:
# {context}"""
        
#         prompt = ChatPromptTemplate.from_messages([
#             ("system", system_prompt),
#             ("human", "{input}")
#         ])
        
#         question_answer_chain = create_stuff_documents_chain(self.llm, prompt)
#         self.rag_chain = create_retrieval_chain(self.retriever, question_answer_chain)
    
#     def _extract_json(self, text: str) -> dict:
#         """Extract JSON from text, handling various formats."""
#         cleaned = text.strip()
        
#         # Remove markdown code blocks
#         if cleaned.startswith("```json"):
#             cleaned = cleaned[7:]
#         if cleaned.startswith("```"):
#             cleaned = cleaned[3:]
#         if cleaned.endswith("```"):
#             cleaned = cleaned[:-3]
#         cleaned = cleaned.strip()
        
#         # Try direct parsing first
#         try:
#             return json.loads(cleaned)
#         except json.JSONDecodeError:
#             pass
        
#         # Try to find JSON object using regex
#         json_match = re.search(r'\{[\s\S]*\}', cleaned)
#         if json_match:
#             try:
#                 return json.loads(json_match.group())
#             except json.JSONDecodeError:
#                 pass
        
#         # If all else fails, raise error
#         raise ValueError(f"Could not extract valid JSON from response")
    
#     def get_constraints(self, conditions: List[str]) -> Dict[str, Any]:
#         """Get nutrition constraints for given medical conditions."""
#         if not conditions:
#             return self._default_constraints()
        
#         query = f"""What are the dietary restrictions and nutrition guidelines for patients with: {', '.join(conditions)}?
#         Extract specific limits for sugar, sodium, fiber, protein, calories, saturated fat.
#         List foods to increase, limit, and avoid."""
        
#         try:
#             response = self.rag_chain.invoke({"input": query})
#             raw_answer = response["answer"]
#             return self._extract_json(raw_answer)
#         except Exception as e:
#             print(f"Error getting constraints: {e}")
#             print(f"Raw response: {response.get('answer', 'N/A')[:500]}...")
#             return self._default_constraints()
    
#     def _default_constraints(self) -> Dict[str, Any]:
#         """Return default healthy eating constraints."""
#         return {
#             "conditions": [],
#             "constraints": {
#                 "sugar_g": {"max": 25},
#                 "sodium_mg": {"max": 2300},
#                 "fiber_g": {"min": 5},
#                 "saturated_fat_g": {"max": 20}
#             },
#             "increase": ["vegetables", "fiber", "whole grains"],
#             "limit": ["processed foods", "added sugars"],
#             "avoid": [],
#             "notes": ["General healthy eating guidelines"]
#         }


# # Initialize Medical RAG
# medical_rag = MedicalRAG(
#     pdf_folder=PDF_FOLDER,
#     vectorstore_path=MEDICAL_VECTORSTORE_PATH
# )
# medical_rag.initialize(load_from_disk=True)
# print("\nMedical RAG ready!")

Initializing Medical RAG...
Loading vectorstore from ../data/processed/medical_pdfs_vectorstore
Found 5 PDF files
Created 356 chunks
Medical RAG initialized with 356 vectors

Medical RAG ready!


In [49]:
# Test Medical RAG
test_conditions = ["parkinson", "hypertension"]
print(f"Getting constraints for: {test_conditions}\n")

constraints = medical_rag.get_constraints(test_conditions)
print(json.dumps(constraints, indent=2))

Getting constraints for: ['parkinson', 'hypertension']

{
  "conditions": [
    "Parkinson's disease",
    "Hypertension"
  ],
  "constraints": {
    "sugar_g": {
      "max": 2.3
    },
    "sodium_mg": {
      "max": 2.3
    },
    "fiber_g": {
      "min": 5
    },
    "protein_g": {
      "max": null,
      "min": null
    },
    "calories": {
      "max": null
    },
    "saturated_fat_g": {
      "max": null
    },
    "cholesterol_mg": {
      "max": null
    }
  },
  "increase": [
    "carbohydrates",
    "lean proteins",
    "healthy fats",
    "fruits",
    "vegetables"
  ],
  "limit": [
    "fatty and sugary foods",
    "fluids high in fat, sugar, and salt"
  ],
  "avoid": [
    "biscuits",
    "cakes",
    "crisps",
    "confectionary",
    "regular fizzy drinks",
    "trans-fats"
  ],
  "notes": [
    "Maintain a balanced diet rich in fruits, vegetables, whole grains, and lean protein; minimize saturated fat, added sugars, and sodium"
  ]
}


## Nil's Medical RAG

In [None]:
class MedicalRAG:
    """RAG system for extracting medical constraints from medical documents."""
    
    def __init__(self, llm, folder_paths: List[str], vectorstore_path: str, chunk_size = 300, chunk_overlap = 50):
        self.llm = OllamaLLM(model=LLM_MODEL, temperature=0.3, format="json")
        self.folder_paths = folder_paths
        self.vectorstore_path = vectorstore_path
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        
        # Initialize the embedding model (MPNET is excellent for medical text)
        self.embeddings = HuggingFaceEmbeddings(
            model_name='sentence-transformers/all-mpnet-base-v2',
            encode_kwargs={"normalize_embeddings": True}
        )
        
        self.vectorstore = None
        self.retriever = None
        self.rag_chain = None
        
    def initialize(self, force_rebuild: bool = False):
        """Builds or loads the knowledge base and prepares the chain."""
        print("Initializing Medical RAG System...")
        
        # 1. Handle Vector Store (Load or Create)
        if not force_rebuild and Path(self.vectorstore_path).exists():
            print(f"Loading existing index from {self.vectorstore_path}")
            self.vectorstore = FAISS.load_local(
                self.vectorstore_path, 
                self.embeddings, 
                allow_dangerous_deserialization=True
            )
        else:
            print("No index found or force_rebuild=True. Processing documents...")
            docs = self._ingest_documents()
            chunks = self._split_data(docs)
            self.vectorstore = FAISS.from_documents(
                documents=chunks,
                embedding=self.embeddings,
                distance_strategy=DistanceStrategy.COSINE
            )
            self.vectorstore.save_local(self.vectorstore_path)

        # 2. Setup Retriever
        self.retriever = self.vectorstore.as_retriever()

        # 3. Build the LLM Chain
        self._build_chain()
        print("System ready for queries.")

    def _ingest_documents(self) -> List:
        """Loads PDFs from multiple directory paths."""
        documents = []

        for folder_path in self.folder_paths:
            pdf_folder = Path(folder_path)

            # Find all PDF files in the folder
            for pdf_file in pdf_folder.glob("*.pdf"):
                print(f"Loading {pdf_file.name}...")
                loader = PyPDFLoader(file_path=str(pdf_file))
                documents.extend(loader.load())

        return documents

    def _split_data(self, documents: List) -> List:
        """Chunks documents and adds metadata IDs."""
        text_splitter = RecursiveCharacterTextSplitter(
                chunk_size=self.chunk_size,
                chunk_overlap=self.chunk_overlap
            )
        chunks = text_splitter.split_documents(documents=documents)
        
        # Just to add id for etch chunks to map it later 
        for i, chunk in enumerate(chunks):
            chunk.metadata.update({
            "id": f"chunk_{i}",
        })
        
        return chunks
    
    def _build_chain(self):
        """Build the RAG chain for extracting nutrition parameters."""
        system_prompt = """
You are NutriRetrieve, a data extraction and structuring engine for nutritional and medical information from scientific sources.

**Your Task:**
Extract and structure medical/nutritional data from the provided context to support downstream LLM processing.

**Context:**
{context}

**User Query:**
{input}

---

### Extraction Guidelines

1. **Extract Medical & Dietary Data**
   - Medical conditions → dietary goals, contraindications, recommended ingredients
   - Allergies/intolerances → safe substitutes and warnings
   - Source citations for each recommendation

2. **Output Format (Clear Text)**

**Medical Recommendations:**
For each condition identified, provide:
- Condition name
- Dietary goals (what to eat and why)
- Contraindications (what to avoid and why)
- Recommended ingredients
- Source citations

**Allergies & Intolerances:**
For each allergen, list:
- Allergen name
- Safe substitutes
- Warnings and precautions

**Nutritional Targets:**
- Macronutrients: target grams for carbs, protein, fat
- Micronutrients: specific vitamins and minerals with target values
- Hydration: recommended daily water intake in ml

**Recommended Ingredients:**
For each ingredient, provide:
- Ingredient name
- Health benefits
- Preparation methods
- Recommended portion size

3. **Validation Rules**
   - Only include data supported by provided context
   - If information is missing, note it clearly (e.g., "Not specified in provided sources")
   - Use clear section headers and bullet points for readability
   - Include citations and sources for all recommendations
"""
        
        prompt = ChatPromptTemplate.from_messages(system_prompt)
        
        question_answer_chain = create_stuff_documents_chain(self.llm, prompt)
        self.rag_chain = create_retrieval_chain(self.retriever, question_answer_chain)
    
    def ask(self, query: str) -> str:
        """Public method to query the RAG system."""
        if not self.rag_chain:
            return "System not initialized."
        response = self.rag_chain.invoke({"input": query})
        return response["answer"]


# Initialize Medical RAG
medical_rag = MedicalRAG(
    pdf_folder=PDF_FOLDER,
    vectorstore_path=MEDICAL_VECTORSTORE_PATH
)
medical_rag.initialize(load_from_disk=True)
print("\nMedical RAG ready!")
medical_rag.ask(query="hello")

NameError: name 'List' is not defined

In [None]:
# def func_name(): --> when method of a class is called after instantiation of a class. my_class = MyClass()

# def _func_name(): --> when method of a class is called inside the class itself. my_class._func_name()

# def __func_name(): --> when method of a class is called inside the class itself and is intended to be private. my_class.__func_name()

# def __magic_mathod__(): --> when method of a class is called inside the class itself and is intended to be a special method that has a specific meaning in Python. my_class.__magic_method__() 

class MyClass:
    def func_name(self):
        print("This is a public method.")
    
    def _func_name(self):
        print("This is a protected method, intended for internal use.")
    
    def __func_name(self):
        print("This is a private method, not intended to be accessed outside the class.")
    
    def __magic_method__(self):
        print("This is a magic method, used for special behavior in Python.")



NameError: name '__str__' is not defined

---
## 4. Nutrition RAG Component

Searches the nutrition database for ingredients matching the query.

**Input:** Enhanced query with ingredients + constraint keywords

**Output:** List of candidate ingredients with full nutrition data

In [50]:
def extract_numeric(val):
    """Extract numeric value from strings like '9.17 g' or '381'."""
    if pd.isna(val):
        return 0.0
    try:
        cleaned = re.sub(r'[a-zA-Z\s]', '', str(val))
        return float(cleaned) if cleaned else 0.0
    except:
        return 0.0


def row_to_document(row) -> Document:
    """Convert DataFrame row to LangChain Document with nutrition metadata."""
    name = row.get('name', 'Unknown food')
    serving = row.get('serving_size', '100 g')
    
    # Create searchable text
    text = f"""Food: {name}
Serving: {serving}
Calories: {row.get('calories', 'N/A')} | Protein: {row.get('protein', 'N/A')} | Carbs: {row.get('carbohydrate', 'N/A')}
Fat: {row.get('total_fat', 'N/A')} | Fiber: {row.get('fiber', 'N/A')} | Sugar: {row.get('sugars', 'N/A')}
Sodium: {row.get('sodium', 'N/A')} | Cholesterol: {row.get('cholesterol', 'N/A')}"""
    
    # Numeric metadata for filtering
    metadata = {
        "name": str(name),
        "calories": extract_numeric(row.get('calories')),
        "protein_g": extract_numeric(row.get('protein')),
        "carbs_g": extract_numeric(row.get('carbohydrate')),
        "fat_g": extract_numeric(row.get('total_fat')),
        "fiber_g": extract_numeric(row.get('fiber')),
        "sugar_g": extract_numeric(row.get('sugars')),
        "sodium_mg": extract_numeric(row.get('sodium')),
        "cholesterol_mg": extract_numeric(row.get('cholesterol')),
        "saturated_fat_g": extract_numeric(row.get('saturated_fat')),
        "potassium_mg": extract_numeric(row.get('potassium')),
        "calcium_mg": extract_numeric(row.get('calcium')),
        "iron_mg": extract_numeric(row.get('irom')),
        "vitamin_c_mg": extract_numeric(row.get('vitamin_c')),
    }
    
    return Document(page_content=text, metadata=metadata)


print("Helper functions defined!")

Helper functions defined!


In [None]:
class NutritionRAG:
    """RAG system for searching nutrition database and answering nutrition questions."""
    
    def __init__(self, data_path: str, vectorstore_path: str = None):
        self.data_path = data_path
        self.vectorstore_path = vectorstore_path
        self.documents = []
        self.embeddings = None
        self.vectorstore = None
        self.retriever = None
        self.llm = None
        self.rag_chain = None
        
    def initialize(self, load_from_disk: bool = True):
        """Initialize the Nutrition RAG system."""
        print("Initializing Nutrition RAG...")
        
        # Load nutrition data
        df = pd.read_excel(self.data_path)
        print(f"Loaded {len(df)} food items")
        
        # Convert to documents
        self.documents = [row_to_document(row) for _, row in df.iterrows()]
        
        # Initialize embeddings
        self.embeddings = HuggingFaceEmbeddings(
            model_name="BAAI/bge-small-en-v1.5",
            model_kwargs={'device': 'cpu'},
            encode_kwargs={'normalize_embeddings': True}
        )
        
        # Load or create vectorstore
        if load_from_disk and self.vectorstore_path and Path(self.vectorstore_path).exists():
            print(f"Loading vectorstore from {self.vectorstore_path}")
            self.vectorstore = FAISS.load_local(
                self.vectorstore_path,
                self.embeddings,
                allow_dangerous_deserialization=True
            )
        else:
            print("Creating new vectorstore...")
            self.vectorstore = FAISS.from_documents(
                documents=self.documents,
                embedding=self.embeddings,
                distance_strategy=DistanceStrategy.COSINE
            )
        
        # Create hybrid retriever
        bm25_retriever = BM25Retriever.from_documents(self.documents)
        bm25_retriever.k = 15
        
        vector_retriever = self.vectorstore.as_retriever(
            search_type="mmr",
            search_kwargs={"k": 15, "fetch_k": 50, "lambda_mult": 0.7}
        )
        
        self.retriever = EnsembleRetriever(
            retrievers=[bm25_retriever, vector_retriever],
            weights=[0.5, 0.5]
        )
        
        # Initialize LLM and RAG chain
        self.llm = OllamaLLM(model=LLM_MODEL, temperature=0.7)
        self._build_rag_chain()
        
        print(f"Nutrition RAG initialized with {len(self.documents)} food items")
    
    def _build_rag_chain(self):
        """Build the RAG chain with system prompt for nutrition questions."""
        system_prompt = """You are a nutrition calculator assistant for recipe ingredients.

YOUR ROLE:
Help users understand the nutritional content of food ingredients for recipe planning and meal preparation.

RULES:
1. Use ONLY the nutritional data from the CONTEXT below. Do not use external knowledge.
2. For each ingredient mentioned, provide a structured nutritional breakdown including:
   - Serving size (always mention this first)
   - Calories
   - Macros: protein, carbohydrates, fat, fiber
   - Key vitamins and minerals (only list notable amounts, not zeros)
   
3. When multiple ingredients are asked about:
   - List each ingredient's nutrition separately
   - If quantities are provided, calculate scaled values (e.g., "200g chicken" = 2x the 100g values)
   
4. Format responses as clear, scannable lists - easy to use for recipe tracking.

5. Highlight nutritional benefits using tags like:
   - "High protein", "Good source of fiber", "Rich in Vitamin C", etc.

6. If an ingredient is not found in the database, say:
   "I don't have nutritional data for [ingredient]. Try a similar item or check the exact name."

7. For health-related questions, remind users that values are estimates and to consult a professional for dietary advice.

RESPONSE FORMAT EXAMPLE:
Carrots, raw (100g serving):
- Calories: 41 kcal
- Protein: 0.9g | Carbs: 10g | Fat: 0.2g | Fiber: 2.8g
- Notable nutrients: Vitamin A (835mcg), Vitamin K (13mcg), Potassium (320mg)
- Benefits: High vitamin A, good for eye health, low calorie

CONTEXT:
{context}"""
        
        prompt = ChatPromptTemplate.from_messages([
            ("system", system_prompt),
            ("human", "{input}")
        ])
        
        question_answer_chain = create_stuff_documents_chain(self.llm, prompt)
        self.rag_chain = create_retrieval_chain(self.retriever, question_answer_chain)
    
    def search(self, query: str, k: int = 20) -> List[Document]:
        """Search for foods matching the query."""
        return self.retriever.invoke(query)[:k]
    
    def search_by_ingredients(self, ingredients: List[str], k: int = 20) -> List[Document]:
        """Search for foods matching the given ingredients."""
        query = " ".join(ingredients)
        return self.search(query, k)
    
    def ask(self, question: str) -> str:
        """Ask a nutrition question and get a detailed answer."""
        if not self.rag_chain:
            return "RAG chain not initialized. Call initialize() first."
        
        response = self.rag_chain.invoke({"input": question})
        return response["answer"]
    
    def ask_with_context(self, question: str) -> Dict[str, Any]:
        """Ask a question and return both answer and retrieved context."""
        if not self.rag_chain:
            return {"answer": "RAG chain not initialized.", "context": []}
        
        response = self.rag_chain.invoke({"input": question})
        return {
            "answer": response["answer"],
            "context": [
                {
                    "name": doc.metadata.get("name", "Unknown"),
                    "calories": doc.metadata.get("calories", 0),
                    "protein_g": doc.metadata.get("protein_g", 0),
                    "sugar_g": doc.metadata.get("sugar_g", 0),
                }
                for doc in response["context"]
            ]
        }


# Initialize Nutrition RAG
nutrition_rag = NutritionRAG(
    data_path=NUTRITION_DATA_PATH,
    vectorstore_path=NUTRITION_VECTORSTORE_PATH
)
nutrition_rag.initialize(load_from_disk=True)
print("\nNutrition RAG ready!")
nutrition_rag.ask("What are the nutritional values of chicken, eggs, and salad?")

Initializing Nutrition RAG...
Loaded 8789 food items


'(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 25e528ed-229a-4d45-9429-aeaee8942d63)')' thrown while requesting HEAD https://huggingface.co/BAAI/bge-small-en-v1.5/resolve/main/modules.json
Retrying in 1s [Retry 1/5].


Loading vectorstore from ../data/processed/nutrition_vectorstore
Nutrition RAG initialized with 8789 food items

Nutrition RAG ready!


In [52]:
# Test Nutrition RAG - Search
test_ingredients = ["chicken", "eggs", "salad"]
print(f"Searching for: {test_ingredients}\n")

results = nutrition_rag.search_by_ingredients(test_ingredients, k=5)
for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.metadata['name']}")
    print(f"   Calories: {doc.metadata['calories']} | Protein: {doc.metadata['protein_g']}g | Sugar: {doc.metadata['sugar_g']}g")

print("\n" + "="*60)
print("Testing RAG Question Answering:")
print("="*60)

# Test the ask() method
question = "What's the nutritional content of 200g chicken breast and 100g broccoli?"
print(f"\nQuestion: {question}\n")
answer = nutrition_rag.ask(question)
print(f"Answer:\n{answer}")

Searching for: ['chicken', 'eggs', 'salad']

1. Fish, dry (Alaska Native), Pacific, herring eggs
   Calories: 312.0 | Protein: 60.4g | Sugar: 0.0g
2. Potato salad with egg
   Calories: 157.0 | Protein: 1.96g | Sugar: 4.95g
3. Fish, eggs (Alaska Native), whitefish
   Calories: 104.0 | Protein: 14.66g | Sugar: 0.0g
4. Egg rolls, heated, refrigerated, chicken
   Calories: 197.0 | Protein: 10.44g | Sugar: 5.57g
5. Fish, plain (Alaska Native), Pacific, herring eggs
   Calories: 74.0 | Protein: 9.6g | Sugar: 0.0g

Testing RAG Question Answering:

Question: What's the nutritional content of 200g chicken breast and 100g broccoli?

Answer:
To determine the nutritional content of 200g chicken breast and 100g broccoli, I'll calculate the total nutritional content of each food item separately.

**Chicken Breast (200g)**

* Calories: 172 / 2 = 86
* Protein: 20.85g x 2 = 41.7g
* Carbohydrates: 0g
* Total Fat: 9.3g
* Saturated Fat: 2.7g
* Fiber: 0g
* Sugar: 0g
* Sodium: 63 / 2 = 31.5mg
* Cholesterol:

---
## 5. Safety Check

Filters out unsafe foods based on:
1. **Allergies** - Removes foods containing allergens
2. **Nutrition limits** - Removes foods exceeding medical constraints
3. **Avoid list** - Removes foods from the "avoid" list

In [53]:
class SafetyFilter:
    """
    Filters foods based on safety constraints:
    - Allergies
    - Medical nutrition limits (per-serving MAX limits only)
    - Foods to avoid
    
    NOTE: Minimum constraints (like fiber_g min) are DAILY targets, not per-food requirements.
    We only enforce MAX limits per serving (sugar, sodium, saturated fat, etc.)
    """
    
    def __init__(self, debug: bool = False):
        self.debug = debug
    
    def filter(
        self,
        candidates: List[Document],
        allergies: List[str],
        constraints: Dict[str, Any],
        avoid_foods: List[str] = None
    ) -> List[Document]:
        """Filter candidates by safety constraints."""
        avoid_foods = avoid_foods or []
        filtered = []
        
        if self.debug:
            print(f"\n[DEBUG] Filtering {len(candidates)} candidates")
            print(f"[DEBUG] Allergies: {allergies}")
            print(f"[DEBUG] Avoid foods: {avoid_foods}")
            print(f"[DEBUG] Constraints: {constraints.get('constraints', {})}")
        
        rejection_reasons = {"allergen": 0, "avoid_list": 0, "nutrition": 0}
        
        for doc in candidates:
            name_lower = doc.metadata.get('name', '').lower()
            
            # Check allergies
            if self._contains_allergen(name_lower, allergies):
                rejection_reasons["allergen"] += 1
                if self.debug:
                    print(f"  [REJECT-ALLERGEN] {doc.metadata.get('name')}")
                continue
            
            # Check avoid list (only strict "avoid", not "limit")
            if self._in_avoid_list(name_lower, avoid_foods):
                rejection_reasons["avoid_list"] += 1
                if self.debug:
                    print(f"  [REJECT-AVOID] {doc.metadata.get('name')}")
                continue
            
            # Check nutrition constraints (MAX limits only - per serving)
            passed, reason = self._meets_constraints(doc.metadata, constraints)
            if not passed:
                rejection_reasons["nutrition"] += 1
                if self.debug:
                    print(f"  [REJECT-NUTRITION] {doc.metadata.get('name')} - {reason}")
                continue
            
            filtered.append(doc)
        
        if self.debug:
            print(f"\n[DEBUG] Rejection summary:")
            print(f"  - Allergen matches: {rejection_reasons['allergen']}")
            print(f"  - Avoid list matches: {rejection_reasons['avoid_list']}")
            print(f"  - Nutrition constraint violations: {rejection_reasons['nutrition']}")
            print(f"  - Passed: {len(filtered)}")
        
        return filtered
    
    def _contains_allergen(self, name: str, allergies: List[str]) -> bool:
        """Check if food name contains any allergen."""
        for allergen in allergies:
            allergen_lower = allergen.lower().replace('_', ' ')
            if allergen_lower in name:
                return True
        return False
    
    def _in_avoid_list(self, name: str, avoid_foods: List[str]) -> bool:
        """Check if food is in avoid list."""
        for avoid in avoid_foods:
            avoid_lower = avoid.lower().replace('_', ' ')
            if avoid_lower in name:
                return True
        return False
    
    def _meets_constraints(self, metadata: Dict, constraints: Dict) -> tuple:
        """
        Check if food meets nutrition constraints.
        
        IMPORTANT: Only enforces MAX limits (per-serving safety limits).
        MIN limits (like fiber) are daily goals, not per-food requirements.
        
        Returns: (passes: bool, reason: str or None)
        """
        constraint_rules = constraints.get('constraints', {})
        
        # Only these constraints make sense as per-serving MAX limits
        # We use reasonable thresholds, not overly strict values
        per_serving_max_limits = {
            'sugar_g': ('sugar_g', 30),       # Reasonable per-serving max (not 2g!)
            'sodium_mg': ('sodium_mg', 800),  # Reasonable per-serving max
            'saturated_fat_g': ('saturated_fat_g', 10),
            'cholesterol_mg': ('cholesterol_mg', 100),
        }
        
        for constraint_key, limits in constraint_rules.items():
            if not isinstance(limits, dict):
                continue
            
            # Only check MAX limits that make sense per-serving
            if constraint_key not in per_serving_max_limits:
                continue
            
            metadata_key, fallback_max = per_serving_max_limits[constraint_key]
            value = metadata.get(metadata_key, 0)
            
            # Get the max limit from constraints, but use a reasonable floor
            max_limit = limits.get('max')
            if max_limit is not None:
                # Use the larger of: constraint value or reasonable minimum
                # This prevents overly strict filtering (e.g., max 2g sugar)
                effective_max = max(max_limit, fallback_max * 0.5)
                
                if value > effective_max:
                    return False, f"{constraint_key}={value} > max {effective_max}"
        
        return True, None


# Initialize Filter with debug mode
safety_filter = SafetyFilter(debug=True)
print("Safety Filter initialized (debug mode ON)!")

Safety Filter initialized (debug mode ON)!


---
## 6. Combined Pipeline

Orchestrates the full flow:
1. Parse user intent
2. Get medical constraints
3. Build augmented query
4. Get LLM recommendation
5. Safety check on retrieved foods
6. Return final results

In [None]:
# @dataclass
# class PipelineResult:
#     """Result from the combined RAG pipeline."""
#     intent: UserIntent
#     constraints: Dict[str, Any]
#     augmented_query: str
#     llm_recommendation: str
#     candidates_count: int
#     filtered_count: int
#     safe_foods: List[Dict[str, Any]]
    
#     def display(self):
#         """Display results in a formatted way."""
#         print("=" * 60)
#         print("NUTRITION RECOMMENDATION RESULTS")
#         print("=" * 60)
        
#         print("\n[1] PARSED INTENT:")
#         print(f"    Medical conditions: {self.intent.medical_conditions}")
#         print(f"    Allergies: {self.intent.allergies}")
#         print(f"    Desired ingredients: {self.intent.ingredients}")
#         print(f"    Cooking style: {self.intent.cooking_style}")
        
#         print("\n[2] MEDICAL CONSTRAINTS:")
#         constraints = self.constraints.get('constraints', {})
#         for key, value in constraints.items():
#             if isinstance(value, dict):
#                 limits = []
#                 if value.get('min') is not None:
#                     limits.append(f"min={value['min']}")
#                 if value.get('max') is not None:
#                     limits.append(f"max={value['max']}")
#                 if limits:
#                     print(f"    {key}: {', '.join(limits)}")
        
#         if self.constraints.get('avoid'):
#             print(f"    Foods to AVOID: {self.constraints.get('avoid', [])}")
#         if self.constraints.get('limit'):
#             print(f"    Foods to LIMIT: {self.constraints.get('limit', [])}")
        
#         print("\n[3] LLM RECOMMENDATION:")
#         print("-" * 60)
#         print(self.llm_recommendation)
#         print("-" * 60)
        
#         print(f"\n[4] SAFETY CHECK:")
#         print(f"    Candidates searched: {self.candidates_count}")
#         print(f"    Safe foods after filtering: {self.filtered_count}")
        
#         print(f"\n[5] VERIFIED SAFE INGREDIENTS (Top {len(self.safe_foods)}):")
#         print("-" * 60)
#         for i, item in enumerate(self.safe_foods, 1):
#             print(f"\n  {i}. {item['name']}")
#             print(f"     Calories: {item['calories']:.0f} | Protein: {item['protein_g']:.1f}g | Sugar: {item['sugar_g']:.1f}g | Sodium: {item['sodium_mg']:.0f}mg")


class NutritionPipeline:
    """Combined pipeline for nutrition recommendations."""
    
    def __init__(
        self,
        intent_parser: IntentParser,
        medical_rag: MedicalRAG,
        nutrition_rag: NutritionRAG,
        safety_filter: SafetyFilter
    ):
        self.intent_parser = intent_parser
        self.medical_rag = medical_rag
        self.nutrition_rag = nutrition_rag
        self.safety_filter = safety_filter
    
    def process(self, user_query: str, top_k: int = 10):
        """
        Process user query through the full pipeline.
        
        Flow:
        1. Intent Parser → extract structured data
        2. Medical RAG → get nutrition constraints
        3. Build Augmented Query → combine all context
        4. Nutrition RAG → get LLM recommendation
        5. Safety Check → filter unsafe foods
        6. Return results
        """
        print("\n" + "="*60)
        print("PROCESSING QUERY...")
        print("="*60)
        print(f"Query: {user_query}\n")
        
        # Step 1: Intent Parser
        print("[Step 1] Intent Parser - extracting structured data...")
        intent = self.intent_parser.parse(user_query)
        print(f"  → Conditions: {intent.medical_conditions}")
        print(f"  → Ingredients: {intent.ingredients}")
        print(f"  → Allergies: {intent.allergies}")
        
        # Step 2: Medical RAG
        print("\n[Step 2] Medical RAG - getting nutrition constraints...")
        constraints = self.medical_rag.get_constraints(intent.medical_conditions)
        print(f"  → Constraints: {list(constraints.get('constraints', {}).keys())}")
        print(f"  → Avoid: {constraints.get('avoid', [])}")
        print(f"  → Limit (informational): {constraints.get('limit', [])}")
        
        # Step 3: Build Augmented Query
        print("\n[Step 3] Building augmented query...")
        augmented_query = self._build_augmented_query(user_query, intent, constraints)
        print(f"  → Query built ({len(augmented_query)} chars)")
        
        # Step 4: Nutrition RAG - LLM recommendation
        print("\n[Step 4] Nutrition RAG - getting LLM recommendation...")
        llm_recommendation = self.nutrition_rag.ask(augmented_query)
        print(f"  → Recommendation received")
        
        # Step 5: Safety Check - search and filter
        print("\n[Step 5] Safety Check - filtering unsafe foods...")
        search_query = self._build_search_query(intent, constraints)
        candidates = self.nutrition_rag.search(search_query, k=50)
        print(f"  → Searched {len(candidates)} candidates")
        
        # FIXED: Only use strict "avoid" list, not "limit" list
        # "limit" means reduce consumption, not completely exclude
        avoid_foods = constraints.get('avoid', [])
        
        safe_foods = self.safety_filter.filter(
            candidates=candidates,
            allergies=intent.allergies,
            constraints=constraints,
            avoid_foods=avoid_foods
        )
        print(f"  → {len(safe_foods)} safe foods after filtering")
        
        # Step 6: Prepare final results
        results = [
            {
                'name': doc.metadata.get('name'),
                'calories': doc.metadata.get('calories', 0),
                'protein_g': doc.metadata.get('protein_g', 0),
                'carbs_g': doc.metadata.get('carbs_g', 0),
                'fat_g': doc.metadata.get('fat_g', 0),
                'fiber_g': doc.metadata.get('fiber_g', 0),
                'sugar_g': doc.metadata.get('sugar_g', 0),
                'sodium_mg': doc.metadata.get('sodium_mg', 0),
            }
            for doc in safe_foods[:top_k]
        ]
        
        return PipelineResult(
            intent=intent,
            constraints=constraints,
            augmented_query=augmented_query,
            llm_recommendation=llm_recommendation,
            candidates_count=len(candidates),
            filtered_count=len(safe_foods),
            safe_foods=results
        )
    
    def _build_augmented_query(self, original_query: str, intent: UserIntent, constraints: Dict) -> str:
        """Build augmented query with medical constraints for the LLM."""
        constraint_rules = constraints.get('constraints', {})
        constraint_text = []
        
        if constraint_rules.get('sugar_g', {}).get('max'):
            constraint_text.append(f"prefer foods lower in sugar (daily limit: {constraint_rules['sugar_g']['max']}g)")
        if constraint_rules.get('sodium_mg', {}).get('max'):
            constraint_text.append(f"prefer foods lower in sodium (daily limit: {constraint_rules['sodium_mg']['max']}mg)")
        if constraint_rules.get('fiber_g', {}).get('min'):
            constraint_text.append(f"prefer foods with good fiber content (daily goal: {constraint_rules['fiber_g']['min']}g)")
        if constraint_rules.get('protein_g', {}).get('max'):
            constraint_text.append(f"moderate protein intake (daily limit: {constraint_rules['protein_g']['max']}g)")
        
        augmented = f"""USER REQUEST: {original_query}

MEDICAL CONDITIONS: {', '.join(intent.medical_conditions) if intent.medical_conditions else 'None'}

NUTRITION GUIDELINES (daily targets):
{chr(10).join('- ' + c for c in constraint_text) if constraint_text else '- General healthy eating'}

ALLERGIES (MUST AVOID):
{chr(10).join('- ' + a for a in intent.allergies) if intent.allergies else '- None'}

FOODS TO AVOID:
{chr(10).join('- ' + f for f in constraints.get('avoid', [])) if constraints.get('avoid') else '- None'}

FOODS TO LIMIT (reduce but okay in moderation):
{chr(10).join('- ' + f for f in constraints.get('limit', [])) if constraints.get('limit') else '- None'}

DESIRED INGREDIENTS: {', '.join(intent.ingredients) if intent.ingredients else 'Any'}

Based on the above, recommend specific food ingredients that are safe and healthy.
Provide nutritional breakdown for each recommendation."""
        
        return augmented
    
    def _build_search_query(self, intent: UserIntent, constraints: Dict) -> str:
        """Build search query for retrieving candidate foods."""
        parts = []
        if intent.ingredients:
            parts.extend(intent.ingredients)
        if intent.cooking_style:
            parts.extend(intent.cooking_style)
        increase = constraints.get('increase', [])
        parts.extend(increase[:3])
        return ' '.join(parts) if parts else 'healthy food'


# Create the pipeline
pipeline = NutritionPipeline(
    intent_parser=intent_parser,
    medical_rag=medical_rag,
    nutrition_rag=nutrition_rag,
    safety_filter=safety_filter
)

print("\nNutrition Pipeline ready!")


Nutrition Pipeline ready!


---
## 7. Demo: Full Pipeline Execution

In [55]:
# Example 1: Diabetic with hypertension
query1 = """
I have type 2 diabetes and high blood pressure. 
I want to make something with chicken, eggs, and salad vegetables.
Quick breakfast. I'm allergic to tomatoes and onions.
"""

result1 = pipeline.process(query1, top_k=10)
result1.display()


PROCESSING QUERY...
Query: 
I have type 2 diabetes and high blood pressure. 
I want to make something with chicken, eggs, and salad vegetables.
Quick breakfast. I'm allergic to tomatoes and onions.


[Step 1] Intent Parser - extracting structured data...
  → Conditions: ['diabetes_type2', 'hypertension']
  → Ingredients: ['chicken', 'eggs', 'salad']
  → Allergies: ['tomatoes', 'onions']

[Step 2] Medical RAG - getting nutrition constraints...
  → Constraints: ['sugar_g', 'sodium_mg', 'fiber_g', 'protein_g', 'calories', 'saturated_fat_g', 'cholesterol_mg']
  → Avoid: ['animal fats like butter, lard, and ghee', 'high-fat foods']
  → Limit (informational): ['refined grains', 'sugary desserts', 'trans-fats', 'fizzy drinks', 'biscuits', 'cakes', 'crisps', 'confectionary']

[Step 3] Building augmented query...
  → Query built (938 chars)

[Step 4] Nutrition RAG - getting LLM recommendation...
  → Recommendation received

[Step 5] Safety Check - filtering unsafe foods...
  → Searched 30 cand

In [58]:
# Example 2: Parkinson's patient
query2 = """
I have Parkinson's disease and need high fiber foods.
Looking for vegetables and whole grains.
No shellfish due to allergy.
"""

result2 = pipeline.process(query2, top_k=10)
result2.display()


PROCESSING QUERY...
Query: 
I have Parkinson's disease and need high fiber foods.
Looking for vegetables and whole grains.
No shellfish due to allergy.


[Step 1] Intent Parser - extracting structured data...
  → Conditions: ['parkinsons', 'high_fiber']
  → Ingredients: ['vegetables', 'whole_grains']
  → Allergies: ['shellfish']

[Step 2] Medical RAG - getting nutrition constraints...
  → Constraints: ['sugar_g', 'sodium_mg', 'fiber_g', 'protein_g', 'calories', 'saturated_fat_g', 'cholesterol_mg']
  → Avoid: ['cakes, biscuits, pastries', 'crust', 'tough meat, hard fruits']
  → Limit (informational): ['sugary foods', 'fatty foods', 'refined grains', 'trans-fats']

[Step 3] Building augmented query...
  → Query built (837 chars)

[Step 4] Nutrition RAG - getting LLM recommendation...
  → Recommendation received

[Step 5] Safety Check - filtering unsafe foods...
  → Searched 30 candidates

[DEBUG] Filtering 30 candidates
[DEBUG] Allergies: ['shellfish']
[DEBUG] Avoid foods: ['cakes, bisc

In [59]:
# Example 3: General healthy eating
query3 = """
I want a healthy lunch with fish or salmon.
Looking for something high in protein and low in calories.
No gluten.
"""

result3 = pipeline.process(query3, top_k=10)
result3.display()


PROCESSING QUERY...
Query: 
I want a healthy lunch with fish or salmon.
Looking for something high in protein and low in calories.
No gluten.


[Step 1] Intent Parser - extracting structured data...
  → Conditions: ['high blood pressure', 'sugar problem']
  → Ingredients: ['fish', 'salmon']
  → Allergies: []

[Step 2] Medical RAG - getting nutrition constraints...
  → Constraints: ['sugar_g', 'sodium_mg', 'fiber_g', 'protein_g', 'calories', 'saturated_fat_g', 'cholesterol_mg']
  → Avoid: ['Animal fats like butter, lard, and ghee', 'Regular fizzy drinks', 'Cordials', 'Sugary free drinks', 'Empty calories from sugary drinks']
  → Limit (informational): ['Refined, low-fibre carbohydrate-rich meals', 'Sugary desserts', 'Plain white rice', 'Pasta', 'Mashed potato', 'Fizzy drinks', 'Carbonated drinks', 'High-osmolarity drinks', 'Red meat', 'Baked beans', 'Cakes', 'Biscuits', 'Pastries', 'Confectionery']

[Step 3] Building augmented query...
  → Query built (915 chars)

[Step 4] Nutrition RA

---
## 8. Interactive Query

In [60]:
# Enter your own query here!
your_query = """
I have diabetes. Looking for breakfast options with eggs.
Can't eat dairy products.
"""

result = pipeline.process(your_query, top_k=10)
result.display()


PROCESSING QUERY...
Query: 
I have diabetes. Looking for breakfast options with eggs.
Can't eat dairy products.


[Step 1] Intent Parser - extracting structured data...
  → Conditions: ['diabetes']
  → Ingredients: ['eggs']
  → Allergies: []

[Step 2] Medical RAG - getting nutrition constraints...
  → Constraints: ['sugar_g', 'sodium_mg', 'fiber_g', 'protein_g', 'calories', 'saturated_fat_g', 'cholesterol_mg']
  → Avoid: ['trans-fats', 'biscuits', 'cakes', 'crisps', 'confectionary', 'regular fizzy drinks']
  → Limit (informational): ['saturated fats', 'added sugars', 'refined grains']

[Step 3] Building augmented query...
  → Query built (765 chars)

[Step 4] Nutrition RAG - getting LLM recommendation...
  → Recommendation received

[Step 5] Safety Check - filtering unsafe foods...
  → Searched 30 candidates

[DEBUG] Filtering 30 candidates
[DEBUG] Allergies: []
[DEBUG] Avoid foods: ['trans-fats', 'biscuits', 'cakes', 'crisps', 'confectionary', 'regular fizzy drinks']
[DEBUG] Constrai

---
## 9. Export Results to JSON

In [61]:
def export_results_to_json(result: PipelineResult, filepath: str):
    """Export pipeline results to JSON file."""
    export_data = {
        "intent": {
            "medical_conditions": result.intent.medical_conditions,
            "dietary_restrictions": result.intent.dietary_restrictions,
            "allergies": result.intent.allergies,
            "ingredients": result.intent.ingredients,
            "cooking_style": result.intent.cooking_style
        },
        "constraints": result.constraints,
        "llm_recommendation": result.llm_recommendation,
        "safety_check": {
            "candidates_searched": result.candidates_count,
            "safe_foods_count": result.filtered_count
        },
        "safe_foods": result.safe_foods
    }
    
    with open(filepath, 'w', encoding='utf-8') as f:
        json.dump(export_data, f, indent=2, ensure_ascii=False)
    
    print(f"Results exported to: {filepath}")


# Export the last result
export_results_to_json(result, "../data/processed/pipeline_results.json")

Results exported to: ../data/processed/pipeline_results.json
