# Patient Case Summary RAG Application

This notebook contains a complete implementation of a Retrieval-Augmented Generation (RAG) application for generating patient case summaries. The application uses open-source tools to:

1. Extract key details from patient data
2. Retrieve relevant clinical guidelines
3. Generate a comprehensive case summary with recommendations

## Features

- **Patient Data Processing**: Upload FHIR JSON files to extract patient information
- **Medical Guidelines Management**: Add guidelines via text files or PDF files
- **Case Summary Generation**: Generate comprehensive case summaries with recommendations
- **Interactive Interface**: User-friendly interface for easy interaction

## 1. Install Required Packages

First, let's install the necessary packages. You can skip this step if you already have these packages installed.

In [13]:
!pip install langchain langchain-community chromadb pydantic python-dotenv pypdf ipywidgets



## 2. Define Core Components

### 2.1 Patient Data Parser

This component extracts patient information from FHIR bundles.

In [1]:
import json
from typing import List, Optional
from pydantic import BaseModel, Field

class ConditionInfo(BaseModel):
    code: str
    display: str
    clinical_status: str

class EncounterInfo(BaseModel):
    date: str = Field(..., description="Date of the encounter.")
    reason_display: Optional[str] = Field(None, description="Reason for the encounter.")
    type_display: Optional[str] = Field(None, description="Type or class of the encounter.")

class MedicationInfo(BaseModel):
    name: str = Field(..., description="Name of the medication.")
    start_date: Optional[str] = Field(None, description="When the medication was prescribed.")
    instructions: Optional[str] = Field(None, description="Dosage instructions.")

class PatientInfo(BaseModel):
    given_name: str
    family_name: str
    birth_date: str
    gender: str
    conditions: List[ConditionInfo] = Field(default_factory=list)
    recent_encounters: List[EncounterInfo] = Field(default_factory=list, description="A few recent encounters.")
    current_medications: List[MedicationInfo] = Field(default_factory=list, description="Current active medications.")

    @property
    def demographic_str(self) -> str:
        """Get demographics string."""
        return f"""\
Given name: {self.given_name}
Family name: {self.family_name}
Birth date: {self.birth_date}
Gender: {self.gender}"""


def parse_synthea_patient(file_path: str, filter_active: bool = True) -> PatientInfo:
    """
    Parse a Synthea-generated FHIR Bundle to extract patient information.
    
    Args:
        file_path: Path to the JSON file containing the FHIR Bundle
        filter_active: Whether to filter for only active conditions
        
    Returns:
        PatientInfo object containing extracted patient data
    """
    # Load the Synthea-generated FHIR Bundle
    with open(file_path, "r") as f:
        bundle = json.load(f)

    patient_resource = None
    conditions = []
    encounters = []
    medication_requests = []

    for entry in bundle.get("entry", []):
        resource = entry.get("resource", {})
        resource_type = resource.get("resourceType")

        if resource_type == "Patient":
            patient_resource = resource
        elif resource_type == "Condition":
            conditions.append(resource)
        elif resource_type == "Encounter":
            encounters.append(resource)
        elif resource_type == "MedicationRequest":
            medication_requests.append(resource)

    if not patient_resource:
        raise ValueError("No Patient resource found in the provided file.")

    # Extract patient demographics
    name_entry = patient_resource.get("name", [{}])[0]
    given_name = name_entry.get("given", [""])[0]
    family_name = name_entry.get("family", "")
    birth_date = patient_resource.get("birthDate", "")
    gender = patient_resource.get("gender", "")

    # Create PatientInfo object
    patient_info = PatientInfo(
        given_name=given_name,
        family_name=family_name,
        birth_date=birth_date,
        gender=gender
    )

    # Extract conditions
    for condition in conditions:
        clinical_status = condition.get("clinicalStatus", {}).get("coding", [{}])[0].get("code", "")
        
        # Skip if not active and filter_active is True
        if filter_active and clinical_status != "active":
            continue
            
        code_entry = condition.get("code", {}).get("coding", [{}])[0]
        code = code_entry.get("code", "")
        display = code_entry.get("display", "")
        
        patient_info.conditions.append(
            ConditionInfo(
                code=code,
                display=display,
                clinical_status=clinical_status
            )
        )

    # Extract recent encounters (sort by date, most recent first)
    sorted_encounters = sorted(
        encounters,
        key=lambda e: e.get("period", {}).get("start", ""),
        reverse=True
    )
    
    for encounter in sorted_encounters[:5]:  # Get 5 most recent encounters
        period = encounter.get("period", {})
        date = period.get("start", "")
        
        reason_display = ""
        if encounter.get("reasonCode"):
            reason_display = encounter.get("reasonCode", [{}])[0].get("coding", [{}])[0].get("display", "")
        
        type_display = ""
        if encounter.get("type"):
            type_display = encounter.get("type", [{}])[0].get("coding", [{}])[0].get("display", "")
        
        patient_info.recent_encounters.append(
            EncounterInfo(
                date=date,
                reason_display=reason_display,
                type_display=type_display
            )
        )

    # Extract current medications
    for med_request in medication_requests:
        status = med_request.get("status", "")
        
        # Skip if not active and filter_active is True
        if filter_active and status != "active":
            continue
            
        med_code_entry = med_request.get("medicationCodeableConcept", {}).get("coding", [{}])[0]
        med_name = med_code_entry.get("display", "")
        
        dosage_instruction = med_request.get("dosageInstruction", [{}])[0]
        text_instruction = dosage_instruction.get("text", "")
        
        start_date = ""
        if med_request.get("authoredOn"):
            start_date = med_request.get("authoredOn", "")
        
        patient_info.current_medications.append(
            MedicationInfo(
                name=med_name,
                start_date=start_date,
                instructions=text_instruction
            )
        )

    return patient_info

### 2.2 PDF Processor

This component extracts text from PDF files.

In [2]:
import os
import pypdf
from langchain.text_splitter import RecursiveCharacterTextSplitter
import tempfile

def extract_text_from_pdf(pdf_path):
    """
    Extract text from a PDF file.
    
    Args:
        pdf_path: Path to the PDF file
        
    Returns:
        Extracted text as a string
    """
    text = ""
    try:
        # Open the PDF file
        with open(pdf_path, 'rb') as file:
            # Create a PDF reader object
            pdf_reader = pypdf.PdfReader(file)
            
            # Extract text from each page
            for page_num in range(len(pdf_reader.pages)):
                page = pdf_reader.pages[page_num]
                text += page.extract_text() + "\n\n"
                
        return text
    except Exception as e:
        print(f"Error extracting text from PDF: {e}")
        return ""

def process_pdf_to_text_file(pdf_path, output_dir):
    """
    Process a PDF file and save its text content to a text file.
    
    Args:
        pdf_path: Path to the PDF file
        output_dir: Directory to save the text file
        
    Returns:
        Path to the created text file
    """
    # Create output directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)
    
    # Extract text from PDF
    text = extract_text_from_pdf(pdf_path)
    
    if not text:
        return None
    
    # Create output file path
    pdf_filename = os.path.basename(pdf_path)
    txt_filename = os.path.splitext(pdf_filename)[0] + ".txt"
    txt_path = os.path.join(output_dir, txt_filename)
    
    # Save text to file
    with open(txt_path, 'w', encoding='utf-8') as file:
        file.write(text)
    
    return txt_path

def process_pdfs_to_text(pdf_paths, output_dir):
    """
    Process multiple PDF files and save their text content to text files.
    
    Args:
        pdf_paths: List of paths to PDF files
        output_dir: Directory to save the text files
        
    Returns:
        List of paths to the created text files
    """
    txt_paths = []
    
    for pdf_path in pdf_paths:
        txt_path = process_pdf_to_text_file(pdf_path, output_dir)
        if txt_path:
            txt_paths.append(txt_path)
    
    return txt_paths

### 2.3 Guideline Retriever

This component handles the storage and retrieval of medical guidelines using a vector database.

In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader, DirectoryLoader
from langchain.schema import Document
import os
import chromadb
import numpy as np
from typing import List, Dict, Any

class SimpleEmbeddings:
    """
    A simple embedding class that uses TF-IDF like approach for text embeddings.
    This is a lightweight alternative that doesn't require external models.
    """
    
    def __init__(self, dimension=100):
        """Initialize with a fixed embedding dimension."""
        self.dimension = dimension
        self.vocabulary = {}
        self.vocab_size = 0
        
    def _preprocess_text(self, text):
        """Simple text preprocessing."""
        # Convert to lowercase and split by whitespace
        return text.lower().split()
        
    def _update_vocabulary(self, tokens):
        """Update vocabulary with new tokens."""
        for token in tokens:
            if token not in self.vocabulary:
                self.vocabulary[token] = self.vocab_size
                self.vocab_size += 1
    
    def _text_to_vector(self, text):
        """Convert text to a fixed-dimension vector."""
        tokens = self._preprocess_text(text)
        self._update_vocabulary(tokens)
        
        # Create a simple frequency vector
        vec = np.zeros(self.dimension)
        for token in tokens:
            if token in self.vocabulary:
                # Use modulo to ensure we stay within dimension
                idx = self.vocabulary[token] % self.dimension
                vec[idx] += 1
        
        # Normalize the vector
        norm = np.linalg.norm(vec)
        if norm > 0:
            vec = vec / norm
        return vec
    
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Embed a list of documents."""
        return [self._text_to_vector(text).tolist() for text in texts]
    
    def embed_query(self, text: str) -> List[float]:
        """Embed a query."""
        return self._text_to_vector(text).tolist()

class GuidelineRetriever:
    """
    A class to handle the storage and retrieval of medical guidelines using ChromaDB.
    """
    
    def __init__(self, persist_directory="./chroma_db"):
        """
        Initialize the retriever with a simple embedding approach.
        
        Args:
            persist_directory: Directory to persist the vector database
        """
        self.persist_directory = persist_directory
        
        # Use our simple embeddings implementation
        self.embeddings = SimpleEmbeddings(dimension=100)
        
        # Create the directory if it doesn't exist
        os.makedirs(persist_directory, exist_ok=True)
        
        # Initialize the vector store
        self.db = None
    
    def add_guidelines(self, guidelines_dir):
        """
        Add medical guidelines from text files in a directory.
        
        Args:
            guidelines_dir: Directory containing guideline text files
        """
        # Load documents from the directory
        loader = DirectoryLoader(guidelines_dir, glob="**/*.txt", loader_cls=TextLoader)
        documents = loader.load()
        
        # Split documents into chunks
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200
        )
        splits = text_splitter.split_documents(documents)
        
        # Create or update the vector store
        self.db = Chroma.from_documents(
            documents=splits,
            embedding=self.embeddings,
            persist_directory=self.persist_directory
        )
        
        # Persist the database
        self.db.persist()
        
        return len(splits)
    
    def add_guideline_text(self, text, metadata=None):
        """
        Add a single guideline text to the vector store.
        
        Args:
            text: The guideline text
            metadata: Optional metadata for the document
        """
        if metadata is None:
            metadata = {}
            
        # Create a document
        doc = Document(page_content=text, metadata=metadata)
        
        # Split the document
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200
        )
        splits = text_splitter.split_documents([doc])
        
        # Create or update the vector store
        if self.db is None:
            self.db = Chroma.from_documents(
                documents=splits,
                embedding=self.embeddings,
                persist_directory=self.persist_directory
            )
        else:
            self.db.add_documents(splits)
        
        # Persist the database
        self.db.persist()
        
        return len(splits)
    
    def retrieve(self, query, top_k=3):
        """
        Retrieve relevant guideline chunks based on a query.
        
        Args:
            query: The search query
            top_k: Number of results to return
            
        Returns:
            List of retrieved documents
        """
        if self.db is None:
            return []
            
        results = self.db.similarity_search(query, k=top_k)
        return results
    
    def load_if_exists(self):
        """
        Load the vector store if it exists.
        
        Returns:
            True if loaded successfully, False otherwise
        """
        try:
            self.db = Chroma(
                persist_directory=self.persist_directory,
                embedding_function=self.embeddings
            )
            return True
        except Exception as e:
            print(f"Error loading vector store: {e}")
            return False

### 2.4 Query Generator

This component generates queries for retrieving relevant medical guidelines based on patient information.

In [4]:
from typing import List, Optional
from pydantic import BaseModel, Field

class RecommendedQuery(BaseModel):
    """Represents a query to retrieve guideline sections relevant to the patient's conditions."""
    query: str
    rationale: str = Field(..., description="Explanation of why this query is relevant to the patient.")

class QueryGenerator:
    """
    A class to generate queries for retrieving relevant medical guidelines based on patient information.
    """
    
    def __init__(self):
        """Initialize the query generator."""
        pass
    
    def generate_queries(self, patient_info) -> List[RecommendedQuery]:
        """
        Generate queries based on patient information.
        
        Args:
            patient_info: PatientInfo object containing patient data
            
        Returns:
            List of RecommendedQuery objects
        """
        queries = []
        
        # Extract active conditions
        active_conditions = [c for c in patient_info.conditions if c.clinical_status == "active"]
        
        # Generate queries for each active condition
        for condition in active_conditions:
            # Basic query for the condition
            queries.append(
                RecommendedQuery(
                    query=f"Treatment guidelines for {condition.display}",
                    rationale=f"Patient has an active diagnosis of {condition.display}."
                )
            )
            
            # Query for medication recommendations
            queries.append(
                RecommendedQuery(
                    query=f"Medication recommendations for {condition.display}",
                    rationale=f"To evaluate if current medications align with guidelines for {condition.display}."
                )
            )
            
            # Query for monitoring recommendations
            queries.append(
                RecommendedQuery(
                    query=f"Monitoring and follow-up for {condition.display}",
                    rationale=f"To ensure appropriate monitoring for {condition.display}."
                )
            )
        
        # If patient has multiple conditions, add a query for comorbidities
        if len(active_conditions) > 1:
            condition_names = [c.display for c in active_conditions]
            comorbidity_query = f"Management of patients with {' and '.join(condition_names)}"
            queries.append(
                RecommendedQuery(
                    query=comorbidity_query,
                    rationale="Patient has multiple conditions that may require coordinated management."
                )
            )
        
        # Add age-specific queries if available
        if patient_info.birth_date:
            # Simple age calculation (not accounting for exact date)
            try:
                birth_year = int(patient_info.birth_date.split("-")[0])
                current_year = 2025  # Using current year as reference
                age = current_year - birth_year
                
                if age >= 65:
                    queries.append(
                        RecommendedQuery(
                            query="Elderly patient care guidelines",
                            rationale=f"Patient is {age} years old and may require age-specific considerations."
                        )
                    )
                elif age <= 18:
                    queries.append(
                        RecommendedQuery(
                            query="Pediatric patient care guidelines",
                            rationale=f"Patient is {age} years old and may require age-specific considerations."
                        )
                    )
            except:
                # If birth date parsing fails, skip age-specific queries
                pass
        
        return queries

### 2.5 RAG Workflow

This component ties everything together to process patient data, retrieve relevant guidelines, and generate a case summary.

In [5]:
from typing import List, Optional, Dict, Any
from pydantic import BaseModel, Field
import json
import os

class CaseSummary(BaseModel):
    """Represents a patient case summary with recommendations."""
    patient_summary: str = Field(..., description="Summary of patient information")
    guideline_recommendations: str = Field(..., description="Recommendations based on clinical guidelines")
    care_gaps: str = Field(..., description="Identified gaps in care")
    next_steps: str = Field(..., description="Recommended next steps for the clinician")

class RAGWorkflow:
    """
    A workflow that processes patient data, retrieves relevant guidelines,
    and generates a case summary with recommendations.
    """
    
    def __init__(self, guideline_retriever: GuidelineRetriever):
        """
        Initialize the workflow.
        
        Args:
            guideline_retriever: The retriever for medical guidelines
        """
        self.guideline_retriever = guideline_retriever
        self.query_generator = QueryGenerator()
    
    def process_patient(self, patient_info: PatientInfo) -> CaseSummary:
        """
        Process patient information and generate a case summary.
        
        Args:
            patient_info: PatientInfo object containing patient data
            
        Returns:
            CaseSummary object
        """
        # Generate queries based on patient information
        recommended_queries = self.query_generator.generate_queries(patient_info)
        
        # Retrieve relevant guidelines for each query
        all_guideline_docs = []
        for query in recommended_queries:
            guideline_docs = self.guideline_retriever.retrieve(query.query)
            all_guideline_docs.extend(guideline_docs)
        
        # Deduplicate guidelines
        unique_guideline_texts = set()
        unique_guideline_docs = []
        
        for doc in all_guideline_docs:
            if doc.page_content not in unique_guideline_texts:
                unique_guideline_texts.add(doc.page_content)
                unique_guideline_docs.append(doc)
        
        # Generate case summary using template-based approach
        summary = self._generate_summary_with_template(patient_info, unique_guideline_docs, recommended_queries)
        
        return summary
    
    def _generate_summary_with_template(self, patient_info: PatientInfo, guideline_docs: List, queries: List[RecommendedQuery]) -> CaseSummary:
        """
        Generate a case summary using a template-based approach (no LLM).
        
        Args:
            patient_info: PatientInfo object
            guideline_docs: List of retrieved guideline documents
            queries: List of recommended queries
            
        Returns:
            CaseSummary object
        """
        # Patient summary
        patient_summary = f"""
        Patient {patient_info.given_name} {patient_info.family_name} is a {patient_info.gender} 
        born on {patient_info.birth_date}. 
        
        Active conditions: {', '.join([c.display for c in patient_info.conditions if c.clinical_status == 'active'])}
        
        Current medications: {', '.join([m.name for m in patient_info.current_medications])}
        
        Recent encounters: {', '.join([f"{e.date}: {e.reason_display or e.type_display or 'Unknown'}" for e in patient_info.recent_encounters[:3]])}
        """
        
        # Guideline recommendations
        guideline_recommendations = "Based on retrieved guidelines:\n\n"
        
        if guideline_docs:
            for i, doc in enumerate(guideline_docs[:5]):  # Limit to 5 recommendations
                guideline_recommendations += f"- {doc.page_content[:200]}...\n\n"
        else:
            guideline_recommendations += "No specific guidelines were retrieved. Consider consulting standard care protocols.\n"
        
        # Care gaps (simplified without LLM)
        care_gaps = """
        Potential care gaps to consider:
        - Verify that current medications align with latest guideline recommendations
        - Ensure appropriate monitoring and follow-up for all active conditions
        - Check if preventive care measures are up-to-date
        """
        
        # Next steps
        next_steps = """
        Recommended next steps:
        1. Review the patient's current treatment plan against the guidelines
        2. Consider adjustments to medications or monitoring if indicated
        3. Schedule appropriate follow-up based on condition severity
        4. Document any changes to the treatment plan
        """
        
        return CaseSummary(
            patient_summary=patient_summary,
            guideline_recommendations=guideline_recommendations,
            care_gaps=care_gaps,
            next_steps=next_steps
        )

## 3. Create Sample Patient Data

Let's create a sample patient data file for testing.

In [6]:
import json
import os

# Create data directory if it doesn't exist
os.makedirs("data", exist_ok=True)

# Sample patient data in FHIR format
patient_data = {
    "resourceType": "Bundle",
    "type": "collection",
    "entry": [
        {
            "resource": {
                "resourceType": "Patient",
                "id": "example-patient",
                "name": [
                    {
                        "given": ["John"],
                        "family": "Smith"
                    }
                ],
                "birthDate": "1970-05-15",
                "gender": "male"
            }
        },
        {
            "resource": {
                "resourceType": "Condition",
                "id": "condition-diabetes",
                "subject": {
                    "reference": "Patient/example-patient"
                },
                "code": {
                    "coding": [
                        {
                            "system": "http://snomed.info/sct",
                            "code": "73211009",
                            "display": "Diabetes mellitus type 2"
                        }
                    ]
                },
                "clinicalStatus": {
                    "coding": [
                        {
                            "system": "http://terminology.hl7.org/CodeSystem/condition-clinical",
                            "code": "active",
                            "display": "Active"
                        }
                    ]
                }
            }
        },
        {
            "resource": {
                "resourceType": "Condition",
                "id": "condition-hypertension",
                "subject": {
                    "reference": "Patient/example-patient"
                },
                "code": {
                    "coding": [
                        {
                            "system": "http://snomed.info/sct",
                            "code": "38341003",
                            "display": "Hypertension"
                        }
                    ]
                },
                "clinicalStatus": {
                    "coding": [
                        {
                            "system": "http://terminology.hl7.org/CodeSystem/condition-clinical",
                            "code": "active",
                            "display": "Active"
                        }
                    ]
                }
            }
        },
        {
            "resource": {
                "resourceType": "MedicationRequest",
                "id": "medication-metformin",
                "subject": {
                    "reference": "Patient/example-patient"
                },
                "medicationCodeableConcept": {
                    "coding": [
                        {
                            "system": "http://www.nlm.nih.gov/research/umls/rxnorm",
                            "code": "860975",
                            "display": "Metformin 500 MG"
                        }
                    ]
                },
                "authoredOn": "2023-01-15",
                "status": "active",
                "dosageInstruction": [
                    {
                        "text": "Take 500mg twice daily with meals"
                    }
                ]
            }
        },
        {
            "resource": {
                "resourceType": "MedicationRequest",
                "id": "medication-lisinopril",
                "subject": {
                    "reference": "Patient/example-patient"
                },
                "medicationCodeableConcept": {
                    "coding": [
                        {
                            "system": "http://www.nlm.nih.gov/research/umls/rxnorm",
                            "code": "314076",
                            "display": "Lisinopril 10 MG"
                        }
                    ]
                },
                "authoredOn": "2023-02-10",
                "status": "active",
                "dosageInstruction": [
                    {
                        "text": "Take 10mg once daily"
                    }
                ]
            }
        },
        {
            "resource": {
                "resourceType": "Encounter",
                "id": "encounter-1",
                "subject": {
                    "reference": "Patient/example-patient"
                },
                "period": {
                    "start": "2024-01-10"
                },
                "type": [
                    {
                        "coding": [
                            {
                                "system": "http://terminology.hl7.org/CodeSystem/encounter-type",
                                "code": "AMB",
                                "display": "Ambulatory"
                            }
                        ]
                    }
                ],
                "reasonCode": [
                    {
                        "coding": [
                            {
                                "system": "http://snomed.info/sct",
                                "code": "73211009",
                                "display": "Diabetes follow-up"
                            }
                        ]
                    }
                ]
            }
        },
        {
            "resource": {
                "resourceType": "Encounter",
                "id": "encounter-2",
                "subject": {
                    "reference": "Patient/example-patient"
                },
                "period": {
                    "start": "2024-02-15"
                },
                "type": [
                    {
                        "coding": [
                            {
                                "system": "http://terminology.hl7.org/CodeSystem/encounter-type",
                                "code": "AMB",
                                "display": "Ambulatory"
                            }
                        ]
                    }
                ],
                "reasonCode": [
                    {
                        "coding": [
                            {
                                "system": "http://snomed.info/sct",
                                "code": "38341003",
                                "display": "Hypertension follow-up"
                            }
                        ]
                    }
                ]
            }
        }
    ]
}

# Save to file
with open('data/sample_patient.json', 'w') as f:
    json.dump(patient_data, f, indent=2)

print("Sample patient data created at data/sample_patient.json")

Sample patient data created at data/sample_patient.json


## 4. Create Sample Guidelines

Let's create some sample medical guidelines for testing.

In [7]:
import os

# Create guidelines directory if it doesn't exist
os.makedirs("data/guidelines", exist_ok=True)

# Sample diabetes guideline
diabetes_guideline = """
# Diabetes Management Guidelines

## Diagnosis
Diabetes mellitus is diagnosed based on one of the following criteria:
- Fasting plasma glucose ≥ 126 mg/dL (7.0 mmol/L)
- 2-hour plasma glucose ≥ 200 mg/dL (11.1 mmol/L) during OGTT
- A1C ≥ 6.5% (48 mmol/mol)
- Random plasma glucose ≥ 200 mg/dL (11.1 mmol/L) in patients with symptoms of hyperglycemia

## Treatment Goals
- A1C < 7.0% for most adults
- Blood pressure < 140/90 mmHg
- LDL cholesterol < 100 mg/dL

## Medication Recommendations
First-line therapy: Metformin (unless contraindicated)
Second-line options (based on patient factors):
- GLP-1 receptor agonists
- SGLT-2 inhibitors
- DPP-4 inhibitors
- Sulfonylureas
- Thiazolidinediones
- Insulin

## Monitoring
- A1C testing: Every 3 months until target is reached, then at least twice per year
- Annual comprehensive foot examination
- Annual dilated eye examination
- Annual screening for albuminuria
- Lipid profile and kidney function tests annually
"""

# Sample hypertension guideline
hypertension_guideline = """
# Hypertension Management Guidelines

## Diagnosis
Hypertension is defined as:
- Systolic BP ≥ 130 mmHg or
- Diastolic BP ≥ 80 mmHg

## Classification
- Normal: < 120/80 mmHg
- Elevated: 120-129/< 80 mmHg
- Stage 1: 130-139/80-89 mmHg
- Stage 2: ≥ 140/90 mmHg

## Treatment Goals
- General population: < 130/80 mmHg
- Older adults (≥ 65 years): Target based on clinical judgment and patient preference

## Medication Recommendations
First-line agents:
- Thiazide diuretics
- ACE inhibitors
- ARBs
- Calcium channel blockers

## Monitoring
- Home BP monitoring is recommended
- Follow-up every 3-6 months for stable patients
- Annual screening for other cardiovascular risk factors
"""

# Save the guidelines to files
with open('data/guidelines/diabetes.txt', 'w', encoding='utf-8') as f:
    f.write(diabetes_guideline)

with open('data/guidelines/hypertension.txt', 'w', encoding='utf-8') as f:
    f.write(hypertension_guideline)

print("Sample guidelines created in data/guidelines directory")

Sample guidelines created in data/guidelines directory


## 5. Initialize the RAG Application

Now let's initialize the RAG application components.

In [8]:
# Create a persistent directory for the vector store
os.makedirs("./chroma_db", exist_ok=True)

# Initialize the guideline retriever
guideline_retriever = GuidelineRetriever(persist_directory="./chroma_db")

# Add the guidelines to the vector store
num_chunks = guideline_retriever.add_guidelines("data/guidelines")
print(f"Added {num_chunks} chunks to the vector database.")

# Initialize the RAG workflow
workflow = RAGWorkflow(guideline_retriever=guideline_retriever)

Added 2 chunks to the vector database.


  self.db.persist()


## 6. Process Patient Data

Let's process the sample patient data and generate a case summary.

In [9]:
# Parse the patient data
patient_info = parse_synthea_patient("data/sample_patient.json")

# Display patient information
print(f"Patient: {patient_info.given_name} {patient_info.family_name}")
print(f"Birth Date: {patient_info.birth_date}")
print(f"Gender: {patient_info.gender}")
print("\nConditions:")
for condition in patient_info.conditions:
    print(f"- {condition.display} (Status: {condition.clinical_status})")
print("\nCurrent Medications:")
for med in patient_info.current_medications:
    print(f"- {med.name}")
    if med.instructions:
        print(f"  Instructions: {med.instructions}")
print("\nRecent Encounters:")
for encounter in patient_info.recent_encounters[:3]:  # Show only 3 most recent
    reason = encounter.reason_display or encounter.type_display or "Unknown"
    print(f"- {encounter.date}: {reason}")

Patient: John Smith
Birth Date: 1970-05-15
Gender: male

Conditions:
- Diabetes mellitus type 2 (Status: active)
- Hypertension (Status: active)

Current Medications:
- Metformin 500 MG
  Instructions: Take 500mg twice daily with meals
- Lisinopril 10 MG
  Instructions: Take 10mg once daily

Recent Encounters:
- 2024-02-15: Hypertension follow-up
- 2024-01-10: Diabetes follow-up


## 7. Generate Queries

Let's generate queries based on the patient information.

In [10]:
# Generate queries
query_generator = QueryGenerator()
recommended_queries = query_generator.generate_queries(patient_info)

# Display the recommended queries
print("Recommended Queries:")
for i, query in enumerate(recommended_queries):
    print(f"\nQuery {i+1}: {query.query}")
    print(f"Rationale: {query.rationale}")

Recommended Queries:

Query 1: Treatment guidelines for Diabetes mellitus type 2
Rationale: Patient has an active diagnosis of Diabetes mellitus type 2.

Query 2: Medication recommendations for Diabetes mellitus type 2
Rationale: To evaluate if current medications align with guidelines for Diabetes mellitus type 2.

Query 3: Monitoring and follow-up for Diabetes mellitus type 2
Rationale: To ensure appropriate monitoring for Diabetes mellitus type 2.

Query 4: Treatment guidelines for Hypertension
Rationale: Patient has an active diagnosis of Hypertension.

Query 5: Medication recommendations for Hypertension
Rationale: To evaluate if current medications align with guidelines for Hypertension.

Query 6: Monitoring and follow-up for Hypertension
Rationale: To ensure appropriate monitoring for Hypertension.

Query 7: Management of patients with Diabetes mellitus type 2 and Hypertension
Rationale: Patient has multiple conditions that may require coordinated management.


## 8. Generate Case Summary

Now let's generate a case summary for the patient.

In [11]:
# Process the patient data through the workflow
case_summary = workflow.process_patient(patient_info)

# Display the case summary
print("PATIENT SUMMARY")
print("==============\n")
print(case_summary.patient_summary)

print("\nGUIDELINE RECOMMENDATIONS")
print("========================\n")
print(case_summary.guideline_recommendations)

print("\nCARE GAPS")
print("=========\n")
print(case_summary.care_gaps)

print("\nNEXT STEPS")
print("==========\n")
print(case_summary.next_steps)

PATIENT SUMMARY


        Patient John Smith is a male 
        born on 1970-05-15. 
        
        Active conditions: Diabetes mellitus type 2, Hypertension
        
        Current medications: Metformin 500 MG, Lisinopril 10 MG
        
        Recent encounters: 2024-02-15: Hypertension follow-up, 2024-01-10: Diabetes follow-up
        

GUIDELINE RECOMMENDATIONS

Based on retrieved guidelines:

- # Hypertension Management Guidelines

## Diagnosis
Hypertension is defined as:
- Systolic BP â‰¥ 130 mmHg or
- Diastolic BP â‰¥ 80 mmHg

## Classification
- Normal: < 120/80 mmHg
- Elevated: 120-129/<...

- # Diabetes Management Guidelines

## Diagnosis
Diabetes mellitus is diagnosed based on one of the following criteria:
- Fasting plasma glucose â‰¥ 126 mg/dL (7.0 mmol/L)
- 2-hour plasma glucose â‰¥ 20...



CARE GAPS


        Potential care gaps to consider:
        - Verify that current medications align with latest guideline recommendations
        - Ensure appropriate monitoring 

## 9. Interactive Interface

Let's create an interactive interface for the RAG application using IPython widgets.

In [12]:
import streamlit as st
import json
import os
import tempfile
import numpy as np
from typing import List, Optional, Dict, Any
from pydantic import BaseModel, Field
import pypdf
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader, DirectoryLoader
from langchain.schema import Document

# Create directories
os.makedirs("data", exist_ok=True)
os.makedirs("data/guidelines", exist_ok=True)
os.makedirs("chroma_db", exist_ok=True)

# Define Pydantic models
class ConditionInfo(BaseModel):
    code: str
    display: str
    clinical_status: str

class EncounterInfo(BaseModel):
    date: str = Field(..., description="Date of the encounter.")
    reason_display: Optional[str] = Field(None, description="Reason for the encounter.")
    type_display: Optional[str] = Field(None, description="Type or class of the encounter.")

class MedicationInfo(BaseModel):
    name: str = Field(..., description="Name of the medication.")
    start_date: Optional[str] = Field(None, description="When the medication was prescribed.")
    instructions: Optional[str] = Field(None, description="Dosage instructions.")

class PatientInfo(BaseModel):
    given_name: str
    family_name: str
    birth_date: str
    gender: str
    conditions: List[ConditionInfo] = Field(default_factory=list)
    recent_encounters: List[EncounterInfo] = Field(default_factory=list, description="A few recent encounters.")
    current_medications: List[MedicationInfo] = Field(default_factory=list, description="Current active medications.")

    @property
    def demographic_str(self) -> str:
        """Get demographics string."""
        return f"""\
Given name: {self.given_name}
Family name: {self.family_name}
Birth date: {self.birth_date}
Gender: {self.gender}"""

class RecommendedQuery(BaseModel):
    """Represents a query to retrieve guideline sections relevant to the patient's conditions."""
    query: str
    rationale: str = Field(..., description="Explanation of why this query is relevant to the patient.")

class CaseSummary(BaseModel):
    """Represents a patient case summary with recommendations."""
    patient_summary: str = Field(..., description="Summary of patient information")
    guideline_recommendations: str = Field(..., description="Recommendations based on clinical guidelines")
    care_gaps: str = Field(..., description="Identified gaps in care")
    next_steps: str = Field(..., description="Recommended next steps for the clinician")

# Define SimpleEmbeddings class
class SimpleEmbeddings:
    """
    A simple embedding class that uses TF-IDF like approach for text embeddings.
    This is a lightweight alternative that doesn't require external models.
    """
    
    def __init__(self, dimension=100):
        """Initialize with a fixed embedding dimension."""
        self.dimension = dimension
        self.vocabulary = {}
        self.vocab_size = 0
        
    def _preprocess_text(self, text):
        """Simple text preprocessing."""
        # Convert to lowercase and split by whitespace
        return text.lower().split()
        
    def _update_vocabulary(self, tokens):
        """Update vocabulary with new tokens."""
        for token in tokens:
            if token not in self.vocabulary:
                self.vocabulary[token] = self.vocab_size
                self.vocab_size += 1
    
    def _text_to_vector(self, text):
        """Convert text to a fixed-dimension vector."""
        tokens = self._preprocess_text(text)
        self._update_vocabulary(tokens)
        
        # Create a simple frequency vector
        vec = np.zeros(self.dimension)
        for token in tokens:
            if token in self.vocabulary:
                # Use modulo to ensure we stay within dimension
                idx = self.vocabulary[token] % self.dimension
                vec[idx] += 1
        
        # Normalize the vector
        norm = np.linalg.norm(vec)
        if norm > 0:
            vec = vec / norm
        return vec
    
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Embed a list of documents."""
        return [self._text_to_vector(text).tolist() for text in texts]
    
    def embed_query(self, text: str) -> List[float]:
        """Embed a query."""
        return self._text_to_vector(text).tolist()

# Define GuidelineRetriever class
class GuidelineRetriever:
    """
    A class to handle the storage and retrieval of medical guidelines using ChromaDB.
    """
    
    def __init__(self, persist_directory="./chroma_db"):
        """
        Initialize the retriever with a simple embedding approach.
        
        Args:
            persist_directory: Directory to persist the vector database
        """
        self.persist_directory = persist_directory
        
        # Use our simple embeddings implementation
        self.embeddings = SimpleEmbeddings(dimension=100)
        
        # Create the directory if it doesn't exist
        os.makedirs(persist_directory, exist_ok=True)
        
        # Initialize the vector store
        self.db = None
        self.load_if_exists()
    
    def add_guidelines(self, guidelines_dir):
        """
        Add medical guidelines from text files in a directory.
        
        Args:
            guidelines_dir: Directory containing guideline text files
        """
        # Load documents from the directory
        loader = DirectoryLoader(guidelines_dir, glob="**/*.txt", loader_cls=TextLoader)
        documents = loader.load()
        
        if not documents:
            return 0
            
        # Split documents into chunks
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200
        )
        splits = text_splitter.split_documents(documents)
        
        # Create or update the vector store
        if self.db is None:
            self.db = Chroma.from_documents(
                documents=splits,
                embedding=self.embeddings,
                persist_directory=self.persist_directory
            )
        else:
            self.db.add_documents(splits)
        
        # Persist the database
        self.db.persist()
        
        return len(splits)
    
    def add_guideline_text(self, text, metadata=None):
        """
        Add a single guideline text to the vector store.
        
        Args:
            text: The guideline text
            metadata: Optional metadata for the document
        """
        if metadata is None:
            metadata = {}
            
        # Create a document
        doc = Document(page_content=text, metadata=metadata)
        
        # Split the document
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200
        )
        splits = text_splitter.split_documents([doc])
        
        # Create or update the vector store
        if self.db is None:
            self.db = Chroma.from_documents(
                documents=splits,
                embedding=self.embeddings,
                persist_directory=self.persist_directory
            )
        else:
            self.db.add_documents(splits)
        
        # Persist the database
        self.db.persist()
        
        return len(splits)
    
    def retrieve(self, query, top_k=3):
        """
        Retrieve relevant guideline chunks based on a query.
        
        Args:
            query: The search query
            top_k: Number of results to return
            
        Returns:
            List of retrieved documents
        """
        if self.db is None:
            return []
            
        results = self.db.similarity_search(query, k=top_k)
        return results
    
    def load_if_exists(self):
        """
        Load the vector store if it exists.
        
        Returns:
            True if loaded successfully, False otherwise
        """
        try:
            self.db = Chroma(
                persist_directory=self.persist_directory,
                embedding_function=self.embeddings
            )
            return True
        except Exception as e:
            print(f"Error loading vector store: {e}")
            return False

# Define QueryGenerator class
class QueryGenerator:
    """
    A class to generate queries for retrieving relevant medical guidelines based on patient information.
    """
    
    def __init__(self):
        """Initialize the query generator."""
        pass
    
    def generate_queries(self, patient_info) -> List[RecommendedQuery]:
        """
        Generate queries based on patient information.
        
        Args:
            patient_info: PatientInfo object containing patient data
            
        Returns:
            List of RecommendedQuery objects
        """
        queries = []
        
        # Extract active conditions
        active_conditions = [c for c in patient_info.conditions if c.clinical_status == "active"]
        
        # Generate queries for each active condition
        for condition in active_conditions:
            # Basic query for the condition
            queries.append(
                RecommendedQuery(
                    query=f"Treatment guidelines for {condition.display}",
                    rationale=f"Patient has an active diagnosis of {condition.display}."
                )
            )
            
            # Query for medication recommendations
            queries.append(
                RecommendedQuery(
                    query=f"Medication recommendations for {condition.display}",
                    rationale=f"To evaluate if current medications align with guidelines for {condition.display}."
                )
            )
            
            # Query for monitoring recommendations
            queries.append(
                RecommendedQuery(
                    query=f"Monitoring and follow-up for {condition.display}",
                    rationale=f"To ensure appropriate monitoring for {condition.display}."
                )
            )
        
        # If patient has multiple conditions, add a query for comorbidities
        if len(active_conditions) > 1:
            condition_names = [c.display for c in active_conditions]
            comorbidity_query = f"Management of patients with {' and '.join(condition_names)}"
            queries.append(
                RecommendedQuery(
                    query=comorbidity_query,
                    rationale="Patient has multiple conditions that may require coordinated management."
                )
            )
        
        # Add age-specific queries if available
        if patient_info.birth_date:
            # Simple age calculation (not accounting for exact date)
            try:
                birth_year = int(patient_info.birth_date.split("-")[0])
                current_year = 2025  # Using current year as reference
                age = current_year - birth_year
                
                if age >= 65:
                    queries.append(
                        RecommendedQuery(
                            query="Elderly patient care guidelines",
                            rationale=f"Patient is {age} years old and may require age-specific considerations."
                        )
                    )
                elif age <= 18:
                    queries.append(
                        RecommendedQuery(
                            query="Pediatric patient care guidelines",
                            rationale=f"Patient is {age} years old and may require age-specific considerations."
                        )
                    )
            except:
                # If birth date parsing fails, skip age-specific queries
                pass
        
        return queries

# Define RAGWorkflow class
class RAGWorkflow:
    """
    A workflow that processes patient data, retrieves relevant guidelines,
    and generates a case summary with recommendations.
    """
    
    def __init__(self, guideline_retriever: GuidelineRetriever):
        """
        Initialize the workflow.
        
        Args:
            guideline_retriever: The retriever for medical guidelines
        """
        self.guideline_retriever = guideline_retriever
        self.query_generator = QueryGenerator()
    
    def process_patient(self, patient_info: PatientInfo) -> CaseSummary:
        """
        Process patient information and generate a case summary.
        
        Args:
            patient_info: PatientInfo object containing patient data
            
        Returns:
            CaseSummary object
        """
        # Generate queries based on patient information
        recommended_queries = self.query_generator.generate_queries(patient_info)
        
        # Retrieve relevant guidelines for each query
        all_guideline_docs = []
        for query in recommended_queries:
            guideline_docs = self.guideline_retriever.retrieve(query.query)
            all_guideline_docs.extend(guideline_docs)
        
        # Deduplicate guidelines
        unique_guideline_texts = set()
        unique_guideline_docs = []
        
        for doc in all_guideline_docs:
            if doc.page_content not in unique_guideline_texts:
                unique_guideline_texts.add(doc.page_content)
                unique_guideline_docs.append(doc)
        
        # Generate case summary using template-based approach
        summary = self._generate_summary_with_template(patient_info, unique_guideline_docs, recommended_queries)
        
        return summary
    
    def _generate_summary_with_template(self, patient_info: PatientInfo, guideline_docs: List, queries: List[RecommendedQuery]) -> CaseSummary:
        """
        Generate a case summary using a template-based approach (no LLM).
        
        Args:
            patient_info: PatientInfo object
            guideline_docs: List of retrieved guideline documents
            queries: List of recommended queries
            
        Returns:
            CaseSummary object
        """
        # Patient summary
        patient_summary = f"""
        Patient {patient_info.given_name} {patient_info.family_name} is a {patient_info.gender} 
        born on {patient_info.birth_date}. 
        
        Active conditions: {', '.join([c.display for c in patient_info.conditions if c.clinical_status == 'active'])}
        
        Current medications: {', '.join([m.name for m in patient_info.current_medications])}
        
        Recent encounters: {', '.join([f"{e.date}: {e.reason_display or e.type_display or 'Unknown'}" for e in patient_info.recent_encounters[:3]])}
        """
        
        # Guideline recommendations
        guideline_recommendations = "Based on retrieved guidelines:\n\n"
        
        if guideline_docs:
            for i, doc in enumerate(guideline_docs[:5]):  # Limit to 5 recommendations
                guideline_recommendations += f"- {doc.page_content[:200]}...\n\n"
        else:
            guideline_recommendations += "No specific guidelines were retrieved. Consider consulting standard care protocols.\n"
        
        # Care gaps (simplified without LLM)
        care_gaps = """
        Potential care gaps to consider:
        - Verify that current medications align with latest guideline recommendations
        - Ensure appropriate monitoring and follow-up for all active conditions
        - Check if preventive care measures are up-to-date
        """
        
        # Next steps
        next_steps = """
        Recommended next steps:
        1. Review the patient's current treatment plan against the guidelines
        2. Consider adjustments to medications or monitoring if indicated
        3. Schedule appropriate follow-up based on condition severity
        4. Document any changes to the treatment plan
        """
        
        return CaseSummary(
            patient_summary=patient_summary,
            guideline_recommendations=guideline_recommendations,
            care_gaps=care_gaps,
            next_steps=next_steps
        )

# Define functions for patient data parsing
def parse_synthea_patient(file_path: str, filter_active: bool = True) -> PatientInfo:
    """
    Parse a Synthea-generated FHIR Bundle to extract patient information.
    
    Args:
        file_path: Path to the JSON file containing the FHIR Bundle
        filter_active: Whether to filter for only active conditions
        
    Returns:
        PatientInfo object containing extracted patient data
    """
    # Load the Synthea-generated FHIR Bundle
    with open(file_path, "r") as f:
        bundle = json.load(f)

    patient_resource = None
    conditions = []
    encounters = []
    medication_requests = []

    for entry in bundle.get("entry", []):
        resource = entry.get("resource", {})
        resource_type = resource.get("resourceType")

        if resource_type == "Patient":
            patient_resource = resource
        elif resource_type == "Condition":
            conditions.append(resource)
        elif resource_type == "Encounter":
            encounters.append(resource)
        elif resource_type == "MedicationRequest":
            medication_requests.append(resource)

    if not patient_resource:
        raise ValueError("No Patient resource found in the provided file.")

    # Extract patient demographics
    name_entry = patient_resource.get("name", [{}])[0]
    given_name = name_entry.get("given", [""])[0]
    family_name = name_entry.get("family", "")
    birth_date = patient_resource.get("birthDate", "")
    gender = patient_resource.get("gender", "")

    # Create PatientInfo object
    patient_info = PatientInfo(
        given_name=given_name,
        family_name=family_name,
        birth_date=birth_date,
        gender=gender
    )

    # Extract conditions
    for condition in conditions:
        clinical_status = condition.get("clinicalStatus", {}).get("coding", [{}])[0].get("code", "")
        
        # Skip if not active and filter_active is True
        if filter_active and clinical_status != "active":
            continue
            
        code_entry = condition.get("code", {}).get("coding", [{}])[0]
        code = code_entry.get("code", "")
        display = code_entry.get("display", "")
        
        patient_info.conditions.append(
            ConditionInfo(
                code=code,
                display=display,
                clinical_status=clinical_status
            )
        )

    # Extract recent encounters (sort by date, most recent first)
    sorted_encounters = sorted(
        encounters,
        key=lambda e: e.get("period", {}).get("start", ""),
        reverse=True
    )
    
    for encounter in sorted_encounters[:5]:  # Get 5 most recent encounters
        period = encounter.get("period", {})
        date = period.get("start", "")
        
        reason_display = ""
        if encounter.get("reasonCode"):
            reason_display = encounter.get("reasonCode", [{}])[0].get("coding", [{}])[0].get("display", "")
        
        type_display = ""
        if encounter.get("type"):
            type_display = encounter.get("type", [{}])[0].get("coding", [{}])[0].get("display", "")
        
        patient_info.recent_encounters.append(
            EncounterInfo(
                date=date,
                reason_display=reason_display,
                type_display=type_display
            )
        )

    # Extract current medications
    for med_request in medication_requests:
        status = med_request.get("status", "")
        
        # Skip if not active and filter_active is True
        if filter_active and status != "active":
            continue
            
        med_code_entry = med_request.get("medicationCodeableConcept", {}).get("coding", [{}])[0]
        med_name = med_code_entry.get("display", "")
        
        dosage_instruction = med_request.get("dosageInstruction", [{}])[0]
        text_instruction = dosage_instruction.get("text", "")
        
        start_date = ""
        if med_request.get("authoredOn"):
            start_date = med_request.get("authoredOn", "")
        
        patient_info.current_medications.append(
            MedicationInfo(
                name=med_name,
                start_date=start_date,
                instructions=text_instruction
            )
        )

    return patient_info

# Define functions for PDF processing
def extract_text_from_pdf(pdf_file):
    """
    Extract text from a PDF file.
    
    Args:
        pdf_file: Uploaded PDF file
        
    Returns:
        Extracted text as a string
    """
    text = ""
    try:
        # Create a temporary file to save the uploaded PDF
        with tempfile.NamedTemporaryFile(delete=False, suffix='.pdf') as tmp_file:
            tmp_file.write(pdf_file.getvalue())
            tmp_path = tmp_file.name
        
        # Open the PDF file
        with open(tmp_path, 'rb') as file:
            # Create a PDF reader object
            pdf_reader = pypdf.PdfReader(file)
            
            # Extract text from each page
            for page_num in range(len(pdf_reader.pages)):
                page = pdf_reader.pages[page_num]
                text += page.extract_text() + "\n\n"
        
        # Clean up the temporary file
        os.unlink(tmp_path)
        return text
    except Exception as e:
        st.error(f"Error extracting text from PDF: {e}")
        return ""

# Create sample patient data
def create_sample_patient_data():
    # Sample patient data in FHIR format
    patient_data = {
        "resourceType": "Bundle",
        "type": "collection",
        "entry": [
            {
                "resource": {
                    "resourceType": "Patient",
                    "id": "example-patient",
                    "name": [
                        {
                            "given": ["John"],
                            "family": "Smith"
                        }
                    ],
                    "birthDate": "1970-05-15",
                    "gender": "male"
                }
            },
            {
                "resource": {
                    "resourceType": "Condition",
                    "id": "condition-diabetes",
                    "subject": {
                        "reference": "Patient/example-patient"
                    },
                    "code": {
                        "coding": [
                            {
                                "system": "http://snomed.info/sct",
                                "code": "73211009",
                                "display": "Diabetes mellitus type 2"
                            }
                        ]
                    },
                    "clinicalStatus": {
                        "coding": [
                            {
                                "system": "http://terminology.hl7.org/CodeSystem/condition-clinical",
                                "code": "active",
                                "display": "Active"
                            }
                        ]
                    }
                }
            },
            {
                "resource": {
                    "resourceType": "Condition",
                    "id": "condition-hypertension",
                    "subject": {
                        "reference": "Patient/example-patient"
                    },
                    "code": {
                        "coding": [
                            {
                                "system": "http://snomed.info/sct",
                                "code": "38341003",
                                "display": "Hypertension"
                            }
                        ]
                    },
                    "clinicalStatus": {
                        "coding": [
                            {
                                "system": "http://terminology.hl7.org/CodeSystem/condition-clinical",
                                "code": "active",
                                "display": "Active"
                            }
                        ]
                    }
                }
            },
            {
                "resource": {
                    "resourceType": "MedicationRequest",
                    "id": "medication-metformin",
                    "subject": {
                        "reference": "Patient/example-patient"
                    },
                    "medicationCodeableConcept": {
                        "coding": [
                            {
                                "system": "http://www.nlm.nih.gov/research/umls/rxnorm",
                                "code": "860975",
                                "display": "Metformin 500 MG"
                            }
                        ]
                    },
                    "authoredOn": "2023-01-15",
                    "status": "active",
                    "dosageInstruction": [
                        {
                            "text": "Take 500mg twice daily with meals"
                        }
                    ]
                }
            },
            {
                "resource": {
                    "resourceType": "MedicationRequest",
                    "id": "medication-lisinopril",
                    "subject": {
                        "reference": "Patient/example-patient"
                    },
                    "medicationCodeableConcept": {
                        "coding": [
                            {
                                "system": "http://www.nlm.nih.gov/research/umls/rxnorm",
                                "code": "314076",
                                "display": "Lisinopril 10 MG"
                            }
                        ]
                    },
                    "authoredOn": "2023-02-10",
                    "status": "active",
                    "dosageInstruction": [
                        {
                            "text": "Take 10mg once daily"
                        }
                    ]
                }
            },
            {
                "resource": {
                    "resourceType": "Encounter",
                    "id": "encounter-1",
                    "subject": {
                        "reference": "Patient/example-patient"
                    },
                    "period": {
                        "start": "2024-01-10"
                    },
                    "type": [
                        {
                            "coding": [
                                {
                                    "system": "http://terminology.hl7.org/CodeSystem/encounter-type",
                                    "code": "AMB",
                                    "display": "Ambulatory"
                                }
                            ]
                        }
                    ],
                    "reasonCode": [
                        {
                            "coding": [
                                {
                                    "system": "http://snomed.info/sct",
                                    "code": "73211009",
                                    "display": "Diabetes follow-up"
                                }
                            ]
                        }
                    ]
                }
            },
            {
                "resource": {
                    "resourceType": "Encounter",
                    "id": "encounter-2",
                    "subject": {
                        "reference": "Patient/example-patient"
                    },
                    "period": {
                        "start": "2024-02-15"
                    },
                    "type": [
                        {
                            "coding": [
                                {
                                    "system": "http://terminology.hl7.org/CodeSystem/encounter-type",
                                    "code": "AMB",
                                    "display": "Ambulatory"
                                }
                            ]
                        }
                    ],
                    "reasonCode": [
                        {
                            "coding": [
                                {
                                    "system": "http://snomed.info/sct",
                                    "code": "38341003",
                                    "display": "Hypertension follow-up"
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }

    # Save to file
    with open('data/sample_patient.json', 'w') as f:
        json.dump(patient_data, f, indent=2)
    
    return 'data/sample_patient.json'

# Create sample guidelines
def create_sample_guidelines():
    # Sample diabetes guideline
    diabetes_guideline = """
    # Diabetes Management Guidelines

    ## Diagnosis
    Diabetes mellitus is diagnosed based on one of the following criteria:
    - Fasting plasma glucose ≥ 126 mg/dL (7.0 mmol/L)
    - 2-hour plasma glucose ≥ 200 mg/dL (11.1 mmol/L) during OGTT
    - A1C ≥ 6.5% (48 mmol/mol)
    - Random plasma glucose ≥ 200 mg/dL (11.1 mmol/L) in patients with symptoms of hyperglycemia

    ## Treatment Goals
    - A1C < 7.0% for most adults
    - Blood pressure < 140/90 mmHg
    - LDL cholesterol < 100 mg/dL

    ## Medication Recommendations
    First-line therapy: Metformin (unless contraindicated)
    Second-line options (based on patient factors):
    - GLP-1 receptor agonists
    - SGLT-2 inhibitors
    - DPP-4 inhibitors
    - Sulfonylureas
    - Thiazolidinediones
    - Insulin

    ## Monitoring
    - A1C testing: Every 3 months until target is reached, then at least twice per year
    - Annual comprehensive foot examination
    - Annual dilated eye examination
    - Annual screening for albuminuria
    - Lipid profile and kidney function tests annually
    """

    # Sample hypertension guideline
    hypertension_guideline = """
    # Hypertension Management Guidelines

    ## Diagnosis
    Hypertension is defined as:
    - Systolic BP ≥ 130 mmHg or
    - Diastolic BP ≥ 80 mmHg

    ## Classification
    - Normal: < 120/80 mmHg
    - Elevated: 120-129/< 80 mmHg
    - Stage 1: 130-139/80-89 mmHg
    - Stage 2: ≥ 140/90 mmHg

    ## Treatment Goals
    - General population: < 130/80 mmHg
    - Older adults (≥ 65 years): Target based on clinical judgment and patient preference

    ## Medication Recommendations
    First-line agents:
    - Thiazide diuretics
    - ACE inhibitors
    - ARBs
    - Calcium channel blockers

    ## Monitoring
    - Home BP monitoring is recommended
    - Follow-up every 3-6 months for stable patients
    - Annual screening for other cardiovascular risk factors
    """

    # Save the guidelines to files
    with open('data/guidelines/diabetes.txt', 'w') as f:
        f.write(diabetes_guideline)

    with open('data/guidelines/hypertension.txt', 'w') as f:
        f.write(hypertension_guideline)
    
    return ['data/guidelines/diabetes.txt', 'data/guidelines/hypertension.txt']

# Initialize the guideline retriever
guideline_retriever = GuidelineRetriever(persist_directory="./chroma_db")

# Initialize the RAG workflow
workflow = RAGWorkflow(guideline_retriever=guideline_retriever)

# Initialize session state variables
if 'patient_info' not in st.session_state:
    st.session_state.patient_info = None
if 'recommended_queries' not in st.session_state:
    st.session_state.recommended_queries = None
if 'case_summary' not in st.session_state:
    st.session_state.case_summary = None
if 'sample_data_created' not in st.session_state:
    st.session_state.sample_data_created = False
if 'sample_guidelines_created' not in st.session_state:
    st.session_state.sample_guidelines_created = False

# Streamlit app
st.title("Patient Case Summary RAG Application")

# Create tabs
tab1, tab2, tab3 = st.tabs(["Upload Patient Data", "Add Guidelines", "Generate Summary"])

# Tab 1: Upload Patient Data
with tab1:
    st.header("Upload Patient Data")
    st.write("Upload a FHIR JSON file containing patient data.")
    
    # Option to create sample data
    if not st.session_state.sample_data_created:
        if st.button("Create Sample Patient Data"):
            sample_file = create_sample_patient_data()
            st.session_state.sample_data_created = True
            st.success(f"Sample patient data created at {sample_file}")
    
    # File uploader
    uploaded_file = st.file_uploader("Upload FHIR JSON file", type=["json"])
    
    if uploaded_file is not None:
        # Save the uploaded file to a temporary location
        with tempfile.NamedTemporaryFile(delete=False, suffix='.json') as tmp_file:
            tmp_file.write(uploaded_file.getvalue())
            tmp_path = tmp_file.name
        
        try:
            # Parse the patient data
            patient_info = parse_synthea_patient(tmp_path)
            st.session_state.patient_info = patient_info
            
            # Display patient information
            st.subheader("Patient Information")
            st.write(f"**Name:** {patient_info.given_name} {patient_info.family_name}")
            st.write(f"**Birth Date:** {patient_info.birth_date}")
            st.write(f"**Gender:** {patient_info.gender}")
            
            st.subheader("Conditions")
            for condition in patient_info.conditions:
                st.write(f"- {condition.display} (Status: {condition.clinical_status})")
            
            st.subheader("Current Medications")
            for med in patient_info.current_medications:
                st.write(f"- {med.name}")
                if med.instructions:
                    st.write(f"  Instructions: {med.instructions}")
            
            st.subheader("Recent Encounters")
            for encounter in patient_info.recent_encounters[:3]:  # Show only 3 most recent
                reason = encounter.reason_display or encounter.type_display or "Unknown"
                st.write(f"- {encounter.date}: {reason}")
            
            # Generate queries
            query_generator = QueryGenerator()
            recommended_queries = query_generator.generate_queries(patient_info)
            st.session_state.recommended_queries = recommended_queries
            
        except Exception as e:
            st.error(f"Error parsing patient data: {str(e)}")
        
        # Clean up the temporary file
        os.unlink(tmp_path)
    
    # Use sample data
    elif st.session_state.sample_data_created:
        if st.button("Use Sample Patient Data"):
            try:
                # Parse the sample patient data
                patient_info = parse_synthea_patient('data/sample_patient.json')
                st.session_state.patient_info = patient_info
                
                # Display patient information
                st.subheader("Patient Information")
                st.write(f"**Name:** {patient_info.given_name} {patient_info.family_name}")
                st.write(f"**Birth Date:** {patient_info.birth_date}")
                st.write(f"**Gender:** {patient_info.gender}")
                
                st.subheader("Conditions")
                for condition in patient_info.conditions:
                    st.write(f"- {condition.display} (Status: {condition.clinical_status})")
                
                st.subheader("Current Medications")
                for med in patient_info.current_medications:
                    st.write(f"- {med.name}")
                    if med.instructions:
                        st.write(f"  Instructions: {med.instructions}")
                
                st.subheader("Recent Encounters")
                for encounter in patient_info.recent_encounters[:3]:  # Show only 3 most recent
                    reason = encounter.reason_display or encounter.type_display or "Unknown"
                    st.write(f"- {encounter.date}: {reason}")
                
                # Generate queries
                query_generator = QueryGenerator()
                recommended_queries = query_generator.generate_queries(patient_info)
                st.session_state.recommended_queries = recommended_queries
                
            except Exception as e:
                st.error(f"Error parsing sample patient data: {str(e)}")

# Tab 2: Add Guidelines
with tab2:
    st.header("Add Guidelines")
    
    # Option to create sample guidelines
    if not st.session_state.sample_guidelines_created:
        if st.button("Add Sample Guidelines"):
            sample_files = create_sample_guidelines()
            num_chunks = guideline_retriever.add_guidelines('data/guidelines')
            st.session_state.sample_guidelines_created = True
            st.success(f"Sample guidelines created and added to the knowledge base ({num_chunks} chunks).")
    
    # Option 1: Enter guideline text
    st.subheader("Option 1: Enter Guideline Text")
    guideline_title = st.text_input("Guideline Title")
    guideline_text = st.text_area("Guideline Text", height=200)
    
    if st.button("Add Guideline Text"):
        if guideline_title and guideline_text:
            metadata = {"title": guideline_title}
            num_chunks = guideline_retriever.add_guideline_text(guideline_text, metadata)
            st.success(f"Successfully added guideline '{guideline_title}' ({num_chunks} chunks) to the knowledge base.")
        else:
            st.error("Please enter both a title and text for the guideline.")
    
    # Option 2: Upload guideline files
    st.subheader("Option 2: Upload Guideline Files")
    st.write("Upload text files or PDF files containing guidelines.")
    
    uploaded_files = st.file_uploader("Upload Files", type=["txt", "pdf"], accept_multiple_files=True)
    
    if uploaded_files and st.button("Process Files"):
        # Create a temporary directory
        with tempfile.TemporaryDirectory() as tmp_dir:
            guidelines_dir = os.path.join(tmp_dir, 'guidelines')
            os.makedirs(guidelines_dir, exist_ok=True)
            
            # Process each file
            for uploaded_file in uploaded_files:
                if uploaded_file.name.lower().endswith('.pdf'):
                    # Extract text from PDF
                    st.write(f"Processing PDF: {uploaded_file.name}")
                    text = extract_text_from_pdf(uploaded_file)
                    
                    if text:
                        # Save the extracted text to a file
                        txt_filename = os.path.splitext(uploaded_file.name)[0] + ".txt"
                        txt_path = os.path.join(guidelines_dir, txt_filename)
                        
                        with open(txt_path, 'w', encoding='utf-8') as f:
                            f.write(text)
                        
                        st.write(f"Extracted {len(text.split())} words from {uploaded_file.name}")
                    else:
                        st.error(f"Failed to extract text from {uploaded_file.name}")
                
                elif uploaded_file.name.lower().endswith('.txt'):
                    # Save text file directly
                    txt_path = os.path.join(guidelines_dir, uploaded_file.name)
                    with open(txt_path, 'w', encoding='utf-8') as f:
                        f.write(uploaded_file.getvalue().decode('utf-8'))
            
            # Add the guidelines to the vector store
            num_chunks = guideline_retriever.add_guidelines(guidelines_dir)
            st.success(f"Successfully added {len(uploaded_files)} guideline files ({num_chunks} chunks) to the knowledge base.")

# Tab 3: Generate Summary
with tab3:
    st.header("Generate Case Summary")
    
    if st.session_state.patient_info is not None:
        # Display patient name
        st.write(f"Patient: {st.session_state.patient_info.given_name} {st.session_state.patient_info.family_name}")
        
        # Display recommended queries
        if st.session_state.recommended_queries is not None:
            st.subheader("Recommended Queries")
            for i, query in enumerate(st.session_state.recommended_queries):
                with st.expander(f"Query {i+1}: {query.query}"):
                    st.write(f"**Rationale:** {query.rationale}")
        
        # Generate case summary
        if st.button("Generate Case Summary"):
            with st.spinner("Generating case summary..."):
                case_summary = workflow.process_patient(st.session_state.patient_info)
                st.session_state.case_summary = case_summary
        
        # Display case summary
        if st.session_state.case_summary is not None:
            st.subheader("Patient Summary")
            st.write(st.session_state.case_summary.patient_summary)
            
            st.subheader("Guideline Recommendations")
            st.write(st.session_state.case_summary.guideline_recommendations)
            
            st.subheader("Care Gaps")
            st.write(st.session_state.case_summary.care_gaps)
            
            st.subheader("Next Steps")
            st.write(st.session_state.case_summary.next_steps)
            
            # Option to download the summary
            summary_text = f"""
            # Patient Case Summary
            
            ## Patient Summary
            {st.session_state.case_summary.patient_summary}
            
            ## Guideline Recommendations
            {st.session_state.case_summary.guideline_recommendations}
            
            ## Care Gaps
            {st.session_state.case_summary.care_gaps}
            
            ## Next Steps
            {st.session_state.case_summary.next_steps}
            """
            
            st.download_button(
                label="Download Case Summary",
                data=summary_text,
                file_name="case_summary.md",
                mime="text/markdown"
            )
    else:
        st.info("Please upload patient data in the 'Upload Patient Data' tab first.")


  self.db = Chroma(
2025-03-13 09:12:37.902 
  command:

    streamlit run C:\Users\Aryan\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\ipykernel_launcher.py [ARGUMENTS]


## 10. Conclusion

This notebook provides a complete implementation of a RAG application for generating patient case summaries using open-source tools. The application extracts key details from patient data, retrieves relevant clinical guidelines, and generates comprehensive case summaries with recommendations.

### Key Components

1. **Patient Data Parser**: Extracts patient information from FHIR bundles
2. **PDF Processor**: Extracts text from PDF files
3. **Guideline Retriever**: Handles the storage and retrieval of medical guidelines using a vector database
4. **Query Generator**: Generates queries for retrieving relevant medical guidelines
5. **RAG Workflow**: Ties everything together to process patient data and generate case summaries
6. **Interactive Interface**: Provides a user-friendly interface for interacting with the application

### Usage

1. Upload patient data in FHIR JSON format
2. Add medical guidelines via text input or file upload (including PDF files)
3. Generate case summaries with recommendations based on the patient data and guidelines

This implementation demonstrates how to build a fully functional RAG application using only open-source, free resources.