# Privacy-Preserving RAG: Clinical Assistant with Tool Calling

This notebook implements a clinical assistant powered by OpenAI's GPT4.1 model that can answer patient-specific questions using structured data while preserving privacy through local de-identification.

In [1]:
import os
import pandas as pd

def generate_note_auto(table_name, patient_id):
    """
    Reads the generated *_notes.csv file for the given table and returns 
    the first clinical note found for the provided patient_id.

    Parameters:
    - table_name: str (e.g., "observations", "medications")
    - patient_id: str (the patient_id to search for)

    Returns:
    - str: the clinical note or a message if not found
    """
    notes_file = os.path.join("data", f"{table_name}_notes.csv")

    if not os.path.exists(notes_file):
        return f"Note file not found: {notes_file}"

    df = pd.read_csv(notes_file)
    
    # Normalize column names just in case
    df.columns = [col.strip().lower() for col in df.columns]

    if "patient_id" not in df.columns or "clinical_note" not in df.columns:
        return "Expected columns not found in the CSV."

    matching_notes = df[df["patient_id"] == patient_id]

    if matching_notes.empty:
        return f"No note found for patient_id: {patient_id}"

    return matching_notes.iloc[0]["clinical_note"]

In [2]:
# Install essential packages
# Run this cell first to install core dependencies

!pip install openai python-dotenv pandas numpy
!pip install transformers
!pip install peft
!pip install torch


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip

In [3]:
# Import necessary libraries
import os
import json
from typing import Dict, List, Any, Optional, Union
from datetime import datetime, date
import pandas as pd
import numpy as np

# OpenAI imports
from openai import OpenAI

# Environment variables
from dotenv import load_dotenv
load_dotenv()

print("✅ Libraries imported successfully!")
print(f"📅 Current date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

✅ Libraries imported successfully!
📅 Current date: 2025-08-09 16:13:37


In [7]:
# gated model login with Hugging Face CLI
# Make sure you have the Hugging Face CLI installed and authenticated
!pip install huggingface_hub
!pip install llama-cpp-python

from huggingface_hub import login
login(os.getenv("HUGGING_FACE_API_KEY"))

from llama_cpp import Llama

# Download the model to 'models/gemma-3-4b-it-qat-q4_0.gguf'
basic_deidentifier = Llama.from_pretrained(
    repo_id="google/gemma-3-4b-it-qat-q4_0-gguf",
    filename="gemma-3-4b-it-q4_0.gguf",  
    n_ctx=2048,
    n_threads=8
)


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


llama_model_load_from_file_impl: using device Metal (Apple M2) - 10916 MiB free
llama_model_loader: loaded meta data with 39 key-value pairs and 444 tensors from /Users/arunjoshi/.cache/huggingface/hub/models--google--gemma-3-4b-it-qat-q4_0-gguf/snapshots/15f73f5eee9c28f53afefef5723e29680c2fc78a/./gemma-3-4b-it-q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma3
llama_model_loader: - kv   1:                      gemma3.context_length u32              = 131072
llama_model_loader: - kv   2:                         gemma3.block_count u32              = 34
llama_model_loader: - kv   3:                    gemma3.embedding_length u32              = 2560
llama_model_loader: - kv   4:                 gemma3.feed_forward_length u32              = 10240
llama_model_loader: - kv   5:                gemma3.attention.he

In [8]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from peft import PeftModel
import torch

BASE_MODEL = "google/gemma-2b"
LORA_MODEL_PATH = "./gemma-deid-lora/checkpoint-60"

# Load tokenizer and base model
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Apply LoRA weights
gemma_finetuned_model = PeftModel.from_pretrained(base_model, LORA_MODEL_PATH)
gemma_finetuned_model.eval()

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): GemmaForCausalLM(
      (model): GemmaModel(
        (embed_tokens): Embedding(256000, 2048, padding_idx=0)
        (layers): ModuleList(
          (0-17): 18 x GemmaDecoderLayer(
            (self_attn): GemmaAttention(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=2048, out_features=2048, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=2048, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2048, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj)

In [20]:
deidentifier_prompt = """You are a local language model responsible for enforcing HIPAA compliance by identifying and 
        removing all Protected Health Information (PHI) from clinical text and structured data before 
        it is passed to an external system. Your task is to remove all 18 identifiers defined under 
        HIPAA's Safe Harbor rule while preserving the clinical meaning of the data.

        Redact all identifiers like names, dates, addresses, SSNs, etc. with placeholder [REDACTED], without summarizing or altering clinical facts."""


def _build_hipaa_prompt(user_data: str) -> str:
    """
    Build the common HIPAA compliance prompt for de-identification.
    
    Args:
        user_data: Clinical text containing PHI
    
    Returns:
        Formatted prompt for de-identification
    """
    return f"""
        {deidentifier_prompt}

        ---
        <data_with_phi>
        {user_data}
        </data_with_phi>
        <data_hipaa_compliant>
    """

def _extract_redacted_content(raw_output: str, prompt: str) -> str:
    """
    Extract the redacted content from model output.
    
    Args:
        raw_output: Raw output from the model
        prompt: Original prompt used
    
    Returns:
        Extracted redacted content
    """
    if "<data_hipaa_compliant>" in raw_output:
        redacted_part = raw_output.split("<data_hipaa_compliant>")[-1]
        redacted_part = redacted_part.split("</data_hipaa_compliant>")[0].strip()
    else:
        # Fallback: remove the prompt from the beginning
        redacted_part = raw_output[len(prompt):].strip()
    
    return redacted_part

In [19]:
def deidentify_with_basic_gemma(user_data: str) -> str:
    """
    De-identify clinical text using the basic Gemma model via llama-cpp-python.
    
    Args:
        user_data: Clinical text containing PHI
    
    Returns:
        De-identified text with PHI redacted
    """
    
    try:
        response = basic_deidentifier.create_chat_completion(
            messages=[
                {"role": "system", "content": deidentifier_prompt},
                {"role": "user", "content": user_data}
            ],
            temperature=0.2
        )
        raw_output = response["choices"][0]["message"]["content"]
        
        # Logging
        print(f"📄 Original note: {user_data}")
        print("🔒 Redaction complete (Basic Gemma). PHI has been removed from the clinical note.")
        print(f"📄 Redacted note: {raw_output}")

        return raw_output

    except Exception as e:
        print(f"❌ Error during basic de-identification: {e}")
        return f"[DEIDENTIFICATION_ERROR: {str(e)}]"

In [11]:
def deidentify_with_finetuned_gemma(user_data: str) -> str:
    """
    De-identify clinical text using the fine-tuned Gemma model with LoRA weights.
    
    Args:
        user_data: Clinical text containing PHI
    
    Returns:
        De-identified text with PHI redacted
    """
    prompt = _build_hipaa_prompt(user_data)
    
    try:
        device = "mps" if torch.mps.is_available() else "cpu"
        input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

        with torch.no_grad():
            output = gemma_finetuned_model.generate(
                input_ids,
                max_new_tokens=300,
                temperature=0.7,
                do_sample=False,
                eos_token_id=tokenizer.eos_token_id,
                pad_token_id=tokenizer.pad_token_id
            )

        raw_output = tokenizer.decode(output[0], skip_special_tokens=True)
        redacted_content = _extract_redacted_content(raw_output, prompt)
        
        # Logging
        print(f"📄 Original note: {user_data}")
        print("🔒 Redaction complete (Fine-tuned Gemma). PHI has been removed from the clinical note.")
        print(f"📄 Redacted note: {redacted_content}")
        
        return redacted_content
        
    except Exception as e:
        print(f"❌ Error during fine-tuned de-identification: {e}")
        return f"[DEIDENTIFICATION_ERROR: {str(e)}]"

In [12]:
# Self-contained Clinical Assistant with all configurations built-in
class ClinicalAssistant:
    """Self-contained clinical assistant with built-in configurations and tools"""
    
    def __init__(self, deidentify_model: str = "finetuned"):
        """
        Initialize the Clinical Assistant with built-in configurations
        
        Args:
            deidentify_model: "basic" or "finetuned" for de-identification model
        """
        self.deidentify_model = deidentify_model
        
        # Initialize OpenAI configuration
        self.api_key = os.getenv("OPENAI_API_KEY")
        if not self.api_key:
            raise ValueError("OPENAI_API_KEY not found in environment variables")
        
        self.client = OpenAI(api_key=self.api_key)
        self.model = "gpt-4.1"
        self.max_completion_tokens = 5000
        self.temperature = 0.1
        
        # Tool definitions
        self.tool_definitions = [
            {
                "type": "function",
                "function": {
                    "name": "get_patient_observations",
                    "description": "Retrieve laboratory test results and clinical observations for a patient",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "patient_id": {
                                "type": "string",
                                "description": "Unique patient identifier"
                            }
                        },
                        "required": ["patient_id"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "get_patient_conditions",
                    "description": "Retrieve patient diagnosis information, medical conditions, and medical history",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "patient_id": {
                                "type": "string",
                                "description": "Unique patient identifier"
                            }
                        },
                        "required": ["patient_id"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "get_patient_medications",
                    "description": "Retrieve current and past medications for a patient",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "patient_id": {
                                "type": "string",
                                "description": "Unique patient identifier"
                            }
                        },
                        "required": ["patient_id"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "get_patient_careplans",
                    "description": "Retrieve care plans and treatment plans for a patient",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "patient_id": {
                                "type": "string",
                                "description": "Unique patient identifier"
                            }
                        },
                        "required": ["patient_id"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "get_patient_procedures",
                    "description": "Retrieve medical procedures and interventions performed on a patient",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "patient_id": {
                                "type": "string",
                                "description": "Unique patient identifier"
                            }
                        },
                        "required": ["patient_id"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "get_patient_imaging_studies",
                    "description": "Retrieve imaging studies and radiology reports for a patient",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "patient_id": {
                                "type": "string",
                                "description": "Unique patient identifier"
                            }
                        },
                        "required": ["patient_id"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "get_patient_immunizations",
                    "description": "Retrieve vaccination history and immunization records for a patient",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "patient_id": {
                                "type": "string",
                                "description": "Unique patient identifier"
                            }
                        },
                        "required": ["patient_id"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "get_patient_allergies",
                    "description": "Retrieve allergy information and adverse reactions for a patient",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "patient_id": {
                                "type": "string",
                                "description": "Unique patient identifier"
                            }
                        },
                        "required": ["patient_id"]
                    }
                }
            }
        ]
        
        # Function mapping - directly use generate_note_auto with appropriate table names
        self.function_map = {
            "get_patient_observations": lambda patient_id: generate_note_auto("observations", patient_id),
            "get_patient_conditions": lambda patient_id: generate_note_auto("conditions", patient_id),
            "get_patient_medications": lambda patient_id: generate_note_auto("medications", patient_id),
            "get_patient_careplans": lambda patient_id: generate_note_auto("careplans", patient_id),
            "get_patient_procedures": lambda patient_id: generate_note_auto("procedures", patient_id),
            "get_patient_imaging_studies": lambda patient_id: generate_note_auto("imaging_studies", patient_id),
            "get_patient_immunizations": lambda patient_id: generate_note_auto("immunizations", patient_id),
            "get_patient_allergies": lambda patient_id: generate_note_auto("allergies", patient_id)
        }
        
        print(f"✅ Clinical Assistant initialized with {self.deidentify_model} de-identification model!")
    
    def _build_system_prompt(self) -> str:
        """Build the system prompt for the clinical assistant"""
        return f"""You are an AI Clinical Reasoning Assistant with expertise in internal medicine. 
        Analyze clinical data and respond to patient-specific questions using structured reasoning.

        Available Tools:
        - get_patient_observations(): Laboratory test results and clinical observations
        - get_patient_conditions(): Diagnosis history, medical conditions, and medical history
        - get_patient_medications(): Current and past medications
        - get_patient_careplans(): Care plans and treatment plans
        - get_patient_procedures(): Medical procedures and interventions
        - get_patient_imaging_studies(): Imaging studies and radiology reports
        - get_patient_immunizations(): Vaccination history and immunization records
        - get_patient_allergies(): Allergy information and adverse reactions

        Clinical Reasoning Framework:
        1. Analyze the question to determine needed information
        2. Use appropriate tools to gather relevant patient data
        3. Synthesize findings from multiple data sources
        4. Provide clear, evidence-based responses with clinical reasoning

        Rules:
        - Use ONLY data provided by tool outputs
        - Reference relative timeframes when provided (e.g., "Day 0", "Post-op Day 3")
        - Acknowledge limitations if data is insufficient
        - Suggest additional information needed when applicable
        - Consider interactions between medications, conditions, and procedures
        - Always check for allergies before recommending treatments

        Current date: {datetime.now().strftime('%Y-%m-%d')}
        """
    
    def _de_identify_data(self, data: str) -> str:
        """De-identify text data using the specified deidentify model"""
        if self.deidentify_model == "basic":
            return deidentify_with_basic_gemma(data)
        elif self.deidentify_model == "finetuned":
            return deidentify_with_finetuned_gemma(data)        
    
    def _execute_tool_call(self, function_name: str, arguments: Dict[str, Any]) -> str:
        """Execute a tool function call with de-identification"""
        if function_name not in self.function_map:
            return f"Error: Unknown function: {function_name}"
        
        print(f"Executing tool call: {function_name}")
        try:
            func = self.function_map[function_name]
            raw_result = func(**arguments)
            return self._de_identify_data(raw_result)
        except Exception as e:
            return f"Error executing {function_name}: {str(e)}"

    def process_query(self, query: str, patient_id: str = None) -> str:
        """
        Process a clinical query using OpenAI GPT with tool calling
        
        Args:
            query: The clinical question to answer
            patient_id: Optional patient ID if known
            
        Returns:
            Clinical assistant response
        """
        try:
            # Build messages
            messages = [
                {"role": "system", "content": self._build_system_prompt()},
                {"role": "user", "content": f"Clinical query: {query}"}
            ]
            
            if patient_id:
                messages[-1]["content"] += f"\nPatient ID: {patient_id}"
            
            # Get initial response from LLM
            response = self.client.chat.completions.create(
                model=self.model,
                messages=messages,
                tools=self.tool_definitions,
                tool_choice="auto",
                max_completion_tokens=self.max_completion_tokens,
                temperature=self.temperature
            )
            
            # Handle tool calls if any
            if response.choices[0].message.tool_calls:
                current_messages = messages.copy()
                current_messages.append({
                    "role": "assistant",
                    "content": response.choices[0].message.content,
                    "tool_calls": response.choices[0].message.tool_calls
                })
                
                # Process each tool call
                for tool_call in response.choices[0].message.tool_calls:
                    function_name = tool_call.function.name
                    function_args = json.loads(tool_call.function.arguments)
                    
                    # Execute tool call with de-identification
                    tool_result = self._execute_tool_call(function_name, function_args)
                    
                    current_messages.append({
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "content": tool_result
                    })
                
                # Get final response with tool results
                final_response = self.client.chat.completions.create(
                    model=self.model,
                    messages=current_messages,
                    tools=self.tool_definitions,
                    tool_choice="auto",
                    max_completion_tokens=self.max_completion_tokens,
                    temperature=self.temperature
                )
                return final_response.choices[0].message.content
            else:
                return response.choices[0].message.content
            
        except Exception as e:
            return f"❌ Error processing query: {str(e)}"

In [17]:
# Test Clinical Assistant with Different De-identification Models
print("🧪 Testing Clinical Assistant with Integrated De-identification")
print("=" * 60)

# Sample clinical queries to test the system with all available tools
sample_queries = [
    "Can you summarize this patient's medical conditions and current medications?",
    "Does this patient have any allergies I should be aware of before prescribing?",
    "Is the patient up to date on their vaccinations?",
    "The patient wants to try and get pregnant, is there any relevant medical history or medications that should be considered?",
]

# Enhanced interactive testing function that creates assistant instances
from IPython.display import display, Markdown

def test_query(query_text: str, patient_id: str = "12345678", deidentify_model: str = "finetuned"):
    """
    Test function that creates a Clinical Assistant instance with the specified model
    
    Args:
        query_text: The clinical question to ask
        patient_id: Patient identifier (default: "12345678")
        deidentify_model: "finetuned" or "basic" for de-identification model
    
    Returns:
        Assistant response
    """
    try:
        # Create assistant instance with the specified de-identification model
        assistant = ClinicalAssistant(deidentify_model=deidentify_model)
        
        model_name = "Fine-tuned Gemma" if deidentify_model == "finetuned" else "Basic Gemma"
        header_md = f"### 🩺 Clinical Query: {query_text}\n\n**Patient ID:** {patient_id}\n**De-identification Model:** {model_name}"
        
        display(Markdown(header_md))
        display(Markdown("---"))
        display(Markdown("### 🤖 Assistant Response"))

        response = assistant.process_query(query_text, patient_id)
        display(Markdown(f"```\n{response}\n```"))
        return response
        
    except Exception as e:
        error_msg = f"❌ Error: {str(e)}"
        display(Markdown(error_msg))
        return error_msg

🧪 Testing Clinical Assistant with Integrated De-identification


In [14]:


# Test with both de-identification models separately
test_patient_id = "1329b83e-ea69-d184-b4af-0d2a8e07896e"
query = sample_queries[1]

print("🔍 Testing Fine-tuned Gemma model:")
test_query(query, test_patient_id, "finetuned")

print("\n" + "="*60 + "\n")

print("🔍 Testing Basic Gemma model:")
test_query(query, test_patient_id, "basic")

🔍 Testing Fine-tuned Gemma model:
✅ Clinical Assistant initialized with finetuned de-identification model!


### 🩺 Clinical Query: Does this patient have any allergies I should be aware of before prescribing?

**Patient ID:** 1329b83e-ea69-d184-b4af-0d2a8e07896e
**De-identification Model:** Fine-tuned Gemma

---

### 🤖 Assistant Response

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Executing tool call: get_patient_allergies
📄 Original note: Patient: Ms. Yaeko Ming Kshlerin, SSN: 999-26-7676, born on 1999‑06‑07 in Oakes, North Dakota, presented to TOWNER COUNTY MEDICAL CENTER INC (HWY 281N, CANDO, ND 58324) on 2000‑11‑20 for an encounter for problem (procedure) related to allergic disposition; she reports a lifelong allergy to animal dander with moderate rhinoconjunctivitis and mild skin eruptions, and she is currently under the care of Dr. Shiloh Larson, general practice.  
The visit was classified as ambulatory, with a base encounter cost of $96.45 and a total claim cost of $483.55; payer coverage was $0.00, leaving her responsible for the full cost, while her total healthcare expenses amount to $127,546.31 against a coverage pool of $673,780.87, and her annual income is $63,061.  
Ms. Kshlerin resides at 523 O'Kon Orchard, Cando, ND 58324 (Towner County, FIPS 38095), and is a white, non‑Hispanic female with no recorded marital status.
🔒 Redaction complete (Fine

```
There is no documented allergy information available for this patient in the current records. Therefore, I cannot confirm whether the patient has any allergies you should be aware of before prescribing.

Recommendation:
- If possible, obtain a direct allergy history from the patient or review additional records to ensure safe prescribing.
- If you have specific concerns about certain medications or classes, please specify, and I can assist further.
```



🔍 Testing Basic Gemma model:
✅ Clinical Assistant initialized with basic de-identification model!


### 🩺 Clinical Query: Does this patient have any allergies I should be aware of before prescribing?

**Patient ID:** 1329b83e-ea69-d184-b4af-0d2a8e07896e
**De-identification Model:** Basic Gemma

---

### 🤖 Assistant Response

Executing tool call: get_patient_allergies


llama_perf_context_print:        load time =   22604.91 ms
llama_perf_context_print: prompt eval time =   22602.73 ms /   755 tokens (   29.94 ms per token,    33.40 tokens per second)
llama_perf_context_print:        eval time =   68461.57 ms /   914 runs   (   74.90 ms per token,    13.35 tokens per second)
llama_perf_context_print:       total time =   92091.28 ms /  1669 tokens


📄 Original note: Patient: Ms. Yaeko Ming Kshlerin, SSN: 999-26-7676, born on 1999‑06‑07 in Oakes, North Dakota, presented to TOWNER COUNTY MEDICAL CENTER INC (HWY 281N, CANDO, ND 58324) on 2000‑11‑20 for an encounter for problem (procedure) related to allergic disposition; she reports a lifelong allergy to animal dander with moderate rhinoconjunctivitis and mild skin eruptions, and she is currently under the care of Dr. Shiloh Larson, general practice.  
The visit was classified as ambulatory, with a base encounter cost of $96.45 and a total claim cost of $483.55; payer coverage was $0.00, leaving her responsible for the full cost, while her total healthcare expenses amount to $127,546.31 against a coverage pool of $673,780.87, and her annual income is $63,061.  
Ms. Kshlerin resides at 523 O'Kon Orchard, Cando, ND 58324 (Towner County, FIPS 38095), and is a white, non‑Hispanic female with no recorded marital status.
🔒 Redaction complete (Basic Gemma). PHI has been removed from the cli

```
This patient has a documented lifelong allergy to animal dander, which causes moderate rhinoconjunctivitis and mild skin eruptions. There are no other allergies listed in the available data.

Clinical recommendation:
- Be aware of this allergy when prescribing, especially if considering medications or treatments that may contain animal-derived components (e.g., certain vaccines, antitoxins, or biologics).
- No drug or medication allergies are documented in the current record.

If you are considering a specific medication or treatment, please specify so I can check for any relevant interactions or additional precautions.
```

'This patient has a documented lifelong allergy to animal dander, which causes moderate rhinoconjunctivitis and mild skin eruptions. There are no other allergies listed in the available data.\n\nClinical recommendation:\n- Be aware of this allergy when prescribing, especially if considering medications or treatments that may contain animal-derived components (e.g., certain vaccines, antitoxins, or biologics).\n- No drug or medication allergies are documented in the current record.\n\nIf you are considering a specific medication or treatment, please specify so I can check for any relevant interactions or additional precautions.'

In [22]:

# Test with both de-identification models separately
test_patient_id = "1329b83e-ea69-d184-b4af-0d2a8e07896e"
query = sample_queries[2]

print("🔍 Testing Fine-tuned Gemma model:")
test_query(query, test_patient_id, "finetuned")

print("\n" + "="*60 + "\n")

print("🔍 Testing Basic Gemma model:")
test_query(query, test_patient_id, "basic")

🔍 Testing Fine-tuned Gemma model:
✅ Clinical Assistant initialized with finetuned de-identification model!


### 🩺 Clinical Query: Is the patient up to date on their vaccinations?

**Patient ID:** 1329b83e-ea69-d184-b4af-0d2a8e07896e
**De-identification Model:** Fine-tuned Gemma

---

### 🤖 Assistant Response

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Executing tool call: get_patient_immunizations
📄 Original note: Patient: Ms. Yaeko Ming Kshlerin, born on 06/07/1999 (SSN: 999-26-7676), presented to TOWNER COUNTY MEDICAL CENTER on 08/30/2022 for a general wellness examination. Dr. Sal Lehner documented a routine physical, noting she received the Influenza seasonal injectable preservative free vaccine (code 140) at age 23, and discussed her healthcare expenses of $127,546.31 against coverage of $673,780.87, with a current income of $63,061; she resides at 523 O'Kon Orchard, Cando, North Dakota, 58324. Follow‑up is scheduled as needed; the total claim cost was $593.90, with $475.12 covered by her payer.
🔒 Redaction complete (Fine-tuned Gemma). PHI has been removed from the clinical note.
📄 Redacted note: Patient: [REDACTED], born on [REDACTED] (SSN: [REDACTED]), presented to [REDACTED] on [REDACTED] for a general wellness examination. Dr. [REDACTED] documented a routine physical, noting she received the Influenza seasonal injectable pr

```
The available immunization record only documents that the patient received a seasonal influenza vaccine at a previous visit. There is no information provided about other routine vaccinations (such as Tdap, MMR, varicella, pneumococcal, COVID-19, or others recommended by age and risk factors).

Based on the data provided, I cannot confirm whether the patient is fully up to date on all recommended vaccinations. 

Recommendation:
- Additional immunization records or a detailed vaccination history are needed to accurately determine if the patient is up to date.
- Consider reviewing the patient's full immunization record or contacting their primary care provider for a comprehensive assessment.
```



🔍 Testing Basic Gemma model:
✅ Clinical Assistant initialized with basic de-identification model!


### 🩺 Clinical Query: Is the patient up to date on their vaccinations?

**Patient ID:** 1329b83e-ea69-d184-b4af-0d2a8e07896e
**De-identification Model:** Basic Gemma

---

### 🤖 Assistant Response

Executing tool call: get_patient_immunizations


Llama.generate: 121 prefix-match hit, remaining 205 prompt tokens to eval
llama_perf_context_print:        load time =   22604.91 ms
llama_perf_context_print: prompt eval time =    4309.29 ms /   205 tokens (   21.02 ms per token,    47.57 tokens per second)
llama_perf_context_print:        eval time =   10369.49 ms /   153 runs   (   67.77 ms per token,    14.75 tokens per second)
llama_perf_context_print:       total time =   14769.54 ms /   358 tokens


📄 Original note: Patient: Ms. Yaeko Ming Kshlerin, born on 06/07/1999 (SSN: 999-26-7676), presented to TOWNER COUNTY MEDICAL CENTER on 08/30/2022 for a general wellness examination. Dr. Sal Lehner documented a routine physical, noting she received the Influenza seasonal injectable preservative free vaccine (code 140) at age 23, and discussed her healthcare expenses of $127,546.31 against coverage of $673,780.87, with a current income of $63,061; she resides at 523 O'Kon Orchard, Cando, North Dakota, 58324. Follow‑up is scheduled as needed; the total claim cost was $593.90, with $475.12 covered by her payer.
🔒 Redaction complete (Basic Gemma). PHI has been removed from the clinical note.
📄 Redacted note: Patient: Ms. [REDACTED], born on [REDACTED], presented to TOWNER COUNTY MEDICAL CENTER on [REDACTED] for a general wellness examination. Dr. [REDACTED] documented a routine physical, noting she received the Influenza seasonal injectable preservative free vaccine (code [REDACTED]) at age

```
The available immunization record only documents that the patient received a seasonal influenza vaccine at a previous visit. There is no information provided about other routine adult vaccinations (such as Tdap, pneumococcal, shingles, COVID-19, or others).

Based on the data available, I cannot confirm whether the patient is fully up to date on all recommended vaccinations. Additional immunization records or a more comprehensive vaccination history would be needed to answer this question definitively.

Recommendation:
- Review the patient's full immunization record or obtain further details to assess if all age-appropriate vaccines are current.
```

"The available immunization record only documents that the patient received a seasonal influenza vaccine at a previous visit. There is no information provided about other routine adult vaccinations (such as Tdap, pneumococcal, shingles, COVID-19, or others).\n\nBased on the data available, I cannot confirm whether the patient is fully up to date on all recommended vaccinations. Additional immunization records or a more comprehensive vaccination history would be needed to answer this question definitively.\n\nRecommendation:\n- Review the patient's full immunization record or obtain further details to assess if all age-appropriate vaccines are current."

In [21]:

# Test with both de-identification models separately
test_patient_id = "1329b83e-ea69-d184-b4af-0d2a8e07896e"
query = sample_queries[3]

print("🔍 Testing Fine-tuned Gemma model:")
test_query(query, test_patient_id, "finetuned")

print("\n" + "="*60 + "\n")

print("🔍 Testing Basic Gemma model:")
test_query(query, test_patient_id, "basic")

🔍 Testing Fine-tuned Gemma model:
✅ Clinical Assistant initialized with finetuned de-identification model!


### 🩺 Clinical Query: The patient wants to try and get pregnant, is there any relevant medical history or medications that should be considered?

**Patient ID:** 1329b83e-ea69-d184-b4af-0d2a8e07896e
**De-identification Model:** Fine-tuned Gemma

---

### 🤖 Assistant Response

Executing tool call: get_patient_conditions


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


📄 Original note: Patient: Ms. Yaeko Ming Kshlerin, born on 06/07/1999 in Oakes, North Dakota (SSN: 999‑26‑7676), white, non‑Hispanic, female, presented to TOWNER COUNTY MEDICAL CENTER on 12/14/2015 for a medication review due (situation) at age 16; she resides at 523 O'Kon Orchard, Cando, North Dakota 58324 (Towner County, FIPS 38095, latitude 48.4746068, longitude ‑99.2336557) and holds driver’s license S99963354 and passport X23683209X. Dr. Jarvis Ankunding, a general practitioner, recorded the encounter as an outpatient check‑up (procedure) with a base cost of $96.45, a total claim of $1,257.75 fully covered by the payer, while her cumulative healthcare expenses amount to $127,546.31 against coverage of $673,780.87. The encounter was classified under SNOMED‑CT code 314529007 (Medication review due) and was documented as an outpatient encounter with a provider specialty of general practice. Follow‑up was scheduled for a future date to review medication adherence and any new concerns.

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


📄 Original note: Ms. Yaeko Ming Kshlerin, Ms. born 06/07/1999 (age 23 at the start of her medication), a white, non‑Hispanic female from Oakes, North Dakota, presented for an outpatient check‑up at TOWNER COUNTY MEDICAL CENTER (HWY 281 N, Cando, ND 58324) under the care of Dr. Jarvis Ankunding, general practice. She received Acetaminophen 300 mg / Hydrocodone Bitartrate 5 mg oral tablets (pharmacy code 856987) from 05/09/2023 04:52:06 to 09/05/2023 04:52:06, with three dispenses totalling $97.29 (base cost $32.43, fully covered by payer), and the encounter itself cost $96.45 (claim cost $1,053.14, fully covered). Her home address is 523 O'Kon Orchard, Cando, ND 58324 (lat 48.4746, lon -99.2337), she resides in Towner County (FIPS 38095), and her SSN is 999‑26‑7676, driver’s license S99963354, passport X23683209X.  Overall, her healthcare expenses total $127,546.31 against coverage of $673,780.87, with an annual income of $63,061.
🔒 Redaction complete (Fine-tuned Gemma). PHI has been re

```
Summary of findings:
- Medical conditions: The available data does not specify any chronic medical conditions or diagnoses relevant to pregnancy (such as diabetes, hypertension, thyroid disease, or reproductive disorders).
- Medications: The patient has previously received Acetaminophen 300 mg / Hydrocodone Bitartrate 5 mg oral tablets. This is a combination opioid analgesic, which is not recommended during pregnancy due to potential risks to the fetus (including neonatal opioid withdrawal syndrome and possible teratogenicity).
- Allergies: The patient has a history of moderate rhinoconjunctivitis and mild skin eruptions, suggesting possible allergic tendencies, but no specific drug allergies are documented.

Clinical reasoning:
- If the patient is still taking Acetaminophen/Hydrocodone, this medication should be reviewed and ideally discontinued or substituted with a safer alternative prior to conception.
- Allergic tendencies may be relevant if considering medications or supplements during pregnancy.
- No other significant medical history is documented that would directly impact pregnancy planning.

Recommendations:
1. Review current medication use—if the patient is still on opioid analgesics, discuss tapering and alternative pain management strategies.
2. Consider preconception counseling, including review of immunization status, folic acid supplementation, and screening for other medical conditions not listed here.
3. Monitor for allergic reactions if new medications or prenatal vitamins are started.

Limitations:
- The data does not provide a comprehensive list of all medical conditions or a current medication list. Additional information about current health status, menstrual/reproductive history, and other medications would be helpful.

If you need a more detailed assessment, please provide or request additional information regarding current medications, medical conditions, or recent laboratory results.
```



🔍 Testing Basic Gemma model:
✅ Clinical Assistant initialized with basic de-identification model!


### 🩺 Clinical Query: The patient wants to try and get pregnant, is there any relevant medical history or medications that should be considered?

**Patient ID:** 1329b83e-ea69-d184-b4af-0d2a8e07896e
**De-identification Model:** Basic Gemma

---

### 🤖 Assistant Response

Executing tool call: get_patient_conditions


Llama.generate: 3 prefix-match hit, remaining 450 prompt tokens to eval
llama_perf_context_print:        load time =   22604.91 ms
llama_perf_context_print: prompt eval time =    6942.13 ms /   450 tokens (   15.43 ms per token,    64.82 tokens per second)
llama_perf_context_print:        eval time =   16781.24 ms /   231 runs   (   72.65 ms per token,    13.77 tokens per second)
llama_perf_context_print:       total time =   23865.82 ms /   681 tokens
Llama.generate: 111 prefix-match hit, remaining 413 prompt tokens to eval


📄 Original note: Patient: Ms. Yaeko Ming Kshlerin, born on 06/07/1999 in Oakes, North Dakota (SSN: 999‑26‑7676), white, non‑Hispanic, female, presented to TOWNER COUNTY MEDICAL CENTER on 12/14/2015 for a medication review due (situation) at age 16; she resides at 523 O'Kon Orchard, Cando, North Dakota 58324 (Towner County, FIPS 38095, latitude 48.4746068, longitude ‑99.2336557) and holds driver’s license S99963354 and passport X23683209X. Dr. Jarvis Ankunding, a general practitioner, recorded the encounter as an outpatient check‑up (procedure) with a base cost of $96.45, a total claim of $1,257.75 fully covered by the payer, while her cumulative healthcare expenses amount to $127,546.31 against coverage of $673,780.87. The encounter was classified under SNOMED‑CT code 314529007 (Medication review due) and was documented as an outpatient encounter with a provider specialty of general practice. Follow‑up was scheduled for a future date to review medication adherence and any new concerns.

llama_perf_context_print:        load time =   22604.91 ms
llama_perf_context_print: prompt eval time =    4051.69 ms /   413 tokens (    9.81 ms per token,   101.93 tokens per second)
llama_perf_context_print:        eval time =   23200.34 ms /   330 runs   (   70.30 ms per token,    14.22 tokens per second)
llama_perf_context_print:       total time =   27459.69 ms /   743 tokens
Llama.generate: 111 prefix-match hit, remaining 307 prompt tokens to eval


📄 Original note: Ms. Yaeko Ming Kshlerin, Ms. born 06/07/1999 (age 23 at the start of her medication), a white, non‑Hispanic female from Oakes, North Dakota, presented for an outpatient check‑up at TOWNER COUNTY MEDICAL CENTER (HWY 281 N, Cando, ND 58324) under the care of Dr. Jarvis Ankunding, general practice. She received Acetaminophen 300 mg / Hydrocodone Bitartrate 5 mg oral tablets (pharmacy code 856987) from 05/09/2023 04:52:06 to 09/05/2023 04:52:06, with three dispenses totalling $97.29 (base cost $32.43, fully covered by payer), and the encounter itself cost $96.45 (claim cost $1,053.14, fully covered). Her home address is 523 O'Kon Orchard, Cando, ND 58324 (lat 48.4746, lon -99.2337), she resides in Towner County (FIPS 38095), and her SSN is 999‑26‑7676, driver’s license S99963354, passport X23683209X.  Overall, her healthcare expenses total $127,546.31 against coverage of $673,780.87, with an annual income of $63,061.
🔒 Redaction complete (Basic Gemma). PHI has been removed

llama_perf_context_print:        load time =   22604.91 ms
llama_perf_context_print: prompt eval time =    3321.41 ms /   307 tokens (   10.82 ms per token,    92.43 tokens per second)
llama_perf_context_print:        eval time =   16653.61 ms /   209 runs   (   79.68 ms per token,    12.55 tokens per second)
llama_perf_context_print:       total time =   20102.28 ms /   516 tokens


📄 Original note: Patient: Ms. Yaeko Ming Kshlerin, SSN: 999-26-7676, born on 1999‑06‑07 in Oakes, North Dakota, presented to TOWNER COUNTY MEDICAL CENTER INC (HWY 281N, CANDO, ND 58324) on 2000‑11‑20 for an encounter for problem (procedure) related to allergic disposition; she reports a lifelong allergy to animal dander with moderate rhinoconjunctivitis and mild skin eruptions, and she is currently under the care of Dr. Shiloh Larson, general practice.  
The visit was classified as ambulatory, with a base encounter cost of $96.45 and a total claim cost of $483.55; payer coverage was $0.00, leaving her responsible for the full cost, while her total healthcare expenses amount to $127,546.31 against a coverage pool of $673,780.87, and her annual income is $63,061.  
Ms. Kshlerin resides at 523 O'Kon Orchard, Cando, ND 58324 (Towner County, FIPS 38095), and is a white, non‑Hispanic female with no recorded marital status.
🔒 Redaction complete (Basic Gemma). PHI has been removed from the cli

```
Summary of findings:
- Medical history: No specific chronic medical conditions or diagnoses relevant to pregnancy were identified in the available data.
- Medications: The patient was prescribed Acetaminophen 300 mg / Hydrocodone Bitartrate 5 mg oral tablets from May to September 2023. There is no evidence of ongoing use or other current medications.
- Allergies: The patient has a lifelong allergy to animal dander, causing moderate rhinoconjunctivitis and mild skin eruptions. No medication allergies are documented.

Clinical considerations:
- Hydrocodone/acetaminophen is not recommended during pregnancy due to potential risks of opioid exposure to the fetus. However, the prescription ended in September 2023, so unless the patient is still taking this medication, it should not impact current pregnancy planning.
- No other medications or medical conditions were identified that would directly impact fertility or pregnancy safety.
- No medication allergies are documented, but animal dander allergy is noted (not relevant to pregnancy).

Recommendations:
- Confirm that the patient is not currently taking hydrocodone/acetaminophen or any other potentially teratogenic medications.
- If she is taking any other medications or has additional medical history not captured here, further review is warranted.
- Routine preconception counseling is advised, including folic acid supplementation, review of immunization status, and assessment for any chronic conditions not listed here.

If more detailed or recent medication or condition data is needed, please specify.
```

'Summary of findings:\n- Medical history: No specific chronic medical conditions or diagnoses relevant to pregnancy were identified in the available data.\n- Medications: The patient was prescribed Acetaminophen 300 mg / Hydrocodone Bitartrate 5 mg oral tablets from May to September 2023. There is no evidence of ongoing use or other current medications.\n- Allergies: The patient has a lifelong allergy to animal dander, causing moderate rhinoconjunctivitis and mild skin eruptions. No medication allergies are documented.\n\nClinical considerations:\n- Hydrocodone/acetaminophen is not recommended during pregnancy due to potential risks of opioid exposure to the fetus. However, the prescription ended in September 2023, so unless the patient is still taking this medication, it should not impact current pregnancy planning.\n- No other medications or medical conditions were identified that would directly impact fertility or pregnancy safety.\n- No medication allergies are documented, but anim