**Project name:** *AEGIS*

**Team Name:** *Black Cat* 

**Team Members:** *Sumit Singh - Vibecoder and AI/ML Engineer*


**Problem statement**
*In many rural or under-resourced healthcare settings, doctors lack immediate support to comprehensively interpret patient history, potential complications, or diagnostic clues. This can lead to missed or delayed diagnoses, worsening patient outcomes. I aim to bridge this gap with an AI-assisted voice companion that helps doctors analyze patient cases in real-time, ensuring they don‚Äôt miss key details‚Äîeven in resource-limited environments.*

**Overall solution**
*To solve this, I built AEGIS‚Äîa multi-agent AI operating system that fits right into a doctor's pocket. By utilizing Google's **Med-Gemma 1.5 (4B)** model, I created "Suvi", a conversational AI Clinical Nurse.* 

*The effective use of this medical-grade model lies in its Agent-to-Agent (A2A) architecture. Instead of typing into clunky EHR systems, doctors can use Voice-to-Text to consult with Suvi hands-free while actively examining a patient. Suvi uses Retrieval-Augmented Generation (RAG) to instantly pull the patient's past medical history from our database and cross-reference it with current symptoms.* 

Beyond just chat, the solution includes:
* **üëÅÔ∏è Aegis-Vision:** A computer vision agent that uses a quick face scan to instantly securely retrieve a patient's medical file.
* **üó£Ô∏è Suvi Voice:** The core clinical reasoning agent that acts as a real-time sounding board for the doctor.
* **‚úçÔ∏è Aegis-Scribe:** A background agent that takes the unstructured voice transcript and automatically generates a structured, ready-to-sign Word Document report, eliminating administrative overhead.

**Technical details**
*Product feasibility and deployment in low-resource environments were my top priorities. Building a massive AI app is useless if a rural clinic can't afford the hardware to run it.* 

*That is why I specifically engineered the backend to utilize the **Med-Gemma 4B** model rather than the heavier 27B version. The 4B model runs flawlessly on a single, low-cost T4 GPU (which I hosted via Kaggle for this build), making cloud-hosting financially viable for under-resourced clinics.* 

**The Tech Stack:**
* **Frontend:** Built in Flutter, ensuring it runs smoothly on standard, low-cost Android tablets or smartphones already present in clinics.
* **Backend:** Python and FastAPI, exposed via Ngrok for seamless mobile-to-cloud communication.
* **Database:** Supabase (PostgreSQL) handles our vector database for storing patient records, ensuring lightning-fast RAG retrieval.
* **AI Protocol:** Because Med-Gemma is a reasoning model, it naturally outputs its internal "scratchpad" thoughts. To maintain a professional UI, I implemented a strict Regex-based preprocessing layer that traps and scrubs the model's `<think>...</think>` tags on the backend. The doctor only ever sees and hears the clean, final clinical advice.

In [1]:
#  CELL 1: ENVIRONMENT SETUP & HARDWARE OPTIMIZATION
# ==============================================================================
import os
import sys
import warnings

warnings.filterwarnings("ignore")
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"

print(" INITIALIZING AEGIS BACKEND ENVIRONMENT...")

!pip install -q -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
!pip install -q -U transformers accelerate bitsandbytes peft
!pip install -q -U fastapi uvicorn python-multipart nest_asyncio pyngrok
!pip install -q -U sentence-transformers supabase
!pip install -q face_recognition opencv-python-headless numpy

print(" Dependencies Installed.")

import torch
print(f"\n HARDWARE ACCELERATION STATUS:")
if torch.cuda.is_available():
    gpu_count = torch.cuda.device_count()
    print(f"   ‚Ä¢ CUDA Available: YES")
    print(f"   ‚Ä¢ GPU Count: {gpu_count}")
    for i in range(gpu_count):
        print(f"   ‚Ä¢ GPU {i}: {torch.cuda.get_device_name(i)} ({round(torch.cuda.get_device_properties(i).total_memory/1e9, 2)} GB VRAM)")
else:
    print(" CRITICAL ERROR: No GPU detected. Change Accelerator to 'GPU T4 x2'.")

 INITIALIZING AEGIS BACKEND ENVIRONMENT...
 Dependencies Installed.

 HARDWARE ACCELERATION STATUS:
   ‚Ä¢ CUDA Available: YES
   ‚Ä¢ GPU Count: 2
   ‚Ä¢ GPU 0: Tesla T4 (15.64 GB VRAM)
   ‚Ä¢ GPU 1: Tesla T4 (15.64 GB VRAM)


##  Global Configuration & Database Initialization
This cell establishes connections to Supabase (Vector DB) and Hugging Face.

In [2]:
#  CELL 2: CREDENTIALS, SUPABASE & EMBEDDINGS
# ==============================================================================
from kaggle_secrets import UserSecretsClient
from huggingface_hub import login
from supabase import create_client
from sentence_transformers import SentenceTransformer


try:
    user_secrets = UserSecretsClient()
    HF_TOKEN = user_secrets.get_secret("HF_TOKEN")
    NGROK_TOKEN = user_secrets.get_secret("NGROK_TOKEN")
    login(token=HF_TOKEN)
    print(" Authenticated with Hugging Face.")
except Exception as e:
    print(f" CREDENTIAL WARNING: Could not load secrets.\nError: {e}")

SUPABASE_URL = "https://ceesnsewtbkouxjnzwqc.supabase.co" 
SUPABASE_KEY = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6ImNlZXNuc2V3dGJrb3V4am56d3FjIiwicm9sZSI6ImFub24iLCJpYXQiOjE3NzEzNzMyMzgsImV4cCI6MjA4Njk0OTIzOH0.19fxCmQ4iStEJo0_tq5j2PDtxInIVKLLFfTZMcfMq94"

try:
    supabase = create_client(SUPABASE_URL, SUPABASE_KEY)
    embedder = SentenceTransformer('all-mpnet-base-v2') 
    print(" Database & Embedding Engine Online.")
except Exception as e:
    print(f" DATABASE CONNECTION ERROR: {e}")

 Authenticated with Hugging Face.


Loading weights:   0%|          | 0/199 [00:00<?, ?it/s]

[1mMPNetModel LOAD REPORT[0m from: sentence-transformers/all-mpnet-base-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

[3mNotes:
- UNEXPECTED[3m	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.[0m


 Database & Embedding Engine Online.


##  The LLM Engine (Med-Gemma)
Loading the LLM into VRAM using 4-bit quantization and auto-sharding across GPUs.

In [3]:
# CELL 3: MODEL LOADER 
# ==============================================================================
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

# ------------------------------------------------------------------------------
# CONFIGURATION: CHOOSE YOUR Model
# ------------------------------------------------------------------------------
# FOR JUDGES: To test maximum reasoning capabilities, uncomment the 27B model below. 
# Requires A100 (40GB) VRAM.
# TARGET_MODEL = "google/medgemma-27b-it" 

TARGET_MODEL = "google/medgemma-1.5-4b-it" 

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

def load_medical_brain(model_id):
    try:
        print(f"\n Loading Tokenizer & Weights for {model_id}...")
        tokenizer = AutoTokenizer.from_pretrained(model_id, token=HF_TOKEN)
        model = AutoModelForCausalLM.from_pretrained(
            model_id,
            quantization_config=bnb_config,
            device_map="auto", 
            token=HF_TOKEN,
            trust_remote_code=True
        )
        return model, tokenizer, True
    except Exception as e:
        print(f"\n ERROR LOADING {model_id}: {e}")
        return None, None, False

model, tokenizer, success = load_medical_brain(TARGET_MODEL)

if success:
    print(f"\n SYSTEM ONLINE. VRAM Footprint: {model.get_memory_footprint() / 1e9:.2f} GB")


 Loading Tokenizer & Weights for google/medgemma-1.5-4b-it...


Loading weights:   0%|          | 0/883 [00:00<?, ?it/s]


 SYSTEM ONLINE. VRAM Footprint: 3.17 GB


##  Specialized Agents & Tools
This defines the distinct tools available to our Orchestrator:
1. RAG Search
2. Vitals Updating
3. Face Identification (Aegis-Vision)
4. Report Generation (Aegis-Scribe)

In [4]:
#  CELL 4: SPECIALIZED AGENTS & TOOLS
# ==============================================================================
import face_recognition
import numpy as np
import cv2
import re

# --- TOOL 1: RAG MEMORY RECALL ---
def search_patient_history(patient_id, query):
    query_vector = embedder.encode(query).tolist()
    response = supabase.rpc(
        'match_clinical_records',
        {
            'query_embedding': query_vector, 
            'match_threshold': 0.1, 
            'match_count': 3,
            'filter_patient_id': int(patient_id)
        }
    ).execute()
    
    if not response.data:
        return "No relevant history found."
        
    context_str = ""
    for idx, record in enumerate(response.data):
        text = record.get('content', record.get('content_markdown', ''))
        context_str += f"--- RECORD {idx+1} ---\n{text}\n"
    return context_str

# --- TOOL 2: VITALS UPDATER ---
def update_patient_vitals(patient_id, field, value):
    try:
        valid_fields = ['weight_kg', 'height_cm', 'blood_group', 'allergies']
        if field not in valid_fields:
            return f"Error: Cannot update '{field}'."
        supabase.table('patients').update({field: value}).eq('id', patient_id).execute()
        return f"Success. {field} updated."
    except Exception as e:
        return f"Database Error: {e}"

# --- TOOL 3: AEGIS-VISION (FACE ID) ---
def identify_patient_from_image(image_bytes):
    try:
        # Convert bytes to OpenCV format
        nparr = np.frombuffer(image_bytes, np.uint8)
        img = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
        rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        
        # Extract embeddings
        encodings = face_recognition.face_encodings(rgb_img)
        if len(encodings) == 0:
            return {"status": "failed", "message": "No face detected in image."}
            
        query_embedding = encodings[0].tolist()
        
        # Match in Supabase
        response = supabase.rpc(
            'match_patient_face', 
            {'query_embedding': query_embedding, 'match_threshold': 0.90, 'match_count': 1}
        ).execute()
        
        data = response.data
        if len(data) > 0:
            return {"status": "success", "patient_id": data[0]['id'], "name": data[0]['full_name']}
        else:
            return {"status": "failed", "message": "Patient face not found in database."}
    except Exception as e:
        return {"status": "error", "message": str(e)}

# --- TOOL 4: AEGIS-SCRIBE (REPORT GENERATOR) ---
def generate_clinical_report(transcript):
    prompt = f"""You are Aegis-Scribe, a strict medical reporting AI. 
Extract the clinical findings from the following transcript and format them professionally. 
Do not include conversational filler, greetings, or your internal thoughts.

[TRANSCRIPT BEGIN]
{transcript}
[TRANSCRIPT END]

FORMAT REQUIRED:
Provide a clear, structured summary including: Chief Complaint, Findings, and Recommendations."""
    
    full_prompt = f"<start_of_turn>user\n{prompt}<end_of_turn>\n<start_of_turn>model\n"
    inputs = tokenizer(full_prompt, return_tensors="pt").to("cuda")
    
    with torch.no_grad():
        outputs = model.generate(**inputs, max_new_tokens=400, temperature=0.1)
        
    return tokenizer.decode(outputs[0], skip_special_tokens=True).split("model\n")[-1].strip()

# --- UTILITY: TEXT CLEANER ---
def clean_llm_output(raw_text):
    cleaned = raw_text
    
    # NEW: Strip out EVERYTHING inside <think>...</think> tags (spanning multiple lines)
    cleaned = re.sub(r'<think>.*?</think>', '', cleaned, flags=re.DOTALL | re.IGNORECASE)
    
    # 1. Strip out the specific <unused94>thought blocks and EVERYTHING until a double newline
    cleaned = re.sub(r'<[^>]*>thought[\s\S]*?(?=\n\n|\Z)', '', cleaned, flags=re.IGNORECASE)
    
    # 2. Strip out numbered reasoning lists (e.g., "1. Identify the persona...")
    cleaned = re.sub(r'(?i)(?:1\.\s+Identify|Identify the persona|Identify the input)[\s\S]*?(?=\n\n|\Z)', '', cleaned)
    
    # 3. Remove any remaining XML tags
    cleaned = re.sub(r'<[^>]*>', '', cleaned)
    
    # 4. Remove markdown and the word "thought" if it leaked
    cleaned = cleaned.replace('*', '').replace('#', '').replace('`', '')
    if cleaned.lower().startswith("thought\n"):
        cleaned = cleaned[8:]
        
    # 5. Prevent the specific system action loop if it leaked
    cleaned = cleaned.replace("[SYSTEM ACTION] Acknowledged.", "")
        
    return cleaned.strip()

##  SUVI Orchestrator Core
The main logic loop that decides whether to search the database, update records, or just chat.

In [5]:

#  CELL 5: SUVI ORCHESTRATOR 
# ==============================================================================

def run_suvi_agent(patient_id, user_text, image_context=""):
    tool_result = ""
    
    # 1. Intent Classification: Update Check
    update_match = re.search(r"(update|change|set)\s+(weight|height|blood|allergy)\s+(?:to\s+)?(.+)", user_text, re.IGNORECASE)
    
    if update_match:
        field_map = {"weight": "weight_kg", "height": "height_cm", "blood": "blood_group", "allergy": "allergies"}
        db_field = field_map.get(update_match.group(2).lower())
        value = update_match.group(3).strip()
        
        action_status = update_patient_vitals(patient_id, db_field, value)
        tool_result = f"[SYSTEM: Vitals updated successfully.]"
        
    else:
        # 2. Intent Classification: RAG Search
        history = search_patient_history(patient_id, user_text)
        tool_result = f"[MEDICAL CONTEXT]:\n{history}"

    # 3. LLM Generation - STRICT PROMPT
    # FIXED: Replaced {context_block} with {tool_result}
    system_prompt = f"""You are Suvi, a Thoughtful, Intelligent and kind Ai Assistant for doctore.
{tool_result}
{image_context}

CRITICAL INSTRUCTION: You are a reasoning model. You MUST wrap all of your internal thoughts, reasoning, and planning inside <think> and </think> tags. 
After the </think> tag, write your final, professional response to the doctor."""

    messages = [
        {"role": "user", "content": f"{system_prompt}\n\nDoctor says: {user_text}"}
    ]
    
    full_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(full_prompt, return_tensors="pt").to("cuda")
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs, 
            max_new_tokens=500, 
            do_sample=True, 
            temperature=0.1,          
            repetition_penalty=1.15,  
            pad_token_id=tokenizer.eos_token_id
        )
        
    # Extract ONLY the newly generated tokens
    generated_tokens = outputs[0][inputs['input_ids'].shape[1]:]
    raw_response = tokenizer.decode(generated_tokens, skip_special_tokens=True).strip()
    
    return clean_llm_output(raw_response)

## FastAPI & Ngrok Server
Exposes our agents to the Flutter frontend via REST API endpoints.

In [6]:
# CELL 6: FASTAPI SERVER & ENDPOINTS
# ==============================================================================
import uvicorn
import nest_asyncio
from fastapi import FastAPI, UploadFile, File, Form
from fastapi.middleware.cors import CORSMiddleware
from pyngrok import ngrok
import threading

app = FastAPI(title="AEGIS Medical API")
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"])

# ENDPOINT 1: SUVI CHAT
# ---------------------------------------------------------
@app.post("/suvi/chat")
async def chat_endpoint(text: str = Form(...), patient_id: int = Form(...), file: UploadFile = File(None)):
    try:
        image_context = ""
        if file:
            image_context = "[SYSTEM: User attached an image for review. Proceed carefully.]"
            
        response = run_suvi_agent(patient_id, text, image_context)
        return {"response": response}
    except Exception as e:
        return {"response": f"System Error: {str(e)}"}

# ENDPOINT 2: AEGIS-VISION (FACE ID)
# ---------------------------------------------------------
@app.post("/suvi/identify_face")
async def identify_face(file: UploadFile = File(...)):
    try:
        image_bytes = await file.read()
        result = identify_patient_from_image(image_bytes)
        return result
    except Exception as e:
        return {"status": "error", "message": str(e)}

# ENDPOINT 3: AEGIS-SCRIBE (REPORTING)
# ---------------------------------------------------------
@app.post("/suvi/generate_report")
async def generate_report(transcript: str = Form(...)):
    try:
        report = generate_clinical_report(transcript)
        return {"status": "success", "report": report}
    except Exception as e:
        return {"status": "error", "message": str(e)}

# START SERVER
# ---------------------------------------------------------
def start_server():
    ngrok.kill()
    if NGROK_TOKEN:
        ngrok.set_auth_token(NGROK_TOKEN)
    
    tunnel = ngrok.connect(8000) 
    print(f" API IS LIVE! UPDATE NGROK URL IN FLUTTER APP:")
    print(f" {tunnel.public_url} ")
    
    config = uvicorn.Config(app, host="127.0.0.1", port=8000, log_level="error")
    server = uvicorn.Server(config)
    nest_asyncio.apply()
    server.run()

thread = threading.Thread(target=start_server)
thread.start()

##  API Health Check & Debugging
Run this cell to verify your database and local functions are working correctly before connecting the Flutter app.

In [7]:
#  CELL 7: DIAGNOSTIC TESTS
# ==============================================================================
import time
time.sleep(3) # Wait for server to boot

print("---  RUNNING DIAGNOSTICS ---")

# 1. Test Supabase Connection
res = supabase.table('patients').select('id, full_name').limit(1).execute()
if res.data:
    print(f" DB Connected. Found Patient: {res.data[0]['full_name']}")
else:
    print(" DB Empty or Connection Failed.")

# 2. Test Orchestrator Logic
test_response = run_suvi_agent(1, "What is my patient's name?")
print(f" SUVI Test Response: {test_response}")

print("---  READY FOR FLUTTER CONNECTION ---")

 API IS LIVE! UPDATE NGROK URL IN FLUTTER APP:
 https://camelia-apocynaceous-pendantly.ngrok-free.dev 
---  RUNNING DIAGNOSTICS ---
 DB Connected. Found Patient: Rajesh Kumar
 SUVI Test Response: Okay, here is the response:

Hello Doctor. Based on the information provided in the records, I understand you would like to know the patient's name. However, the records themselves do not include the patient's name. They refer to the patient generically as "Patient".

Could you please confirm the patient's name when we next discuss them? If there's anything else I can assist you with regarding these records, feel free to ask.
---  READY FOR FLUTTER CONNECTION ---
