Skip to content

Vikhas/MedsightAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ₯ MedsightAI β€” Real-Time Clinical Decision Assistant

Gemini Live Agent Challenge Hackathon β€” A multimodal AI assistant that helps doctors analyze patient symptoms during rounds using voice and vision.

Built with Gemini Google Cloud Python


πŸ“‹ Table of Contents


πŸ” Problem Statement

During hospital rounds, physicians must rapidly assess patient symptoms, recall drug interactions, reference clinical guidelines, and make critical decisions β€” often under time pressure with limited access to reference materials. Current tools require manual lookup and switching between multiple systems.

πŸ’‘ Solution

MedsightAI is a real-time clinical decision support assistant powered by Gemini Live API. A doctor opens the web app and:

  1. πŸ—£οΈ Speaks to the AI assistant naturally
  2. πŸ“Έ Shows symptoms (rash, wound, X-ray) through the webcam
  3. 🧠 AI analyzes both voice and image inputs in real-time
  4. πŸ”Š AI responds with spoken clinical insights
  5. ⚑ Doctor can interrupt at any time (barge-in support)

The AI provides differential diagnoses, severity assessments, drug interaction checks, and clinical guideline references β€” all through a natural voice conversation.


πŸ—οΈ Architecture

graph TD
    subgraph Client ["Browser (Web App)"]
        UI["UI (Vanilla JS)"]
        Mic["Mic (PCM Audio)"]
        Cam["Webcam (JPEG)"]
        Insights["Clinical Insights Panel"]
    end

    subgraph Backend ["FastAPI Backend (Cloud Run)"]
        Proxy["WebSocket Proxy"]
        GS["Gemini Live Session"]
        Tools["agent_tools.py"]
        OCR["OCR & Report Engine"]
    end

    subgraph Google ["Google Cloud & Gemini"]
        GLA["Gemini Live API (2.5 Flash Audio)"]
        VAI["Gemini 2.5 Flash (Symptom/OCR)"]
    end

    subgraph Ext ["External Services"]
        FDA["OpenFDA API"]
    end

    Mic -->|Audio Binary| Proxy
    Cam -->|JPEG Frames| Proxy
    Proxy <--> GS
    GS <-->|Bidirectional Stream| GLA
    GS --> Tools
    Tools -->|Differential Diagnosis| VAI
    Tools -->|Drug Interactions| FDA
    GS --> OCR
    OCR -->|Medical Reports| Insights
    Insights -.-> UI
Loading

πŸ› οΈ Tech Stack

Layer Technology
AI Model Gemini 2.5 Flash (Native Audio Dialog)
Live API Gemini Live API (WebSocket, real-time multimodal)
SDK Google GenAI Python SDK (google-genai)
Backend Python 3.11 + FastAPI + Uvicorn
Frontend Vanilla JS + Web Audio API + MediaDevices API
Styling Custom CSS (Dark Medical Theme)
Deployment Google Cloud Run
Container Docker
CI/CD Google Cloud Build

✨ Features

Multimodal Interaction

  • 🎀 Voice input β€” speak naturally to the AI
  • πŸ“Ή Webcam video β€” show symptoms, X-rays, wounds
  • πŸ”Š Voice output β€” AI responds with natural speech
  • ⚑ Barge-in β€” interrupt the AI at any time

πŸš€ The "Beyond Text" Factor

This project breaks the "text box" paradigm. It acts as a true Live Agent for clinical environments where hands are often sterilized or busy. The interaction is fully natural:

  • The agent "Sees, Hears, and Speaks" via Gemini's multimodal Live API.
  • Interruptions (barge-in) are handled gracefully (e.g., "Wait, they are allergic to penicillin").
  • Seamlessly weaves real-time video observation with medical fact-checking.

Clinical Agent Tools

  • πŸ”¬ Symptom Analysis β€” differential diagnoses from visual observation via Gemini Vision.
  • πŸ’Š Drug Interactions β€” safety checks leveraging the OpenFDA API.
  • πŸ“‹ Clinical Guidelines β€” evidence-based treatment protocols for 12+ major conditions.
  • ⚠️ Risk Assessment β€” validated NEWS2 (National Early Warning Score 2) calculation.
  • πŸ“„ Prescription OCR β€” Multi-modal insight from handwritten medical slips.
  • πŸ–¨οΈ Automated PDF Reports β€” summarized clinical prescriptions generated from live audio transcripts.

Premium UI

  • πŸŒ™ Dark medical theme with glassmorphism.
  • πŸ’¬ Concurrent streaming bubbles for Doctor & AI transcripts.
  • πŸ–¨οΈ Optimized print stylesheets for medical letterhead export.

🎬 Demo Scenario

Scene 1: Visual Symptom Analysis

Doctor opens MedsightAI and points the webcam at a rash.

Doctor: "MedsightAI, what do you think about this rash on the patient's forearm?"

MedsightAI: "I can see what appears to be an erythematous, raised rash on the forearm. Let me run an analysis..."

The Clinical Insights panel populates with:

  • Contact Dermatitis β€” 75% confidence
  • Cellulitis β€” 60% confidence
  • Recommended tests: Skin biopsy, CBC, IgE levels

Scene 2: Drug Interaction Check (Interruption)

Doctor interrupts: "Wait β€” the patient is allergic to penicillin. What antibiotics are safe?"

MedsightAI immediately stops speaking and responds:

MedsightAI: "Given the penicillin allergy, amoxicillin is contraindicated. Safe alternatives include azithromycin, doxycycline, or trimethoprim-sulfamethoxazole..."


πŸš€ Getting Started

Prerequisites

1. Clone the repository

git clone https://github.com/YOUR_USERNAME/medsight-ai.git
cd medsight-ai

2. Set up the backend

cd backend

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # macOS/Linux

# Install dependencies
pip install -r requirements.txt

# Set your API key in .env
cp .env.example .env

3. Run locally

python main.py

Navigate to http://localhost:8000 in your browser.


☁️ Deploying to Google Cloud

Prerequisites

Deploy with the script

export GEMINI_API_KEY=your_key_here
chmod +x infrastructure/deploy.sh
./infrastructure/deploy.sh my-project-id us-central1

πŸ“ Project Structure

medsight-ai/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py              # FastAPI server
β”‚   β”œβ”€β”€ gemini_live.py        # Gemini Live API wrapper
β”‚   β”œβ”€β”€ agent_tools.py        # Clinical reasoning tools (NEWS2, OpenFDA)
β”‚   β”œβ”€β”€ requirements.txt      # Python dependencies
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ index.html            # Main web page
β”‚   β”œβ”€β”€ css/styles.css        # Premium dark theme
β”‚   └── js/                   # WebSocket & Media handlers
β”œβ”€β”€ prompts/
β”‚   └── system_prompt.txt     # Clinical system prompt
β”œβ”€β”€ infrastructure/           # Cloud Run & Build configs
β”œβ”€β”€ Dockerfile                # Container definition
└── README.md

βš–οΈ Disclaimer

MedsightAI is a demonstration project built for the Gemini Live Agent Challenge hackathon. It is not a certified medical device and should not be used for actual clinical decision-making.


Built with ❀️ for the Gemini Live Agent Challenge
Powered by Google Gemini and Google Cloud

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors