Skip to content

Sheethaljoshi/AIDA

 
 

Repository files navigation

AIDA: Artificial Intelligence Diagnostic Assistant

AIDA is an AI-driven, multimodal healthcare assistant that combines live audio capture, facial-emotion analysis, retrieval-augmented generation over user EHR-like data, and a chat UI to provide context-aware medical guidance.

Key capabilities

  • Context-aware chat: Uses OpenAI Assistants with a vector store populated from the user’s historical conversations and metadata in MongoDB.
  • Voice-to-text: Periodic local audio capture and transcription via OpenAI Whisper, streamed into the conversation loop.
  • Affective signals: Frontend streams webcam frames via WebSocket; backend uses Hume Face model to compute emotion scores.
  • Session orchestration: Start/stop capture endpoints; “latest” polling for synchronized, near-real-time chat updates.
  • Personalization: User profile and medical history persisted in MongoDB, referenced inside RAG.

Architecture

flowchart LR
  subgraph Client[Next.js Frontend]
    UI[Chat UI / Pages] --> VC[VideoChat component]
    UI --> CH[ChatHistory component]
    UI --> Setup[Setup Form]
  end

  subgraph Backend[FastAPI Backend]
    API[/REST APIs/]:::api
    WS((WebSocket /ws)):::ws
    Poll[poll_transcript loop]:::svc
    RAG[OpenAI Assistants + Vector Store]:::ext
    STT[OpenAI Whisper]:::ext
    Hume[Hume Face Analysis]:::ext
    DB[(MongoDB)]:::db
  end

  VC -- send frames --> WS
  UI -- POST /get_answer --> API
  UI -- GET /latest --> API
  UI -- GET /get_history --> API
  UI -- GET /start-recording --> API
  UI -- GET /stop-recording --> API
  Setup -- localStorage profile --> UI

  API <--> DB
  API --> RAG
  Poll --> STT
  Poll --> RAG
  WS --> Hume

  classDef api fill:#e8f0fe,stroke:#5c6bc0;
  classDef ws fill:#e0f7fa,stroke:#00acc1;
  classDef db fill:#e8f5e9,stroke:#43a047;
  classDef ext fill:#fff3e0,stroke:#fb8c00;
  classDef svc fill:#f3e5f5,stroke:#8e24aa;
Loading

High-level data flow (voice → insight)

sequenceDiagram
  autonumber
  participant User
  participant Frontend as Next.js UI
  participant Backend as FastAPI
  participant Whisper as OpenAI Whisper
  participant Hume as Hume Face API
  participant Assist as OpenAI Assistants
  participant Mongo as MongoDB

  User->>Frontend: Start Session
  Frontend->>Backend: GET /start-recording
  Frontend->>Backend: WS /ws (send frame blobs every 2s)
  Backend->>Hume: analyze(face image)
  Hume-->>Backend: emotion scores

  loop every ~5s
    Frontend->>Backend: mic capture (MediaRecorder) [optional]
    Backend->>Whisper: transcribe(output.wav)
    Whisper-->>Backend: transcript
    Backend->>Assist: ask(question=transcript, tools=file_search)
    Assist->>Mongo: RAG over vector store (prior convos)
    Assist-->>Backend: assistant response
    Backend->>Frontend: GET /latest → {latestUser, latestBot}
  end

  User->>Frontend: Types question
  Frontend->>Backend: POST /get_answer
  Backend->>Assist: ask(question=input)
  Assist-->>Backend: answer
  Backend->>Mongo: push messages[], export vector store
  Backend-->>Frontend: {answer}

  Frontend->>Backend: GET /stop-recording
  Backend->>Assist: summarize(last convo → title)
  Backend->>Mongo: update title
Loading

Backend

  • Framework: FastAPI
  • File: backend/app.py
  • Responsibilities:
    • CORS, REST endpoints, WebSocket endpoint
    • Conversation creation, Q&A, titling, and RAG updates
    • Background transcript polling loop and “latest” cache
    • Vector store export on updates
    • Emotion analysis via Hume Face API

REST and WebSocket APIs

Method Path Purpose Input Output
POST /create_conversation/ Start a new conversation form: email, first_name, last_name unique_id (UUID, used as initial title)
POST /get_answer/ Get assistant answer for a question and persist to latest convo form: email, first_name, last_name, question { answer }
POST /update_conversation_title/ Summarize latest convo title using Assistant form: email, first_name, last_name { status, new_title }
GET /get_history Fetch list of past conversation dates and titles query: email, first_name, last_name [ { date, title } ]
GET /start-recording/ Begin background audio polling + transcript loop query: email, first_name, last_name 200
GET /stop-recording/ Stop polling and retitle latest convo query: email, first_name, last_name 200
GET /latest/ Poll latest user/bot messages produced by audio loop none { latestUser, latestBot, newText }
WS /ws Receive periodic webcam frames for emotion analysis binary blobs (jpeg/webm) server-side updates average sentiments

Notes:

  • The Hume emotion averages are maintained in-memory (sentiments, sentCount). The /avgs route exists but is commented out in app.py.
  • The WebSocket in videoProcess.py mirrors app.py but is not used when running app.py. Prefer the single FastAPI app in app.py.

Vector store and RAG strategy

  • export_and_upload_to_vector_store() exports all user documents from MongoDB to a JSON file and re-uploads it to a preconfigured vector store (vs_r70jSDRJR1LyHTCChmWyKTGd).
  • Each Q&A flow uses an Assistant configured with the file_search tool and bound to that vector store to ground answers on user-specific data.
  • Retitling (return_title) uses the same store to summarize the most recent convo.

Audio transcription loop

  • backend/wav.py captures 5s WAV segments using sounddevice and writes output.wav.
  • backend/whisper.py calls OpenAI Whisper (whisper-1) to get text; short fragments (<10 chars) are discarded.
  • poll_transcript appends transcripts until silence, then calls return_answer(transcript) and publishes to the latest cache for the frontend to poll.

Data model (MongoDB aida.users)

A single user document contains profile and conversational history. Conversations are appended and the latest is targeted for updates.

{
  "email": "johndoe@gmail.com",
  "first_name": "John",
  "last_name": "Doe",
  "basic_info": { "age": 32, "height": 178, "weight": 75, "sex": "Male" },
  "medical_history": [
    { "disease": "Hypertension", "severity": 2, "probability": 60 }
  ],
  "past_convos": [
    {
      "date": "09/12/25",
      "title": "a6b1-...-uuid-or-summary",
      "messages": [
        { "role": "user", "content": "I feel dizzy..." },
        { "role": "assistant", "content": "Given your history..." }
      ]
    }
  ]
}

Frontend

  • Framework: Next.js + React, styling with Tailwind CSS.
  • Key pages/components:
    • pages/index.js: Landing with feature navigation cards.
    • pages/chatbot.js: Main chat; posts to /get_answer/, polls /latest/, renders VideoChat and ChatHistory.
    • components/VideoChat.js: Initializes MediaRecorder, opens WebSocket to ws://localhost:8000/ws, sends frame blobs every 2s; start/stop session endpoints.
    • components/ChatHistory.js: Displays chat sessions (static sample; API call present but commented).
    • pages/setup.js + components/setup.js: Local-only user profile capture stored in localStorage.
    • components/results.js: Patient summary view, sentiment chart (static data unless /avgs is enabled).

Frontend data interactions

sequenceDiagram
  participant UI as Chat UI
  participant BE as FastAPI

  UI->>BE: POST /get_answer (question)
  BE-->>UI: { answer }
  Note over UI,BE: UI also polls GET /latest for voice-loop outputs

  UI->>BE: GET /start-recording (begin polling loop)
  UI->>BE: WS /ws (send frames)
  UI->>BE: GET /stop-recording (retitle latest)
Loading

Technical innovations

  • Unified RAG over user timeline: Exporting the entire user corpus to a vector store on each update ensures the Assistant’s file_search has consistent context across sessions without bespoke embedding code.
  • Hybrid interaction loop: Text chat and passive voice capture run concurrently; the frontend reconciles both streams into a single chat timeline.
  • Affective context: Real-time emotion inference from facial signals enables future adaptive responses (e.g., escalation on high distress).
  • Lightweight orchestration: Background polling loop with sounddevice avoids heavy streaming infra while still delivering incremental insights.

Local development

Prerequisites

  • Python 3.10+
  • Node.js 18+
  • MongoDB Atlas URI (or local MongoDB)
  • API keys: OpenAI, Hume

Environment

Create a .env at backend/.env with:

OPENAI_API_KEY=sk-...
WHISPER_KEY=sk-... # if using a distinct key variable for Whisper
MONGODB_URI=mongodb+srv://...
HUME_API_KEY=...
VECTOR_STORE_ID=vs_...

Update backend/app.py to read MONGODB_URI, VECTOR_STORE_ID, and Hume key from env (recommended). The current code includes hardcoded values—replace these with env lookups for security.

Run backend

cd backend
pip install -r requirements.txt  # create this file if missing; include fastapi, uvicorn, pymongo, python-dotenv, requests, sounddevice, scipy, hume, openai
uvicorn app:app --reload --host 127.0.0.1 --port 8000

Run frontend

cd frontend
npm install
npm run dev
# open http://localhost:3000

Security and privacy

  • Move credentials to environment variables; never commit keys.
  • Restrict CORS to known origins (currently http://localhost:3000).
  • Consider encrypting sensitive user fields at rest and applying field-level validation.
  • Add authentication/authorization; the example addresses a fixed demo user.
  • Handle PHI according to compliance needs; add consent and data retention policies.

Observability and reliability

  • Add structured logging and request IDs to backend.
  • Expose health/readiness probes for the FastAPI app.
  • Persist Hume emotion aggregates per session instead of in-memory if analytics are needed post-session.

Roadmap

  • Enable /avgs and integrate real emotion timelines in results.js.
  • Replace polling with server push (Server-Sent Events or WebSocket) for latest updates.
  • Implement real auth + per-user vector stores or namespaces.
  • Add rate limiting and input validation on all endpoints.
  • Migrate to streaming STT to reduce latency; add VAD.

Repository map

backend/
  app.py               # FastAPI app: REST, WS, RAG, polling
  wav.py               # Audio capture + WAV writer
  whisper.py           # Whisper transcription client
  videoProcess.py      # Separate FastAPI+Socket.IO demo (not primary)
frontend/
  pages/*.js           # Next.js pages (index, chatbot, setup, etc.)
  components/*.js      # VideoChat, ChatHistory, Results, etc.

Notes

  • This README reflects the current code: hardcoded URIs and IDs exist; replace with env-driven configuration for production.
  • The vector store ID and Hume key are placeholders if changed; ensure consistent configuration between code and environment.

About

In a world where medical information can often be overwhelming or unreliable, we wanted to create an intelligent assistant that could provide tailored, AI-driven medical assistance at any time, empowering users to make informed health decisions!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 87.1%
  • JavaScript 7.4%
  • Python 5.2%
  • Other 0.3%