GitHub - Sheethaljoshi/AIDA: In a world where medical information can often be overwhelming or unreliable, we wanted to create an intelligent assistant that could provide tailored, AI-driven medical assistance at any time, empowering users to make informed health decisions!

AIDA: Artificial Intelligence Diagnostic Assistant

AIDA is an AI-driven, multimodal healthcare assistant that combines live audio capture, facial-emotion analysis, retrieval-augmented generation over user EHR-like data, and a chat UI to provide context-aware medical guidance.

Key capabilities

Context-aware chat: Uses OpenAI Assistants with a vector store populated from the user’s historical conversations and metadata in MongoDB.
Voice-to-text: Periodic local audio capture and transcription via OpenAI Whisper, streamed into the conversation loop.
Affective signals: Frontend streams webcam frames via WebSocket; backend uses Hume Face model to compute emotion scores.
Session orchestration: Start/stop capture endpoints; “latest” polling for synchronized, near-real-time chat updates.
Personalization: User profile and medical history persisted in MongoDB, referenced inside RAG.

Architecture

flowchart LR
  subgraph Client[Next.js Frontend]
    UI[Chat UI / Pages] --> VC[VideoChat component]
    UI --> CH[ChatHistory component]
    UI --> Setup[Setup Form]
  end

  subgraph Backend[FastAPI Backend]
    API[/REST APIs/]:::api
    WS((WebSocket /ws)):::ws
    Poll[poll_transcript loop]:::svc
    RAG[OpenAI Assistants + Vector Store]:::ext
    STT[OpenAI Whisper]:::ext
    Hume[Hume Face Analysis]:::ext
    DB[(MongoDB)]:::db
  end

  VC -- send frames --> WS
  UI -- POST /get_answer --> API
  UI -- GET /latest --> API
  UI -- GET /get_history --> API
  UI -- GET /start-recording --> API
  UI -- GET /stop-recording --> API
  Setup -- localStorage profile --> UI

  API <--> DB
  API --> RAG
  Poll --> STT
  Poll --> RAG
  WS --> Hume

  classDef api fill:#e8f0fe,stroke:#5c6bc0;
  classDef ws fill:#e0f7fa,stroke:#00acc1;
  classDef db fill:#e8f5e9,stroke:#43a047;
  classDef ext fill:#fff3e0,stroke:#fb8c00;
  classDef svc fill:#f3e5f5,stroke:#8e24aa;

High-level data flow (voice → insight)

sequenceDiagram
  autonumber
  participant User
  participant Frontend as Next.js UI
  participant Backend as FastAPI
  participant Whisper as OpenAI Whisper
  participant Hume as Hume Face API
  participant Assist as OpenAI Assistants
  participant Mongo as MongoDB

  User->>Frontend: Start Session
  Frontend->>Backend: GET /start-recording
  Frontend->>Backend: WS /ws (send frame blobs every 2s)
  Backend->>Hume: analyze(face image)
  Hume-->>Backend: emotion scores

  loop every ~5s
    Frontend->>Backend: mic capture (MediaRecorder) [optional]
    Backend->>Whisper: transcribe(output.wav)
    Whisper-->>Backend: transcript
    Backend->>Assist: ask(question=transcript, tools=file_search)
    Assist->>Mongo: RAG over vector store (prior convos)
    Assist-->>Backend: assistant response
    Backend->>Frontend: GET /latest → {latestUser, latestBot}
  end

  User->>Frontend: Types question
  Frontend->>Backend: POST /get_answer
  Backend->>Assist: ask(question=input)
  Assist-->>Backend: answer
  Backend->>Mongo: push messages[], export vector store
  Backend-->>Frontend: {answer}

  Frontend->>Backend: GET /stop-recording
  Backend->>Assist: summarize(last convo → title)
  Backend->>Mongo: update title

Backend

Framework: FastAPI
File: backend/app.py
Responsibilities:
- CORS, REST endpoints, WebSocket endpoint
- Conversation creation, Q&A, titling, and RAG updates
- Background transcript polling loop and “latest” cache
- Vector store export on updates
- Emotion analysis via Hume Face API

REST and WebSocket APIs

Method	Path	Purpose	Input	Output
POST	`/create_conversation/`	Start a new conversation	form: `email, first_name, last_name`	`unique_id` (UUID, used as initial title)
POST	`/get_answer/`	Get assistant answer for a question and persist to latest convo	form: `email, first_name, last_name, question`	`{ answer }`
POST	`/update_conversation_title/`	Summarize latest convo title using Assistant	form: `email, first_name, last_name`	`{ status, new_title }`
GET	`/get_history`	Fetch list of past conversation dates and titles	query: `email, first_name, last_name`	`[ { date, title } ]`
GET	`/start-recording/`	Begin background audio polling + transcript loop	query: `email, first_name, last_name`	200
GET	`/stop-recording/`	Stop polling and retitle latest convo	query: `email, first_name, last_name`	200
GET	`/latest/`	Poll latest user/bot messages produced by audio loop	none	`{ latestUser, latestBot, newText }`
WS	`/ws`	Receive periodic webcam frames for emotion analysis	binary blobs (jpeg/webm)	server-side updates average sentiments

Notes:

The Hume emotion averages are maintained in-memory (sentiments, sentCount). The /avgs route exists but is commented out in app.py.
The WebSocket in videoProcess.py mirrors app.py but is not used when running app.py. Prefer the single FastAPI app in app.py.

Vector store and RAG strategy

export_and_upload_to_vector_store() exports all user documents from MongoDB to a JSON file and re-uploads it to a preconfigured vector store (vs_r70jSDRJR1LyHTCChmWyKTGd).
Each Q&A flow uses an Assistant configured with the file_search tool and bound to that vector store to ground answers on user-specific data.
Retitling (return_title) uses the same store to summarize the most recent convo.

Audio transcription loop

backend/wav.py captures 5s WAV segments using sounddevice and writes output.wav.
backend/whisper.py calls OpenAI Whisper (whisper-1) to get text; short fragments (<10 chars) are discarded.
poll_transcript appends transcripts until silence, then calls return_answer(transcript) and publishes to the latest cache for the frontend to poll.

Data model (MongoDB `aida.users`)

A single user document contains profile and conversational history. Conversations are appended and the latest is targeted for updates.

{
  "email": "johndoe@gmail.com",
  "first_name": "John",
  "last_name": "Doe",
  "basic_info": { "age": 32, "height": 178, "weight": 75, "sex": "Male" },
  "medical_history": [
    { "disease": "Hypertension", "severity": 2, "probability": 60 }
  ],
  "past_convos": [
    {
      "date": "09/12/25",
      "title": "a6b1-...-uuid-or-summary",
      "messages": [
        { "role": "user", "content": "I feel dizzy..." },
        { "role": "assistant", "content": "Given your history..." }
      ]
    }
  ]
}

Frontend

Framework: Next.js + React, styling with Tailwind CSS.
Key pages/components:
- pages/index.js: Landing with feature navigation cards.
- pages/chatbot.js: Main chat; posts to /get_answer/, polls /latest/, renders VideoChat and ChatHistory.
- components/VideoChat.js: Initializes MediaRecorder, opens WebSocket to ws://localhost:8000/ws, sends frame blobs every 2s; start/stop session endpoints.
- components/ChatHistory.js: Displays chat sessions (static sample; API call present but commented).
- pages/setup.js + components/setup.js: Local-only user profile capture stored in localStorage.
- components/results.js: Patient summary view, sentiment chart (static data unless /avgs is enabled).

Frontend data interactions

sequenceDiagram
  participant UI as Chat UI
  participant BE as FastAPI

  UI->>BE: POST /get_answer (question)
  BE-->>UI: { answer }
  Note over UI,BE: UI also polls GET /latest for voice-loop outputs

  UI->>BE: GET /start-recording (begin polling loop)
  UI->>BE: WS /ws (send frames)
  UI->>BE: GET /stop-recording (retitle latest)

Technical innovations

Unified RAG over user timeline: Exporting the entire user corpus to a vector store on each update ensures the Assistant’s file_search has consistent context across sessions without bespoke embedding code.
Hybrid interaction loop: Text chat and passive voice capture run concurrently; the frontend reconciles both streams into a single chat timeline.
Affective context: Real-time emotion inference from facial signals enables future adaptive responses (e.g., escalation on high distress).
Lightweight orchestration: Background polling loop with sounddevice avoids heavy streaming infra while still delivering incremental insights.

Local development

Prerequisites

Python 3.10+
Node.js 18+
MongoDB Atlas URI (or local MongoDB)
API keys: OpenAI, Hume

Environment

Create a .env at backend/.env with:

OPENAI_API_KEY=sk-...
WHISPER_KEY=sk-... # if using a distinct key variable for Whisper
MONGODB_URI=mongodb+srv://...
HUME_API_KEY=...
VECTOR_STORE_ID=vs_...

Update backend/app.py to read MONGODB_URI, VECTOR_STORE_ID, and Hume key from env (recommended). The current code includes hardcoded values—replace these with env lookups for security.

Run backend

cd backend
pip install -r requirements.txt  # create this file if missing; include fastapi, uvicorn, pymongo, python-dotenv, requests, sounddevice, scipy, hume, openai
uvicorn app:app --reload --host 127.0.0.1 --port 8000

Run frontend

cd frontend
npm install
npm run dev
# open http://localhost:3000

Security and privacy

Move credentials to environment variables; never commit keys.
Restrict CORS to known origins (currently http://localhost:3000).
Consider encrypting sensitive user fields at rest and applying field-level validation.
Add authentication/authorization; the example addresses a fixed demo user.
Handle PHI according to compliance needs; add consent and data retention policies.

Observability and reliability

Add structured logging and request IDs to backend.
Expose health/readiness probes for the FastAPI app.
Persist Hume emotion aggregates per session instead of in-memory if analytics are needed post-session.

Roadmap

Enable /avgs and integrate real emotion timelines in results.js.
Replace polling with server push (Server-Sent Events or WebSocket) for latest updates.
Implement real auth + per-user vector stores or namespaces.
Add rate limiting and input validation on all endpoints.
Migrate to streaming STT to reduce latency; add VAD.

Repository map

backend/
  app.py               # FastAPI app: REST, WS, RAG, polling
  wav.py               # Audio capture + WAV writer
  whisper.py           # Whisper transcription client
  videoProcess.py      # Separate FastAPI+Socket.IO demo (not primary)
frontend/
  pages/*.js           # Next.js pages (index, chatbot, setup, etc.)
  components/*.js      # VideoChat, ChatHistory, Results, etc.

Notes

This README reflects the current code: hardcoded URIs and IDs exist; replace with env-driven configuration for production.
The vector store ID and Hume key are placeholders if changed; ensure consistent configuration between code and environment.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
breast_cancer_implementation.ipynb		breast_cancer_implementation.ipynb
hackmit.yaml		hackmit.yaml
helpful.txt		helpful.txt
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIDA: Artificial Intelligence Diagnostic Assistant

Key capabilities

Architecture

High-level data flow (voice → insight)

Backend

REST and WebSocket APIs

Vector store and RAG strategy

Audio transcription loop

Data model (MongoDB `aida.users`)

Frontend

Frontend data interactions

Technical innovations

Local development

Prerequisites

Environment

Run backend

Run frontend

Security and privacy

Observability and reliability

Roadmap

Repository map

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AIDA: Artificial Intelligence Diagnostic Assistant

Key capabilities

Architecture

High-level data flow (voice → insight)

Backend

REST and WebSocket APIs

Vector store and RAG strategy

Audio transcription loop

Data model (MongoDB aida.users)

Frontend

Frontend data interactions

Technical innovations

Local development

Prerequisites

Environment

Run backend

Run frontend

Security and privacy

Observability and reliability

Roadmap

Repository map

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Data model (MongoDB `aida.users`)

Packages