MediVault AI — Offline Clinical Intelligence Platform

A fully offline clinical intelligence platform that captures doctor-patient consultations via browser-based audio recording or direct file upload, transcribes speech using a local Whisper ASR container, performs LLM-driven speaker diarization, and generates structured SOAP notes with ICD-10 and CPT billing codes — all without any cloud dependency. Approved notes are embedded into a persistent ChromaDB vector store, enabling retrieval-augmented clinical Q&A and PDF knowledge base ingestion entirely within a local Docker environment.

Project Overview

MediVault AI demonstrates how modern open-source AI components can be composed into a clinical documentation workflow that operates entirely on-premises. The platform accepts raw consultation audio, produces a structured SOAP note with billing codes, stores approved notes in a semantic vector database, and exposes a conversational clinical Q&A interface backed by retrieval-augmented generation — all without transmitting any data to external services.

This makes MediVault AI suitable for:

Clinical AI research — reference implementation of a full speech-to-note pipeline
Air-gapped environments — run fully offline with Ollama and locally hosted models
Clinical informatics engineering — integrate Whisper ASR, Ollama inference, Flowise chain orchestration, and ChromaDB vector storage
Healthcare AI prototyping — build and evaluate offline clinical documentation tooling

How It Works

The clinician records a consultation in the browser or uploads a WAV or MP3 file.
The React frontend sends the audio to the FastAPI backend.
The backend forwards the audio to the Whisper ASR container and receives a timestamped transcript.
The backend sends the transcript segments to Ollama for LLM-driven speaker diarization — each segment is labelled Doctor or Patient.
The diarized transcript is sent to the Flowise SOAP Generator, which invokes an LLMChain with a specialty-aware prompt via Ollama and returns a structured SOAP note.
The clinician reviews and edits the SOAP note, then requests ICD-10 and CPT billing codes from the backend.
After review, the clinician approves the note — the backend embeds it into ChromaDB using nomic-embed-text via the direct Python client.
Approved notes and uploaded clinical PDFs are immediately available to the Clinical QA system, which retrieves relevant passages and passes them to Ollama for grounded answers.

All inference runs through Ollama on the host machine. No data leaves the local environment.

Architecture

The application follows a modular five-service architecture. The React frontend communicates exclusively with the FastAPI backend. The backend delegates speech-to-text to the Whisper container, invokes Flowise LLM chains for SOAP generation, writes and queries embeddings directly to ChromaDB, and calls Ollama for diarization, billing codes, and clinical Q&A. Flowise flows are auto-provisioned at startup so the stack is fully operational on the first docker compose up.

Architecture Diagram

graph TB
    subgraph Client Layer
        UI[React UI<br/>port 3000]
    end

    subgraph Backend Layer
        API[FastAPI<br/>port 5001]
    end

    subgraph Flowise Orchestration Layer
        FW[Flowise<br/>port 3001]
    end

    subgraph Vector Store
        CDB[ChromaDB<br/>port 8100]
    end

    subgraph Speech Processing
        WH[Whisper ASR<br/>port 9000]
    end

    subgraph LLM Inference
        OL[Ollama<br/>host machine]
    end

    UI -->|HTTP / Axios| API
    API -->|Audio upload| WH
    API -->|Chain invocation| FW
    API -->|Direct Python client| CDB
    FW -->|LLM calls| OL
    API -->|Embeddings + diarization| OL

Architecture Components

Frontend (React + Vite)

Consultation Recorder — mode toggle between browser recording and file upload, specialty selector, diarized transcript view with Doctor/Patient colour-coded labels
SOAP Note Editor — human-in-the-loop review with editable sections, billing code generation, and approve-to-knowledge-base action
Clinical Chat — conversational Q&A with cited source documents
Knowledge Base — document list with PDF upload and document deletion
Nginx serves the production build and proxies all /api/ requests to the backend

Backend Services

API Server (server.py): FastAPI application with CORS middleware, request validation, and all route handlers
Whisper Client (services/whisper_client.py): Submits audio to the Whisper ASR container and returns timestamped segments
LLM Client (services/llm_client.py): Calls Ollama directly for speaker diarization, billing code generation, and clinical Q&A
Flowise Client (services/flowise_client.py): Invokes Flowise prediction and upsert endpoints
Flowise Provisioner (services/flowise_provisioner.py): Auto-creates the three Flowise flows at API startup if they do not already exist
Chroma Client (services/chroma_client.py): Writes and queries the clinical_kb ChromaDB collection using the direct Python client
PDF Service (services/pdf_service.py): Validates and extracts text from uploaded PDF files

External Integration

LLM inference: Ollama running natively on the host machine, accessed from containers via host.docker.internal:11434
LLM orchestration: Flowise running as a Docker service, auto-provisioned with three flows at startup
Vector store: ChromaDB running as a Docker service with a persistent named volume

Service Components

Service	Container	Host Port	Description
`medivault-api`	`medivault-api`	`5001`	FastAPI backend — transcription, SOAP generation, RAG, billing codes
`medivault-ui`	`medivault-ui`	`3000`	React frontend — served by Nginx, proxies `/api/` to the backend
`medivault-flowise`	`medivault-flowise`	`3001`	Flowise — LLM chain orchestration, auto-provisioned flows
`medivault-chromadb`	`medivault-chromadb`	`8100`	ChromaDB — persistent vector store for clinical knowledge base
`medivault-whisper`	`medivault-whisper`	`9000`	Whisper ASR — speech-to-text with timestamped segment output

Ollama is intentionally not a Docker service. Running Ollama inside Docker bypasses GPU acceleration. Ollama must run natively on the host so the backend and Flowise containers can reach it via host.docker.internal:11434.

Typical Flow

Clinician records or uploads consultation audio in the web UI.
The backend transcribes the audio via Whisper ASR and receives timestamped segments.
The backend calls Ollama to classify each segment as Doctor or Patient.
The diarized transcript is sent to the Flowise SOAP Generator — Flowise invokes the LLMChain via Ollama and returns structured SOAP JSON.
The clinician reviews the note and requests billing codes — the backend calls Ollama directly for ICD-10 and CPT suggestions.
The clinician approves the note — the backend embeds it into ChromaDB via the direct Python client.
The clinician asks a clinical question — the backend queries ChromaDB for relevant passages, passes them to Ollama, and returns a grounded answer with cited sources.

Get Started

Prerequisites

Before you begin, ensure you have the following installed and configured:

Docker and Docker Compose (v2)
- Install Docker
- Install Docker Compose
Ollama installed natively on the host machine with the required models:
- Install Ollama

ollama pull llama3.1:8b
ollama pull nomic-embed-text

Verify Installation

docker --version
docker compose version
docker ps
ollama list

Quick Start (Docker Deployment)

1. Clone the Repository

git clone https://github.com/cld2labs/MediVaultAI.git
cd MediVaultAI

2. Configure the Environment

cp .env.example .env

Open .env and confirm the Ollama and service URLs match your environment. See Environment Variables for all available settings.

3. Build and Start the Application

# Standard (attached)
docker compose up --build

# Detached (background)
docker compose up -d --build

4. Access the Application

Once containers are running:

Frontend UI: http://localhost:3000
Backend API: http://localhost:5001
API Docs (Swagger): http://localhost:5001/docs
Flowise Canvas: http://localhost:3001

5. Verify Services

# Health check
curl http://localhost:5001/health

# View running containers
docker compose ps

View logs:

# All services
docker compose logs -f

# Backend only
docker compose logs -f medivault-api

# Flowise only
docker compose logs -f medivault-flowise

6. Stop the Application

docker compose down

Local Development Setup

Run the backend and frontend directly on the host without Docker. Start the required containers first:

docker compose up medivault-chromadb medivault-whisper medivault-flowise

Backend (Python / FastAPI)

cd api
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp ../.env.example ../.env       # configure CHROMA_HOST=localhost, WHISPER_ENDPOINT=http://localhost:9000
uvicorn server:app --reload --port 5001

Frontend (Node / Vite)

cd ui
npm install
npm run dev

The Vite dev server proxies /api/ to http://localhost:5001. Open http://localhost:5173.

Project Structure

MediVaultAI/
├── api/                        # FastAPI backend
│   ├── config.py               # All environment-driven settings
│   ├── models.py               # Pydantic request/response schemas
│   ├── server.py               # FastAPI app, routes, and middleware
│   ├── services/
│   │   ├── chroma_client.py    # ChromaDB direct Python client
│   │   ├── flowise_client.py   # Flowise prediction and upsert
│   │   ├── flowise_provisioner.py  # Auto-provision flows at startup
│   │   ├── llm_client.py       # Ollama calls for diarization, billing, QA
│   │   ├── pdf_service.py      # PDF validation and text extraction
│   │   └── whisper_client.py   # Whisper ASR transcription
│   ├── Dockerfile
│   └── requirements.txt
├── ui/                         # React frontend
│   ├── src/
│   │   ├── App.jsx
│   │   ├── components/
│   │   │   ├── ClinicalChat.jsx
│   │   │   ├── ConsultationRecorder.jsx
│   │   │   ├── FlowCanvas.jsx
│   │   │   ├── Header.jsx
│   │   │   ├── KnowledgeBase.jsx
│   │   │   ├── LandingPage.jsx
│   │   │   ├── SoapNoteEditor.jsx
│   │   │   └── StatusBadge.jsx
│   │   └── main.jsx
│   ├── Dockerfile
│   └── nginx.conf
├── docs/
│   └── assets/                 # Documentation images
├── docker-compose.yaml         # Main orchestration file
├── .env.example                # Environment variable reference
├── CONTRIBUTING.md
├── DISCLAIMER.md
├── LICENSE.md
├── README.md
├── SECURITY.md
├── TERMS_AND_CONDITIONS.md
└── TROUBLESHOOTING.md

Usage Guide

Record or upload a consultation:

Open the application at http://localhost:3000.
Click Launch App from the landing page.
Select a clinical specialty from the dropdown.
Click Record to capture audio via the browser microphone, or click Upload to submit a WAV or MP3 file.
Submit the audio to trigger transcription.

Generate a SOAP note:

After transcription completes, review the diarized transcript — Doctor segments are shown in purple, Patient segments in cyan.
Click Generate SOAP Note.
The SOAP note (Subjective, Objective, Assessment, Plan) appears in the right panel with extracted keywords.
Edit any section in the human-in-the-loop editor before proceeding.

Generate billing codes:

After the SOAP note is displayed, click Generate Billing Codes.
Review the suggested ICD-10 diagnosis codes and CPT procedure codes.
All billing codes require clinician verification before use.

Approve to knowledge base:

After reviewing the SOAP note and billing codes, enter an optional patient reference.
Click Approve & Save.
The note is embedded into the ChromaDB clinical_kb collection and becomes immediately available in Clinical QA.

Clinical QA:

Open the Clinical Chat panel.
Enter any clinical question.
The backend retrieves semantically relevant passages from the knowledge base and passes them to Ollama.
The answer is displayed with cited source documents.

Knowledge base management:

Open the Knowledge Base panel.
Upload PDF clinical guidelines using the document upload control.
Remove any document by ID using the delete control.

Inference Metrics

The table below compares inference performance across different providers, deployment modes, and hardware profiles. The workload covers the full MediVault AI consultation pipeline: Whisper transcription, diarization, SOAP generation, and billing codes

Provider	Model	Deployment	Context Window	Avg Input Tokens	Avg Output Tokens	Avg Tokens / Request	P50 Latency (ms)	P95 Latency (ms)	Throughput (req/s)	Hardware
OpenAI (Cloud)	`gpt-4o-mini` + `whisper-1`	API (Cloud)	128K	558	310	867	9,500	171,900	0.006	N/A
Intel OPEA EI	`meta-llama/Llama-3.1-8B-Instruct` + `BAAI/bge-base-en-v1.5`	Enterprise (On-Prem)	128K	588	372	960	68,463	156,099	0.0035	CPU-only (Xeon)

Notes:

All metrics use the same MediVault AI workload and identical inputs (audio~1.9 min). Token counts may vary slightly per run due to non-deterministic model output.

OpenAI metrics are averaged over 5 zero-shot runs. P95 is elevated due to SOAP generation routing through a local Flowise intermediary

Model Capabilities

Meta-Llama-3.1-8B-Instruct

An 8-billion-parameter open-weight instruction-tuned model from Meta (July 2023 release), designed for on-prem and enterprise deployment.

Attribute	Details
Parameters	8.0B total
Architecture	Transformer with Grouped Query Attention (GQA) — 32 layers, 32 Q-heads / 8 KV-heads
Context Window	128,000 tokens (128K) native
Reasoning Mode	Standard instruction-following
Tool / Function Calling	Supported via structured prompts
Structured Output	JSON-structured responses supported
Multilingual	English-focused with multilingual capabilities
Benchmarks	MMLU: 73.0%, GSM8K: 84.4%, HumanEval: 72.6%
Quantization Formats	GGUF (Q4_K_M ~4.9 GB, Q8_0 ~8.5 GB), AWQ (int4), GPTQ (int4)
Inference Runtimes	Ollama, vLLM, llama.cpp, LMStudio, TGI
Fine-Tuning	Full fine-tuning and adapter-based (LoRA); community adapters available
License	Llama 3.1 Community License (permits commercial use with conditions)
Deployment	Local, on-prem, air-gapped, cloud — full data sovereignty

BAAI/bge-base-en-v1.5

A 109M-parameter English text embedding model from the Beijing Academy of Artificial Intelligence (BAAI), optimised for dense retrieval and semantic similarity tasks.

Attribute	Details
Parameters	109M total
Architecture	BERT-based bi-encoder
Embedding Dimension	768
Max Sequence Length	512 tokens
Task	Dense retrieval / semantic similarity
Benchmarks	MTEB (English) avg: 63.55
Quantization Formats	FP32, FP16, INT8 (ONNX)
Inference Runtimes	vLLM, Hugging Face Transformers, ONNX Runtime
Fine-Tuning	Full fine-tuning and adapter-based (LoRA)
License	MIT
Deployment	Local, on-prem, air-gapped — full data sovereignty

GPT-4o-mini

OpenAI's cost-efficient multimodal model, accessible exclusively via cloud API.

Attribute	Details
Parameters	Not publicly disclosed
Architecture	Multimodal Transformer (text + image input, text output)
Context Window	128,000 tokens input / 16,384 tokens max output
Reasoning Mode	Standard inference (no explicit chain-of-thought toggle)
Tool / Function Calling	Supported; parallel function calling
Structured Output	JSON mode and strict JSON schema adherence supported
Multilingual	Broad multilingual support
Benchmarks	MMLU: ~87%, strong HumanEval and MBPP scores
Pricing	$0.15 / 1M input tokens, $0.60 / 1M output tokens (Batch API: 50% discount)
Fine-Tuning	Supervised fine-tuning via OpenAI API
License	Proprietary (OpenAI Terms of Use)
Deployment	Cloud-only — OpenAI API or Azure OpenAI Service. No self-hosted or on-prem option
Knowledge Cutoff	October 2023

Comparison Summary

Capability	Meta-Llama-3.1-8B-Instruct	GPT-4o-mini
SOAP note generation	Yes	Yes
Billing code extraction (ICD-10 / CPT)	Yes	Yes
Speaker diarization classification	Yes	Yes
Clinical QA with RAG	Yes	Yes
Function / tool calling	Yes	Yes
JSON structured output	Yes	Yes
On-prem / air-gapped deployment	Yes	No
Data sovereignty	Full (weights run locally)	No (data sent to cloud API)
Open weights	Yes (Llama 3.1 Community License)	No (proprietary)
Custom fine-tuning	Full fine-tuning + LoRA adapters	Supervised fine-tuning (API only)
Quantization for edge devices	GGUF / AWQ / GPTQ	N/A
Multimodal (image input)	No	Yes
Native context window	128K	128K

Both models support SOAP generation, billing codes, and clinical QA with RAG. However, only Meta-Llama-3.1-8B-Instruct offers open weights, data sovereignty, and local deployment flexibility — making it suitable for air-gapped, regulated, or cost-sensitive clinical environments. GPT-4o-mini offers lower latency and higher throughput via OpenAI's cloud infrastructure, with added multimodal capabilities.

Flowise Orchestration

Three Flowise flows are automatically provisioned when the stack starts. No manual flow configuration is required.

MediVault SOAP Generator

An LLMChain composed of a ChatPromptTemplate and a ChatOllama node. The prompt is specialty-aware and instructs the model to produce a structured SOAP note from diarized consultation transcript segments.

MediVault Clinical QA

A ConversationalRetrievalQAChain composed of a ChromaDB retriever, a BufferMemory node for conversation history, and a ChatOllama node. The chain returns answers with returnSourceDocuments: true.

MediVault KB Upsert

A flow combining PlainText input, OllamaEmbeddings, and a ChromaDB sink for document ingestion operations.

Inspect all live flow topologies by opening the Flowise canvas at http://localhost:3001.

Environment Variables

All variables are defined in .env (copied from .env.example). The backend reads them at startup via python-dotenv.

Flowise Configuration

Variable	Description	Default
`FLOWISE_ENDPOINT`	Internal URL of the Flowise service	`http://medivault-flowise:3001`
`FLOWISE_API_KEY`	Flowise API key for authenticated requests	(empty — auth disabled)

Ollama Configuration

Variable	Description	Default
`OLLAMA_BASE_URL`	URL of the Ollama service on the host	`http://host.docker.internal:11434`
`OLLAMA_MODEL`	Ollama model used for LLM inference	`llama3.1:8b`
`OLLAMA_EMBED_MODEL`	Ollama model used for embeddings	`nomic-embed-text`

ChromaDB Configuration

Variable	Description	Default
`CHROMA_HOST`	ChromaDB service hostname	`medivault-chromadb`
`CHROMA_PORT`	ChromaDB internal port	`8000`

Whisper Configuration

Variable	Description	Default
`WHISPER_ENDPOINT`	Internal URL of the Whisper ASR service	`http://medivault-whisper:9000`
`WHISPER_MODEL`	Whisper model size	`small`

File Upload Limits

Variable	Description	Default
`MAX_AUDIO_SIZE`	Maximum accepted audio file size in bytes	`26214400` (25 MB)
`MAX_FILE_SIZE`	Maximum accepted document file size in bytes	`10485760` (10 MB)

Server Configuration

Variable	Description	Default
`BACKEND_PORT`	Port the FastAPI server listens on	`5001`

Technology Stack

Backend

Framework: FastAPI (Python 3.11+) with Uvicorn ASGI server
LLM Orchestration: Flowise — auto-provisioned chains for SOAP generation
LLM Inference: Ollama — runs natively on host for diarization, billing codes, and clinical QA
Vector DB Client: chromadb-client — direct Python client for all upsert and query operations
PDF Processing: pypdf for text extraction from uploaded PDF files
Config Management: python-dotenv for environment variable injection at startup
Data Validation: Pydantic v2 for request/response schema enforcement

Frontend

Framework: React 18 with Vite (fast HMR and production bundler)
Styling: Tailwind CSS with dark mode
Icons: Lucide React
HTTP Client: Axios
Flow Visualisation: @xyflow/react
Production Server: Nginx — serves the built assets and proxies /api/ to the backend container

Infrastructure

Component	Technology
Containerisation	Docker Compose (5 services)
LLM inference	Ollama (host machine)
LLM orchestration	Flowise
Speech-to-text	Whisper ASR (onerahmet/openai-whisper-asr-webservice, faster_whisper engine)
Vector store	ChromaDB (containerised, persistent named volume)

Troubleshooting

For common issues and solutions, see TROUBLESHOOTING.md.

Quick diagnostic commands:

# Health check
curl http://localhost:5001/health

# View logs for all services
docker compose logs -f

# View logs for a specific service
docker compose logs -f medivault-api

# Check container health status
docker compose ps

# Restart a single service
docker compose restart medivault-flowise

# Rebuild and restart the entire stack
docker compose down && docker compose up --build

License

This project is licensed under the MIT License. See LICENSE.md for details.

Disclaimer

MediVault AI is provided as-is for demonstration and educational purposes. While we strive for accuracy:

All AI-generated SOAP notes, ICD-10 codes, CPT codes, and clinical Q&A responses must be reviewed and approved by a licensed clinician before use in any clinical context
Do not rely solely on AI-generated outputs without independent clinical verification
Do not use this system with real patient data without implementing full HIPAA, GDPR, and applicable regulatory compliance measures
The quality of outputs depends on the underlying Ollama model and the content of the ingested knowledge base

For full disclaimer details, see DISCLAIMER.md.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
api		api
docs/assets		docs/assets
ui		ui
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DISCLAIMER.md		DISCLAIMER.md
LICENSE.md		LICENSE.md
README.md		README.md
SECURITY.md		SECURITY.md
TERMS_AND_CONDITIONS.md		TERMS_AND_CONDITIONS.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
docker-compose.yaml		docker-compose.yaml

Folders and files

Latest commit

History

Repository files navigation

MediVault AI — Offline Clinical Intelligence Platform

Table of Contents

Project Overview

How It Works

Architecture

Architecture Diagram

Architecture Components

Service Components

Typical Flow

Get Started

Prerequisites

Verify Installation

Quick Start (Docker Deployment)

1. Clone the Repository

2. Configure the Environment

3. Build and Start the Application

4. Access the Application

5. Verify Services

6. Stop the Application

Local Development Setup

Project Structure

Usage Guide

Inference Metrics

Model Capabilities

Meta-Llama-3.1-8B-Instruct

BAAI/bge-base-en-v1.5

GPT-4o-mini

Comparison Summary

Flowise Orchestration

MediVault SOAP Generator

MediVault Clinical QA

MediVault KB Upsert

Environment Variables

Flowise Configuration

Ollama Configuration

ChromaDB Configuration

Whisper Configuration

File Upload Limits

Server Configuration

Technology Stack

Backend

Frontend

Infrastructure

Troubleshooting

License

Disclaimer

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages