CareAgent

Multi-Agent Provider Quality Intelligence System

Automates Medicare provider quality scoring, anomaly detection, and network recommendations using a LangGraph multi-agent pipeline on real CMS data — in under 0.4 seconds per provider.

Live Demo

Component	URL
🎛️ Dashboard	huggingface.co/spaces/AhmeduddinMohammed/careagent
⚡ API	careagent-api-668260909878.us-central1.run.app
📖 API Docs	careagent-api-668260909878.us-central1.run.app/docs
🏥 Health Check	careagent-api-668260909878.us-central1.run.app/health
💻 GitHub	github.com/MohammedAhmeduddin/careagent

Try these NPIs in the dashboard:

NPI	Provider	Specialty	Expected Result
`1000153386`	Davis, John	Orthopedic Surgery · IN	🟢 INCLUDE — score 94.8, no anomaly
`5133794489`	Anderson, Barbara	Psychiatry · IL	🟡 REVIEW — $10,217 avg charge, anomaly flagged
`6237376063`	Garcia, Michael	Nephrology · FL	🟡 REVIEW — 3.5× cost ratio outlier

The Problem

Insurance companies and health systems manually review thousands of provider performance reports every quarter to decide network inclusion and reimbursement tiers. At scale this takes 3 analysts 2 weeks per quarter — and still produces inconsistent results with no reproducible audit trail.

CareAgent replaces that process end-to-end.

A single API call ingests a provider NPI, routes it through five specialized agents, runs Isolation Forest anomaly detection across 10,000 providers, generates a grounded performance narrative, and returns a structured scorecard with a network recommendation — in 0.36 seconds on average.

Key Results

Metric	Value
Pipeline latency	0.32 – 0.36s per provider (warm instance)
Providers loaded	10,000 CMS Medicare records
Procedure rows	54,987 across 15 HCPCS codes
Specialties covered	20
Anomaly detection rate	3.0% — 300 / 9,985 providers flagged
Test coverage	92.42%
Total tests	99 — 90 unit · 9 integration
Deployment	GCP Cloud Run + Neon PostgreSQL + HuggingFace Spaces

Agent Architecture

CMS Medicare Data  ──→  PostgreSQL warehouse (Neon · cloud)
                               │
                    LangGraph Supervisor Agent
                    ┌──────────────────────────┐
                    │  Reads AgentState after  │
                    │  every step. Routes      │
                    │  dynamically. Skips      │
                    │  unneeded agents.        │
                    └──────────────────────────┘
                    ╱    ╱      │      ╲     ╲
                   ↓    ↓      ↓       ↓     ↓
            Cleaner  Scorer  Anomaly  Sum.  Reporter
               │       │       │       │       │
               └───────┴───────┴───────┴───────┘
                           returns to
                          supervisor ↑
                               │
                    AgentState TypedDict
                    (25 typed fields, shared)
                               │
                    FastAPI  /analyze  (GCP Cloud Run)
                    ╱                        ╲
         Streamlit dashboard            MLflow + LangSmith
         (HuggingFace Spaces)           observability

Why agents and not a pipeline?

The supervisor reads AgentState after every step and routes dynamically. A provider with complete data skips the DataCleaner entirely. Two providers entering the same system can take completely different paths. Each agent uses fundamentally different tooling — SQL for retrieval, scikit-learn for scoring, OpenAI for narrative — which cannot be collapsed into a single LLM call at production scale.

Agent Details

1 · Data Cleaner Agent

Assesses data quality score — fraction of key fields present before imputation
Imputes missing numeric fields using specialty-level percentile_cont(0.5) medians from PostgreSQL
Skipped by supervisor when data_quality_score ≥ 0.85

2 · Statistical Agent

Composite quality score (0–100) benchmarked against specialty national averages:

Component	Weight	Metric
Cost efficiency	40%	Provider charge ratio vs specialty average
Volume	30%	Percentile rank of total services within specialty
Payment efficiency	30%	Medicare payment / allowed amount ratio

3 · Anomaly Detection Agent

Isolation Forest — contamination=0.03, n_estimators=100
Features: avg_submitted_charge, avg_medicare_payment, avg_allowed_amount, total_services
StandardScaler normalisation before fitting
Trained on full 9,985-provider dataset per run
Confirmed 300 / 9,985 providers flagged — exactly 3.0% in production

4 · Provider Performance Narrative Agent

GPT-4o-mini with function calling for structured JSON output
Temperature 0.1 for factual consistency
Data-grounded prompt — no hallucination beyond source fields
Template fallback when API key unavailable — zero downtime

5 · Reporter Agent

Assembles final scorecard JSON
Network recommendation logic: include (quality ≥ 75, no anomaly) · review (otherwise) · exclude (future)
Writes all outputs to providers table with full audit trail

Tech Stack

Layer	Technology	Justification
Agent orchestration	LangGraph StateGraph	Purpose-built for stateful multi-agent workflows with cycles
LLM	GPT-4o-mini + function calling	Structured JSON output, hallucination control
Anomaly detection	Isolation Forest (scikit-learn)	Unsupervised — no labelled anomalies needed
Database	PostgreSQL 16 + SQLAlchemy 2	Typed ORM, all agent outputs pre-reserved in schema
Cloud database	Neon.tech	Serverless PostgreSQL, free tier, SSL
API	FastAPI + Pydantic v2	Typed endpoints, NPI format validation
Observability	LangSmith	Step-level agent trace visibility
Experiment tracking	MLflow	Agent run logging, artifact versioning
CI/CD	GitHub Actions	Unit tests + 85% coverage gate on every push
Dashboard	Streamlit	Live agent progress stepper
Deployment	GCP Cloud Run	Serverless, min-instances=1, no cold starts
Containers	Docker + docker-compose	AMD64 build for Cloud Run compatibility

Project Structure

careagent/
├── src/careagent/
│   ├── agents/
│   │   ├── supervisor.py        # LangGraph routing — dynamic agent skipping
│   │   ├── data_cleaner.py      # Specialty median imputation via PostgreSQL
│   │   ├── statistical.py       # Composite quality score vs national benchmarks
│   │   ├── anomaly.py           # Isolation Forest across full provider set
│   │   ├── summarizer.py        # GPT-4o-mini + template fallback
│   │   └── reporter.py          # Scorecard assembly + recommendation
│   ├── graph/
│   │   ├── state.py             # AgentState TypedDict — 25 typed fields
│   │   └── workflow.py          # LangGraph StateGraph definition
│   ├── api/
│   │   ├── main.py              # FastAPI app + root endpoint directory
│   │   ├── schemas.py           # Pydantic v2 request/response contracts
│   │   └── routes/
│   │       ├── analyze.py       # POST /analyze — full pipeline
│   │       └── health.py        # GET /health — live database stats
│   ├── db/
│   │   ├── models.py            # SQLAlchemy ORM — 3 tables, all agent fields
│   │   ├── queries.py           # Typed query functions — one per agent operation
│   │   └── session.py           # Engine + session factory
│   └── config.py                # Pydantic v2 settings from environment
├── scripts/
│   ├── load_cms_data.py         # CMS ingestion — upsert, idempotent, column-mapped
│   └── generate_synthetic_cms.py
├── tests/
│   ├── unit/                    # 90 tests — SQLite in-memory, no PostgreSQL needed
│   └── integration/             # 9 tests — requires live PostgreSQL
├── app.py                       # Streamlit dashboard — live stepper + scorecard
├── Dockerfile                   # FastAPI container (AMD64 for GCP Cloud Run)
├── docker-compose.yml           # PostgreSQL + MLflow local stack
├── pyproject.toml               # Dependencies + pytest configuration
└── .github/workflows/ci.yml     # GitHub Actions CI pipeline

Database Schema

Three tables. Every agent output field is pre-reserved in the schema before any agent is written — preventing schema drift.

providers                    provider_services           agent_runs
─────────────────────        ─────────────────────       ─────────────────────
npi (PK)                     id (PK)                     id (PK)
entity_type                  npi (FK → providers)        run_id (unique)
last_name_or_org             hcpcs_code                  npi (FK → providers)
provider_type                hcpcs_description           agents_executed
state                        line_service_count          agents_skipped
                             beneficiary_count           total_duration_seconds
── Statistical Agent ──      avg_medicare_payment        llm_tokens_used
quality_score                avg_submitted_charge        final_recommendation
quality_percentile           avg_allowed_amount          status
cost_efficiency_ratio                                    mlflow_run_id
volume_percentile            
── Anomaly Agent ──
is_anomaly
anomaly_score
anomaly_reason
── Data Cleaner Agent ──
data_quality_score
fields_imputed
cleaning_notes
── Summarizer Agent ──
performance_narrative
narrative_faithfulness
narrative_relevancy
── Reporter Agent ──
network_recommendation
scorecard_version
last_scored_at

Quickstart (Local)

Prerequisites: Python 3.11+, Docker Desktop

# 1. Clone
git clone https://github.com/MohammedAhmeduddin/careagent.git
cd careagent

# 2. Virtual environment
python3 -m venv venv && source venv/bin/activate

# 3. Install dependencies
pip install -e ".[dev]"

# 4. Configure environment
cp .env.example .env
# Set POSTGRES_PORT=5433 if port 5432 is in use

# 5. Start infrastructure
docker-compose up -d
sleep 10

# 6. Load provider data
python scripts/generate_synthetic_cms.py
python scripts/load_cms_data.py --file data/cms_provider_2022.csv

# 7. Run tests
PYTHONPATH=src pytest tests/unit/ -v

# 8. Start API
PYTHONPATH=src uvicorn careagent.api.main:app --reload --port 8000

# 9. Start dashboard (new terminal)
streamlit run app.py

API Reference

`POST /analyze`

Runs the full 5-agent pipeline for a provider NPI.

curl -X POST https://careagent-api-668260909878.us-central1.run.app/analyze \
  -H "Content-Type: application/json" \
  -d '{"npi": "1000153386"}'

Response:

{
  "npi": "1000153386",
  "run_id": "814a506e-dea9-4f6f-957e-852248171ce0",
  "provider_name": "Davis, John",
  "provider_type": "Orthopedic Surgery",
  "state": "IN",
  "quality_score": 94.81,
  "quality_percentile": 83.04,
  "cost_efficiency_ratio": 2.695,
  "volume_percentile": 97.27,
  "is_anomaly": false,
  "anomaly_score": -0.521082,
  "anomaly_reason": "Provider cost and volume metrics are within normal range...",
  "data_quality_score": 1.0,
  "fields_imputed": 0,
  "performance_narrative": "Davis is an Orthopedic Surgery provider in IN...",
  "network_recommendation": "include",
  "scorecard_version": "0.1.0",
  "agents_executed": ["data_cleaner", "statistical", "anomaly", "summarizer", "reporter"],
  "agents_skipped": [],
  "pipeline_duration_seconds": 0.363,
  "generated_at": "2026-05-29T03:01:18.410709Z"
}

`GET /health`

{
  "status": "healthy",
  "database": "connected",
  "providers_loaded": 10000,
  "providers_scored": 8,
  "anomalies_flagged": 3,
  "version": "0.1.0"
}

Full interactive docs: careagent-api-668260909878.us-central1.run.app/docs

Deployment

Component	Platform	URL
FastAPI	GCP Cloud Run (us-central1)	careagent-api-668260909878.us-central1.run.app
PostgreSQL	Neon.tech (AWS us-east-1)	Managed serverless — 10,000 providers
Dashboard	HuggingFace Spaces	huggingface.co/spaces/AhmeduddinMohammed/careagent

Cloud Run configuration:

--min-instances=1 — no cold starts
--memory=1Gi --cpu=1
AMD64 image built with docker buildx --platform linux/amd64

CI/CD Pipeline

GitHub Actions runs on every push to main:

Push to main
    │
    ├── Job: test
    │   ├── Set up Python 3.11
    │   ├── pip install -e ".[dev]"
    │   ├── pytest tests/unit/ --cov --cov-fail-under=85
    │   ├── Validate all imports
    │   └── Validate API schema contracts
    │
    └── Job: lint
        └── ruff check src/ tests/

Testing Strategy

Unit tests use SQLite in-memory and plain dataclasses — no PostgreSQL, no OpenAI, runs in under 2 seconds.

tests/unit/
├── test_db_models.py        18 tests  — ORM models + typed query functions
├── test_supervisor.py       15 tests  — all routing paths including skip logic
├── test_agents_week3.py     15 tests  — quality scoring formula, IF feature matrix
├── test_api_schemas.py      15 tests  — Pydantic v2 NPI validation, response shape
├── test_api_routes.py        9 tests  — FastAPI TestClient, mocked pipeline
└── test_agents_coverage.py  18 tests  — imputation, recommendations, fallbacks

tests/integration/
└── test_pipeline.py          9 tests  — full pipeline against live PostgreSQL
    ├── completes without error
    ├── quality score 0–100
    ├── anomaly flag (bool)
    ├── narrative > 20 chars
    ├── valid recommendation
    ├── agents tracked correctly
    ├── results written to database
    ├── completes under 30 seconds
    └── clean provider skips data cleaner

Data

CareAgent uses synthetic data generated with the exact CMS Medicare Provider Utilization and Payment Data column structure (2022 release).

The synthetic generator replicates real CMS data characteristics:

Log-normal payment and charge distributions
Suppressed beneficiary counts (< 11 patients → NaN) matching CMS privacy rules
3% injected cost/quality anomalies for realistic Isolation Forest training
20 specialties · 20 states · 15 HCPCS procedure codes

For production use, download the real dataset from data.cms.gov and run:

python scripts/load_cms_data.py --file data/cms_provider_2022.csv

Limitations and Future Work

Area	Current state	Planned
LLM evaluation	`narrative_faithfulness` and `narrative_relevancy` are `null`	RAGAS eval harness on held-out sample; log scores to MLflow
Async execution	DataCleaner and Anomaly run sequentially	`asyncio` parallel execution → ~2s pipeline
Real CMS data	Synthetic 10K providers	Full 9M-row dataset + OpenAI Batch API (~$12)
LangSmith tracing	Configured, requires API key	Full step-level trace on LangSmith free tier
Scoring formula	Proxy metrics (cost, volume, payment)	Clinical quality measures (readmission rates, HEDIS)

Resume Bullets

Architected LangGraph multi-agent system with dynamic supervisor routing across 5
specialized agents, processing CMS Medicare provider analysis in 0.36s avg with
92% test coverage across 99 tests — deployed on GCP Cloud Run

Built Isolation Forest anomaly detection identifying top 3% of outlier providers by
cost-quality ratio across 10,000 CMS Medicare provider records, confirmed 300/9,985
providers flagged at contamination=0.03

Implemented composite quality scoring formula benchmarked against specialty national
averages (cost efficiency 40% + volume percentile 30% + payment efficiency 30%) with
real-time PostgreSQL benchmark queries

Deployed multi-agent system on GCP Cloud Run with FastAPI serving scorecards in
0.36s, Neon PostgreSQL cloud database, and HuggingFace Spaces dashboard

Built GitHub Actions CI/CD pipeline with 90 unit tests + 85% coverage gate on every
push, using SQLite in-memory fixtures — zero external dependencies in CI

Acknowledgements

Provider data structure modelled on CMS Medicare Provider Utilization and Payment Data — free, public, used by real health systems.

Author

Ahmeduddin Mohammed

GitHub: @MohammedAhmeduddin
LinkedIn: linkedin.com/in/mohammed-ahmeduddin
Portfolio: SessionScout · LoanLens · CareAgent

_{Built with LangGraph · FastAPI · PostgreSQL · scikit-learn · GPT-4o-mini · GCP Cloud Run}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CareAgent

Multi-Agent Provider Quality Intelligence System

Live Demo

The Problem

Key Results

Agent Architecture

Why agents and not a pipeline?

Agent Details

1 · Data Cleaner Agent

2 · Statistical Agent

3 · Anomaly Detection Agent

4 · Provider Performance Narrative Agent

5 · Reporter Agent

Tech Stack

Project Structure

Database Schema

Quickstart (Local)

API Reference

`POST /analyze`

`GET /health`

Deployment

CI/CD Pipeline

Testing Strategy

Data

Limitations and Future Work

Resume Bullets

Acknowledgements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
scripts		scripts
src/careagent		src/careagent
tests		tests
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

CareAgent

Multi-Agent Provider Quality Intelligence System

Live Demo

The Problem

Key Results

Agent Architecture

Why agents and not a pipeline?

Agent Details

1 · Data Cleaner Agent

2 · Statistical Agent

3 · Anomaly Detection Agent

4 · Provider Performance Narrative Agent

5 · Reporter Agent

Tech Stack

Project Structure

Database Schema

Quickstart (Local)

API Reference

POST /analyze

GET /health

Deployment

CI/CD Pipeline

Testing Strategy

Data

Limitations and Future Work

Resume Bullets

Acknowledgements

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /analyze`

`GET /health`

Packages