AINPI

Experimental explorer for the CMS National Provider Directory (NPD) public use files.

Work in progress. AINPI is research/educational. Data may be incomplete, stale, or incorrect. Every number should be verified against primary sources before any business or clinical decision. See the /insights page for a full provenance analysis.

What it does

CMS released the National Provider Directory as FHIR R4 NDJSON public use files from directory.cms.gov — 27.2M records across 6 resource types (Practitioner, PractitionerRole, Organization, OrganizationAffiliation, Location, Endpoint). AINPI:

Ingests the full 40.7 GB dataset into Google BigQuery (2.8 GB compressed zstd on disk, 27,200,569 rows loaded, 99.985% completeness vs CMS manifest)
Serves interactive exploration through a Next.js 14 app on Vercel, backed by Supabase Postgres for pre-aggregated metrics
Analyzes data provenance — which fields come from NPPES vs PECOS vs CEHRT vendor submissions, and where the self-attestation gaps are (CAQH is not in the NPD pipeline)

Pages

Path	What it is
`/methodology`	Versioned audit methodology — DAMA DMBOK mapping, L0–L7 scoring, reproducibility commands
`/findings`	Pre-registered findings (H1–H22). Each states null hypothesis + denominator before numbers drop
`/npd`	Public search by NPI, name, organization, state, city
`/data-quality`	D3 dashboard: choropleth, sankey, knowledge graph, drill-down, validation
`/insights`	Provenance + variance analysis (NPD vs published org numbers)
`/provider-search`	Real-time search against live payer FHIR directories
`/magic-scanner`	AI-augmented provider discovery

Public URL contract

Static JSON, CDN-cached, safe to depend on across releases:

/api/v1/stats.json — site-wide counters, methodology version, commit SHA
/api/v1/findings/<slug>.json — one per finding (types)

Breaking changes bump the path (/api/v2/), not the shape in place.

Sibling repositories

Repo	Scope
`FHIR-IQ/ainpi-probe`	FHIR endpoint liveness crawler (L0–L7). Runs separately from the site so operators can audit the code that hits their endpoints.
`FHIR-IQ/ainpi-examples`	Python + DuckDB usage examples for the `/api/v1/*` contract.

What's in this repo

frontend/          Next.js 14 app — routes, API, charts, tests
pipeline/          DuckDB-over-Parquet validation pipeline (shard, edges, NPI Luhn, temporal)
docs/methodology/  Versioned methodology doc, rendered at /methodology
.github/           CI, CodeQL, dependabot, issue + PR templates, release workflow

Architecture

       ┌────────────────────────────────┐
       │ directory.cms.gov              │
       │ 6 NDJSON.zst files, 2.8 GB     │
       └──────────────┬─────────────────┘
                      │ scripts/ingest-cms-npd.ts
                      ▼
       ┌────────────────────────────────┐
       │ BigQuery (cms_npd dataset)     │
       │ resource:JSON + _* flat fields │
       │ 27.2M rows + 5 analytics views │
       └──────┬─────────────────────┬───┘
              │                     │
  live query  │                     │ scripts/sync-bq-to-supabase.ts
              │                     │  (nightly aggregation)
              ▼                     ▼
       ┌──────────────┐     ┌──────────────────┐
       │ Next.js API  │     │ Supabase Postgres│
       │ routes       │◄────┤ Prisma ORM       │
       │ on Vercel    │     │ pre-agg metrics  │
       └──────┬───────┘     │ user auth        │
              │             └──────────────────┘
              ▼
       ┌────────────────────────────────┐
       │ React + D3 dashboard           │
       │ FilterContext cross-filtering  │
       └────────────────────────────────┘

Why this split: BigQuery costs <$1/mo to hold 40 GB of FHIR JSON and gives free-tier-friendly analytics. Supabase is where the app's hot-path queries and auth data live. Pre-aggregations are synced nightly so the dashboard doesn't hit BigQuery on every page load.

Quickstart

cd frontend
npm install
cp .env.example .env.local   # fill in Supabase + GCP values
npm run db:push              # push Prisma schema to Supabase
npm run dev                  # http://localhost:3000

To reload the NPD warehouse (only needed when CMS publishes a new release):

npm run bq:setup     # Create dataset + tables + views (idempotent)
npm run bq:ingest    # Download from directory.cms.gov, stream into BigQuery
npm run bq:sync      # Aggregate BigQuery → Supabase metrics

Testing

npm run test         # Vitest — 62 unit tests
npm run test:e2e     # Playwright — 15 E2E specs

Covers FHIR reference extraction, API parameter parsing, validation contract, filter context hierarchy, NPI/URL regex, BigQuery schema, dashboard dropdown interactions, and search.

Documentation

CLAUDE.md — Architecture + developer reference
DATABASE_SETUP.md — Supabase + Prisma + BigQuery setup walkthrough

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
.github		.github
analysis		analysis
backend		backend
crawler		crawler
demo-data		demo-data
docs		docs
examples		examples
frontend		frontend
models		models
modules		modules
pipeline		pipeline
sample-data		sample-data
specs		specs
supabase		supabase
web-app		web-app
.gitattributes		.gitattributes
.gitignore		.gitignore
.mcp.json		.mcp.json
.vercelignore		.vercelignore
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DATABASE_SETUP.md		DATABASE_SETUP.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
deploy.sh		deploy.sh
package.json		package.json
tsconfig.json		tsconfig.json
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AINPI

What it does

Pages

Public URL contract

Sibling repositories

What's in this repo

Architecture

Quickstart

Testing

Documentation

Key references

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AINPI

What it does

Pages

Public URL contract

Sibling repositories

What's in this repo

Architecture

Quickstart

Testing

Documentation

Key references

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages