DataSheriff — AI Data Incident Investigator

Describe a broken dashboard in plain English. DataSheriff traces your full data lineage, checks quality tests at every upstream node, identifies the root cause, and auto-tags the failing asset in OpenMetadata — all in under 30 seconds.

Built for the WeMakeDevs × OpenMetadata Hackathon 2026

🚀 Try It Live — No Setup Required


Live App	https://datasheriff.riturajdey01.workers.dev/
Demo Video	https://youtu.be/FJjLcWuaQ0M
GitHub	https://github.com/Rituraj-13/DataSheriff

Bring your own Anthropic API key — go to ⚙️ Settings, paste your sk-ant-... key, type any query, and watch the investigation unfold in real time.

🎯 The Problem — $2.5B Lost to Broken Data Pipelines Every Year

Every data team has the same nightmare: a dashboard goes red at 2 AM, and no one knows why.

The typical debugging workflow looks like this:

Slack fire — stakeholders ping the data team
Manual triage — engineers check Airflow DAGs, query upstream tables, search OpenMetadata for owners
3–6 hours later — root cause found, fix deployed, damage done

DataSheriff compresses that entire workflow to 30 seconds.

It does exactly what a senior data engineer would do — but automatically, in parallel, with every finding traceable back to a real API call.

✨ What Makes It Stand Out

Capability	How DataSheriff does it
Natural Language Input	Plain English query, no JSON, no SQL
Live Streaming Investigation	Every tool call streamed via SSE — watch the agent think
Real Lineage Traversal	Walks the actual OpenMetadata lineage graph, not hardcoded paths
Evidence-Backed Reports	Every claim in the report is quoted directly from API output
Auto-Governance Tagging	Writes `DataQuality.Failing` back to OpenMetadata on root cause
Bring Your Own Key	Server never touches API keys — zero credit risk for operators
MTTR Tracking	History dashboard measures your team's mean time to resolution
Honesty Rules	Agent refuses to invent data — returns null if nothing is found

🧠 How It Works

DataSheriff uses a Claude Sonnet 4.6 AI agent equipped with 6 custom MCP tools, each wrapping a real OpenMetadata REST API. Here's a real investigation trace:

User: "The orders dashboard is showing wrong data"
        ↓
[Tool 1] search_assets("orders dashboard")
         → Found: sample_looker.orders (Dashboard)

[Tool 2] get_lineage("sample_looker.orders", "dashboard", depth=3)
         → 10-node upstream graph: fact_orders ← dim_address, dim_customer, raw_order, ...

[Tool 3] get_quality_tests("sample_data.ecommerce_db.shopify.fact_orders")
         → 9 tests: 6 passed, 3 FAILED

[Tool 4] get_quality_tests("sample_data.ecommerce_db.shopify.dim_address")
         → FAILED: diff_columns (4 different columns vs production)
         → FAILED: diff_with_production (3 differing rows found)
         → FAILED: column_values_to_be_between (min=1001, expected ≥1000)

[Tool 5] get_asset_owner("sample_data.ecommerce_db.shopify.dim_address")
         → Owner: data-platform-team

[Tool 6] tag_asset_failing("sample_data.ecommerce_db.shopify.dim_address")
         → ✅ Tagged DataQuality.Failing in OpenMetadata

Output: Incident report — root cause: dim_address, 3 failing tests, owner notified

Total time: 12 seconds.

🗺️ Architecture

User (plain English query + Anthropic API key)
        │
        ▼
React Frontend (Cloudflare Workers)
        │  POST /investigate
        │  Header: X-Anthropic-Key: sk-ant-xxxx
        ▼
FastAPI Backend (DigitalOcean, port 8000)
        │
        ▼
Claude Sonnet 4.6 Agent
        │  @mcp.tool() functions called directly (async Python)
        ▼
OpenMetadata REST APIs (v1.12.5)
   ├── GET  /search/query                     ← find assets by name
   ├── GET  /lineage/{type}/{id}              ← traverse lineage (upstreamDepth=3)
   ├── GET  /dataQuality/testCases            ← check test results per table
   ├── GET  /pipelines/name/{fqn}            ← resolve pipeline ID
   │    └── GET /pipelines/{id}/status       ← fetch run history
   ├── GET  /{entity_type}s/name/{fqn}       ← find asset owner (tables/dashboards/pipelines)
   │    GET  /{entity_type}s/name/{fqn}?fields=tags
   └── PATCH /{entity_type}s/{id}            ← write DataQuality.Failing tag
        │
        ▼
Server-Sent Events stream → React UI updates in real time

🔗 OpenMetadata Integration — In Depth

DataSheriff is built on 6 OpenMetadata API surfaces used in a coordinated investigation flow:

MCP Tool	OpenMetadata API	What it unlocks
`search_assets(query)`	`GET /api/v1/search/query`	Full-text asset search across all entity types
`get_lineage(entity_id, type)`	`GET /api/v1/lineage/{entityType}/{id}?upstreamDepth=3`	Multi-hop upstream lineage graph traversal
`get_quality_tests(table_fqn)`	`GET /api/v1/dataQuality/testCases`	Per-table test results with failure evidence
`get_pipeline_runs(pipeline_fqn)`	`GET /api/v1/pipelines/name/{fqn}` → `GET /api/v1/pipelines/{id}/status`	2-step: resolve ID then fetch Airflow run history
`get_asset_owner(entity_fqn, entity_type)`	`GET /api/v1/{entity_type}s/name/{fqn}`	Ownership resolution for tables, dashboards, pipelines
`tag_asset_failing(entity_fqn, entity_type)`	`GET` + `PATCH /api/v1/{entity_type}s/{id}`	Idempotent governance tagging — checks before writing

The agent dynamically determines which tools to call and in what order based on what it finds. It doesn't follow a script — it reasons from evidence.

🔑 Key Features

Live Investigation Timeline — every tool call streamed in real time with a confidence score bar climbing 0 → 100%
Interactive Lineage Graph — React Flow visualization; failing nodes highlighted in red with directional arrows
Structured Incident Report — root cause, severity badge, evidence quoted directly from API output, recommended action
Auto-Governance Tagging — writes DataQuality.Failing tag back to OpenMetadata at investigation close
MTTR History Dashboard — tracks past investigations, computes mean time to resolution vs 4.2h industry average
Severity Trend Chart — Recharts visualization of incident severity distribution over time
Bring Your Own Key — Anthropic key stored in browser localStorage only, sent as X-Anthropic-Key header, never persisted server-side
Honesty Rules — agent refuses to invent lineage or test results; every claim cites a specific tool call

📋 Prerequisites

Docker Desktop — at least 6 GB RAM allocated
Python 3.11+
Node.js 18+
An Anthropic API key — ~$5 credits covers hundreds of investigations

⏱️ Total Setup Time: ~20 minutes (mostly waiting for containers to start)

⚡ Quick Start (Recommended)

git clone https://github.com/Rituraj-13/DataSheriff.git
cd DataSheriff
chmod +x seed_and_run.sh
./seed_and_run.sh

The script will:

Verify Docker is running
Start OpenMetadata + Airflow containers
Wait for OpenMetadata to be ready (2–5 min)
Pause and guide you through the manual Airflow + token steps
Create the Python venv and install dependencies
Run setup_demo.py to wire up demo lineage
Start the FastAPI backend

Before starting the frontend, open frontend/src/App.jsx and update line 18 to point at your local backend:

// Change this (production URL):
const API_BASE = 'https://backend.riturajdey.dev'

// To this (your local backend):
const API_BASE = 'http://localhost:8000'

Then in a second terminal:

cd frontend && npm install && npm run dev

Open http://localhost:5173 → ⚙️ Settings → enter your Anthropic API key → investigate.

📋 Manual Setup (Step by Step) — click to expand

💡 Why some steps are manual: Airflow startup time varies greatly across machines. Manual triggering lets you verify each component is ready before proceeding.

Step 1 — Clone the repository

git clone https://github.com/Rituraj-13/DataSheriff.git
cd DataSheriff

Step 2 — Start OpenMetadata + Airflow

docker compose -f openmetadata-docker/docker-compose-postgres.yml up --detach

This starts 5 containers:

Container	Purpose	Port
`openmetadata_server`	Main OpenMetadata app	8585
`openmetadata_ingestion`	Airflow for sample data	8080
`openmetadata_postgresql`	Metadata database	—
`openmetadata_elasticsearch`	Search index	—
`execute_migrate_all`	DB migrations (exits after)	—

Wait 3–5 minutes, then verify:

curl http://localhost:8585/api/v1/system/version
# {"version":"1.12.5", ...}

Step 3 — Load sample data via Airflow

Open http://localhost:8080
Log in: admin / admin

Trigger these 4 DAGs in order, waiting for each green ✅ before the next:

#	DAG Name	What it loads	Time
1	`sample_data`	Tables: fact_orders, dim_address, raw_order, dim_customer...	~2–3 min
2	`sample_lineage`	Lineage connections between tables	~1–2 min
3	`sample_usage`	Usage statistics	~1 min
4	`airflow_metadata_extraction`	Syncs Airflow metadata into OpenMetadata	~1–2 min

If a DAG fails (red ❌), click it to see logs and re-trigger.

Step 4 — Get your OpenMetadata JWT token

Open http://localhost:8585
Log in: admin@open-metadata.org / admin
Settings (gear icon) → Bots → click ingestion-bot
Copy the Token value

If you get 401 errors later, the token expired — repeat this step and update backend/.env.

Step 5 — Configure the backend

cd backend
python3 -m venv venv
source venv/bin/activate      # Mac/Linux
# .\venv\Scripts\activate     # Windows

pip install -r requirements.txt
cp .env.example .env

Edit backend/.env:

OPENMETADATA_HOST=http://localhost:8585
OPENMETADATA_TOKEN=paste-your-token-here

Note: ANTHROPIC_API_KEY is not set here. Users bring their own key via the UI — your server never sees it.

Windows + Ollama users: A conflicting ANTHROPIC_BASE_URL system variable will break the connection. Fix it:
Remove-Item Env:ANTHROPIC_BASE_URL -ErrorAction SilentlyContinue

Step 6 — Seed demo lineage

cd backend
python3 setup_demo.py

This adds two missing lineage edges to make the full investigation chain work:

dim_address → fact_orders
fact_orders → orders dashboard

Expected output:

DataSheriff — Demo Setup
==================================================
Step 1: Verifying OpenMetadata connection
  OK  OpenMetadata 1.12.5 is running at http://localhost:8585

Step 2: Locating assets in sample data
  OK  Orders dashboard  : sample_looker.orders
  OK  fact_orders table : sample_data.ecommerce_db.shopify.fact_orders
  OK  fact_orders has 9 upstream nodes already in lineage
  OK  Quality tests     : 9 total, 3 already failing

Step 3: Adding lineage — fact_orders -> orders dashboard
  OK  Linked: sample_data.ecommerce_db.shopify.fact_orders -> sample_looker.orders
  OK  Linked: dim_address -> fact_orders

Step 4: Verifying final state
  OK  fact_orders upstream nodes : 10
  OK  fact_orders downstream     : 3 connection(s) to dashboard
  OK  Failing quality tests      : 3
       - diff_columns: Tables have 4 different columns...
       - diff_with_production: Found 3 different rows...
       - column_values_to_be_between_with_sample_rows: Found min=1001...

Step 7 — Start the backend

cd backend
uvicorn main:app --host 0.0.0.0 --port 8000

Verify:

curl http://localhost:8000/health
# {"status":"ok","service":"DataSheriff"}

Step 8 — Point the frontend at your local backend

Open frontend/src/App.jsx and update line 18:

// Change this (production URL):
const API_BASE = 'https://backend.riturajdey.dev'

// To this (your local backend):
const API_BASE = 'http://localhost:8000'

Step 9 — Start the frontend

cd frontend
npm install
npm run dev

Open http://localhost:5173

Step 10 — Add your Anthropic API key

Click ⚙️ Settings, enter your sk-ant-... key, and click Save Key. It's stored in your browser's localStorage only — it never leaves your device except as a request header.

You're ready to investigate.

✅ Verification Checklist

Component	How to check	Expected result
OpenMetadata	`curl http://localhost:8585/api/v1/system/version`	`{"version":"1.12.5",...}`
Airflow	Open http://localhost:8080	4 DAGs green ✅
Backend	`curl http://localhost:8000/health`	`{"status":"ok","service":"DataSheriff"}`
Frontend	Open http://localhost:5173	DataSheriff UI loads
Sample Data	Search `fact_orders` in OpenMetadata	Table with lineage visible
Quality Tests	View `dim_address` in OpenMetadata	3 failing tests visible

🎮 Using DataSheriff

Recommended demo queries

Query	What the agent finds
`The orders dashboard is showing wrong data`	Full 10-node lineage chain + dim_address root cause (3 failing tests)
`dim_address table is failing data quality checks`	3 failing tests directly on dim_address
`raw_customer table has incorrect data`	Traces lineage upstream → finds dim_address failures
`The fact_orders table has missing data`	Traverses fact_orders lineage → dim_address
`The payments dashboard is showing wrong numbers`	Honest null report — asset not found in OpenMetadata

The 8-step investigation workflow

Search — finds the asset in OpenMetadata by name
Traverse lineage — walks the upstream graph, up to 10+ nodes
Check quality tests — runs get_quality_tests on every table in the chain
Check pipeline runs — inspects Airflow execution history for pipeline nodes
Identify root cause — first node where quality tests show Failed
Find the owner — resolves the data owner to contact
Tag the asset — writes DataQuality.Failing governance tag in OpenMetadata
Generate report — structured incident report, every claim traceable to a tool call

What you'll see in the UI

Confidence bar — climbs 0% → 100% as investigation progresses
Live timeline — each tool call appears as it happens, with inputs shown
Lineage graph — interactive React Flow diagram; failing node highlighted in red
Incident report — severity badge, root cause, evidence, lineage path, recommended action
History dashboard — MTTR chart, severity trend, replay any past investigation

📁 Project Structure

DataSheriff/
├── seed_and_run.sh                    ← one-command setup
├── openmetadata-docker/
│   └── docker-compose-postgres.yml   ← OpenMetadata + Airflow stack
├── backend/
│   ├── agent.py                       ← Claude AI investigation agent
│   ├── main.py                        ← FastAPI server + SSE streaming
│   ├── mcp_server.py                  ← 6 MCP tools → OpenMetadata APIs
│   ├── setup_demo.py                  ← one-time lineage seeding script
│   ├── requirements.txt
│   ├── .env.example
│   └── .env                           ← OM token only
└── frontend/
    └── src/
        ├── App.jsx                    ← main app, SSE handler, tab routing
        ├── hooks/
        │   ├── useHistory.js          ← investigation history + MTTR stats
        │   └── useApiKey.js           ← localStorage API key management
        └── components/
            ├── InvestigationTimeline.jsx  ← live stream + confidence bar
            ├── LineageGraph.jsx           ← React Flow lineage visualization
            ├── ReportCard.jsx             ← incident report display
            ├── HistoryDashboard.jsx       ← MTTR history + severity charts
            └── SettingsPage.jsx           ← API key management UI

🔐 API Key Security Architecture

DataSheriff is zero-trust by design — the server operator never needs to manage Anthropic API keys:

User's browser
  └── localStorage: sk-ant-xxxx    ← only copy, never sent to server storage

  POST /investigate
  Header: X-Anthropic-Key: sk-ant-xxxx   ← sent per-request in header
        ↓
  FastAPI extracts header → passes to Claude agent
        ↓
  Anthropic API called with user's key
        ↓
  SSE stream → browser

The server only needs OPENMETADATA_TOKEN in .env. Each user pays for their own Claude usage. No shared keys, no credit risk, no liability.

🧩 MCP Tools Reference

All 6 tools are implemented as @mcp.tool() async Python functions in backend/mcp_server.py, invoked directly by the agent loop in backend/agent.py.

Tool	HTTP calls made	Purpose
`search_assets(query)`	`GET /api/v1/search/query`	Full-text search across tables, dashboards, pipelines, topics
`get_lineage(entity_id, entity_type)`	`GET /api/v1/lineage/{type}/{id}?upstreamDepth=3&downstreamDepth=1`	Walk upstream lineage graph up to 3 hops
`get_quality_tests(table_fqn)`	`GET /api/v1/dataQuality/testCases?entityLink=<#E::table::fqn>`	Fetch latest test results (pass/fail/aborted) per table
`get_pipeline_runs(pipeline_fqn)`	`GET /api/v1/pipelines/name/{fqn}` → `GET /api/v1/pipelines/{id}/status`	Resolve pipeline by FQN, then fetch last 10 run statuses
`get_asset_owner(entity_fqn, entity_type)`	`GET /api/v1/{entity_type}s/name/{fqn}?fields=owner`	Resolve owner (user or team) for any entity type
`tag_asset_failing(entity_fqn, entity_type)`	`GET /api/v1/{entity_type}s/name/{fqn}?fields=tags` → `PATCH /api/v1/{entity_type}s/{id}`	Idempotent: reads existing tags, skips if already tagged, otherwise writes `DataQuality.Failing` via JSON-Patch

All tools are async, fault-tolerant, and return structured JSON that the agent reasons over.

🛠️ Tech Stack

Layer	Technology
AI Agent	Claude Sonnet 4.6 (Anthropic)
Agent Protocol	MCP (Model Context Protocol) via FastMCP
Backend	Python 3.11, FastAPI, uvicorn
HTTP Client	httpx (async)
Streaming	Server-Sent Events (SSE)
Frontend	React 18, Vite, Tailwind CSS
Graph Visualization	React Flow
Charts	Recharts
Data Catalog	OpenMetadata 1.12.5
Pipeline Orchestration	Apache Airflow 2.x
Database	PostgreSQL
Search	Elasticsearch
Frontend Hosting	Cloudflare Workers
Backend Hosting	DigitalOcean Droplet

🔗 Built With

OpenMetadata — open source data catalog and governance platform
Anthropic Claude — AI backbone for the investigation agent
WeMakeDevs × OpenMetadata Hackathon 2026

📄 License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
backend		backend
frontend		frontend
openmetadata-docker		openmetadata-docker
.gitignore		.gitignore
readme.md		readme.md
seed_and_run.sh		seed_and_run.sh

Folders and files

Latest commit

History

Repository files navigation

DataSheriff — AI Data Incident Investigator

🚀 Try It Live — No Setup Required

🎯 The Problem — $2.5B Lost to Broken Data Pipelines Every Year

✨ What Makes It Stand Out

🧠 How It Works

🗺️ Architecture

🔗 OpenMetadata Integration — In Depth

🔑 Key Features

📋 Prerequisites

⚡ Quick Start (Recommended)

Step 1 — Clone the repository

Step 2 — Start OpenMetadata + Airflow

Step 3 — Load sample data via Airflow

Step 4 — Get your OpenMetadata JWT token

Step 5 — Configure the backend

Step 6 — Seed demo lineage

Step 7 — Start the backend

Step 8 — Point the frontend at your local backend

Step 9 — Start the frontend

Step 10 — Add your Anthropic API key

✅ Verification Checklist

🎮 Using DataSheriff

Recommended demo queries

The 8-step investigation workflow

What you'll see in the UI

📁 Project Structure

🔐 API Key Security Architecture

🧩 MCP Tools Reference

🛠️ Tech Stack

🔗 Built With

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages