Production incident investigation tool — multi-cloud log aggregation, signal extraction, and AI-driven root cause analysis.
Queries Alibaba Cloud SLS, Tencent Cloud CLS, Volcengine TLS, webhook-based workflow engines, and databases (MongoDB, SQL), then feeds normalized results into AI-driven root cause analysis.
pnpm installcp .env.example .env
# Edit .env with your cloud credentials, webhook URLs, and database connections.cp config/projects.example.json config/projects.json
# Edit config/projects.json to describe your services, log stores, and topology.Key fields per project:
vendor/queryBackend: Which cloud log service to use (sls,cls,tls,webhook)envs.<env>.sources: Log stores / topics to query, with architectural layer and purposedownstream: Which other projects this one calls (used for automated chain traversal)keywords: Words that identify this project in cross-project log mentionstaskPatterns:{type, regex}pairs for taskId format recognitionmultiEnvs: If set, a single--envquery expands across multiple env configs (e.g. multi-region prod)
Edit config/signal-patterns.json to customize which error patterns are classified as hard failures (e.g. timeout, network errors) and info failures (e.g. processing failed, callback failed). The file is pre-filled with sensible defaults — only edit if your services produce different error patterns.
Edit references/call-graph.md to describe your service topology, routing rules, and escalation paths.
# Query cloud logs
pnpm fetch-logs -- --project my-service --env prod --query "someTaskId AND ERROR" --hours 24
# Query with raw log data included (default JSON output strips raw to save tokens)
pnpm fetch-logs -- --project my-service --env prod --query "someTaskId AND ERROR" --json --include-raw
# Query webhook-based workflow engine
pnpm fetch-webhook -- --taskId xxx --json
# Look up record from MongoDB
pnpm fetch-mongo -- --query 12345 --collection myCollection --lookup-field userNo --json
# Look up record from MongoDB (test environment)
pnpm fetch-mongo -- --query 12345 --env test --json
# Look up record from SQL database
pnpm fetch-sql -- --query someValue --jsonUser report (ID + symptoms)
│
▼
┌─────────────────────────────────────────────┐
│ Script Layer │
│ fetch-logs / fetch-webhook / fetch-mongo │
│ ├─ Query log sources in parallel │
│ ├─ Normalize to unified schema │
│ ├─ Extract signals (hard failures, errors) │
│ ├─ Cluster & deduplicate logs │
│ └─ Generate analysis hints │
└─────────────────────────────────────────────┘
│ JSON output
▼
┌─────────────────────────────────────────────┐
│ Analysis Layer (AI) │
│ SKILL.md workflow │
│ ├─ Classify problem type │
│ ├─ Trace identifiers across services │
│ ├─ Iterate downstream until root cause │
│ └─ Generate customer-facing response │
└─────────────────────────────────────────────┘
| Variable | Purpose | Required by |
|---|---|---|
SLS_ACCESS_KEY_ID / SLS_ACCESS_KEY_SECRET |
Alibaba Cloud SLS | fetch-logs (SLS vendor) |
CLS_SECRET_ID / CLS_SECRET_KEY |
Tencent Cloud CLS | fetch-logs (CLS vendor) |
TLS_ACCESS_KEY_ID / TLS_ACCESS_KEY_SECRET |
Volcengine TLS | fetch-logs (TLS vendor) |
TLS_SESSION_TOKEN |
Volcengine TLS temp token | fetch-logs (optional) |
TLS_HOST |
Volcengine TLS endpoint | fetch-logs (TLS vendor) |
| Variable | Purpose | Required by |
|---|---|---|
WEBHOOK_API_URL |
Workflow query API endpoint | fetch-webhook |
WEBHOOK_ERROR_API_URL |
Workflow error detail endpoint | fetch-webhook (optional) |
WEBHOOK_TOKEN |
Auth token for webhook APIs | fetch-webhook |
fetch-mongo connects via env vars; collection, lookup field, and return fields are passed per query via CLI flags.
| Variable | Purpose |
|---|---|
MONGO_URI |
Production MongoDB connection string |
MONGO_DB |
Production database name |
TEST_MONGO_URI |
Test environment MongoDB connection string |
TEST_MONGO_DB |
Test environment database name |
fetch-sql connects via env vars; table, lookup field, and return fields are passed per query via CLI flags.
| Variable | Purpose |
|---|---|
SQL_HOST / TEST_SQL_HOST |
Database host |
SQL_PORT / TEST_SQL_PORT |
Database port |
SQL_USER / TEST_SQL_USER |
Database user |
SQL_PASSWORD / TEST_SQL_PASSWORD |
Database password |
SQL_DATABASE / TEST_SQL_DATABASE |
Database name |
SQL_DIALECT / TEST_SQL_DIALECT |
Database dialect (e.g. mysql, postgres) |
config/projects.json defines each project's log sources, cloud vendor, environments, downstream services, and identifier patterns. See config/projects.example.json for the full schema.