Skip to content

Add index health monitor workflow#50

Merged
jpr5 merged 2 commits intomainfrom
index-health-monitor
Apr 27, 2026
Merged

Add index health monitor workflow#50
jpr5 merged 2 commits intomainfrom
index-health-monitor

Conversation

@jpr5
Copy link
Copy Markdown
Contributor

@jpr5 jpr5 commented Apr 27, 2026

Summary

Adds a GitHub Actions cron workflow that probes both Pathfinder production instances every 4 hours for indexing health. Single new file, zero changes to application code.

  • Liveness check via /health endpoint on mcp.copilotkit.ai and mcp.pathfinder.copilotkit.dev
  • Per-source error detection for mapped sources (docs, code, ag-ui-docs, ag-ui-code, pathfinder-docs)
  • Commit drift detection: compares indexed SHA against GitHub HEAD with 6h staleness threshold
  • Chunk floor validation: 1000 for copilotkit-docs, 50 for pathfinder-docs
  • Notification state machine: green/red transitions notify, green-to-green silent, red-to-red rate-limited 2x/24h
  • State persistence via Actions cache (immutable save with run_id, prefix-match restore)
  • Slack alerts via SLACK_WEBHOOK secret with instance details and run URL
  • workflow_dispatch inputs for force_notify and dry_run

Why

We had no automated way to detect when Pathfinder indexes go stale, lose data (chunk count drop), or enter error state. This workflow closes that gap with minimal infrastructure — just a GitHub Action and a Slack webhook.

Test plan

  • Trigger via workflow_dispatch with dry_run=true, verify probes run and no notifications sent
  • Trigger via workflow_dispatch with force_notify=true, verify Slack message received
  • Verify cron schedule fires every 4 hours
  • Confirm SLACK_WEBHOOK secret is configured in repo settings

jpr5 added 2 commits April 27, 2026 15:49
GitHub Actions cron workflow (every 4h) that probes both Pathfinder
production instances for indexing health:

- Liveness check via /health endpoint
- Per-source error detection (only checks mapped sources)
- Commit drift detection (indexed SHA vs GitHub HEAD, 6h staleness)
- Chunk floor validation (1000 copilotkit-docs, 50 pathfinder-docs)
- Notification state machine (green/red transitions, 2x/24h rate limit)
- State persistence via Actions cache (save with run_id, prefix restore)
- Slack alerts with instance details and run URL
- workflow_dispatch inputs: force_notify and dry_run

Instances: mcp.copilotkit.ai (4 sources), mcp.pathfinder.copilotkit.dev
(1 source). Source-to-repo mapping: docs+code to CopilotKit/CopilotKit,
ag-ui-docs+ag-ui-code to ag-ui-protocol/ag-ui, pathfinder-docs to
CopilotKit/pathfinder.
@jpr5 jpr5 merged commit 41ba94c into main Apr 27, 2026
5 checks passed
@jpr5 jpr5 deleted the index-health-monitor branch April 27, 2026 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant