Skip to content

Harden real-time re-indexing pipeline#65

Merged
jpr5 merged 4 commits into
mainfrom
blitz/webhook-hardening/integration
May 13, 2026
Merged

Harden real-time re-indexing pipeline#65
jpr5 merged 4 commits into
mainfrom
blitz/webhook-hardening/integration

Conversation

@jpr5
Copy link
Copy Markdown
Contributor

@jpr5 jpr5 commented May 13, 2026

Summary

  • Fixed silent 413 rejections on large GitHub webhook payloads — express.raw() defaulted to 100KB, now 25mb
  • Raised health monitor threshold from 6h to 25h to match daily reindex cadence
  • Added queue-level job dedup — prevents redundant reindex jobs for the same repo/source
  • Added concurrent drain — independent repos now reindex in parallel (max 3 concurrent, per-repo serialization)
  • Added persistent webhook delivery tracking — new webhook_deliveries DB table logs every webhook receipt with decision/reason; stats exposed on /health
  • Extended monitoring to aimock-docs — third Pathfinder instance now in health monitor workflow
  • Hardened setup-webhooks.sh--config flag for multi-instance use, PATCH includes events[]=push, REPOS array initialized

Why

The root cause of spurious "commit drift" health alerts was express.raw() silently dropping GitHub push payloads larger than 100KB with a 413 — no logs on the Pathfinder side. Webhook-triggered re-indexing was already built but not firing for large pushes. The delivery tracking table ensures this class of silent failure is visible going forward.

Test plan

  • tsc --noEmit clean
  • 1739 tests pass (4 pre-existing failures in cli.test.ts/faq-txt-endpoint.test.ts)
  • 12 new orchestrator dedup/parallel tests
  • 18 new webhook delivery tracking tests
  • CR converged: Round 1 (5 bucket-a findings fixed) → Confirmation round (0 bucket-a)

@jpr5 jpr5 force-pushed the blitz/webhook-hardening/integration branch from a2dd616 to 7e87abf Compare May 13, 2026 19:24
jpr5 added 4 commits May 13, 2026 12:29
express.raw() defaulted to 100KB, silently rejecting large GitHub push
payloads with 413. Raised to 25mb on all 3 webhook routes.

Health monitor commit-drift threshold raised from 6h to 25h to match
daily reindex cadence. Added aimock-docs as third monitored instance.

Added webhook delivery stats to /health endpoint for observability.
Queue-level dedup prevents redundant jobs for the same repo/source.
Concurrent drain replaces serial processing — independent repos now
reindex in parallel (configurable max, default 3). Per-repo mutex
serializes same-repo jobs. do-while re-check after final wait
prevents jobs queued during drain from being missed.

Wired webhook_deliveries cleanup into nightly reindex cycle (30-day
retention, mirrors query_log pattern).
New webhook_deliveries table records every webhook receipt with source,
event type, decision (queued/ignored/error), reason, and payload size.
All 3 handlers (GitHub, Slack, Discord) log at every decision point.
Fire-and-forget pattern ensures tracking never blocks webhook processing.
30-day retention cleanup wired into nightly reindex cycle.
Added --config flag to specify which deploy YAML to read (defaults to
mcp-docs.yaml for backward compat). Fixed PATCH to include events[]=push
so existing webhooks get correct event subscriptions. Initialized REPOS
array before conditional to prevent unbound variable under set -u.
@jpr5 jpr5 force-pushed the blitz/webhook-hardening/integration branch from 7e87abf to 2ff2d50 Compare May 13, 2026 19:29
@jpr5 jpr5 merged commit 08d0c24 into main May 13, 2026
4 checks passed
@jpr5 jpr5 deleted the blitz/webhook-hardening/integration branch May 13, 2026 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant