-
Notifications
You must be signed in to change notification settings - Fork 46
Open
Description
Summary
Add the ability for code-graph to stay in sync with a repository by automatically updating the graph on each commit to a tracked branch (e.g. main). Instead of re-indexing the entire codebase on every change, the system should compute a diff-based incremental update — only processing files that were added, modified, or deleted in the commit.
Motivation
Currently code-graph requires a full re-index to reflect codebase changes. For large repositories this is slow and wasteful. Continuous incremental updates would make code-graph viable as a live knowledge source for AI-assisted development tools (e.g. Claude Code via MCP), CI pipelines, and developer dashboards — where the graph must reflect the latest state of main at all times.
Proposed Behavior
- Trigger: On each push/merge to the tracked branch, the system receives a notification (Git webhook, polling, or filesystem watch).
- Diff extraction: Determine which files were added, modified, or deleted in the commit(s) since the last indexed commit SHA.
- Incremental graph update:
- Deleted files — remove all nodes and edges originating from those files.
- Modified files — remove existing nodes/edges for the file, re-parse, and re-insert.
- Added files — parse and insert new nodes and edges.
- Cross-file edges — recompute edges (calls, imports, inheritance) that involve any touched file, and prune stale edges whose targets no longer exist.
- Bookmark: Persist the last successfully indexed commit SHA so the system can resume correctly after restarts or failures.
Design Considerations
- Atomicity — Graph updates for a single commit should be applied as a transaction so queries never see a half-updated state. Consider wrapping the delete + re-insert cycle in a FalkorDB transaction or using a shadow-graph swap approach for larger changesets.
- Batch commits — If the watcher falls behind (e.g. service was down), it should be able to squash multiple commits into a single cumulative diff rather than replaying one-by-one.
- Trigger modes — Support at least two modes:
- Webhook — HTTP endpoint that receives a GitHub/GitLab push event payload.
- Poll — Periodically check the remote branch HEAD and update if it has advanced.
- (Optional) Filesystem watch — for local-only setups using inotify/fswatch on a bare repo.
- Concurrency — Graph reads (MCP queries, API requests) should not be blocked during an update. Consider read/write isolation or short lock windows.
- Idempotency — Re-processing the same commit SHA should be a no-op.
- Logging & observability — Each update cycle should log: trigger commit SHA, files affected, nodes/edges added/removed, duration, and any parse errors (with the update continuing past unparseable files).
Suggested Implementation Phases
Phase 1 — Core incremental update engine
- Given a before/after commit SHA pair, compute the file diff, update the graph accordingly, and persist the new bookmark.
- Unit-testable in isolation (no webhook needed, just call with two SHAs).
Phase 2 — Trigger integration
- Add the webhook HTTP endpoint (GitHub/GitLab push event format).
- Add the poll-based watcher as an alternative.
- Configuration: tracked branch name, poll interval, webhook secret.
Phase 3 — Robustness & observability
- Batch catch-up for missed commits.
- Metrics endpoint or structured logs (commits processed, lag, errors).
- Graceful handling of force-pushes / rebases (detect non-fast-forward and trigger a full re-index as fallback).
Acceptance Criteria
- Pushing a commit to
mainthat adds a new function results in the corresponding node and call-edges appearing in the graph within a configurable time window (default < 30s for webhook mode). - Renaming/moving a function removes the old node and creates a new one with correct edges.
- Deleting a file removes all its nodes and any dangling edges.
- The system recovers cleanly after a restart, picking up from the last indexed SHA.
- A full re-index can still be triggered manually as a fallback.
Reactions are currently unavailable