Skip to content

Continuous graph updates via Git webhook / branch watcher #614

@gkorland

Description

@gkorland

Summary

Add the ability for code-graph to stay in sync with a repository by automatically updating the graph on each commit to a tracked branch (e.g. main). Instead of re-indexing the entire codebase on every change, the system should compute a diff-based incremental update — only processing files that were added, modified, or deleted in the commit.

Motivation

Currently code-graph requires a full re-index to reflect codebase changes. For large repositories this is slow and wasteful. Continuous incremental updates would make code-graph viable as a live knowledge source for AI-assisted development tools (e.g. Claude Code via MCP), CI pipelines, and developer dashboards — where the graph must reflect the latest state of main at all times.

Proposed Behavior

  1. Trigger: On each push/merge to the tracked branch, the system receives a notification (Git webhook, polling, or filesystem watch).
  2. Diff extraction: Determine which files were added, modified, or deleted in the commit(s) since the last indexed commit SHA.
  3. Incremental graph update:
    • Deleted files — remove all nodes and edges originating from those files.
    • Modified files — remove existing nodes/edges for the file, re-parse, and re-insert.
    • Added files — parse and insert new nodes and edges.
    • Cross-file edges — recompute edges (calls, imports, inheritance) that involve any touched file, and prune stale edges whose targets no longer exist.
  4. Bookmark: Persist the last successfully indexed commit SHA so the system can resume correctly after restarts or failures.

Design Considerations

  • Atomicity — Graph updates for a single commit should be applied as a transaction so queries never see a half-updated state. Consider wrapping the delete + re-insert cycle in a FalkorDB transaction or using a shadow-graph swap approach for larger changesets.
  • Batch commits — If the watcher falls behind (e.g. service was down), it should be able to squash multiple commits into a single cumulative diff rather than replaying one-by-one.
  • Trigger modes — Support at least two modes:
    • Webhook — HTTP endpoint that receives a GitHub/GitLab push event payload.
    • Poll — Periodically check the remote branch HEAD and update if it has advanced.
    • (Optional) Filesystem watch — for local-only setups using inotify/fswatch on a bare repo.
  • Concurrency — Graph reads (MCP queries, API requests) should not be blocked during an update. Consider read/write isolation or short lock windows.
  • Idempotency — Re-processing the same commit SHA should be a no-op.
  • Logging & observability — Each update cycle should log: trigger commit SHA, files affected, nodes/edges added/removed, duration, and any parse errors (with the update continuing past unparseable files).

Suggested Implementation Phases

Phase 1 — Core incremental update engine

  • Given a before/after commit SHA pair, compute the file diff, update the graph accordingly, and persist the new bookmark.
  • Unit-testable in isolation (no webhook needed, just call with two SHAs).

Phase 2 — Trigger integration

  • Add the webhook HTTP endpoint (GitHub/GitLab push event format).
  • Add the poll-based watcher as an alternative.
  • Configuration: tracked branch name, poll interval, webhook secret.

Phase 3 — Robustness & observability

  • Batch catch-up for missed commits.
  • Metrics endpoint or structured logs (commits processed, lag, errors).
  • Graceful handling of force-pushes / rebases (detect non-fast-forward and trigger a full re-index as fallback).

Acceptance Criteria

  • Pushing a commit to main that adds a new function results in the corresponding node and call-edges appearing in the graph within a configurable time window (default < 30s for webhook mode).
  • Renaming/moving a function removes the old node and creates a new one with correct edges.
  • Deleting a file removes all its nodes and any dangling edges.
  • The system recovers cleanly after a restart, picking up from the last indexed SHA.
  • A full re-index can still be triggered manually as a fallback.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions