Skip to content

Reduce update churn and stabilize community IDs#822

Closed
FatahChan wants to merge 2 commits into
Graphify-Labs:v7from
FatahChan:fix/update-idempotency-churn
Closed

Reduce update churn and stabilize community IDs#822
FatahChan wants to merge 2 commits into
Graphify-Labs:v7from
FatahChan:fix/update-idempotency-churn

Conversation

@FatahChan

@FatahChan FatahChan commented May 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add graphify update --no-cluster and thread it through the watch rebuild path
  • make graphify update idempotent by skipping graph.json / GRAPH_REPORT.md rewrites when graph/report content is unchanged (ignoring commit-hash-only report drift)
  • stabilize clustering output with deterministic partition input ordering, seeded Leiden when supported, and overlap-based remapping of new communities to prior IDs
  • add regression tests for remapping stability, idempotent rebuild under cluster-ID flapping, and update --no-cluster behavior

Closes #741

Test plan

  • uv run --with pytest pytest tests/test_cluster.py tests/test_watch.py tests/test_cli_export.py
  • uv run python -m graphify update . && uv run python -m graphify update .
  • uv run --with pytest pytest (pre-existing baseline failures unrelated to this change remain)

Ahmad Fathallah and others added 2 commits May 12, 2026 02:23
Make `graphify update` idempotent by skipping output rewrites when graph/report content is unchanged, add `update --no-cluster`, and preserve community IDs across runs via overlap-based remapping with deterministic partition inputs.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use safe JSON serialization fallbacks for deterministic sort keys in clustering and graph canonicalization, and skip invalid community IDs with a stderr warning instead of raising during update rebuilds.

Co-authored-by: Cursor <cursoragent@cursor.com>
@FatahChan FatahChan closed this May 12, 2026
bgmbgm94 added a commit to bgmbgm94/graphify that referenced this pull request May 26, 2026
cluster-only re-runs Leiden clustering and then re-applies the existing
.graphify_labels.json by raw cid index, which causes labels to attach to
clusters whose members are unrelated to the label's original meaning
whenever the graph has changed between labeling and re-clustering.

Mirror the safety net already present in watch.py:_rebuild_code added in
Graphify-Labs#822 for the watch/update paths.

Adds a regression test that fails without the fix (label cids become
orphaned from graph.json community attributes after re-clustering).

Refs: Graphify-Labs#1027
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
safishamsi pushed a commit that referenced this pull request May 26, 2026
…1028)

cluster-only re-runs Leiden clustering and then re-applies the existing
.graphify_labels.json by raw cid index, which causes labels to attach to
clusters whose members are unrelated to the label's original meaning
whenever the graph has changed between labeling and re-clustering.

Mirror the safety net already present in watch.py:_rebuild_code added in
#822 for the watch/update paths.

Adds a regression test that fails without the fix (label cids become
orphaned from graph.json community attributes after re-clustering).

Refs: #1027

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Question — graphify update non-determinism produces ~11k-line diff per run on unchanged source: intended or addressable?

1 participant