Skip to content

fix(build_merge): replace re-extracted files instead of accumulating stale edges#1344

Merged
safishamsi merged 1 commit into
Graphify-Labs:v8from
RelywOo:fix/build-merge-replace-changed-file-edges
Jun 17, 2026
Merged

fix(build_merge): replace re-extracted files instead of accumulating stale edges#1344
safishamsi merged 1 commit into
Graphify-Labs:v8from
RelywOo:fix/build-merge-replace-changed-file-edges

Conversation

@RelywOo

@RelywOo RelywOo commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Problem

build_merge merges the loaded graph (base) with new_chunks and only collapses exact-duplicate edges via dedupe_edges. When a file changed, build() merged its old and new contributions for the same source_file, so any node/edge that disappeared from the file's new version survived forever — stale edges accumulate across every incremental extract update. Only fully deleted files were cleaned (via prune_sources); changed files were never pruned.

Repro: build a graph where changed.md yields A,B + edge A->B, then build_merge with a re-extraction of changed.md yielding A,C + edge A->C. The old A->B edge and orphan B survive alongside the fresh A->C.

Fix

Before merging, drop from the loaded base every node/edge whose source_file is present in new_chunks, so a re-extracted file replaces its prior contribution.

Docstring updated: build_merge now documents replace-on-re-extract semantics instead of "only grows".

Test

Adds test_build_merge_replaces_changed_file_stale_edges:

  • stale node/edge from the old version dropped,
  • fresh node/edge present,
  • unchanged file untouched,
  • win32 absolute new-chunk path matches relative stored path.

Full tests/test_build.py green (33 passed).

…stale edges

build_merge merged the loaded graph (base) with new_chunks and only collapsed
exact-duplicate edges via dedupe_edges. When a file CHANGED, build() merged its
old and new contributions for the same source_file, so any node/edge that
disappeared from the file's new version survived forever — stale edges
accumulated across every incremental `extract` update. Only fully DELETED files
were cleaned (via prune_sources); changed files were never pruned.

Fix: before merging, drop from the loaded base every node/edge whose
source_file is present in new_chunks, so a re-extracted file REPLACES its prior
contribution. Brand-new files aren't in base (no-op); deleted files are still
handled by prune_sources. Matched in both raw and _norm_source_file form so an
absolute win32 path in new_chunks still matches the relative posix source_file
stored in the graph (mirrors the prune path, Graphify-Labs#1007).

Adds a regression test covering: stale node/edge dropped, fresh node/edge
present, unchanged file untouched, and win32 absolute new-chunk path matching.
@safishamsi safishamsi merged commit fd463de into Graphify-Labs:v8 Jun 17, 2026
safishamsi pushed a commit that referenced this pull request Jun 18, 2026
…eleted-only

Completes the source_file convention fix begun in #1344 (build_merge
replace-on-re-extract) and #1361 (pass root= to build_merge in the --update
runbook). Two gaps still let the full build and incremental --update emit
different source_file bases for the same file, so the source_file-keyed replace
missed and duplicates accumulated:

1. extraction-spec(.md/-compact.md): the subagent's source_file slot was an
   unpinned "relative/path", so it invented a base per run (and the node id,
   derived from the same path, drifted too). Pin it to the verbatim FILE_LIST
   path so _norm_source_file(root) canonicalizes every run identically.

2. core.md: the full build called build_from_json WITHOUT root=, so #1361's
   update-side root= had no matching base on the full-build side. Pass
   root='INPUT_PATH' at both sites (Step 4 export, Step 5 report) so the full
   build and --update relativize to the same base.

update.md prune_sources = deleted only. Changed files are replaced by build_merge
(#1344); once root= aligns the bases, leaving `changed` in prune_sources would
delete the freshly re-extracted nodes.

Engine (build.py) unchanged. Regenerated all skill artifacts via
tools/skillgen/gen.py. Adds test_build_merge_root_collapses_convention_drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
safishamsi added a commit that referenced this pull request Jun 18, 2026
…tops duplicating

After #1361 added root= to the --update build_merge call, build_merge's prune
(which runs after the merge) began matching freshly re-extracted changed-file
nodes and deleting them — a regression in 0.8.41 where --update on a changed
file wiped its nodes. This drops `changed` from prune_sources (replace-on-
re-extract from #1344 already reconciles changed files), passes root= to the
full build's build_from_json so its node-key base matches the update side, and
pins the extraction-spec source_file to the verbatim path so the two runs never
drift. Re-blessed skillgen expected/ snapshots.

Co-Authored-By: RelywOo <RelywOo@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
galshubeli added a commit to galshubeli/graphify that referenced this pull request Jun 18, 2026
Resolve conflicts after upstream advanced (Java records, Swift relationships,
build_merge replace-on-re-extract Graphify-Labs#1344, dedup fixes, HTML XSS hardening, etc.):

- build.py build_merge: adopt v8's replace-on-re-extract (Graphify-Labs#1344/Graphify-Labs#1007) — drop a
  changed file's prior source_file contribution before merging so stale
  nodes/edges don't accumulate — but keep loading the base from the FalkorDB
  store (not graph.json), and keep graph_name/uri binding.
- skillgen update fragment: keep the FalkorDB _store/open_store binding for the
  --update build_merge call; fold in v8's root= prune-relativization note (Graphify-Labs#1361).
  Regenerated skill artifacts + blessed goldens.
- test_build.py: keep the store-based tests; v8's new graph.json/build_merge(path)
  tests don't apply to the store-based build_merge.

Verified: package imports, zero runtime networkx, 118 merge-relevant tests pass
(1 env-only tree-sitter fail), skillgen --check clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
antonioscarinci added a commit to antonioscarinci/graphify that referenced this pull request Jun 27, 2026
Combines v8's features (uv/pipx detection, graphify-out/ convention,
FalkorDB, Step 4.5 health check, directed-graph support, shrink-guard,
build_merge --update) with windows-skill-fix's approach (PowerShell
throughout, external windows-scripts/*.py, fail-fast, incremental
--update fix, Windows encoding fixes).

Key changes:
- windows-scripts/: all 34 scripts ported to graphify-out/ paths and
  v8 API (root=, cache_root=, directed=, build_from_json, etc.)
- detect_incremental.py: propagates all_files so manifest covers full
  corpus on --update, not just changed files
- check_graph_health.py: new script wrapping v8's diagnostics step
- skill-windows.md: rewritten for PowerShell with $GRAPHIFY_SCRIPTS
  loaded from file each step; --update uses inline build_merge() so
  edge direction is preserved (Graphify-Labs#801, Graphify-Labs#1344, Graphify-Labs#1361)
- export.py: replace None attribute values before nx.write_graphml to
  avoid AttributeError on None-valued node/edge properties
- __main__.py: add --answer-file to graphify save-result to bypass
  Windows command-line length limit for long answers
- .gitattributes: add line-ending normalization rules for .py/.sh (LF)
  and .bat/.cmd (CRLF)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants