fix(loader): preserve blueprint_name and tags on file rename#219
Conversation
When a file is renamed in Dropbox (same MD5, new path/basename), the incremental fixtures loader was leaving blueprint_name + search_text stale and re-deprecating the row in the same load — which then stripped the freshly re-inserted tags and images via _handle_deprecation. The rename branch in _handle_addition now also resyncs blueprint_name and search_text to the new file basename and clears the deprecated flag. The blueprint id is recorded so the subsequent deprecation pass in the same load skips it instead of tombstoning the live row. update_blueprint gains search_text in its allowed-fields list so the loader can push the recomputed value through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/gemini review |
devonjones
left a comment
There was a problem hiding this comment.
Code Review (Claude)
I've reviewed the changes in this PR. Here's my analysis:
Critical Issues
None. The fix is correct and addresses the documented corruption pattern.
Medium Issues
1. Latent bug: _handle_modification does not update search_text when tags change (openforge/db/fixtures/incremental.py:1029-1069)
This PR fixes the rename path, but the same class of bug exists in _handle_modification. The modify path is reached when MD5 and full_name are unchanged but tags, images, modified-time, size, or config differ (_has_significant_changes, lines 620-718). Tags feed into search_text via words in _blueprint_search_text. When tags change in modify, _munge_blueprint produces a bp_data dict with no search_text key, so update_blueprint's allow-list silently drops it (since the new search_text field is allow-listed but not present in the data) and the row's search_text stays stale.
This isn't introduced by this PR — it's pre-existing — but adding search_text to update_blueprint's allow-list now makes it trivially fixable here. Consider mirroring the rename branch's approach in _handle_modification:
# In _handle_modification, after _munge_blueprint:
words = self._get_words(modified_item)
bp_data["search_text"] = blueprint_sql._blueprint_search_text(bp_data, words)If you'd rather scope this PR tightly to the rename bug, please file a follow-up issue — search consistency on tag edits is the same shape of corruption for users (search misses post-edit), just less visible than the rename case because the row isn't deprecated.
2. No test coverage for the rename path (tests/test_incremental_fixtures.py, tests/test_incremental.py)
I checked both incremental test files — neither has a test that exercises the same-MD5-rename branch in _handle_addition (no matches for rename or _renamed_blueprint_ids). Given the bug here corrupted ~38 production rows and was discovered by accident, this branch is worth pinning down. A unit test that:
- Inserts a blueprint with full_name
tiles/foo/atzlan-1.stland some tags/images. - Runs
apply_incremental_changeswith a fixture containing the same MD5 but full_nametiles/foo/aztlan-1.stl. - Asserts: one live row (not two),
blueprint_name = "aztlan-1.stl",deprecated = false, tags + images present,search_textcontainsaztlan.
would lock in both the rename-resync fix and the deprecation-skip behavior. The PR description also lists this as an unchecked test-plan item — the fix is small enough that I'd push for the test to land alongside it.
3. Reaching across module boundary into blueprint_sql._blueprint_search_text (openforge/db/fixtures/incremental.py:924)
Calling a leading-underscore private function from another module is a convention break (the project's own CLAUDE.md documents _function_name as private). Two cleaner options:
- Promote
_blueprint_search_texttoblueprint_search_text(public). It's a pure function with no side effects, and it's now used by two modules. - Better: wrap the rename update inside a new helper in
blueprint_sql(e.g.,rename_blueprint(curs, blueprint_id, new_full_name, words)) that handles the search_text recomputation internally. This keepsincremental.pyfrom having to know about the search_text computation shape and prevents the call sites from drifting (cf. issue #1 above — if both modify and rename used the same helper, the latent bug would be fixed in one place).
I'd lean toward option 2 since it co-locates "what does it mean to update a blueprint's name" in one module.
Minor Issues
1. _renamed_blueprint_ids reset location is asymmetric with current_fixture_files (openforge/db/fixtures/incremental.py:398, 760)
current_fixture_files is reset in compare_fixture_data (the comparison phase). _renamed_blueprint_ids is reset in _apply_changes_with_cursor (the apply phase). Both are correct in the current call patterns, but the asymmetry means a future caller who runs compare → compare → apply would get the right behavior for one (current_fixture_files reflects the second compare) and the wrong behavior for the other (renamed_ids would be reset before the second compare's renamed rows are processed... actually the set is populated in _handle_addition during apply, not compare, so this is fine). Worth a one-line comment near the __init__ set to document that this state is per-apply and gets reset at the top of _apply_changes_with_cursor.
2. Comment style nit (openforge/db/fixtures/incremental.py:110-113, 808-810)
Inline em-dashes (—) are used in two new comments. Consistent with some existing comments in the file but not all. Not worth changing, just flagging in case the project has a style preference.
Positive Observations
- Root-cause fix, not symptom suppression. The PR description traces the corruption through three interacting code paths (rescue-branch leaving stale fields,
_find_missing_blueprintsflagging the old name,_handle_deprecationstripping tags/images). The fix addresses the actual cause in each path rather than papering over it. - Per-load reset of
_renamed_blueprint_ids. Resetting at the top of_apply_changes_with_cursor(line 760) correctly prevents leakage across fixture files when a singleIncrementalFixturesLoaderinstance processes a directory inload_fixtures(openforge/db/fixtures/__init__.py:127-145). Good defensive instinct. - Search-text input shape consistency.
_blueprint_search_textis called with{"blueprint_name": new_file_name}— matching howinsert_blueprintcalls it after_munge_blueprintsetsbp["blueprint_name"] = data["file_metadata"]["file"](the basename). Consistent indexing semantics between insert and rename — an easy thing to get wrong. - Allow-list extension is minimal. Just adding
search_texttoupdate_blueprint'sfieldslist is the right surface area; no behavior change for existing callers since they don't pass it. - Dry-run safety.
apply_incremental_changesreturns early ondry_run=True(line 734-737) before_apply_changes_with_cursor, so the rename code path is never exercised in dry-run —_renamed_blueprint_idsstays empty, and no DB writes happen. Skipping deprecation is moot since deprecation also doesn't run. Correct by construction. - Comment explaining the deprecation skip. The inline comment at lines 808-817 explains why the skip exists, not just what it does. Future maintainers will need this — it's the kind of bug that's invisible without context.
Recommendations Summary
Must fix before merge:
None — the fix is correct as-is.
Should fix:
- Add a test for the same-MD5-rename branch (item Medium #2). The PR description even has it as an unchecked test-plan box; this is the regression test for the bug being fixed.
- Decide on
_blueprint_search_textaccess (item Medium #3) — either promote to public or wrap in ablueprint_sql.rename_blueprinthelper. The latter also opens the door to fixing item Medium #1 in one place.
Nice to have:
- File a follow-up (or fold into this PR) for the
_handle_modificationsearch_text staleness on tag changes (item Medium #1). - One-line comment in
__init__documenting that_renamed_blueprint_idsis per-apply state (item Minor #1).
Overall Assessment
Solid bug fix that correctly addresses a real production data-corruption issue. The diagnosis in the PR description is excellent and the fix matches it. My main asks are a regression test for the rename branch and a small refactor to avoid the cross-module private-function call — neither blocks correctness. The latent _handle_modification search_text staleness deserves a follow-up regardless.
Addresses review feedback on PR #219: - Promote _blueprint_search_text -> blueprint_search_text (public) to avoid cross-module access to a leading-underscore private. - _handle_modification now recomputes search_text alongside other column updates, so tag edits keep search_text consistent (latent bug, same shape as the rename one). - Add three regression tests for the same-MD5 rename branch: blueprint_name+search_text+deprecated resync, deprecation skip for renamed ids, and per-load reset of _renamed_blueprint_ids. - Document _renamed_blueprint_ids as per-apply state. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Response to Claude ReviewThanks for the thorough review. Pushed Medium #1 (latent: Medium #2 (no test coverage for the rename path): Added three regression tests in
Medium #3 (cross-module access to Minor #1 ( Minor #2 (em-dash style): Won't fix — matches existing comment style in the file. |
- _munge_blueprint reads file_metadata['file'], so add it to the fixture item used in the rename test. - _apply_changes_with_cursor runs _link_deprecated_to_successors at the end which calls cursor.fetchall(); stub it. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
/gemini review |
devonjones
left a comment
There was a problem hiding this comment.
Code Review (Claude, Round 2)
I've reviewed the changes addressing round-1 feedback. Here's my analysis.
Round 1 Resolution
-
Latent
_handle_modificationsearch_text bug — RESOLVED.incremental.py:1056-1060now recomputessearch_textfrombp_data(the_munge_blueprintoutput) before callingupdate_blueprint._munge_blueprintalways populatesblueprint_name(fromdata["name"]for config blueprints, or fromfile_metadata["file"]for file-based blueprints —utils.py:43, 52-53), soblueprint_search_textworks correctly for both types. The inline comment explaining why the recompute is necessary is a nice touch. -
Test coverage for the rename branch — RESOLVED. Three new tests in
tests/test_incremental_fixtures.py:524-635:test_rename_resyncs_blueprint_name_and_clears_deprecated— asserts the four corruption-prone fields (full_name,file_name,blueprint_name,deprecated) plussearch_textare all in the update payload, and thatsearch_textcontains the new tokens (aztlan,floor) and not the old (atzlan).test_renamed_id_skips_deprecation— assertsmark_blueprint_deprecated,delete_all_blueprint_tags, anddelete_images_for_blueprintare NOT called when the id is in the renamed set.test_apply_changes_resets_renamed_ids_per_load— asserts the set is empty after an apply pass, even when seeded with stale state.
These test observable behavior (DB calls, post-state) rather than internal mechanics. The first test directly pins the original bug shape (staleatzlannot appearing insearch_text).
-
_blueprint_search_text→blueprint_search_text— RESOLVED. Promoted inblueprints.py:97, used internally byinsert_blueprintat line 107, and externally byincremental.py:927, 1057. Confirmed via grep: no remaining references to the old_blueprint_search_textprivate name anywhere in the codebase, and no otherblueprint_sql._*private access from outside the module.
New Issues
None blocking. One minor note:
- Test brittleness is acceptable but worth flagging.
test_rename_resyncs_blueprint_name_and_clears_deprecatedpatches six SQL boundary functions. This is unavoidable without a real DB (the conftest'sclean_tablesautouse fixture forces Postgres for unit tests), and the assertions still target observable behavior (the contents of the update payload). If the rename branch is later refactored to call a single helper that does the update + tag/image refresh, several patch targets will need updating — but the assertion shape would survive. Acceptable trade-off.
Other Observations
-
.beads/.gitignoreaddition is correct. The two added entries (.jsonl.lock,.migration-hint-ts) match the actual untracked files visible in the workspace (.beads/.jsonl.lock,.beads/.migration-hint-ts) and are runtime/lock files that shouldn't be tracked. -
Audit of all
update_blueprintcallers inincremental.pyis consistent:- Line 937 (rename branch): sets
search_textexplicitly. Correct. - Line 991 (consolidated_paths only): doesn't change name/tags, no
search_textneeded. Correct. - Line 1060 (modification): now sets
search_text. Correct.
No other path touches name/tags without refreshing
search_text. - Line 937 (rename branch): sets
-
_renamed_blueprint_idslifecycle is well-documented. The class-level docstring (lines 110-116) and the reset comment at line 762 make the per-apply scoping unambiguous, which matters because the loader instance can be reused across fixtures in a directory walk. -
Config-blueprint
search_textrecompute is safe._munge_blueprintsetsblueprint_name = data.get("name")for config blueprints, and_compare_single_itemalready skips items without a name (line 494-500), soblueprint_search_textwill never be handed aNoneblueprint_namein practice.
Overall Assessment
Ready to merge. All three round-1 items are properly resolved (not just superficially), no new issues introduced, and the regression tests pin down the original bug shape concretely enough that a future regression would fail loudly.
Response to Claude Round 2Thanks for the audit. All three round-1 items confirmed resolved with no new issues introduced. Will let one more reviewer round run to catch any final feedback before asking about merge. |
|
/gemini review |
Summary
Fixes a long-standing bug in the incremental fixtures loader where renaming a file in Dropbox (same MD5, new path/basename) would leave the database row in a corrupt state: stale
blueprint_name, stalesearch_text, no tags, no images, anddeprecated = true.This was discovered while loading the new aztlan series. A typo'd filename (
atzlan→aztlan) was caught and renamed in Dropbox, then the next fixture load corrupted all 27 affected floor rows. Investigation found the same pattern across other historical renames (5 dungeon_stone → aztlan column rows, 1 aztlan corner floor renamed 4x4 → 2x2, 7 building_facades gable rows, 25 rough_stone_ruined rows — 38 rows total).What was happening
In
_handle_addition's rename-within-same-fixture branch (incremental.py), when an MD5 conflict was rescued and detected as a rename:full_nameandfile_namewere updated —blueprint_nameandsearch_textwere left pointing at the old file._handle_deprecation(because_find_missing_blueprintssaw the oldfull_namewas no longer in the fixture), which setdeprecated = trueand deleted the freshly re-inserted tags and images.Compounding this,
update_blueprintdidn't includesearch_textin its allow-list, so even fixing the rename branch wouldn't have been enough on its own.Changes
_handle_additionrename branch now also updatesblueprint_name, recomputessearch_textfrom the new basename + tag words, and clearsdeprecated._renamed_blueprint_idsset;_handle_deprecationskips ids in this set so renamed rows aren't tombstoned in the same load.update_blueprint's field list gainssearch_text.Test plan
pytest tests/test_incremental_fixtures.pypassesaztlan.jsonagainst a non-prod env and verify a renamed file ends up with:blueprint_name = file_name,deprecated = false, tags + images present,search_textcontaining the new tokensNote: prod data was already SQL-patched out of band (38 rows: synced
blueprint_nametofile_name, cleareddeprecated, fixedsearch_textper category). The aztlan rows that lost tags/images still need a fixture re-upload; this PR is the prerequisite so that re-upload doesn't re-corrupt them.🤖 Generated with Claude Code