Importer perf: derive comicbox delete_keys from USED set#633
Merged
ajslater merged 1 commit intov1.11-performancefrom Apr 28, 2026
Merged
Importer perf: derive comicbox delete_keys from USED set#633ajslater merged 1 commit intov1.11-performancefrom
ajslater merged 1 commit intov1.11-performancefrom
Conversation
The codex importer already passes a hand-curated delete_keys set to comicbox so the worker skips parse work for top-level fields the importer doesn't consume. Two issues with the hand-curation: 1. cover_image was never added to delete_keys despite never being read by codex — comicbox parsed it on every comic and dropped it downstream. 2. Future comicbox releases that add a new top-level schema field would silently start coming through as parsed-but-unused work until someone noticed and updated delete_keys. Move USED_COMICBOX_FIELDS to settings/__init__.py as the single source of truth for the codex<->comicbox contract, and derive _COMICBOX_DELETE_KEYS programmatically from ComicboxYamlSubSchema._declared_fields - USED_COMICBOX_FIELDS. New comicbox fields default to deleted (safe — codex maintainer adds to USED if needed); cover_image is now correctly included in the set. aggregate_path.py imports the constant from settings instead of maintaining its own copy. The runtime _transform_metadata filter still uses the same set, so the codex-side post-extract filter remains the second-line backstop. Net delete_keys: 12 -> 13 entries (adds cover_image). No other behavior change; this is the field-projection cleanup half of tasks/importer-perf/06-comicbox-side.md (Improvement A in spirit; the existing delete_keys mechanism already implements the runtime side of the plan). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the field-projection half of
tasks/importer-perf/06-comicbox-side.md(planning PR #627). Independent of #632 (filesystem mtime pre-filter, the other half of sub-plan 06).After investigating the comicbox internals, the plan's "Improvement A:
to_dict(keys=...)" turned out to be ~95% already implemented via comicbox's existingdelete_keysconfig — codex passesdelete_keysto skip parse work for unwanted top-level fields, and the most expensive computed action (pages) is already short-circuited whenpagesis indelete_keys. The substantive runtime win for sub-plan 06 was Improvement D (filesystem mtime pre-filter, PR #632).What was actually missing:
cover_imagewas never added todelete_keysdespite codex never reading it — comicbox parsed it on every comic and dropped it downstream.delete_keys.Change
USED_COMICBOX_FIELDStosettings/__init__.pyas the single source of truth for the codex↔comicbox contract._COMICBOX_DELETE_KEYSprogrammatically fromComicboxYamlSubSchema._declared_fields - USED_COMICBOX_FIELDS.cover_imageis now correctly included.aggregate_path.pyimports the constant from settings instead of maintaining its own copy. The runtime_transform_metadatafilter still uses the same set, so the codex-side post-extract drop remains the second-line backstop.Net
delete_keys: 12 → 13 entries (cover_imagenewly included). No other behavior change; this just keeps the existing mechanism honest going forward.Test plan
make fixcleanmake lint-pythonclean (0 errors, 0 warnings)pytest tests/importer/ tests/test_search_fts.py— 7 passedcover_image🤖 Generated with Claude Code