Importer perf: pre-fetch parent FK pk maps in create/update by ajslater · Pull Request #629 · ajslater/codex

ajslater · 2026-04-28T03:44:23Z

Summary

Implements tasks/importer-perf/02-create-fk-batching.md from PR #627. Second of the surgical N+1 wins, after PR #628.

_get_create_update_args previously called field_model.objects.get() once per row to dereference each parent FK reference. For a fresh 600k-comic import: ~2M small SELECTs across the imprint/series/volume/credit/identifier/storyarcnumber create and update phases, dominated by round-trip overhead.

The fix:

New _build_parent_fk_pk_maps(model) walks MODEL_CREATE_ARGS_MAP[model] to find every parent FK model referenced, then pre-fetches a {key_tuple: pk} map per parent — one values_list("pk", *rels) query each.
_get_create_update_args takes the prebuilt map and resolves parent references via dict lookup.
The resolved pk is assigned via <field_name>_id so Django skips the FK-descriptor round-trip on construction. bulk_create / bulk_update write the same column either way; the update_fields / unique_fields tuples don't need to change since Django's ORM accepts both forms.

Net query count: ~2M → roughly K*2 where K is the number of distinct parent FK models (~12-20 SELECTs for a typical import). Python work unchanged.

Correctness notes

Group-model parent slicing (values_tuple[:index+1] for Imprint/Series/Volume parents in GROUP_MODEL_COUNT_FIELDS) preserved exactly.
Identifier-style multi-column key tuples preserved exactly.
Nullable parent FKs (identifier): pk_map miss returns None, written as <field>_id=None — same as before.
Required parent FK miss: previously DoesNotExist raised in the loop, now IntegrityError raised at bulk_create time. Both abort the import; only the exception class shifts. In practice, by the time create runs, the parent rows exist (created in a previous step or already in the DB), so this path is theoretical.

Test plan

make fix clean
make lint-python clean (0 errors, 0 warnings)
pytest tests/importer/ tests/test_search_fts.py — 7 passed
Field check on a real fixture: import a 1k-comic library twice (cold + warm) and a 1k-comic library with update_fields changes, assert resulting Publisher/Imprint/Series/Volume/Identifier/Credit/StoryArcNumber rows match the pre-PR baseline
Wall-clock measurement on a real-scale fixture once one is available

🤖 Generated with Claude Code

Previously, _get_create_update_args called field_model.objects.get() once per row to dereference each parent FK reference. For a fresh 600k-comic import that was ~2M small SELECTs across the imprint/series/volume/credit/identifier/storyarcnumber create and update phases, dominated by round-trip overhead. Add _build_parent_fk_pk_maps(model) which walks MODEL_CREATE_ARGS_MAP[model] to find every parent FK model and pre-fetches a {key_tuple: pk} map per parent — one values_list("pk", *rels) query each. _get_create_update_args now takes the prebuilt map and resolves parent references via dict lookup, then assigns the pk via <field_name>_id to skip the FK descriptor materialization on the new model instance. bulk_create / bulk_update write the same column either way; the update_fields/unique_fields tuples don't need to change since Django's ORM accepts both forms. Net: ~2M SELECTs collapses to ~1 SELECT per parent model per phase (roughly K * 2 where K is the number of distinct parent FK models, or ~12-20 SELECTs for a typical import). Python work is unchanged; the win is eliminating per-row DB round-trips inside the create and update loops. Implements tasks/importer-perf/02-create-fk-batching.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

ajslater merged commit 448821c into v1.11-performance Apr 28, 2026
1 check failed

This was referenced Apr 28, 2026

Importer perf: pre-fetch Folder pk map + narrow prune prefetch #630

Merged

Importer perf: phase-level transaction.atomic + scoped SQLite PRAGMAs #631

Merged

Importer perf: chunk per-comic phases to bound peak memory #634

Merged

ajslater deleted the importer-create-fk-batching branch May 2, 2026 22:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Importer perf: pre-fetch parent FK pk maps in create/update#629

Importer perf: pre-fetch parent FK pk maps in create/update#629
ajslater merged 1 commit intov1.11-performancefrom
importer-create-fk-batching

ajslater commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ajslater commented Apr 28, 2026

Summary

Correctness notes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant