Importer perf: pre-fetch parent FK pk maps in create/update#629
Merged
ajslater merged 1 commit intov1.11-performancefrom Apr 28, 2026
Merged
Importer perf: pre-fetch parent FK pk maps in create/update#629ajslater merged 1 commit intov1.11-performancefrom
ajslater merged 1 commit intov1.11-performancefrom
Conversation
Previously, _get_create_update_args called field_model.objects.get()
once per row to dereference each parent FK reference. For a fresh
600k-comic import that was ~2M small SELECTs across the
imprint/series/volume/credit/identifier/storyarcnumber create and
update phases, dominated by round-trip overhead.
Add _build_parent_fk_pk_maps(model) which walks
MODEL_CREATE_ARGS_MAP[model] to find every parent FK model and
pre-fetches a {key_tuple: pk} map per parent — one
values_list("pk", *rels) query each. _get_create_update_args now
takes the prebuilt map and resolves parent references via dict
lookup, then assigns the pk via <field_name>_id to skip the FK
descriptor materialization on the new model instance. bulk_create /
bulk_update write the same column either way; the
update_fields/unique_fields tuples don't need to change since
Django's ORM accepts both forms.
Net: ~2M SELECTs collapses to ~1 SELECT per parent model per phase
(roughly K * 2 where K is the number of distinct parent FK models,
or ~12-20 SELECTs for a typical import). Python work is unchanged;
the win is eliminating per-row DB round-trips inside the create and
update loops.
Implements tasks/importer-perf/02-create-fk-batching.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements
tasks/importer-perf/02-create-fk-batching.mdfrom PR #627. Second of the surgical N+1 wins, after PR #628._get_create_update_argspreviously calledfield_model.objects.get()once per row to dereference each parent FK reference. For a fresh 600k-comic import: ~2M small SELECTs across the imprint/series/volume/credit/identifier/storyarcnumber create and update phases, dominated by round-trip overhead.The fix:
_build_parent_fk_pk_maps(model)walksMODEL_CREATE_ARGS_MAP[model]to find every parent FK model referenced, then pre-fetches a{key_tuple: pk}map per parent — onevalues_list("pk", *rels)query each._get_create_update_argstakes the prebuilt map and resolves parent references via dict lookup.<field_name>_idso Django skips the FK-descriptor round-trip on construction.bulk_create/bulk_updatewrite the same column either way; theupdate_fields/unique_fieldstuples don't need to change since Django's ORM accepts both forms.Net query count: ~2M → roughly K*2 where K is the number of distinct parent FK models (~12-20 SELECTs for a typical import). Python work unchanged.
Correctness notes
values_tuple[:index+1]for Imprint/Series/Volume parents inGROUP_MODEL_COUNT_FIELDS) preserved exactly.identifier): pk_map miss returnsNone, written as<field>_id=None— same as before.DoesNotExistraised in the loop, nowIntegrityErrorraised atbulk_createtime. Both abort the import; only the exception class shifts. In practice, by the time create runs, the parent rows exist (created in a previous step or already in the DB), so this path is theoretical.Test plan
make fixcleanmake lint-pythonclean (0 errors, 0 warnings)pytest tests/importer/ tests/test_search_fts.py— 7 passedupdate_fieldschanges, assert resulting Publisher/Imprint/Series/Volume/Identifier/Credit/StoryArcNumber rows match the pre-PR baseline🤖 Generated with Claude Code