Skip to content

Importer perf: pre-fetch parent FK pk maps in create/update#629

Merged
ajslater merged 1 commit intov1.11-performancefrom
importer-create-fk-batching
Apr 28, 2026
Merged

Importer perf: pre-fetch parent FK pk maps in create/update#629
ajslater merged 1 commit intov1.11-performancefrom
importer-create-fk-batching

Conversation

@ajslater
Copy link
Copy Markdown
Owner

Summary

Implements tasks/importer-perf/02-create-fk-batching.md from PR #627. Second of the surgical N+1 wins, after PR #628.

_get_create_update_args previously called field_model.objects.get() once per row to dereference each parent FK reference. For a fresh 600k-comic import: ~2M small SELECTs across the imprint/series/volume/credit/identifier/storyarcnumber create and update phases, dominated by round-trip overhead.

The fix:

  • New _build_parent_fk_pk_maps(model) walks MODEL_CREATE_ARGS_MAP[model] to find every parent FK model referenced, then pre-fetches a {key_tuple: pk} map per parent — one values_list("pk", *rels) query each.
  • _get_create_update_args takes the prebuilt map and resolves parent references via dict lookup.
  • The resolved pk is assigned via <field_name>_id so Django skips the FK-descriptor round-trip on construction. bulk_create / bulk_update write the same column either way; the update_fields / unique_fields tuples don't need to change since Django's ORM accepts both forms.

Net query count: ~2M → roughly K*2 where K is the number of distinct parent FK models (~12-20 SELECTs for a typical import). Python work unchanged.

Correctness notes

  • Group-model parent slicing (values_tuple[:index+1] for Imprint/Series/Volume parents in GROUP_MODEL_COUNT_FIELDS) preserved exactly.
  • Identifier-style multi-column key tuples preserved exactly.
  • Nullable parent FKs (identifier): pk_map miss returns None, written as <field>_id=None — same as before.
  • Required parent FK miss: previously DoesNotExist raised in the loop, now IntegrityError raised at bulk_create time. Both abort the import; only the exception class shifts. In practice, by the time create runs, the parent rows exist (created in a previous step or already in the DB), so this path is theoretical.

Test plan

  • make fix clean
  • make lint-python clean (0 errors, 0 warnings)
  • pytest tests/importer/ tests/test_search_fts.py — 7 passed
  • Field check on a real fixture: import a 1k-comic library twice (cold + warm) and a 1k-comic library with update_fields changes, assert resulting Publisher/Imprint/Series/Volume/Identifier/Credit/StoryArcNumber rows match the pre-PR baseline
  • Wall-clock measurement on a real-scale fixture once one is available

🤖 Generated with Claude Code

Previously, _get_create_update_args called field_model.objects.get()
once per row to dereference each parent FK reference. For a fresh
600k-comic import that was ~2M small SELECTs across the
imprint/series/volume/credit/identifier/storyarcnumber create and
update phases, dominated by round-trip overhead.

Add _build_parent_fk_pk_maps(model) which walks
MODEL_CREATE_ARGS_MAP[model] to find every parent FK model and
pre-fetches a {key_tuple: pk} map per parent — one
values_list("pk", *rels) query each. _get_create_update_args now
takes the prebuilt map and resolves parent references via dict
lookup, then assigns the pk via <field_name>_id to skip the FK
descriptor materialization on the new model instance. bulk_create /
bulk_update write the same column either way; the
update_fields/unique_fields tuples don't need to change since
Django's ORM accepts both forms.

Net: ~2M SELECTs collapses to ~1 SELECT per parent model per phase
(roughly K * 2 where K is the number of distinct parent FK models,
or ~12-20 SELECTs for a typical import). Python work is unchanged;
the win is eliminating per-row DB round-trips inside the create and
update loops.

Implements tasks/importer-perf/02-create-fk-batching.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ajslater ajslater merged commit 448821c into v1.11-performance Apr 28, 2026
1 check failed
@ajslater ajslater deleted the importer-create-fk-batching branch May 2, 2026 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant