Skip to content

Importer perf: batch M2M name->pk lookup in link prepare#628

Merged
ajslater merged 1 commit intov1.11-performancefrom
importer-link-batching
Apr 28, 2026
Merged

Importer perf: batch M2M name->pk lookup in link prepare#628
ajslater merged 1 commit intov1.11-performancefrom
importer-link-batching

Conversation

@ajslater
Copy link
Copy Markdown
Owner

Summary

Implements tasks/importer-perf/01-link-batching.md (planning PR #627). The headline impact-per-LOC item from the importer perf plan.

link_prepare_m2m_links previously fired one SELECT per (comic, M2M field) pair — for a 600k-comic import, ~6.6M small SELECTs in this phase alone, dominated by round-trip overhead. Restructured into three phases:

  1. Collect — walk LINK_M2MS once and group every key tuple by M2M field name.
  2. Resolve — one batched SELECT per field builds a {key_tuple: pk} dict.
    • Single-column models (named M2Ms, folders): IN clauses batched at IMPORTER_LINK_FK_BATCH_SIZE (30000) to stay under SQLite's 32766-variable cap.
    • Multi-column models (Credit, StoryArcNumber, Identifier): Q-OR chains batched at 500 to keep SQLite's planner well-behaved.
  3. Stitch — per-comic loop becomes pure dict lookups, no SQL.

Query count drops from ~11N+1 to ~K+1 (K ≈ number of distinct M2M models, ~12). FTS side-effects preserved exactly — complex/folder fields still flow through _add_complex_link_to_fts, named M2Ms still hit add_links_to_fts with the same value shape.

Drops four now-dead helpers (_get_link_folders_filter, _get_link_complex_model_filter, _link_prepare_complex_m2ms, _link_prepare_named_m2ms).

Test plan

  • make fix clean
  • make lint-python clean (0 errors, 0 warnings)
  • pytest tests/importer/ — 3 passed
  • pytest tests/test_search_fts.py tests/importer/ — 7 passed (FTS side-effects intact)
  • Field check on a real fixture: import a 1k-comic library twice (cold + warm), assert resulting Comic, M2M through-row counts match the pre-PR baseline byte-for-byte
  • Wall-clock measurement on a real-scale fixture once one is available

🤖 Generated with Claude Code

Previously, link_prepare_m2m_links fired one SELECT per
(comic, M2M field) pair via _link_prepare_complex_m2ms /
_link_prepare_named_m2ms. For a 600k-comic import that was
~6.6M small SELECTs in this phase alone, dominated by round-trip
overhead.

Restructure into three phases:

1. _collect_m2m_keys_per_field: walk LINK_M2MS once and group every
   key tuple by M2M field name.
2. _build_m2m_pk_maps: one batched SELECT per field builds a
   {key_tuple: pk} dict. Single-column models (named M2Ms,
   folders) use IN clauses batched at IMPORTER_LINK_FK_BATCH_SIZE
   to stay under SQLite's 32766-variable cap. Multi-column models
   (Credit, StoryArcNumber, Identifier) use Q-OR chains batched at
   500 to keep SQLite's planner well-behaved.
3. Per-comic stitch: pure dict lookups, no SQL.

Query count drops from ~11N + 1 to roughly K + 1 (where K is the
number of distinct M2M models referenced by the import, ~12). The
Python work is unchanged; the win is eliminating per-iteration DB
round-trips.

FTS side-effects preserved exactly: complex/folder fields still go
through _add_complex_link_to_fts, named M2Ms still call
add_links_to_fts directly with the same value shape. Drops the
now-dead _get_link_folders_filter, _get_link_complex_model_filter,
_link_prepare_complex_m2ms, and _link_prepare_named_m2ms helpers
(only called from the rewritten link_prepare_m2m_links).

Implements tasks/importer-perf/01-link-batching.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant