Importer perf: batch M2M name->pk lookup in link prepare#628
Merged
ajslater merged 1 commit intov1.11-performancefrom Apr 28, 2026
Merged
Importer perf: batch M2M name->pk lookup in link prepare#628ajslater merged 1 commit intov1.11-performancefrom
ajslater merged 1 commit intov1.11-performancefrom
Conversation
Previously, link_prepare_m2m_links fired one SELECT per
(comic, M2M field) pair via _link_prepare_complex_m2ms /
_link_prepare_named_m2ms. For a 600k-comic import that was
~6.6M small SELECTs in this phase alone, dominated by round-trip
overhead.
Restructure into three phases:
1. _collect_m2m_keys_per_field: walk LINK_M2MS once and group every
key tuple by M2M field name.
2. _build_m2m_pk_maps: one batched SELECT per field builds a
{key_tuple: pk} dict. Single-column models (named M2Ms,
folders) use IN clauses batched at IMPORTER_LINK_FK_BATCH_SIZE
to stay under SQLite's 32766-variable cap. Multi-column models
(Credit, StoryArcNumber, Identifier) use Q-OR chains batched at
500 to keep SQLite's planner well-behaved.
3. Per-comic stitch: pure dict lookups, no SQL.
Query count drops from ~11N + 1 to roughly K + 1 (where K is the
number of distinct M2M models referenced by the import, ~12). The
Python work is unchanged; the win is eliminating per-iteration DB
round-trips.
FTS side-effects preserved exactly: complex/folder fields still go
through _add_complex_link_to_fts, named M2Ms still call
add_links_to_fts directly with the same value shape. Drops the
now-dead _get_link_folders_filter, _get_link_complex_model_filter,
_link_prepare_complex_m2ms, and _link_prepare_named_m2ms helpers
(only called from the rewritten link_prepare_m2m_links).
Implements tasks/importer-perf/01-link-batching.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements
tasks/importer-perf/01-link-batching.md(planning PR #627). The headline impact-per-LOC item from the importer perf plan.link_prepare_m2m_linkspreviously fired one SELECT per (comic, M2M field) pair — for a 600k-comic import, ~6.6M small SELECTs in this phase alone, dominated by round-trip overhead. Restructured into three phases:{key_tuple: pk}dict.INclauses batched atIMPORTER_LINK_FK_BATCH_SIZE(30000) to stay under SQLite's 32766-variable cap.Query count drops from ~11N+1 to ~K+1 (K ≈ number of distinct M2M models, ~12). FTS side-effects preserved exactly — complex/folder fields still flow through
_add_complex_link_to_fts, named M2Ms still hitadd_links_to_ftswith the same value shape.Drops four now-dead helpers (
_get_link_folders_filter,_get_link_complex_model_filter,_link_prepare_complex_m2ms,_link_prepare_named_m2ms).Test plan
make fixcleanmake lint-pythonclean (0 errors, 0 warnings)pytest tests/importer/— 3 passedpytest tests/test_search_fts.py tests/importer/— 7 passed (FTS side-effects intact)🤖 Generated with Claude Code