Importer perf: batch M2M name->pk lookup in link prepare by ajslater · Pull Request #628 · ajslater/codex

ajslater · 2026-04-28T03:32:40Z

Summary

Implements tasks/importer-perf/01-link-batching.md (planning PR #627). The headline impact-per-LOC item from the importer perf plan.

link_prepare_m2m_links previously fired one SELECT per (comic, M2M field) pair — for a 600k-comic import, ~6.6M small SELECTs in this phase alone, dominated by round-trip overhead. Restructured into three phases:

Collect — walk LINK_M2MS once and group every key tuple by M2M field name.
Resolve — one batched SELECT per field builds a {key_tuple: pk} dict.
- Single-column models (named M2Ms, folders): IN clauses batched at IMPORTER_LINK_FK_BATCH_SIZE (30000) to stay under SQLite's 32766-variable cap.
- Multi-column models (Credit, StoryArcNumber, Identifier): Q-OR chains batched at 500 to keep SQLite's planner well-behaved.
Stitch — per-comic loop becomes pure dict lookups, no SQL.

Query count drops from ~11N+1 to ~K+1 (K ≈ number of distinct M2M models, ~12). FTS side-effects preserved exactly — complex/folder fields still flow through _add_complex_link_to_fts, named M2Ms still hit add_links_to_fts with the same value shape.

Drops four now-dead helpers (_get_link_folders_filter, _get_link_complex_model_filter, _link_prepare_complex_m2ms, _link_prepare_named_m2ms).

Test plan

make fix clean
make lint-python clean (0 errors, 0 warnings)
pytest tests/importer/ — 3 passed
pytest tests/test_search_fts.py tests/importer/ — 7 passed (FTS side-effects intact)
Field check on a real fixture: import a 1k-comic library twice (cold + warm), assert resulting Comic, M2M through-row counts match the pre-PR baseline byte-for-byte
Wall-clock measurement on a real-scale fixture once one is available

🤖 Generated with Claude Code

Previously, link_prepare_m2m_links fired one SELECT per (comic, M2M field) pair via _link_prepare_complex_m2ms / _link_prepare_named_m2ms. For a 600k-comic import that was ~6.6M small SELECTs in this phase alone, dominated by round-trip overhead. Restructure into three phases: 1. _collect_m2m_keys_per_field: walk LINK_M2MS once and group every key tuple by M2M field name. 2. _build_m2m_pk_maps: one batched SELECT per field builds a {key_tuple: pk} dict. Single-column models (named M2Ms, folders) use IN clauses batched at IMPORTER_LINK_FK_BATCH_SIZE to stay under SQLite's 32766-variable cap. Multi-column models (Credit, StoryArcNumber, Identifier) use Q-OR chains batched at 500 to keep SQLite's planner well-behaved. 3. Per-comic stitch: pure dict lookups, no SQL. Query count drops from ~11N + 1 to roughly K + 1 (where K is the number of distinct M2M models referenced by the import, ~12). The Python work is unchanged; the win is eliminating per-iteration DB round-trips. FTS side-effects preserved exactly: complex/folder fields still go through _add_complex_link_to_fts, named M2Ms still call add_links_to_fts directly with the same value shape. Drops the now-dead _get_link_folders_filter, _get_link_complex_model_filter, _link_prepare_complex_m2ms, and _link_prepare_named_m2ms helpers (only called from the rewritten link_prepare_m2m_links). Implements tasks/importer-perf/01-link-batching.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

ajslater merged commit 24fd37a into v1.11-performance Apr 28, 2026
1 check failed

ajslater deleted the importer-link-batching branch May 2, 2026 22:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Importer perf: batch M2M name->pk lookup in link prepare#628

Importer perf: batch M2M name->pk lookup in link prepare#628
ajslater merged 1 commit intov1.11-performancefrom
importer-link-batching

ajslater commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ajslater commented Apr 28, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant