Skip to content

ENH: igenerator pkl correctness fixes + CastXML content-addressed cache (supersedes #6486)#6533

Draft
hjmjohnson wants to merge 5 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:enh/igenerator-sqlite-pkl
Draft

ENH: igenerator pkl correctness fixes + CastXML content-addressed cache (supersedes #6486)#6533
hjmjohnson wants to merge 5 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:enh/igenerator-sqlite-pkl

Conversation

@hjmjohnson

Copy link
Copy Markdown
Member

Fix two latent correctness bugs in the Python wrapping pipeline's pkl
tracking, then add a two-level content-addressed CastXML cache that
reduces CI build time by ~32 min on 2-core AzDO runners (−19%) and by
~3h 12m on GHA ubuntu-24.04 when combined with a warm ccache (−64%).
Supersedes #6486.

What this fixes (correctness)

Ninja graph gapigenerator.py writes one .pkl file per wrapped
class (~816 files per full ITK build) into itk-pkl/ as side effects.
None of those files appear as CMake OUTPUTs; only the .index.txt
manifests do. When itk-pkl/*.pkl files are absent (fresh clone, partial
ninja -t clean, CI agent reusing a workspace with a stale build tree)
while the .index.txt files survive, ninja considers igenerator.py
up-to-date. pyi_generator.py then fails with:

No pickle files were found in the pkl directory

Fix (commit 1): a per-module <Module>.stamp file is written by
igenerator.py after all pkl work for that module completes and is
declared as a CMake OUTPUT. Ninja now reruns igenerator.py whenever
the stamp is absent, guaranteeing the pkl files exist before
pyi_generator.py runs.

Parallel write race — the ~96 concurrent igenerator.py jobs (one per
ITK module) all write individual .pkl files into the same itk-pkl/
directory with no locking. Under adverse scheduling this produces torn
reads or silently truncated files.

Fix (commit 2): replace the directory of per-class .pkl files with a
single WAL-mode SQLite database (itk-pkl-v1.db). SQLite WAL mode is
designed for concurrent writers; the race is structurally eliminated.
The .index.txt manifests now hold DB keys (not file paths), and
pyi_generator.py queries the DB via those keys.

What this adds (performance)

Commit 3 defers import pygccxml until after the submodule list is
built from the .mdx files. This separates the pure-I/O discovery phase
from the pygccxml/CastXML-dependent processing phase, laying the
groundwork for a future early-return when all submodule data is already
cached.

Commit 4 adds itk-castxml-cache.py, a transparent wrapper for the
CastXML binary with a two-level content-addressed cache:

  • L1 (no subprocess, ~0.2 s): SHA-256 of .cxx content + .castxml.inc
    flags → L2 key. Survives ninja -t clean because the CastXML binary is
    fingerprinted by content, not mtime.
  • L2 (preprocessor pass, ~0.75 s): SHA-256 of castxml -E output with
    line markers stripped → output.xml.gz. Path-independent: any build
    directory on the same source tree produces the same L2 key.

Enable via ITK_WRAP_CASTXML_CACHE (default ON). Set ITK_WRAP_CACHE
to override the cache root (default ~/.cache/itk-wrap).

CI wiring added: Cache@2 tasks in all three Azure DevOps Python
pipelines and GHA arm.yml; configure-python-ci / build-python-ci /
test-python-ci pixi tasks added.

Measured CI performance (PR #6486 runs, GHA + AzDO)

All measurements taken from the ci/linux-azure-disk-management WIP
branch (same implementation). The clean branch (enh/igenerator-sqlite-pkl)
produces identical build artifacts.

GHA ubuntu-24.04 (2-core runner, PR #6486):

Phase Cold run (2026-06-28) Warm run (2026-06-29) Δ
CastXML (816 jobs) 32 min 23 sec −98%
igenerator + C++ compile ~255 min ~67 min −74%
Tests 44 min 40 min −9%
Total 5h 1m 52s 1h 49m 50s −64%

Cold-to-warm: both the CastXML cache and ccache were cold on the first
run and warm on the second; the 64% total reduction is their combined effect.

AzDO ITK.Linux.Python (2-core runner) — CastXML cache in isolation:

Both the cold run (buildId=16132) and the warm run (buildId=16133) had
ccache already warm (restored from prior branch builds in 22–36 s). This
isolates the CastXML cache contribution:

Cold (16132) Warm (16133) Δ
Build and test 167m 54s 132m 15s −35m 39s
Total 2h 52m 46s 2h 20m 23s −32m 23s (−19%)

CastXML cache alone saves ~32 min per AzDO Python CI run.

72-core local workstation (gyrus, NVMe SSD, ccache warm):

On a many-core machine CastXML parallelizes across all cores and is not
on the critical path. The cache overhead is <30 s on a cold seed pass and
the warm speedup is ~19 s (4%). The cache is most valuable on CI runners
where CastXML is core-count-limited.

Why this supersedes #6486

PR #6486 (ci/linux-azure-disk-management) was a rambling work-in-progress
used to accumulate CI benchmark data. It contains:

This PR (enh/igenerator-sqlite-pkl) is a clean rewrite from upstream/main
with the same final implementation in five independently-reviewable commits.
The igenerator LRU cache removal commit is absent because that code never
existed on upstream/main; only the net-positive changes are present.

igenerator.py writes N pkl files per module as side effects that ninja
cannot track because their names (ClassName.SubmoduleName.pkl) are not
enumerable at CMake configure time.  When pkl files are deleted while
the .index.txt byproducts survive, ninja considers igenerator up-to-date
and pyi_generator.py fails with "No pickle files were found."

Add a --pkl_stamp argument to igenerator.py.  The stamp is written after
all pkl files for the module are complete and is declared as a CMake
OUTPUT of the igenerator add_custom_command.  Ninja now re-runs
igenerator whenever the stamp is absent, which guarantees the pkl files
are regenerated before pyi_generator.py reads the .index.txt manifests.
@github-actions github-actions Bot added type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Enhancement Improvement of existing methods or implementation area:Python wrapping Python bindings for a class type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct area:Documentation Issues affecting the Documentation module labels Jun 30, 2026
@hjmjohnson hjmjohnson force-pushed the enh/igenerator-sqlite-pkl branch from b98089d to afbe436 Compare June 30, 2026 13:58
@hjmjohnson hjmjohnson marked this pull request as ready for review June 30, 2026 13:58
@hjmjohnson hjmjohnson marked this pull request as draft June 30, 2026 13:59
Comment thread Wrapping/Generators/CastXML/itk-castxml-cache.py
@greptile-apps

greptile-apps Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes two latent correctness bugs in the Python wrapping pipeline (Ninja graph gap via a new per-module stamp file, and a parallel-write race via SQLite WAL-mode), then adds a two-level content-addressed CastXML cache (itk-castxml-cache.py) that delivers −19 % to −64 % CI build time savings. CI wiring is added for Azure DevOps (Linux, macOS, Windows) and GitHub Actions (arm).

  • Stamp file fix: igenerator.py now writes a <Module>.stamp file declared as a CMake OUTPUT; ninja will re-run igenerator.py whenever the stamp is absent, guaranteeing pkl data exists before pyi_generator.py runs.
  • SQLite pkl store: Per-class .pkl files in itk-pkl/ are replaced with a single WAL-mode SQLite database (itk-pkl-v1.db); .index.txt manifests now carry DB keys instead of file paths, eliminating concurrent-write races across ~96 parallel igenerator jobs.
  • CastXML cache: itk-castxml-cache.py wraps the CastXML binary with an L1 (no-subprocess, ~0.2 s) / L2 (preprocessor pass, ~0.75 s) content-addressed two-level cache enabled by ITK_WRAP_CASTXML_CACHE=ON.

Confidence Score: 4/5

The correctness fixes (stamp file, SQLite WAL store) are well-designed and the cache implementation is solid; the only concerns are quality-of-life gaps that don't affect build correctness in the tested configurations.

The stamp-file and SQLite migration are structurally sound and CI-tested at scale. Three things warrant a second look: the default 5-second SQLite busy timeout in _open_pkl_db could produce OperationalError: database is locked under peak 96-writer concurrency on slow or NFS-backed cache roots; the manifest table is written by every igenerator invocation but never queried by pyi_generator.py; and pyi_generator.py opens the SQLite DB without a table-existence guard, so a misconfigured ITK_WRAP_CACHE path yields an obscure OperationalError rather than a helpful message.

Wrapping/Generators/SwigInterface/igenerator.py (SQLite timeout and manifest table), Wrapping/Generators/Python/itk/pyi_generator.py (DB open without table guard), Wrapping/Generators/CastXML/itk-castxml-cache.py (Windows eviction gap).

Important Files Changed

Filename Overview
Wrapping/Generators/CastXML/itk-castxml-cache.py New 636-line two-level content-addressed CastXML cache; logic is well-structured with correct atomic-rename store, gzip compression, multi-root cascade, and L1/L2 key hierarchy. LRU eviction via os.fork() is silently disabled on Windows with no fallback.
Wrapping/Generators/SwigInterface/igenerator.py Replaces per-class .pkl files with a WAL-mode SQLite DB; adds stamp file output for correct ninja graph ordering. Default 5-second SQLite busy timeout may be too tight for 96 concurrent writers; manifest table is populated but never queried by pyi_generator.py.
Wrapping/Generators/Python/itk/pyi_generator.py Migrated from file-path-based .pkl loading to SQLite key lookups; correctly reads index.txt for keys then queries pkl_data. DB opened without CREATE TABLE guard — an unexpected ITK_WRAP_CACHE path would give an obscure OperationalError.
Wrapping/macro_files/itk_end_wrap_module.cmake Adds per-module stamp file as CMake OUTPUT and passes --module_name/--pkl_stamp to igenerator; correct fix for the ninja graph gap where stamp absence now forces igenerator to re-run.
Wrapping/macro_files/itk_auto_load_submodules.cmake Correctly builds _castxml_cmd to prepend the Python cache wrapper when ITK_WRAP_CASTXML_CACHE is ON; appends cache script to the existing _castxml_depends list which is already referenced in the add_custom_command DEPENDS clause.
CMake/itkWrapCastXMLCacheSupport.cmake New CMake module adding ITK_WRAP_CASTXML_CACHE option (ON by default) with validation of script path and Python3_EXECUTABLE; cleanly separated from the wrapping core.
.github/workflows/arm.yml Adds ITK_WRAP_CACHE env var and cache/restore steps for the castxml cache; correctly gated on matrix.python-version presence.
Testing/ContinuousIntegration/AzurePipelinesLinuxPython.yml Adds Cache@2 task for CastXML cache with per-SHA key and OS/pipeline fallback restore keys; ITK_WRAP_CASTXML_CACHE:BOOL=ON passed to cmake configure.
Testing/ContinuousIntegration/AzurePipelinesMacOSPython.yml Mirrors the Linux pipeline changes for macOS; cache key correctly uses 'macOSPython' tag to isolate from Linux entries.
Testing/ContinuousIntegration/AzurePipelinesWindowsPython.yml Same CastXML cache integration for Windows; ITK_WRAP_CACHE uses semicolon-separated path convention handled correctly by the Python scripts.
pyproject.toml Adds ccache dependency to pixi feature.cxx and introduces configure-python-ci / build-python-ci / test-python-ci tasks with ITK_WRAP_CASTXML_CACHE enabled; clean separation from the existing configure-python task.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant CMake as CMake/Ninja
    participant Cache as itk-castxml-cache.py
    participant CastXML as CastXML binary
    participant IGen as igenerator.py
    participant DB as itk-pkl-v1.db SQLite WAL
    participant PyiGen as pyi_generator.py

    CMake->>Cache: invoke with cxx+inc args
    alt L1 HIT and L2 HIT
        Cache-->>CMake: restore output.xml.gz, no subprocess
    else L1 miss, run castxml -E for L2 key
        Cache->>CastXML: castxml -E preprocessor pass
        CastXML-->>Cache: preprocessed output
        alt L2 HIT
            Cache-->>CMake: restore output.xml.gz, refresh L1 map
        else L2 miss
            Cache->>CastXML: full castxml run
            CastXML-->>Cache: output.xml
            Cache-->>CMake: store to L2, write L1 map
        end
    end
    CMake->>IGen: igenerator.py per module, ~96 parallel
    IGen->>DB: WAL transaction DELETE manifest then INSERT pkl_data and manifest
    IGen-->>CMake: write .index.txt with DB keys, touch Module.stamp
    CMake->>PyiGen: pyi_generator.py after stamp dependency
    PyiGen->>DB: SELECT data FROM pkl_data WHERE key equals key
    DB-->>PyiGen: pickled ITKClass bytes
    PyiGen-->>CMake: write .pyi stubs
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant CMake as CMake/Ninja
    participant Cache as itk-castxml-cache.py
    participant CastXML as CastXML binary
    participant IGen as igenerator.py
    participant DB as itk-pkl-v1.db SQLite WAL
    participant PyiGen as pyi_generator.py

    CMake->>Cache: invoke with cxx+inc args
    alt L1 HIT and L2 HIT
        Cache-->>CMake: restore output.xml.gz, no subprocess
    else L1 miss, run castxml -E for L2 key
        Cache->>CastXML: castxml -E preprocessor pass
        CastXML-->>Cache: preprocessed output
        alt L2 HIT
            Cache-->>CMake: restore output.xml.gz, refresh L1 map
        else L2 miss
            Cache->>CastXML: full castxml run
            CastXML-->>Cache: output.xml
            Cache-->>CMake: store to L2, write L1 map
        end
    end
    CMake->>IGen: igenerator.py per module, ~96 parallel
    IGen->>DB: WAL transaction DELETE manifest then INSERT pkl_data and manifest
    IGen-->>CMake: write .index.txt with DB keys, touch Module.stamp
    CMake->>PyiGen: pyi_generator.py after stamp dependency
    PyiGen->>DB: SELECT data FROM pkl_data WHERE key equals key
    DB-->>PyiGen: pickled ITKClass bytes
    PyiGen-->>CMake: write .pyi stubs
Loading

Reviews (1): Last reviewed commit: "STYLE: Explicitly set ITK_WRAP_CASTXML_C..." | Re-trigger Greptile

def _open_pkl_db(pkl_dir: str) -> sqlite3.Connection:
db_path = _pkl_db_path(pkl_dir)
db_path.parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(db_path)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The sqlite3.connect() call uses the default 5-second busy timeout. In a full ITK build, ~96 igenerator.py processes run concurrently — all sharing the same WAL-mode database. Each module's transaction (DELETE FROM manifest + N INSERT OR REPLACE rows) is fast individually, but the worst-case queueing time across all 96 writers can approach or exceed 5 seconds, causing OperationalError: database is locked on slower CI machines or when the cache root is on NFS. A 30-second timeout is safer for large parallel builds.

Suggested change
conn = sqlite3.connect(db_path)
conn = sqlite3.connect(db_path, timeout=30)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call here. I evaluated this earlier, and 5 seconds is quite safe. There's no harm in making a 100x safety window at 30 seconds.

Comment on lines 104 to 111
)
conn.execute("CREATE INDEX IF NOT EXISTS manifest_module ON manifest(module)")
conn.commit()
return conn


def argument_parser():
cmdln_arg_parser = ArgumentParser()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 manifest table written but never read

The manifest table (and manifest_module index) is created, written to with DELETE + INSERT in every igenerator invocation, but pyi_generator.py never queries it — it reads only pkl_data via keys collected from the .index.txt files. This creates unnecessary write pressure per module transaction (~96 extra delete+insert batches in a full build) without any current consumer. If the table is intended for future cleanup of orphan pkl_data rows, a comment documenting that intent would prevent it being mistakenly removed as dead code.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +196 to +204
return
if sys.platform == "win32":
return
pid = os.fork()
if pid == 0:
try:
_evict_lru(cache_root, _max_cache_bytes())
finally:
os._exit(0)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 LRU eviction silently disabled on Windows

_evict_async touches the _evict_ts sentinel (consuming the 60-second rate-limit slot) and then returns without evicting anything on Windows because os.fork() is unavailable. This means cache size is unbounded on Windows for local developer builds — ITK_WRAP_CACHE_MAX_SIZE is never enforced automatically. The docstring and module-level documentation only mention "background fork" without noting this gap. A short comment (or a subprocess-based fallback for Windows) would clarify the expected behaviour and prevent cache directories from growing without limit on local Windows machines.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me double-check: in the PR that this succeeds, I had looked into the Windows scenario; perhaps I dropped it during the reorganization.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _evict_async in PR #6486 is identical — the if sys.platform == "win32": return early exit (without eviction) was already there. No Windows fallback was present in the predecessor either; it wasn't dropped during the reorganization.

So the behavior is the same between both PRs: on Windows, the sentinel is touched (consuming the 60-second rate-limit slot) but eviction never runs. The original comment stands — cache growth is unbounded on Windows for local developer builds.

Given that ITK's Python wrapping CI doesn't run on Windows and Windows developer builds are relatively rare, a simple comment documenting the intentional gap would be the pragmatic fix rather than a full subprocess-based fallback:

Suggested change
return
if sys.platform == "win32":
return
pid = os.fork()
if pid == 0:
try:
_evict_lru(cache_root, _max_cache_bytes())
finally:
os._exit(0)
if sys.platform == "win32":
# os.fork() is unavailable on Windows; eviction is not enforced
# automatically. Run `python3 itk-castxml-cache.py --evict` manually
# to reclaim space when ITK_WRAP_CACHE_MAX_SIZE is exceeded.
return

Comment on lines +618 to +619
db_conn = sqlite3.connect(_pkl_db_path(options.pkl_dir))
db_conn.execute("PRAGMA journal_mode=WAL")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 DB opened without table-existence guard

sqlite3.connect(_pkl_db_path(...)) will create an empty file if the database does not yet exist at the given path. If the path is wrong (e.g., ITK_WRAP_CACHE points to a location different from where igenerator.py wrote) the subsequent SELECT data FROM pkl_data WHERE key=? query will fail with OperationalError: no such table: pkl_data instead of the more informative "key not found" warning. Adding CREATE TABLE IF NOT EXISTS here (mirroring _open_pkl_db) would make the read side self-contained and provide a clearer error path.

@hjmjohnson

Copy link
Copy Markdown
Member Author

Yes — this file is the core deliverable of the PR.

itk-castxml-cache.py is a transparent two-level content-addressed wrapper that replaces the bare castxml invocation for all 816 wrapping jobs. CMake/itkWrapCastXMLCacheSupport.cmake points ITK_WRAP_CASTXML_CACHE_SCRIPT at it; Wrapping/macro_files/itk_auto_load_submodules.cmake:207–215 substitutes it for castxml when ITK_WRAP_CASTXML_CACHE=ON. On a warm cache it skips all 816 CastXML subprocesses, saving ~32 min per AzDO Python CI run (−19%) and ~3h 12m on GHA ubuntu-24.04 combined with ccache (−64%).

Documentation/docs/contributing/wrapping_architecture.md has the full picture (Step 1 and the "CastXML cache" section under Caches).

@hjmjohnson hjmjohnson force-pushed the enh/igenerator-sqlite-pkl branch 6 times, most recently from f3471d1 to 35fd9b6 Compare June 30, 2026 18:55
igenerator.py now writes all pickle data for a module into a single
WAL-mode SQLite database (itk-pkl-v3.db) kept in the build tree's
ITK_PKL_DIR.  The DB is keyed by bare class.submodule strings and is
intermediate handoff state between igenerator (writer) and
pyi_generator (reader), so it stays local to one build tree and is
never shared via ITK_WRAP_CACHE.  The .index.txt manifests now hold
DB keys instead of file paths.

pyi_generator.py reads keys from the manifests, queries the DB, and
prunes any row whose key is absent from the current build's manifests
(exact keyset cleanup, replacing the previous per-tree *.pkl glob).

The schema version is embedded in the filename so a stale database
from an older schema is ignored rather than migrated.
itk_end_wrap_module.cmake passes --module_name to igenerator to
identify the manifest partition; the per-module stamp file (.stamp)
remains the CMake OUTPUT that ninja tracks.

Assisted-by: Claude Code — design, implementation, and pre-commit validation
pygccxml is now imported lazily via _load_pygccxml() called in main()
after the submodule_names_list is populated from the mdx files.  The
pygccxml_config is constructed immediately after, so the processing
loop is unchanged.

This separates the submodule-discovery phase (pure file I/O) from the
pygccxml/CastXML-dependent phase, enabling a future cache-hit early
return before pygccxml is ever loaded.
On 2-core CI runners, 816 CastXML invocations (~32 min) account for
~10% of a cold Python wrapping build.  A warm cache reduces that to
~23 sec, cutting total build time by 32 min on Azure DevOps (19%)
and 3h 12m on GHA when combined with a warm ccache (64%).

itk-castxml-cache.py wraps the CastXML binary transparently:
- L1 key: sha256 of cxx + castxml.inc + flags (no subprocess, ~0.2s)
- L2 key: sha256 of castxml -E preprocessed output with line markers
  stripped, making keys path-independent across build directories

Enable via ITK_WRAP_CASTXML_CACHE (default ON).  Set ITK_WRAP_CACHE
to override the cache root (default ~/.cache/itk-wrap).

CI wiring: Cache@2 tasks added to Linux, macOS, and Windows Azure
DevOps pipelines; arm.yml updated for GHA.  pixi tasks added for
configure-python-ci / build-python-ci / test-python-ci.
Relies on CMake default (ON) elsewhere; explicit here avoids silent
no-op if the option is ever overridden upstream in the dashboard script.
@hjmjohnson hjmjohnson force-pushed the enh/igenerator-sqlite-pkl branch from 35fd9b6 to 30c5eaa Compare June 30, 2026 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Documentation Issues affecting the Documentation module area:Python wrapping Python bindings for a class type:Enhancement Improvement of existing methods or implementation type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants