Add PersistentProgramCache (sqlite + filestream backends) by cpcloud · Pull Request #1912 · NVIDIA/cuda-python

cpcloud · 2026-04-14T22:00:23Z

Summary

Converts cuda.core.utils from a module to a package
Adds ProgramCacheResource ABC with dict-like interface for compiled-program caches
Adds make_program_cache_key() — blake2b digest incorporating schema version, cuda-core/driver/nvrtc versions, code, options, extra_sources, and use_libdevice
Adds SQLiteProgramCache — LRU eviction, single-process, max_size_bytes cap
Adds FileStreamProgramCache — os.replace atomic writes, mtime-based eviction, multi-process safe
~40 unit tests + 3 multiprocess stress tests
API docs added to api.rst

Split design (two classes, not unified): different concurrency and eviction semantics make a single class with a mode flag misleading.

Program.compile(cache=...) integration is out of scope (tracked by #176/#179).

Test plan

~40 unit tests covering ABC contract, key generation, CRUD, eviction, corruption recovery
3 multiprocess tests (concurrent writers same key, distinct keys, reader vs writer)
CI: end-to-end with real Program compilation (requires GPU)

Closes #178

🤖 Generated with Claude Code

Move the existing re-exports into a package __init__.py so that upcoming persistent program cache classes (issue NVIDIA#178) can live alongside without mixing concerns in a single file. The public import surface (cuda.core.utils.StridedMemoryView, cuda.core.utils.args_viewable_as_strided_memory) is unchanged.

Introduce the abstract base class that concrete program caches implement, aligned with the signature from issue NVIDIA#179 and extended for persistence (__contains__, __delitem__, __len__, clear, close, context manager). Implementations will follow: SQLite for single-process use and a file- stream cache for multi-process safety. Part of issue NVIDIA#178.

Hashes compile inputs (source, code_type, options as canonical bytes, target_type, sorted name_expressions) together with driver/nvrtc/cuda-core version identifiers into a 32-byte blake2b digest. A schema version prefix is included so future changes to the keying scheme can invalidate existing caches deliberately rather than by accident. Part of issue NVIDIA#178.

Persistent program cache backed by a single sqlite3 file in WAL mode. Keys are arbitrary bytes (str keys are UTF-8 encoded); values are ObjectCode instances serialised with pickle. Corrupt entries are treated as cache misses and pruned on read. A max_size_bytes cap, when supplied, triggers LRU eviction on writes -- the tracking infrastructure is in place even though dedicated eviction tests land in a follow-up commit. Part of issue NVIDIA#178.

Exercises the size-cap eviction path, confirms reads update accessed_at (so a recently read entry is preserved when a newer write forces eviction), and asserts that omitting the cap keeps the cache unbounded. Part of issue NVIDIA#178.

Persistent program cache backed by a directory of entry files, one per key hash. Writes stage into a tmp/ subdirectory and promote via os.replace so concurrent readers never observe a torn file. Corrupt entries are treated as cache misses and pruned. The max_size_bytes cap is enforced opportunistically on writes by oldest mtime; this is deliberately best-effort for multi-process use. Use SQLiteProgramCache for strict LRU semantics within a single process. Part of issue NVIDIA#178.

Exercises the opportunistic mtime-based eviction sweep that runs on each write and confirms that omitting the cap keeps the cache unbounded. Part of issue NVIDIA#178.

Spawns multiple processes to hammer the cache: writers on a shared key prove last-write-wins without corruption, writers on distinct keys prove nothing is lost under contention, and a reader racing against a writer confirms torn files are never observed because os.replace is atomic. Part of issue NVIDIA#178.

Add the new cache ABC, the two concrete backends, and the key helper to the cuda.core.utils API reference. Part of issue NVIDIA#178.

…ytes cuda.core's ProgramOptions.as_bytes learned a target_type parameter so that NVVM compilations can inject -gen-lto for ltoir targets. Forward target_type from make_program_cache_key so the canonical option bytes reflect this when available, falling back to the one-argument signature on older builds. Part of issue NVIDIA#178.

make_program_cache_key() only hashed options.as_bytes() which omits NVVM-specific fields (extra_sources, use_libdevice). Two NVVM compilations with different extra_sources could produce the same cache key, returning wrong cached object code. Also adds docstring warnings about pickle trust model (cache dirs should be treated as trusted build artifacts) and path-backed ObjectCode instability (normalize to bytes before caching). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

extra_sources is a sequence of (name, source) tuples but the prior fix hashed each item as a flat str/bytes, never reaching the actual source content. Now unpacks both name and source from each tuple. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-14T22:32:37Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1912/
https://nvidia.github.io/cuda-python/pr-preview/pr-1912/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1912/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1912/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

sqlite3 was imported at module level, which fails on CI containers that don't have libsqlite3.so.0 installed (e.g., CUDA 12.9.1 test images). Move the import into SQLiteProgramCache.__init__ so the module is loadable even without sqlite3 — only constructing a SQLiteProgramCache requires it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- SPDX: LicenseRef-NVIDIA-SOFTWARE-LICENSE -> Apache-2.0 in tests - B027: noqa for intentional no-op close() - S301: noqa for pickle.loads (fundamental to cache design) - B904: add 'from None' to re-raised KeyErrors - SIM105: replace try-except-pass with contextlib.suppress - ruff format applied Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Replace private Cython attrs (_module/_name/_code_type) with public properties (code/name/code_type) — cdef attrs not accessible from Python - Add @needs_sqlite3 skip decorator for sqlite tests when libsqlite3 is missing from CI containers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

On Windows, os.replace raises PermissionError when another process has the target file open. Add exponential backoff retry (5 attempts) for the concurrent-writers case. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

On Windows, os.replace raises PermissionError when another process holds the target open. Instead of retrying, silently drop the write — correct semantics for a cache (a missed write isn't corruption). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cpcloud added this to the cuda.core v1.0.0 milestone Apr 14, 2026

cpcloud added P0 High priority - Must do! feature New feature or request cuda.core Everything related to the cuda.core module labels Apr 14, 2026

cpcloud self-assigned this Apr 14, 2026

cpcloud and others added 12 commits April 14, 2026 18:15

test(core.utils): cover FileStreamProgramCache size cap

4f368b1

Exercises the opportunistic mtime-based eviction sweep that runs on each write and confirms that omitting the cap keeps the cache unbounded. Part of issue NVIDIA#178.

docs(core.utils): publish ProgramCacheResource and persistent backends

3c95b8a

Add the new cache ABC, the two concrete backends, and the key helper to the cuda.core.utils API reference. Part of issue NVIDIA#178.

cpcloud force-pushed the persistent-program-cache-178 branch from de57bd8 to ac38a68 Compare April 14, 2026 22:15

cpcloud and others added 5 commits April 14, 2026 18:39

Retry os.replace on Windows PermissionError

4aa42e0

On Windows, os.replace raises PermissionError when another process has the target file open. Add exponential backoff retry (5 attempts) for the concurrent-writers case. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PersistentProgramCache (sqlite + filestream backends)#1912

Add PersistentProgramCache (sqlite + filestream backends)#1912
cpcloud wants to merge 17 commits intoNVIDIA:mainfrom
cpcloud:persistent-program-cache-178

cpcloud commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cpcloud commented Apr 14, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Apr 14, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant