Skip to content

Add PersistentProgramCache (sqlite + filestream backends)#1912

Open
cpcloud wants to merge 17 commits intoNVIDIA:mainfrom
cpcloud:persistent-program-cache-178
Open

Add PersistentProgramCache (sqlite + filestream backends)#1912
cpcloud wants to merge 17 commits intoNVIDIA:mainfrom
cpcloud:persistent-program-cache-178

Conversation

@cpcloud
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud commented Apr 14, 2026

Summary

  • Converts cuda.core.utils from a module to a package
  • Adds ProgramCacheResource ABC with dict-like interface for compiled-program caches
  • Adds make_program_cache_key() — blake2b digest incorporating schema version, cuda-core/driver/nvrtc versions, code, options, extra_sources, and use_libdevice
  • Adds SQLiteProgramCache — LRU eviction, single-process, max_size_bytes cap
  • Adds FileStreamProgramCacheos.replace atomic writes, mtime-based eviction, multi-process safe
  • ~40 unit tests + 3 multiprocess stress tests
  • API docs added to api.rst

Split design (two classes, not unified): different concurrency and eviction semantics make a single class with a mode flag misleading.

Program.compile(cache=...) integration is out of scope (tracked by #176/#179).

Test plan

  • ~40 unit tests covering ABC contract, key generation, CRUD, eviction, corruption recovery
  • 3 multiprocess tests (concurrent writers same key, distinct keys, reader vs writer)
  • CI: end-to-end with real Program compilation (requires GPU)

Closes #178

🤖 Generated with Claude Code

@cpcloud cpcloud added this to the cuda.core v1.0.0 milestone Apr 14, 2026
@cpcloud cpcloud added P0 High priority - Must do! feature New feature or request cuda.core Everything related to the cuda.core module labels Apr 14, 2026
@cpcloud cpcloud self-assigned this Apr 14, 2026
cpcloud and others added 12 commits April 14, 2026 18:15
Move the existing re-exports into a package __init__.py so that upcoming
persistent program cache classes (issue NVIDIA#178) can live alongside without
mixing concerns in a single file. The public import surface
(cuda.core.utils.StridedMemoryView, cuda.core.utils.args_viewable_as_strided_memory)
is unchanged.
Introduce the abstract base class that concrete program caches implement,
aligned with the signature from issue NVIDIA#179 and extended for persistence
(__contains__, __delitem__, __len__, clear, close, context manager).
Implementations will follow: SQLite for single-process use and a file-
stream cache for multi-process safety.

Part of issue NVIDIA#178.
Hashes compile inputs (source, code_type, options as canonical bytes,
target_type, sorted name_expressions) together with driver/nvrtc/cuda-core
version identifiers into a 32-byte blake2b digest. A schema version prefix
is included so future changes to the keying scheme can invalidate existing
caches deliberately rather than by accident.

Part of issue NVIDIA#178.
Persistent program cache backed by a single sqlite3 file in WAL mode. Keys
are arbitrary bytes (str keys are UTF-8 encoded); values are ObjectCode
instances serialised with pickle. Corrupt entries are treated as cache
misses and pruned on read. A max_size_bytes cap, when supplied, triggers
LRU eviction on writes -- the tracking infrastructure is in place even
though dedicated eviction tests land in a follow-up commit.

Part of issue NVIDIA#178.
Exercises the size-cap eviction path, confirms reads update accessed_at
(so a recently read entry is preserved when a newer write forces eviction),
and asserts that omitting the cap keeps the cache unbounded.

Part of issue NVIDIA#178.
Persistent program cache backed by a directory of entry files, one per
key hash. Writes stage into a tmp/ subdirectory and promote via os.replace
so concurrent readers never observe a torn file. Corrupt entries are
treated as cache misses and pruned. The max_size_bytes cap is enforced
opportunistically on writes by oldest mtime; this is deliberately
best-effort for multi-process use. Use SQLiteProgramCache for strict
LRU semantics within a single process.

Part of issue NVIDIA#178.
Exercises the opportunistic mtime-based eviction sweep that runs on each
write and confirms that omitting the cap keeps the cache unbounded.

Part of issue NVIDIA#178.
Spawns multiple processes to hammer the cache: writers on a shared key
prove last-write-wins without corruption, writers on distinct keys prove
nothing is lost under contention, and a reader racing against a writer
confirms torn files are never observed because os.replace is atomic.

Part of issue NVIDIA#178.
Add the new cache ABC, the two concrete backends, and the key helper to
the cuda.core.utils API reference.

Part of issue NVIDIA#178.
…ytes

cuda.core's ProgramOptions.as_bytes learned a target_type parameter so
that NVVM compilations can inject -gen-lto for ltoir targets. Forward
target_type from make_program_cache_key so the canonical option bytes
reflect this when available, falling back to the one-argument signature
on older builds.

Part of issue NVIDIA#178.
make_program_cache_key() only hashed options.as_bytes() which omits
NVVM-specific fields (extra_sources, use_libdevice). Two NVVM
compilations with different extra_sources could produce the same
cache key, returning wrong cached object code.

Also adds docstring warnings about pickle trust model (cache dirs
should be treated as trusted build artifacts) and path-backed
ObjectCode instability (normalize to bytes before caching).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
extra_sources is a sequence of (name, source) tuples but the prior
fix hashed each item as a flat str/bytes, never reaching the actual
source content. Now unpacks both name and source from each tuple.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cpcloud cpcloud force-pushed the persistent-program-cache-178 branch from de57bd8 to ac38a68 Compare April 14, 2026 22:15
@github-actions
Copy link
Copy Markdown

cpcloud and others added 5 commits April 14, 2026 18:39
sqlite3 was imported at module level, which fails on CI containers
that don't have libsqlite3.so.0 installed (e.g., CUDA 12.9.1 test
images). Move the import into SQLiteProgramCache.__init__ so the
module is loadable even without sqlite3 — only constructing a
SQLiteProgramCache requires it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SPDX: LicenseRef-NVIDIA-SOFTWARE-LICENSE -> Apache-2.0 in tests
- B027: noqa for intentional no-op close()
- S301: noqa for pickle.loads (fundamental to cache design)
- B904: add 'from None' to re-raised KeyErrors
- SIM105: replace try-except-pass with contextlib.suppress
- ruff format applied

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace private Cython attrs (_module/_name/_code_type) with public
  properties (code/name/code_type) — cdef attrs not accessible from Python
- Add @needs_sqlite3 skip decorator for sqlite tests when libsqlite3
  is missing from CI containers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On Windows, os.replace raises PermissionError when another process
has the target file open. Add exponential backoff retry (5 attempts)
for the concurrent-writers case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On Windows, os.replace raises PermissionError when another process
holds the target open. Instead of retrying, silently drop the write —
correct semantics for a cache (a missed write isn't corruption).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module feature New feature or request P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add cuda.core.utils.PersistentProgramCache

1 participant