Add PersistentProgramCache (sqlite + filestream backends)#1912
Open
cpcloud wants to merge 17 commits intoNVIDIA:mainfrom
Open
Add PersistentProgramCache (sqlite + filestream backends)#1912cpcloud wants to merge 17 commits intoNVIDIA:mainfrom
cpcloud wants to merge 17 commits intoNVIDIA:mainfrom
Conversation
Move the existing re-exports into a package __init__.py so that upcoming persistent program cache classes (issue NVIDIA#178) can live alongside without mixing concerns in a single file. The public import surface (cuda.core.utils.StridedMemoryView, cuda.core.utils.args_viewable_as_strided_memory) is unchanged.
Introduce the abstract base class that concrete program caches implement, aligned with the signature from issue NVIDIA#179 and extended for persistence (__contains__, __delitem__, __len__, clear, close, context manager). Implementations will follow: SQLite for single-process use and a file- stream cache for multi-process safety. Part of issue NVIDIA#178.
Hashes compile inputs (source, code_type, options as canonical bytes, target_type, sorted name_expressions) together with driver/nvrtc/cuda-core version identifiers into a 32-byte blake2b digest. A schema version prefix is included so future changes to the keying scheme can invalidate existing caches deliberately rather than by accident. Part of issue NVIDIA#178.
Persistent program cache backed by a single sqlite3 file in WAL mode. Keys are arbitrary bytes (str keys are UTF-8 encoded); values are ObjectCode instances serialised with pickle. Corrupt entries are treated as cache misses and pruned on read. A max_size_bytes cap, when supplied, triggers LRU eviction on writes -- the tracking infrastructure is in place even though dedicated eviction tests land in a follow-up commit. Part of issue NVIDIA#178.
Exercises the size-cap eviction path, confirms reads update accessed_at (so a recently read entry is preserved when a newer write forces eviction), and asserts that omitting the cap keeps the cache unbounded. Part of issue NVIDIA#178.
Persistent program cache backed by a directory of entry files, one per key hash. Writes stage into a tmp/ subdirectory and promote via os.replace so concurrent readers never observe a torn file. Corrupt entries are treated as cache misses and pruned. The max_size_bytes cap is enforced opportunistically on writes by oldest mtime; this is deliberately best-effort for multi-process use. Use SQLiteProgramCache for strict LRU semantics within a single process. Part of issue NVIDIA#178.
Exercises the opportunistic mtime-based eviction sweep that runs on each write and confirms that omitting the cap keeps the cache unbounded. Part of issue NVIDIA#178.
Spawns multiple processes to hammer the cache: writers on a shared key prove last-write-wins without corruption, writers on distinct keys prove nothing is lost under contention, and a reader racing against a writer confirms torn files are never observed because os.replace is atomic. Part of issue NVIDIA#178.
Add the new cache ABC, the two concrete backends, and the key helper to the cuda.core.utils API reference. Part of issue NVIDIA#178.
…ytes cuda.core's ProgramOptions.as_bytes learned a target_type parameter so that NVVM compilations can inject -gen-lto for ltoir targets. Forward target_type from make_program_cache_key so the canonical option bytes reflect this when available, falling back to the one-argument signature on older builds. Part of issue NVIDIA#178.
make_program_cache_key() only hashed options.as_bytes() which omits NVVM-specific fields (extra_sources, use_libdevice). Two NVVM compilations with different extra_sources could produce the same cache key, returning wrong cached object code. Also adds docstring warnings about pickle trust model (cache dirs should be treated as trusted build artifacts) and path-backed ObjectCode instability (normalize to bytes before caching). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
extra_sources is a sequence of (name, source) tuples but the prior fix hashed each item as a flat str/bytes, never reaching the actual source content. Now unpacks both name and source from each tuple. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
de57bd8 to
ac38a68
Compare
|
sqlite3 was imported at module level, which fails on CI containers that don't have libsqlite3.so.0 installed (e.g., CUDA 12.9.1 test images). Move the import into SQLiteProgramCache.__init__ so the module is loadable even without sqlite3 — only constructing a SQLiteProgramCache requires it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SPDX: LicenseRef-NVIDIA-SOFTWARE-LICENSE -> Apache-2.0 in tests - B027: noqa for intentional no-op close() - S301: noqa for pickle.loads (fundamental to cache design) - B904: add 'from None' to re-raised KeyErrors - SIM105: replace try-except-pass with contextlib.suppress - ruff format applied Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace private Cython attrs (_module/_name/_code_type) with public properties (code/name/code_type) — cdef attrs not accessible from Python - Add @needs_sqlite3 skip decorator for sqlite tests when libsqlite3 is missing from CI containers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On Windows, os.replace raises PermissionError when another process has the target file open. Add exponential backoff retry (5 attempts) for the concurrent-writers case. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On Windows, os.replace raises PermissionError when another process holds the target open. Instead of retrying, silently drop the write — correct semantics for a cache (a missed write isn't corruption). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
cuda.core.utilsfrom a module to a packageProgramCacheResourceABC with dict-like interface for compiled-program cachesmake_program_cache_key()— blake2b digest incorporating schema version, cuda-core/driver/nvrtc versions, code, options, extra_sources, and use_libdeviceSQLiteProgramCache— LRU eviction, single-process,max_size_bytescapFileStreamProgramCache—os.replaceatomic writes, mtime-based eviction, multi-process safeapi.rstSplit design (two classes, not unified): different concurrency and eviction semantics make a single class with a mode flag misleading.
Program.compile(cache=...)integration is out of scope (tracked by #176/#179).Test plan
Programcompilation (requires GPU)Closes #178
🤖 Generated with Claude Code