(feat) rvs: implement SOT artifact caching pipeline and HTTP cache server#1813
Open
maxo-nv wants to merge 1 commit into
Open
(feat) rvs: implement SOT artifact caching pipeline and HTTP cache server#1813maxo-nv wants to merge 1 commit into
maxo-nv wants to merge 1 commit into
Conversation
c3fc202 to
c31ef62
Compare
This patch adds the artifact pre-caching pipeline described in NVIDIA#416. Before each validation cycle, RVS now fetches the SOT JSON, resolves artifact URLs (direct URIs and JSONPath-based `sotpath` expressions), and downloads them concurrently into a local cache directory. An HTTP file server then serves the cache to nodes during validation. The server starts once before the main loop so it stays alive across cycles; new files written by each download pass become visible immediately without a restart. Downloads are bounded by a configurable semaphore, respect a per-file timeout, and verify integrity against the SHA-256 checksum advertised by Artifactory in the `x-checksum-sha256` response header. Files already on disk are skipped on subsequent cycles. Artifact URL resolution lives in a new `scenario/resolver.rs` module and is pure (no I/O), making it straightforward to unit test. JSONPath evaluation handles `sotpath` expressions like `$.BoardSKUs[?@.Name == '...'].Components.Software[?@.Component == '...'].Locations[?@.Name == '...'].Location`. The SOT is fetched from NICC via `list_rack_firmware` and matched by the `Name` field against the scenario's `sot_release`. A file-based override on `RvsCtx` keeps the full pipeline exercisable without a live NICC connection. Multi-SOT support (scenarios targeting different releases in the same cycle) is left as a follow-up TODO. The crate is restructured to have a lib target so the artifact, scenario, and context types are shareable across binaries. A new `test-artifact-cache` binary wires up the complete pipeline against a local SOT file and serves the resulting cache — useful for manual verification and showing colleagues how the pieces fit together. Testing ------- The `test-artifact-cache` binary was run against a real SOT JSON file and a hand-crafted scenario TOML targeting release 1.2.2. The scenario exercises three artifact kinds: an OS image placeholder, a direct-URI artifact, and a `sotpath`-resolved artifact. A fourth large artifact (~1.9 GB NVOS binary) was included to validate concurrent streaming and checksum verification. The SOT and scenario files are not committed (kept alongside the upstream SOT JSON used for development). To reproduce, supply any SOT JSON and a matching scenario TOML: `target/debug/test-artifact-cache \` `--sot <path/to/sot.json> \` `--scenario <path/to/scenario.toml> \` `--cache-dir /tmp/rvs-test-cache \` `> /tmp/rvs-test.log 2>&1 &` `tail -f /tmp/rvs-test.log` After downloads completed the cache directory contained: total 3808848 -rw-r--r-- 1 user root 31K nmx-m-nmx-c.proto -rw-r--r-- 1 user root 1.8G nvos.bin -rw-r--r-- 1 user root 1.7M nvos_openapi.json -rw-r--r-- 1 user root 1.7M os All checksums passed. The server correctly served all files from `http://localhost:8080/gb200nvl/1.2.2/<filename>`. Signed-off-by: Max Olender <molender@nvidia.com>
23e731d to
734a84d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This patch adds the artifact pre-caching pipeline described in #416.
Before each validation cycle, RVS now fetches the SOT JSON, resolves
artifact URLs (direct URIs and JSONPath-based
sotpathexpressions),and downloads them concurrently into a local cache directory. An HTTP
file server then serves the cache to nodes during validation. The
server starts once before the main loop so it stays alive across
cycles; new files written by each download pass become visible
immediately without a restart.
Downloads are bounded by a configurable semaphore, respect a per-file
timeout, and verify integrity against the SHA-256 checksum advertised
by Artifactory in the
x-checksum-sha256response header. Filesalready on disk are skipped on subsequent cycles.
Artifact URL resolution lives in a new
scenario/resolver.rsmoduleand is pure (no I/O), making it straightforward to unit test. JSONPath
evaluation handles
sotpathexpressions like$.BoardSKUs[?@.Name == '...'].Components.Software[?@.Component == '...'].Locations[?@.Name == '...'].Location.The SOT is fetched from NICC via
list_rack_firmwareand matched bythe
Namefield against the scenario'ssot_release. A file-basedoverride on
RvsCtxkeeps the full pipeline exercisable without alive NICC connection. Multi-SOT support (scenarios targeting different
releases in the same cycle) is left as a follow-up TODO.
The crate is restructured to have a lib target so the artifact,
scenario, and context types are shareable across binaries. A new
test-artifact-cachebinary wires up the complete pipeline against alocal SOT file and serves the resulting cache — useful for manual
verification and showing colleagues how the pieces fit together.
Testing
The
test-artifact-cachebinary was run against a real SOT JSON fileand a hand-crafted scenario TOML targeting release 1.2.2. The scenario
exercises three artifact kinds: an OS image placeholder, a direct-URI
artifact, and a
sotpath-resolved artifact. A fourth large artifact(~1.9 GB NVOS binary) was included to validate concurrent streaming
and checksum verification.
The SOT and scenario files are not committed (kept alongside the
upstream SOT JSON used for development). To reproduce, supply any SOT
JSON and a matching scenario TOML:
After downloads completed the cache directory contained:
All checksums passed. The server correctly served all files from
http://localhost:8080/gb200nvl/1.2.2/<filename>.Type of Change
Related Issues (Optional)
#1653
Breaking Changes
Testing
Additional Notes