Skip to content

feat(nico): add tenant-transition data sanitization validations (SEC21-04/05/06)#458

Merged
abegnoche merged 3 commits into
mainfrom
cursor/sec21-tenant-sanitization-b3cc
Jun 11, 2026
Merged

feat(nico): add tenant-transition data sanitization validations (SEC21-04/05/06)#458
abegnoche merged 3 commits into
mainfrom
cursor/sec21-tenant-sanitization-b3cc

Conversation

@abegnoche

@abegnoche abegnoche commented Jun 9, 2026

Copy link
Copy Markdown
Member

Summary

Adds three provider-agnostic validations for the "Data Sanitization" requirement (SEC21/SEC22), wired for NICo via the bare_metal suite. Each asserts the same provider-neutral invariant: a host that has served a tenant must pass through a dedicated sanitizing lifecycle stage before it becomes allocatable to a new tenant again.

  • MemorySanitizationCheck (SEC21-04): host (RAM) memory is sanitized between tenants.
  • GpuMemorySanitizationCheck (SEC21-05): SRAM/GPU memory is sanitized between tenants (scoped to GPU-equipped hosts).
  • FirmwareResetCheck (SEC21-06 / SEC22): TPM is cleared and BIOS/UEFI is recommitted during tenant transitions or hardware replacement, plus report-only firmware identity (vendor / product / BIOS version).

The checks fail a host that went from in_use back to available without an intervening sanitizing stage, or that is offered to new tenants while still bound to a prior tenant.

How it maps to NICo

NICo runs its cleanup/sanitization workflow between tenants (host/RAM cleanup + UEFI MemoryOverwriteRequestControl, NVMe/HDD secure erase, InfiniBand cleanup, TPM clear, BIOS/UEFI recommit). At the REST level this whole workflow is the machine Reset status — a released host moves InUse -> Reset -> ... -> Ready and must not return to Ready until cleanup completes (per infra-controller docs/operations/tenant-lifecycle-cleanup.md and the managed-host state machine).

query_sanitization.py reads each machine's status + statusHistory and maps the NICo lifecycle into a provider-neutral token sequence (Reset -> sanitizing, InUse -> in_use, Ready -> available), computing served_tenant, sanitized, available, and stale_tenant_binding, plus has_gpu and firmware identity. The mapping is robust to truncated history: a violation is only flagged on positive evidence of an in_use -> available transition without an intervening sanitizing.

Changes

  • isvtest/src/isvtest/validations/sanitization.py: new validations (shared _TenantSanitizationCheck base + the three checks).
  • isvtest/src/isvtest/validations/__init__.py: register/export the three checks.
  • isvctl/configs/providers/nico/scripts/sanitization/query_sanitization.py: NICo step that emits the neutral per-machine contract.
  • isvctl/configs/suites/bare_metal.yaml: add memory_sanitization, gpu_memory_sanitization, firmware_reset validation groups (only run for providers that implement the query_sanitization step).
  • isvctl/configs/providers/nico/config/bare_metal.yaml: wire the query_sanitization step.
  • isvctl/configs/suites/README.md: document the step's key JSON fields.
  • isvtest/tests/test_sanitization.py + isvctl/tests/test_nico_provider.py: unit tests for the validations and the NICo script (token mapping, history ordering, gate logic, record building, and end-to-end script → validation contract).
  • Ships unreleased; isvtest/src/isvtest/released_tests.json intentionally untouched (new checks land in a separate release commit; exercise with ISVTEST_INCLUDE_UNRELEASED=1).

Validation

Exercised end-to-end through the isvctl orchestrator with an echo-based step (no cloud credentials needed): a clean fleet passes all three checks and a host that went in_use -> available (skipping sanitizing) fails all three with an actionable message.

sec21_sanitization_e2e.log

Programmatic checks: make test (964 + 68 passed), make lint (clean), make demo-test (clean), uvx pre-commit run -a (clean).

Issues

Closes #312
Closes #313
Closes #314

To show artifacts inline, enable in settings.

Open in Web Open in Cursor 

Summary by CodeRabbit

  • New Features

    • Added tenant-transition sanitization checks for bare-metal: MemorySanitization, GpuMemorySanitization, and FirmwareReset (SEC21/SEC22).
    • New CLI sanitization audit emits per-host JSON summaries (sanitized/in_use/available, GPU presence, firmware identity).
  • Tests

    • Extensive unit and integration tests validating sanitization logic, script output, and validation behavior across clean/failed/skipped scenarios.
  • Documentation

    • Test-suite documentation updated to reference the new sanitization coverage.

@copy-pr-bot

copy-pr-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: be9ed06c-4f5b-4d9a-ad95-c9047bfa26a4

📥 Commits

Reviewing files that changed from the base of the PR and between 62579f5 and 02c5639.

📒 Files selected for processing (1)
  • isvctl/tests/test_nico_provider.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • isvctl/tests/test_nico_provider.py

📝 Walkthrough

Walkthrough

Adds a NICo sanitization CLI that derives neutral lifecycle tokens from machine status/history and emits per-machine sanitization records; wires a new query_sanitization step into bare-metal suites; adds Memory/GPU/Firmware sanitization validators; and includes provider and validation tests covering helpers, outputs, and checks.

Changes

Tenant-transition sanitization validation (SEC21-04/05/06)

Layer / File(s) Summary
NICo sanitization query script
isvctl/configs/providers/nico/scripts/sanitization/query_sanitization.py
Implements lifecycle token constants and mapping, ordered history extraction, transition evaluation (served_tenant, sanitized), GPU detection, machine_record assembly with firmware identity and truncated transitions, and CLI main() with auth and machine fetch.
Configuration wiring for sanitization step
isvctl/configs/providers/nico/config/bare_metal.yaml, isvctl/configs/suites/bare_metal.yaml, isvctl/configs/suites/README.md
Provider bare-metal config adds query_sanitization step (org/site/api-base, 120s). Suites add memory_sanitization, gpu_memory_sanitization, and firmware_reset validation groups referencing the new step; README documents the script and expected JSON fields.
NICo provider script testing
isvctl/tests/test_nico_provider.py
Adds _load_sanitization_script() loader, unit tests for status→token, ordered history, and evaluate_transitions; integration tests running script against mocked NICo API asserting per-machine fields; updates parametrized tests to include query_sanitization.
Sanitization validation checks and exports
isvtest/src/isvtest/validations/sanitization.py
New validations module with IN_USE/SANITIZING/AVAILABLE tokens, evaluate_sanitization(), _TenantSanitizationCheck (filtering, per-machine subtests, failure aggregation), and concrete MemorySanitizationCheck, GpuMemorySanitizationCheck, FirmwareResetCheck.
Sanitization validation check testing
isvtest/tests/test_sanitization.py
Provider-neutral test helpers (_machine, _output) and tests covering all three checks: sanitized fleet pass, unsanitized failures, stale tenant binding, GPU scoping and absent-GPU failure, firmware identity reporting and step-failure handling.

Sequence Diagram(s)

sequenceDiagram
  participant NicoAPI as NICo API
  participant QueryScript as query_sanitization.py
  participant ValidationCheck as MemorySanitizationCheck
  participant Report as Test Report

  QueryScript->>NicoAPI: GET machines (org, site)
  NicoAPI-->>QueryScript: machine records (status, statusHistory, capabilities, identity)
  QueryScript->>QueryScript: ordered_history_statuses() -> status_token()
  QueryScript->>QueryScript: evaluate_transitions() -> (served_tenant, sanitized)
  QueryScript->>QueryScript: machine_record() (flags, transitions, firmware)
  QueryScript-->>ValidationCheck: emit JSON step_output with machines array
  ValidationCheck->>ValidationCheck: validate step_output.success and machines list
  ValidationCheck->>ValidationCheck: per-machine evaluate_sanitization()
  ValidationCheck->>Report: emit per-machine subtests and aggregated pass/fail summary
Loading

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers:

  • mresvanis
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: adding tenant-transition data sanitization validations for SEC21-04/05/06 in the NICo provider.
Linked Issues check ✅ Passed The PR fully implements all three linked issue requirements: MemorySanitizationCheck (SEC21-04) [#312], GpuMemorySanitizationCheck (SEC21-05) [#313], and FirmwareResetCheck (SEC21-06) [#314] with validation logic and tests.
Out of Scope Changes check ✅ Passed All changes are within scope: sanitization validations, NICo query script, suite/config updates, tests, and documentation directly support the SEC21-04/05/06 requirements.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cursor/sec21-tenant-sanitization-b3cc

Comment @coderabbitai help to get the list of available commands and usage tips.

@abegnoche

Copy link
Copy Markdown
Member Author

/ok to test 3ca8e7c

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-06-09 20:28:10 UTC | Commit: 3ca8e7c

@abegnoche abegnoche marked this pull request as ready for review June 10, 2026 20:16
@abegnoche abegnoche requested a review from a team as a code owner June 10, 2026 20:16
@abegnoche

Copy link
Copy Markdown
Member Author

/ok to test 2b005a1

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@isvctl/tests/test_nico_provider.py`:
- Around line 902-903: The timestamp formatting in the list comprehension that
constructs entries from history_statuses uses f"2026-01-01T00:0{i}:00Z" which
breaks for i >= 10; update the "created" format in that comprehension (the
expression producing {"status": s, "message": "", "created": ...}) to zero-pad
the minute value (e.g., use a two-digit integer format like {i:02d} or
equivalent datetime formatting) so timestamps remain valid ISO 8601 for i >= 10.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ad35a962-e9f7-4b09-bfdf-059147804aa9

📥 Commits

Reviewing files that changed from the base of the PR and between 9680e30 and 2b005a1.

📒 Files selected for processing (8)
  • isvctl/configs/providers/nico/config/bare_metal.yaml
  • isvctl/configs/providers/nico/scripts/sanitization/query_sanitization.py
  • isvctl/configs/suites/README.md
  • isvctl/configs/suites/bare_metal.yaml
  • isvctl/tests/test_nico_provider.py
  • isvtest/src/isvtest/validations/__init__.py
  • isvtest/src/isvtest/validations/sanitization.py
  • isvtest/tests/test_sanitization.py

Comment thread isvctl/tests/test_nico_provider.py
cursoragent and others added 2 commits June 11, 2026 13:36
…1-04/05/06)

Add three provider-agnostic checks that assert a host which served a tenant is
not returned to the allocatable pool until it passes through a sanitizing
lifecycle stage:

- MemorySanitizationCheck (SEC21-04): host (RAM) memory sanitization.
- GpuMemorySanitizationCheck (SEC21-05): SRAM/GPU memory sanitization, scoped
  to GPU-equipped hosts.
- FirmwareResetCheck (SEC21-06/SEC22): TPM clear and BIOS/UEFI recommit on
  transition, plus report-only firmware identity.

Wire a NICo query_sanitization.py step that maps machine status + statusHistory
(Reset -> sanitizing, InUse -> in_use, Ready -> available) into the neutral
contract, into the bare_metal suite and NICo provider config. Ships unreleased;
released_tests.json intentionally untouched.

Closes #312, #313, #314

Signed-off-by: Cursor Agent <cursoragent@cursor.com>

Co-authored-by: Alexandre Begnoche <abegnoche@users.noreply.github.com>
The tenant-transition sanitization checks concatenated every failing
machine's full message (incl. its entire state-transition history) into
the top-level failure summary, which pytest and the orchestrator then
echoed several times -- a wall of text for large fleets.

Summarize with a count plus a few offending machine IDs instead; the full
per-machine reason (including transitions) is still emitted per subtest
and in the JUnit XML, so no diagnostic detail is lost. Update the tests
to assert the concise summary plus the detail on the subtest.

Signed-off-by: Alexandre Begnoche <abegnoche@nvidia.com>
@cursor cursor Bot force-pushed the cursor/sec21-tenant-sanitization-b3cc branch from 2b005a1 to 62579f5 Compare June 11, 2026 13:37

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
isvctl/tests/test_nico_provider.py (1)

293-302: ⚡ Quick win

Keep API-base contract tests aligned with all configured NICo scripts.

Line 293 and Line 316 add query_sanitization but still omit query_ib_tenant_isolation and
query_ib_keys, which are configured test steps in bare_metal.yaml and also consume --api-base.
Adding them to both parametrizations prevents quiet coverage drift.

♻️ Proposed test update
 `@pytest.mark.parametrize`(
     "step_name",
     [
         "verify_ingestion",
         "check_dpu_health",
         "query_governance_metrics",
         "query_host_health",
         "query_health_aggregation",
+        "query_ib_tenant_isolation",
+        "query_ib_keys",
         "query_sanitization",
     ],
 )
@@
 `@pytest.mark.parametrize`(
     ("script_name", "load_script"),
     [
         ("verify_ingestion.py", _load_ingestion_script),
         ("check_dpu_health.py", _load_dpu_health_script),
         ("query_metrics.py", _load_governance_metrics_script),
         ("query_host_health.py", _load_host_health_script),
         ("query_health_aggregation.py", _load_health_aggregation_script),
+        ("query_ib_tenant_isolation.py", _load_ib_tenant_isolation_script),
+        ("query_ib_keys.py", _load_ib_keys_script),
         ("query_sanitization.py", _load_sanitization_script),
     ],
 )

Also applies to: 316-325

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@isvctl/tests/test_nico_provider.py` around lines 293 - 302, The parametrized
test lists for "step_name" (the pytest.mark.parametrize blocks in
isvctl/tests/test_nico_provider.py that include values like
"query_sanitization") are missing two configured NICo script steps; update both
parametrizations that define "step_name" to include "query_ib_tenant_isolation"
and "query_ib_keys" alongside the existing entries (e.g., "verify_ingestion",
"check_dpu_health", etc.) so the API-base contract tests remain aligned with
bare_metal.yaml.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@isvctl/tests/test_nico_provider.py`:
- Around line 293-302: The parametrized test lists for "step_name" (the
pytest.mark.parametrize blocks in isvctl/tests/test_nico_provider.py that
include values like "query_sanitization") are missing two configured NICo script
steps; update both parametrizations that define "step_name" to include
"query_ib_tenant_isolation" and "query_ib_keys" alongside the existing entries
(e.g., "verify_ingestion", "check_dpu_health", etc.) so the API-base contract
tests remain aligned with bare_metal.yaml.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c8213171-8cd8-45d6-ab7c-673d41042b49

📥 Commits

Reviewing files that changed from the base of the PR and between 2b005a1 and 62579f5.

📒 Files selected for processing (8)
  • isvctl/configs/providers/nico/config/bare_metal.yaml
  • isvctl/configs/providers/nico/scripts/sanitization/query_sanitization.py
  • isvctl/configs/suites/README.md
  • isvctl/configs/suites/bare_metal.yaml
  • isvctl/tests/test_nico_provider.py
  • isvtest/src/isvtest/validations/__init__.py
  • isvtest/src/isvtest/validations/sanitization.py
  • isvtest/tests/test_sanitization.py
✅ Files skipped from review due to trivial changes (1)
  • isvctl/configs/suites/README.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • isvtest/src/isvtest/validations/init.py
  • isvctl/configs/suites/bare_metal.yaml
  • isvtest/src/isvtest/validations/sanitization.py
  • isvtest/tests/test_sanitization.py

@abegnoche

Copy link
Copy Markdown
Member Author

/ok to test 62579f5

Signed-off-by: Cursor Agent <cursoragent@cursor.com>

Co-authored-by: Alexandre Begnoche <abegnoche@users.noreply.github.com>
@abegnoche

Copy link
Copy Markdown
Member Author

/ok to test 02c5639

@abegnoche abegnoche merged commit 43cecdb into main Jun 11, 2026
7 checks passed
@abegnoche abegnoche deleted the cursor/sec21-tenant-sanitization-b3cc branch June 11, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants