diff --git a/docs/security/security-model.md b/docs/security/security-model.md index b07e6c05..fc5ad847 100644 --- a/docs/security/security-model.md +++ b/docs/security/security-model.md @@ -112,6 +112,28 @@ Use this checklist to verify a workload running in a dstack CVM. - [ ] key-provider matches expected KMS identity - [ ] KMS attestation is valid +## Verification Design Notes + +This section explains two deliberate scoping decisions in how dstack verifies a quote. Both are intentional; the rationale is recorded here so the behavior is not mistaken for an oversight. + +### Only RTMR3 is verified via event-log replay + +dstack replays an event log only for RTMR3. RTMR0-2 (and MRTD) are not replayed from an event log — they are taken directly from the hardware-signed quote and compared against expected values computed offline from the OS source (e.g. `dstack-mr`). + +This is also reflected at the source: the event log shipped alongside an attestation is stripped down to RTMR3 entries before it is embedded. `VersionedAttestation::into_stripped()` keeps only events with `imr == 3` (see `dstack-attest/src/attestation.rs`), and verification only ever replays those runtime events against `rt_mr3` (`verify_tdx_quote_with_events` / `decode_mr_tdx_from_quote`). + +The reason boot-time event log entries (RTMR0-2) are dropped is that **nothing downstream consumes them**. Verification recomputes the OS-layer measurements directly from the signed `rt_mr0/1/2` values and compares them to independently reproduced expected measurements, so the corresponding boot event log would be redundant. Keeping it would only bloat the RA-TLS certificate and expose extra detail without adding any verification capability. RTMR3, by contrast, is runtime-extended (compose-hash, key-provider, instance-id, and application-emitted events), so its event log is the only one with a real consumer — the replay that proves what was extended into RTMR3. + +### TCB status is surfaced, not gated, during verification + +dstack's `validate_tcb` does not reject a quote based on its TCB status string (`UpToDate`, `OutOfDate`, `ConfigurationNeeded`, `SWHardeningNeeded`, ...). It only enforces hard invariants: debug mode must be off, and the SEAM/service-TD measurements must be well-formed. The verified report carries the `status` field through to the caller. + +This is deliberate: whether a non-current TCB (e.g. `OutOfDate`) is acceptable is a **policy decision that belongs downstream**, not in the verification primitive. Different deployments have different risk tolerances, so the verifier surfaces the status and lets the consuming policy decide. The "TCB status is up-to-date" item in the verification checklist above is exactly such a downstream policy check. + +The one case dstack does not leave to downstream is a genuinely invalid TCB: `dcap-qvl` rejects `Revoked` outright (its `is_valid()` returns false only for `Revoked`), so a revoked TCB never reaches the policy layer in the first place. + +> **Future work:** this will be refactored toward a grace-period model, where an out-of-date TCB is accepted for a bounded window after a new TCB level is published rather than being a binary downstream decision. + ## Limitations ### Attestation proves identity, not correctness diff --git a/dstack-attest/src/lib.rs b/dstack-attest/src/lib.rs index 1f7bc814..63ca978f 100644 --- a/dstack-attest/src/lib.rs +++ b/dstack-attest/src/lib.rs @@ -2,6 +2,8 @@ // // SPDX-License-Identifier: Apache-2.0 +use std::sync::{LazyLock, Mutex}; + use anyhow::Context; use cc_eventlog::RuntimeEvent; @@ -13,12 +15,26 @@ use crate::attestation::AttestationMode; pub mod attestation; mod v1; +/// Serializes runtime event emission within this process. +/// +/// Appending to the event log and extending RTMR3 must happen atomically as a +/// unit: the order of log entries has to match the order of RTMR extensions, +/// otherwise the RTMR replay performed during quote verification will not +/// reproduce the measured value. Concurrent callers (e.g. multiple +/// `emit_event` RPCs hitting the guest-agent at once) would otherwise be able +/// to interleave their log writes and `extend_rtmr` calls. +static EMIT_LOCK: LazyLock> = LazyLock::new(|| Mutex::new(())); + /// Emit a runtime event that extends RTMR3 and logs the event. pub fn emit_runtime_event(event: &str, payload: &[u8]) -> anyhow::Result<()> { let event = RuntimeEvent::new(event.to_string(), payload.to_vec()); let mode = AttestationMode::detect()?; + // Hold the lock across both the log append and the register extension so + // that the on-disk log order always matches the RTMR extension order. + let _guard = EMIT_LOCK.lock().unwrap_or_else(|e| e.into_inner()); + event.emit().context("Failed to emit runtime event")?; if mode.has_tdx() {