Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/security/security-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,28 @@ Use this checklist to verify a workload running in a dstack CVM.
- [ ] key-provider matches expected KMS identity
- [ ] KMS attestation is valid

## Verification Design Notes

This section explains two deliberate scoping decisions in how dstack verifies a quote. Both are intentional; the rationale is recorded here so the behavior is not mistaken for an oversight.

### Only RTMR3 is verified via event-log replay

dstack replays an event log only for RTMR3. RTMR0-2 (and MRTD) are not replayed from an event log — they are taken directly from the hardware-signed quote and compared against expected values computed offline from the OS source (e.g. `dstack-mr`).

This is also reflected at the source: the event log shipped alongside an attestation is stripped down to RTMR3 entries before it is embedded. `VersionedAttestation::into_stripped()` keeps only events with `imr == 3` (see `dstack-attest/src/attestation.rs`), and verification only ever replays those runtime events against `rt_mr3` (`verify_tdx_quote_with_events` / `decode_mr_tdx_from_quote`).

The reason boot-time event log entries (RTMR0-2) are dropped is that **nothing downstream consumes them**. Verification recomputes the OS-layer measurements directly from the signed `rt_mr0/1/2` values and compares them to independently reproduced expected measurements, so the corresponding boot event log would be redundant. Keeping it would only bloat the RA-TLS certificate and expose extra detail without adding any verification capability. RTMR3, by contrast, is runtime-extended (compose-hash, key-provider, instance-id, and application-emitted events), so its event log is the only one with a real consumer — the replay that proves what was extended into RTMR3.

### TCB status is surfaced, not gated, during verification

dstack's `validate_tcb` does not reject a quote based on its TCB status string (`UpToDate`, `OutOfDate`, `ConfigurationNeeded`, `SWHardeningNeeded`, ...). It only enforces hard invariants: debug mode must be off, and the SEAM/service-TD measurements must be well-formed. The verified report carries the `status` field through to the caller.

This is deliberate: whether a non-current TCB (e.g. `OutOfDate`) is acceptable is a **policy decision that belongs downstream**, not in the verification primitive. Different deployments have different risk tolerances, so the verifier surfaces the status and lets the consuming policy decide. The "TCB status is up-to-date" item in the verification checklist above is exactly such a downstream policy check.

The one case dstack does not leave to downstream is a genuinely invalid TCB: `dcap-qvl` rejects `Revoked` outright (its `is_valid()` returns false only for `Revoked`), so a revoked TCB never reaches the policy layer in the first place.

> **Future work:** this will be refactored toward a grace-period model, where an out-of-date TCB is accepted for a bounded window after a new TCB level is published rather than being a binary downstream decision.

## Limitations

### Attestation proves identity, not correctness
Expand Down
16 changes: 16 additions & 0 deletions dstack-attest/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
//
// SPDX-License-Identifier: Apache-2.0

use std::sync::{LazyLock, Mutex};

use anyhow::Context;
use cc_eventlog::RuntimeEvent;

Expand All @@ -13,12 +15,26 @@ use crate::attestation::AttestationMode;
pub mod attestation;
mod v1;

/// Serializes runtime event emission within this process.
///
/// Appending to the event log and extending RTMR3 must happen atomically as a
/// unit: the order of log entries has to match the order of RTMR extensions,
/// otherwise the RTMR replay performed during quote verification will not
/// reproduce the measured value. Concurrent callers (e.g. multiple
/// `emit_event` RPCs hitting the guest-agent at once) would otherwise be able
/// to interleave their log writes and `extend_rtmr` calls.
static EMIT_LOCK: LazyLock<Mutex<()>> = LazyLock::new(|| Mutex::new(()));

/// Emit a runtime event that extends RTMR3 and logs the event.
pub fn emit_runtime_event(event: &str, payload: &[u8]) -> anyhow::Result<()> {
let event = RuntimeEvent::new(event.to_string(), payload.to_vec());

let mode = AttestationMode::detect()?;

// Hold the lock across both the log append and the register extension so
// that the on-disk log order always matches the RTMR extension order.
let _guard = EMIT_LOCK.lock().unwrap_or_else(|e| e.into_inner());

event.emit().context("Failed to emit runtime event")?;

if mode.has_tdx() {
Expand Down
Loading