Ker102 · Ker102 · May 9, 2026 · May 9, 2026 · May 9, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -18,3 +18,5 @@ All notable changes to this project will be documented here.
 - Added sandbox runtime probes to `nullstate sandbox status`.
 - Decoupled static/offline IaC mode from model usage; configured model endpoints are now used unless `--mock-agents` is passed.
 - Added role-specific red/blue model endpoint configuration and per-role vLLM metrics snapshots.
+- Added live Terraform apply/re-apply support for Terraform-backed LocalStack scenarios.
+- Added MI300X model-serving scripts for Qwen3.5 on SGLang and Gemma 4 on vLLM ROCm.
diff --git a/README.md b/README.md
@@ -213,6 +213,7 @@ Each run writes:
 - [Threat model](docs/threat-model.md)
 - [CI/CD](docs/ci-cd.md)
 - [Runbook](docs/runbook.md)
+- [Model serving runbook](docs/model-serving.md)
 - [AMD compute strategy](docs/compute-strategy.md)
 - [Failure modes](docs/failure-modes.md)
 - [Cost report](docs/cost-report.md)

diff --git a/docs/model-serving.md b/docs/model-serving.md
@@ -0,0 +1,97 @@
+# Model Serving Runbook
+
+This project uses OpenAI-compatible local endpoints so the CLI does not care whether the model is served by vLLM, SGLang, or a managed fallback.
+
+## Recommended MI300X Split
+
+Use two containers when testing red and blue roles independently:
+
+| Role | Serving stack | First model | Reason |
+|---|---|---|---|
+| Red | SGLang ROCm | `Qwen/Qwen3.5-9B` or `Qwen/Qwen3.5-35B-A3B` | Qwen3.5 has current SGLang AMD recipes and strong agent/coding behavior. |
+| Blue | vLLM ROCm | `google/gemma-4-E4B-it` first, then `google/gemma-4-26B-A4B-it` or `google/gemma-4-31B-it` | Gemma 4 support requires a current vLLM ROCm image; start small, then scale. |
+
+Do not start with Nemotron 3 Super on the single 1x MI300X droplet. The public model cards list much larger hardware requirements for the BF16/FP8 releases than one MI300X provides. Keep it in the case study as a future multi-GPU target or managed-endpoint comparison.
+
+## Start Red Qwen3.5 Endpoint
+
+Run on the droplet:
+
+```bash
+cd /opt/nullstate
+
+MODEL_ID=Qwen/Qwen3.5-9B \
+SERVED_MODEL_NAME=nullstate-qwen35-9b \
+HOST_PORT=8001 \
+bash /path/to/nullstate-cli/scripts/droplet/serve-qwen35-sglang-rocm.sh
+```
+
+For a stronger red model after the first smoke test:
+
+```bash
+MODEL_ID=Qwen/Qwen3.5-35B-A3B \
+SERVED_MODEL_NAME=nullstate-qwen35-35b \
+HOST_PORT=8001 \
+bash /path/to/nullstate-cli/scripts/droplet/serve-qwen35-sglang-rocm.sh
+```
+
+## Start Blue Gemma 4 Endpoint
+
+Run on the droplet:
+
+```bash
+MODEL_ID=google/gemma-4-E4B-it \
+SERVED_MODEL_NAME=nullstate-gemma4-e4b \
+HOST_PORT=8002 \
+bash /path/to/nullstate-cli/scripts/droplet/serve-gemma4-vllm-rocm.sh
+```
+
+For larger blue-team analysis after the E4B endpoint is stable:
+
+```bash
+MODEL_ID=google/gemma-4-26B-A4B-it \
+SERVED_MODEL_NAME=nullstate-gemma4-26b-a4b \
+HOST_PORT=8002 \
+MAX_MODEL_LEN=32768 \
+bash /path/to/nullstate-cli/scripts/droplet/serve-gemma4-vllm-rocm.sh
+```
+
+## Tunnel To Local Windows
+
+From Windows PowerShell:
+
+```powershell
+ssh -i "$env:USERPROFILE\Documents\AMDhackkey" -N `
+  -L 8001:127.0.0.1:8001 `
+  -L 8002:127.0.0.1:8002 `
+  root@<droplet-ip>
+```
+
+Then configure nullstate locally:
+
+```powershell
+$env:NULLSTATE_RED_LLM_BASE_URL = "http://127.0.0.1:8001"
+$env:NULLSTATE_BLUE_LLM_BASE_URL = "http://127.0.0.1:8002"
+python -m nullstate run examples/aws-public-s3 --target localstack-aws --scenario aws-public-s3 --red-model nullstate-qwen35-9b --blue-model nullstate-gemma4-e4b
+```
+
+Use `--offline` only when you want static IaC parsing without Terraform apply. Use `--mock-agents` only when you want no model calls.
+
+## Evidence Collection
+
+On the droplet:
+
+```bash
+bash /path/to/nullstate-cli/scripts/droplet/collect-endpoint-evidence.sh http://127.0.0.1:8001 nullstate-qwen35-9b
+bash /path/to/nullstate-cli/scripts/droplet/collect-endpoint-evidence.sh http://127.0.0.1:8002 nullstate-gemma4-e4b
+```
+
+Save the generated evidence directory plus the nullstate run artifacts:
+
+- `models.json`
+- `metrics.prom`
+- `chat-completion.json`
+- `host-snapshot.txt`
+- `runs/<id>/metrics.json`
+- `runs/<id>/vllm-metrics-red-*.prom`
+- `runs/<id>/vllm-metrics-blue-*.prom`
diff --git a/docs/runbook.md b/docs/runbook.md
@@ -42,6 +42,13 @@ Then:
 python -m nullstate sandbox up localstack-azure
 ```
 
+If the token is stored in a local env file, keep the file untracked and pass it explicitly:
+
+```powershell
+python -m nullstate sandbox up localstack-azure --env-file .env.local
+python -m nullstate sandbox up localstack-aws --env-file .env.local
+```
+
 Docker Compose alternative:
 
 ```powershell
@@ -105,6 +112,8 @@ python -m nullstate run examples/azure-public-blob --offline --mock-agents
 
 Use [AMD Compute Strategy](compute-strategy.md) as the deployment checklist. Build the non-GPU DigitalOcean baseline first, then attach the MI300X-backed model endpoint when access is available.
 
+Use [Model Serving Runbook](model-serving.md) for the current two-container Qwen3.5/Gemma 4 setup.
+
 ## Fireworks fallback
 
 If AMD GPU access is delayed, point `NULLSTATE_LLM_BASE_URL` at the managed endpoint and keep the same nullstate run flow. Label the evidence as managed inference, not private GPU-hosted inference.

diff --git a/examples/aws-public-s3/main.tf b/examples/aws-public-s3/main.tf
@@ -11,17 +11,18 @@ provider "aws" {
   region                      = "us-east-1"
   access_key                  = "test"
   secret_key                  = "test"
+  s3_use_path_style           = true
   skip_credentials_validation = true
   skip_metadata_api_check     = true
   skip_requesting_account_id  = true
 
   endpoints {
-    s3 = "http://localhost.localstack.cloud:4566"
+    s3 = "http://s3.localhost.localstack.cloud:4566"
   }
 }
 
 resource "aws_s3_bucket" "public_logs" {
-  bucket = "nullstate-public-logs"
+  bucket_prefix = "nullstate-public-logs-"
 }
 
 resource "aws_s3_bucket_public_access_block" "public_logs" {

diff --git a/scripts/droplet/collect-endpoint-evidence.sh b/scripts/droplet/collect-endpoint-evidence.sh
@@ -0,0 +1,41 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+BASE_URL="${1:?Usage: collect-endpoint-evidence.sh <base-url> <model-name> [output-dir]}"
+MODEL_NAME="${2:?Usage: collect-endpoint-evidence.sh <base-url> <model-name> [output-dir]}"
+OUTPUT_DIR="${3:-/opt/nullstate/evidence}"
+STAMP="$(date -u +%Y%m%d-%H%M%S)"
+RUN_DIR="${OUTPUT_DIR}/endpoint-${MODEL_NAME}-${STAMP}"
+
+mkdir -p "${RUN_DIR}"
+
+curl -sS "${BASE_URL%/}/v1/models" | tee "${RUN_DIR}/models.json"
+curl -sS "${BASE_URL%/}/metrics" | tee "${RUN_DIR}/metrics.prom" >/dev/null || true
+
+curl -sS "${BASE_URL%/}/v1/chat/completions" \
+  -H "Content-Type: application/json" \
+  -d "{
+    \"model\": \"${MODEL_NAME}\",
+    \"messages\": [
+      {\"role\": \"system\", \"content\": \"You are a concise cloud security assistant.\"},
+      {\"role\": \"user\", \"content\": \"Explain why public object storage is risky in one sentence.\"}
+    ],
+    \"max_tokens\": 80,
+    \"temperature\": 0.2
+  }" | tee "${RUN_DIR}/chat-completion.json"
+
+{
+  echo "=== date ==="
+  date -Is
+  echo
+  echo "=== docker containers ==="
+  docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}"
+  echo
+  echo "=== amd-smi ==="
+  amd-smi static --asic --vram --driver 2>/dev/null || amd-smi 2>/dev/null || true
+  echo
+  echo "=== rocm-smi ==="
+  rocm-smi 2>/dev/null || true
+} | tee "${RUN_DIR}/host-snapshot.txt"
+
+echo "Wrote ${RUN_DIR}"
diff --git a/scripts/droplet/serve-gemma4-vllm-rocm.sh b/scripts/droplet/serve-gemma4-vllm-rocm.sh
@@ -0,0 +1,46 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+CONTAINER_NAME="${CONTAINER_NAME:-nullstate-blue-gemma4}"
+IMAGE="${IMAGE:-vllm/vllm-openai-rocm:latest}"
+MODEL_ID="${MODEL_ID:-google/gemma-4-E4B-it}"
+SERVED_MODEL_NAME="${SERVED_MODEL_NAME:-nullstate-gemma4-e4b}"
+HOST_PORT="${HOST_PORT:-8002}"
+CONTAINER_PORT="${CONTAINER_PORT:-8000}"
+MAX_MODEL_LEN="${MAX_MODEL_LEN:-32768}"
+GPU_MEMORY_UTILIZATION="${GPU_MEMORY_UTILIZATION:-0.90}"
+
+mkdir -p "${HOME}/.cache/huggingface"
+
+docker rm -f "${CONTAINER_NAME}" >/dev/null 2>&1 || true
+docker pull "${IMAGE}"
+
+docker run -d \
+  --name "${CONTAINER_NAME}" \
+  --ipc=host \
+  --privileged \
+  --cap-add=CAP_SYS_ADMIN \
+  --device=/dev/kfd \
+  --device=/dev/dri \
+  --group-add=video \
+  --cap-add=SYS_PTRACE \
+  --security-opt seccomp=unconfined \
+  --shm-size 16G \
+  -p "127.0.0.1:${HOST_PORT}:${CONTAINER_PORT}" \
+  -e "HF_TOKEN=${HF_TOKEN:-}" \
+  -e VLLM_ROCM_USE_AITER=1 \
+  -v "${HOME}/.cache/huggingface:/root/.cache/huggingface" \
+  "${IMAGE}" \
+    --model "${MODEL_ID}" \
+    --served-model-name "${SERVED_MODEL_NAME}" \
+    --host 0.0.0.0 \
+    --port "${CONTAINER_PORT}" \
+    --dtype bfloat16 \
+    --max-model-len "${MAX_MODEL_LEN}" \
+    --gpu-memory-utilization "${GPU_MEMORY_UTILIZATION}" \
+    --enable-force-include-usage \
+    --enable-prompt-tokens-details \
+    --limit-mm-per-prompt '{"image": 0, "audio": 0}'
+
+echo "Started ${CONTAINER_NAME} on 127.0.0.1:${HOST_PORT}"
+echo "Model: ${SERVED_MODEL_NAME} (${MODEL_ID})"
diff --git a/scripts/droplet/serve-qwen35-sglang-rocm.sh b/scripts/droplet/serve-qwen35-sglang-rocm.sh
@@ -0,0 +1,44 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+CONTAINER_NAME="${CONTAINER_NAME:-nullstate-red-qwen35}"
+IMAGE="${IMAGE:-lmsysorg/sglang:v0.5.9-rocm720-mi30x}"
+MODEL_ID="${MODEL_ID:-Qwen/Qwen3.5-9B}"
+SERVED_MODEL_NAME="${SERVED_MODEL_NAME:-nullstate-qwen35-9b}"
+HOST_PORT="${HOST_PORT:-8001}"
+CONTAINER_PORT="${CONTAINER_PORT:-30000}"
+MEM_FRACTION_STATIC="${MEM_FRACTION_STATIC:-0.8}"
+
+mkdir -p "${HOME}/.cache/huggingface"
+
+docker rm -f "${CONTAINER_NAME}" >/dev/null 2>&1 || true
+docker pull "${IMAGE}"
+
+docker run -d \
+  --name "${CONTAINER_NAME}" \
+  --ipc=host \
+  --privileged \
+  --device=/dev/kfd \
+  --device=/dev/dri \
+  --group-add=video \
+  --cap-add=SYS_PTRACE \
+  --security-opt seccomp=unconfined \
+  --shm-size 16G \
+  -p "127.0.0.1:${HOST_PORT}:${CONTAINER_PORT}" \
+  -e "HF_TOKEN=${HF_TOKEN:-}" \
+  -v "${HOME}/.cache/huggingface:/root/.cache/huggingface" \
+  "${IMAGE}" \
+  python3 -m sglang.launch_server \
+    --model-path "${MODEL_ID}" \
+    --served-model-name "${SERVED_MODEL_NAME}" \
+    --host 0.0.0.0 \
+    --port "${CONTAINER_PORT}" \
+    --tp-size 1 \
+    --attention-backend triton \
+    --reasoning-parser qwen3 \
+    --tool-call-parser qwen3_coder \
+    --mem-fraction-static "${MEM_FRACTION_STATIC}" \
+    --trust-remote-code
+
+echo "Started ${CONTAINER_NAME} on 127.0.0.1:${HOST_PORT}"
+echo "Model: ${SERVED_MODEL_NAME} (${MODEL_ID})"
diff --git a/src/nullstate/agents.py b/src/nullstate/agents.py
@@ -72,9 +72,17 @@ def complete(self, system_prompt: str, user_prompt: str, offline: bool) -> Agent
         )
 
     def _offline_response(self, user_prompt: str) -> str:
+        if "AWS_S3_PUBLIC_ACCESS_BLOCK_DISABLED" in user_prompt:
+            if self.role == "red":
+                return "Offline red team selected an anonymous S3 read hypothesis for the public access block exposure."
+            return "Offline blue team recommended enabling all S3 public access block controls."
+        if "AZURE_STORAGE_PUBLIC_BLOB" in user_prompt:
+            if self.role == "red":
+                return "Offline red team selected the anonymous Azure Blob read exploit for the detected public container."
+            return (
+                "Offline blue team confirmed the exposure and recommended setting container_access_type to private "
+                "and allow_nested_items_to_be_public to false."
+            )
         if self.role == "red":
-            return "Offline red team selected the anonymous Azure Blob read exploit for the detected public container."
-        return (
-            "Offline blue team confirmed the exposure and recommended setting container_access_type to private "
-            "and allow_nested_items_to_be_public to false."
-        )
+            return "Offline red team selected an exploit hypothesis for the detected exposure."
+        return "Offline blue team confirmed the exposure and recommended the deterministic remediation."
diff --git a/src/nullstate/cli.py b/src/nullstate/cli.py
@@ -21,7 +21,7 @@
 from .sandbox import get_backend, list_backends, probe_backend, render_commands, run_commands
 from .scenario_detection import infer_scenario
 from .scenarios import get_scenario, list_scenarios
-from .terraform import load_plan_json
+from .terraform import apply_saved_plan, load_plan_json
 
 
 app = typer.Typer(no_args_is_help=True, help="Autonomous purple-teaming CLI for infrastructure-as-code sandboxes.")
@@ -92,7 +92,7 @@ def run(
     """Run detection, attack, remediation, and validation."""
     scenario_spec = _resolve_scenario(terraform_dir, scenario)
     backend = _resolve_backend(target, scenario_spec.backend)
-    if scenario_spec.name != "azure-public-blob" and not offline:
+    if not offline and not _scenario_supports_live_terraform(scenario_spec.name):
         raise typer.BadParameter(
             f"Scenario {scenario_spec.name!r} supports offline demo execution only for now. "
             "Use --offline until its live sandbox adapter is implemented."
@@ -141,6 +141,9 @@ def run(
     plan, commands = load_plan_json(workspace_dir, offline=offline)
     for result in commands:
         events.write("terraform", "Command completed", command=result.command, returncode=result.returncode)
+    if not offline:
+        for result in apply_saved_plan(workspace_dir):
+            events.write("terraform", "Command completed", command=result.command, returncode=result.returncode)
 
     findings = find_scenario_findings(scenario_spec.name, workspace_dir, plan)
     events.write("analysis", "IaC input analyzed", finding_count=len(findings))
@@ -244,7 +247,15 @@ def run(
     )
     events.write("blue-team", "IaC remediation generated", changed=patch_result.changed, agent=blue_result)
 
-    remediated_plan, _ = load_plan_json(workspace_dir, offline=True)
+    if offline:
+        remediated_plan, remediation_commands = load_plan_json(workspace_dir, offline=True)
+    else:
+        remediated_plan, remediation_commands = load_plan_json(workspace_dir, offline=False)
+    for result in remediation_commands:
+        events.write("terraform", "Command completed", command=result.command, returncode=result.returncode)
+    if not offline:
+        for result in apply_saved_plan(workspace_dir):
+            events.write("terraform", "Command completed", command=result.command, returncode=result.returncode)
     remaining_findings = find_scenario_findings(scenario_spec.name, workspace_dir, remediated_plan)
     after_attack = simulate_attack(remaining_findings, "after")
     events.write("validation", "Attack attempted after remediation", result=after_attack, remaining_findings=len(remaining_findings))
@@ -339,14 +350,15 @@ def sandbox_status(name: str = typer.Argument("plan-only", help="Sandbox backend
 def sandbox_up(
     name: str = typer.Argument("localstack-azure", help="Sandbox backend name."),
     dry_run: bool = typer.Option(False, "--dry-run", help="Print commands without running them."),
+    env_file: Path | None = typer.Option(None, "--env-file", help="Docker env file for sandbox secrets such as LOCALSTACK_AUTH_TOKEN."),
 ) -> None:
     """Start a sandbox backend."""
     try:
         backend = get_backend(name)
     except KeyError as error:
         raise typer.BadParameter(str(error)) from error
 
-    commands = backend.up_commands()
+    commands = backend.up_commands(env_file=env_file)
     if dry_run or not commands:
         console.print(render_commands(commands))
         return
@@ -433,6 +445,10 @@ def _resolve_backend(target: str, scenario_backend: str):
         raise typer.BadParameter(str(error)) from error
 
 
+def _scenario_supports_live_terraform(scenario_name: str) -> bool:
+    return scenario_name in {"azure-public-blob", "aws-public-s3"}
+
+
 def _resolve_agent_base_url(role: str, explicit: str | None) -> str | None:
     if explicit:
         return explicit