Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,5 @@ All notable changes to this project will be documented here.
- Added sandbox runtime probes to `nullstate sandbox status`.
- Decoupled static/offline IaC mode from model usage; configured model endpoints are now used unless `--mock-agents` is passed.
- Added role-specific red/blue model endpoint configuration and per-role vLLM metrics snapshots.
- Added live Terraform apply/re-apply support for Terraform-backed LocalStack scenarios.
- Added MI300X model-serving scripts for Qwen3.5 on SGLang and Gemma 4 on vLLM ROCm.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,7 @@ Each run writes:
- [Threat model](docs/threat-model.md)
- [CI/CD](docs/ci-cd.md)
- [Runbook](docs/runbook.md)
- [Model serving runbook](docs/model-serving.md)
- [AMD compute strategy](docs/compute-strategy.md)
- [Failure modes](docs/failure-modes.md)
- [Cost report](docs/cost-report.md)
Expand Down
97 changes: 97 additions & 0 deletions docs/model-serving.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Model Serving Runbook

This project uses OpenAI-compatible local endpoints so the CLI does not care whether the model is served by vLLM, SGLang, or a managed fallback.

## Recommended MI300X Split

Use two containers when testing red and blue roles independently:

| Role | Serving stack | First model | Reason |
|---|---|---|---|
| Red | SGLang ROCm | `Qwen/Qwen3.5-9B` or `Qwen/Qwen3.5-35B-A3B` | Qwen3.5 has current SGLang AMD recipes and strong agent/coding behavior. |
| Blue | vLLM ROCm | `google/gemma-4-E4B-it` first, then `google/gemma-4-26B-A4B-it` or `google/gemma-4-31B-it` | Gemma 4 support requires a current vLLM ROCm image; start small, then scale. |

Do not start with Nemotron 3 Super on the single 1x MI300X droplet. The public model cards list much larger hardware requirements for the BF16/FP8 releases than one MI300X provides. Keep it in the case study as a future multi-GPU target or managed-endpoint comparison.

## Start Red Qwen3.5 Endpoint

Run on the droplet:

```bash
cd /opt/nullstate

MODEL_ID=Qwen/Qwen3.5-9B \
SERVED_MODEL_NAME=nullstate-qwen35-9b \
HOST_PORT=8001 \
bash /path/to/nullstate-cli/scripts/droplet/serve-qwen35-sglang-rocm.sh
```

For a stronger red model after the first smoke test:

```bash
MODEL_ID=Qwen/Qwen3.5-35B-A3B \
SERVED_MODEL_NAME=nullstate-qwen35-35b \
HOST_PORT=8001 \
bash /path/to/nullstate-cli/scripts/droplet/serve-qwen35-sglang-rocm.sh
```

## Start Blue Gemma 4 Endpoint

Run on the droplet:

```bash
MODEL_ID=google/gemma-4-E4B-it \
SERVED_MODEL_NAME=nullstate-gemma4-e4b \
HOST_PORT=8002 \
bash /path/to/nullstate-cli/scripts/droplet/serve-gemma4-vllm-rocm.sh
```

For larger blue-team analysis after the E4B endpoint is stable:

```bash
MODEL_ID=google/gemma-4-26B-A4B-it \
SERVED_MODEL_NAME=nullstate-gemma4-26b-a4b \
HOST_PORT=8002 \
MAX_MODEL_LEN=32768 \
bash /path/to/nullstate-cli/scripts/droplet/serve-gemma4-vllm-rocm.sh
```

## Tunnel To Local Windows

From Windows PowerShell:

```powershell
ssh -i "$env:USERPROFILE\Documents\AMDhackkey" -N `
-L 8001:127.0.0.1:8001 `
-L 8002:127.0.0.1:8002 `
root@<droplet-ip>
```

Then configure nullstate locally:

```powershell
$env:NULLSTATE_RED_LLM_BASE_URL = "http://127.0.0.1:8001"
$env:NULLSTATE_BLUE_LLM_BASE_URL = "http://127.0.0.1:8002"
python -m nullstate run examples/aws-public-s3 --target localstack-aws --scenario aws-public-s3 --red-model nullstate-qwen35-9b --blue-model nullstate-gemma4-e4b
```

Use `--offline` only when you want static IaC parsing without Terraform apply. Use `--mock-agents` only when you want no model calls.

## Evidence Collection

On the droplet:

```bash
bash /path/to/nullstate-cli/scripts/droplet/collect-endpoint-evidence.sh http://127.0.0.1:8001 nullstate-qwen35-9b
bash /path/to/nullstate-cli/scripts/droplet/collect-endpoint-evidence.sh http://127.0.0.1:8002 nullstate-gemma4-e4b
```

Save the generated evidence directory plus the nullstate run artifacts:

- `models.json`
- `metrics.prom`
- `chat-completion.json`
- `host-snapshot.txt`
- `runs/<id>/metrics.json`
- `runs/<id>/vllm-metrics-red-*.prom`
- `runs/<id>/vllm-metrics-blue-*.prom`
9 changes: 9 additions & 0 deletions docs/runbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,13 @@ Then:
python -m nullstate sandbox up localstack-azure
```

If the token is stored in a local env file, keep the file untracked and pass it explicitly:

```powershell
python -m nullstate sandbox up localstack-azure --env-file .env.local
python -m nullstate sandbox up localstack-aws --env-file .env.local
```

Docker Compose alternative:

```powershell
Expand Down Expand Up @@ -105,6 +112,8 @@ python -m nullstate run examples/azure-public-blob --offline --mock-agents

Use [AMD Compute Strategy](compute-strategy.md) as the deployment checklist. Build the non-GPU DigitalOcean baseline first, then attach the MI300X-backed model endpoint when access is available.

Use [Model Serving Runbook](model-serving.md) for the current two-container Qwen3.5/Gemma 4 setup.

## Fireworks fallback

If AMD GPU access is delayed, point `NULLSTATE_LLM_BASE_URL` at the managed endpoint and keep the same nullstate run flow. Label the evidence as managed inference, not private GPU-hosted inference.
Expand Down
5 changes: 3 additions & 2 deletions examples/aws-public-s3/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,18 @@ provider "aws" {
region = "us-east-1"
access_key = "test"
secret_key = "test"
s3_use_path_style = true
skip_credentials_validation = true
skip_metadata_api_check = true
skip_requesting_account_id = true

endpoints {
s3 = "http://localhost.localstack.cloud:4566"
s3 = "http://s3.localhost.localstack.cloud:4566"
}
}

resource "aws_s3_bucket" "public_logs" {
bucket = "nullstate-public-logs"
bucket_prefix = "nullstate-public-logs-"
}

resource "aws_s3_bucket_public_access_block" "public_logs" {
Expand Down
41 changes: 41 additions & 0 deletions scripts/droplet/collect-endpoint-evidence.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/usr/bin/env bash
set -euo pipefail

BASE_URL="${1:?Usage: collect-endpoint-evidence.sh <base-url> <model-name> [output-dir]}"
MODEL_NAME="${2:?Usage: collect-endpoint-evidence.sh <base-url> <model-name> [output-dir]}"
OUTPUT_DIR="${3:-/opt/nullstate/evidence}"
STAMP="$(date -u +%Y%m%d-%H%M%S)"
RUN_DIR="${OUTPUT_DIR}/endpoint-${MODEL_NAME}-${STAMP}"

mkdir -p "${RUN_DIR}"

curl -sS "${BASE_URL%/}/v1/models" | tee "${RUN_DIR}/models.json"
curl -sS "${BASE_URL%/}/metrics" | tee "${RUN_DIR}/metrics.prom" >/dev/null || true

curl -sS "${BASE_URL%/}/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"${MODEL_NAME}\",
\"messages\": [
{\"role\": \"system\", \"content\": \"You are a concise cloud security assistant.\"},
{\"role\": \"user\", \"content\": \"Explain why public object storage is risky in one sentence.\"}
],
\"max_tokens\": 80,
\"temperature\": 0.2
}" | tee "${RUN_DIR}/chat-completion.json"

{
echo "=== date ==="
date -Is
echo
echo "=== docker containers ==="
docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}"
echo
echo "=== amd-smi ==="
amd-smi static --asic --vram --driver 2>/dev/null || amd-smi 2>/dev/null || true
echo
echo "=== rocm-smi ==="
rocm-smi 2>/dev/null || true
} | tee "${RUN_DIR}/host-snapshot.txt"

echo "Wrote ${RUN_DIR}"
46 changes: 46 additions & 0 deletions scripts/droplet/serve-gemma4-vllm-rocm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#!/usr/bin/env bash
set -euo pipefail

CONTAINER_NAME="${CONTAINER_NAME:-nullstate-blue-gemma4}"
IMAGE="${IMAGE:-vllm/vllm-openai-rocm:latest}"
MODEL_ID="${MODEL_ID:-google/gemma-4-E4B-it}"
SERVED_MODEL_NAME="${SERVED_MODEL_NAME:-nullstate-gemma4-e4b}"
HOST_PORT="${HOST_PORT:-8002}"
CONTAINER_PORT="${CONTAINER_PORT:-8000}"
MAX_MODEL_LEN="${MAX_MODEL_LEN:-32768}"
GPU_MEMORY_UTILIZATION="${GPU_MEMORY_UTILIZATION:-0.90}"

mkdir -p "${HOME}/.cache/huggingface"

docker rm -f "${CONTAINER_NAME}" >/dev/null 2>&1 || true
docker pull "${IMAGE}"

docker run -d \
--name "${CONTAINER_NAME}" \
--ipc=host \
--privileged \
--cap-add=CAP_SYS_ADMIN \
--device=/dev/kfd \
--device=/dev/dri \
--group-add=video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--shm-size 16G \
-p "127.0.0.1:${HOST_PORT}:${CONTAINER_PORT}" \
-e "HF_TOKEN=${HF_TOKEN:-}" \
-e VLLM_ROCM_USE_AITER=1 \
-v "${HOME}/.cache/huggingface:/root/.cache/huggingface" \
"${IMAGE}" \
--model "${MODEL_ID}" \
--served-model-name "${SERVED_MODEL_NAME}" \
--host 0.0.0.0 \
--port "${CONTAINER_PORT}" \
--dtype bfloat16 \
--max-model-len "${MAX_MODEL_LEN}" \
--gpu-memory-utilization "${GPU_MEMORY_UTILIZATION}" \
--enable-force-include-usage \
--enable-prompt-tokens-details \
--limit-mm-per-prompt '{"image": 0, "audio": 0}'

echo "Started ${CONTAINER_NAME} on 127.0.0.1:${HOST_PORT}"
echo "Model: ${SERVED_MODEL_NAME} (${MODEL_ID})"
44 changes: 44 additions & 0 deletions scripts/droplet/serve-qwen35-sglang-rocm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/usr/bin/env bash
set -euo pipefail

CONTAINER_NAME="${CONTAINER_NAME:-nullstate-red-qwen35}"
IMAGE="${IMAGE:-lmsysorg/sglang:v0.5.9-rocm720-mi30x}"
MODEL_ID="${MODEL_ID:-Qwen/Qwen3.5-9B}"
SERVED_MODEL_NAME="${SERVED_MODEL_NAME:-nullstate-qwen35-9b}"
HOST_PORT="${HOST_PORT:-8001}"
CONTAINER_PORT="${CONTAINER_PORT:-30000}"
MEM_FRACTION_STATIC="${MEM_FRACTION_STATIC:-0.8}"

mkdir -p "${HOME}/.cache/huggingface"

docker rm -f "${CONTAINER_NAME}" >/dev/null 2>&1 || true
docker pull "${IMAGE}"

docker run -d \
--name "${CONTAINER_NAME}" \
--ipc=host \
--privileged \
--device=/dev/kfd \
--device=/dev/dri \
--group-add=video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--shm-size 16G \
-p "127.0.0.1:${HOST_PORT}:${CONTAINER_PORT}" \
-e "HF_TOKEN=${HF_TOKEN:-}" \
-v "${HOME}/.cache/huggingface:/root/.cache/huggingface" \
"${IMAGE}" \
python3 -m sglang.launch_server \
--model-path "${MODEL_ID}" \
--served-model-name "${SERVED_MODEL_NAME}" \
--host 0.0.0.0 \
--port "${CONTAINER_PORT}" \
--tp-size 1 \
--attention-backend triton \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder \
--mem-fraction-static "${MEM_FRACTION_STATIC}" \
--trust-remote-code

echo "Started ${CONTAINER_NAME} on 127.0.0.1:${HOST_PORT}"
echo "Model: ${SERVED_MODEL_NAME} (${MODEL_ID})"
18 changes: 13 additions & 5 deletions src/nullstate/agents.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,17 @@ def complete(self, system_prompt: str, user_prompt: str, offline: bool) -> Agent
)

def _offline_response(self, user_prompt: str) -> str:
if "AWS_S3_PUBLIC_ACCESS_BLOCK_DISABLED" in user_prompt:
if self.role == "red":
return "Offline red team selected an anonymous S3 read hypothesis for the public access block exposure."
return "Offline blue team recommended enabling all S3 public access block controls."
if "AZURE_STORAGE_PUBLIC_BLOB" in user_prompt:
if self.role == "red":
return "Offline red team selected the anonymous Azure Blob read exploit for the detected public container."
return (
"Offline blue team confirmed the exposure and recommended setting container_access_type to private "
"and allow_nested_items_to_be_public to false."
)
if self.role == "red":
return "Offline red team selected the anonymous Azure Blob read exploit for the detected public container."
return (
"Offline blue team confirmed the exposure and recommended setting container_access_type to private "
"and allow_nested_items_to_be_public to false."
)
return "Offline red team selected an exploit hypothesis for the detected exposure."
return "Offline blue team confirmed the exposure and recommended the deterministic remediation."
24 changes: 20 additions & 4 deletions src/nullstate/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
from .sandbox import get_backend, list_backends, probe_backend, render_commands, run_commands
from .scenario_detection import infer_scenario
from .scenarios import get_scenario, list_scenarios
from .terraform import load_plan_json
from .terraform import apply_saved_plan, load_plan_json


app = typer.Typer(no_args_is_help=True, help="Autonomous purple-teaming CLI for infrastructure-as-code sandboxes.")
Expand Down Expand Up @@ -92,7 +92,7 @@ def run(
"""Run detection, attack, remediation, and validation."""
scenario_spec = _resolve_scenario(terraform_dir, scenario)
backend = _resolve_backend(target, scenario_spec.backend)
if scenario_spec.name != "azure-public-blob" and not offline:
if not offline and not _scenario_supports_live_terraform(scenario_spec.name):
raise typer.BadParameter(
f"Scenario {scenario_spec.name!r} supports offline demo execution only for now. "
"Use --offline until its live sandbox adapter is implemented."
Expand Down Expand Up @@ -141,6 +141,9 @@ def run(
plan, commands = load_plan_json(workspace_dir, offline=offline)
for result in commands:
events.write("terraform", "Command completed", command=result.command, returncode=result.returncode)
if not offline:
for result in apply_saved_plan(workspace_dir):
events.write("terraform", "Command completed", command=result.command, returncode=result.returncode)

findings = find_scenario_findings(scenario_spec.name, workspace_dir, plan)
events.write("analysis", "IaC input analyzed", finding_count=len(findings))
Expand Down Expand Up @@ -244,7 +247,15 @@ def run(
)
events.write("blue-team", "IaC remediation generated", changed=patch_result.changed, agent=blue_result)

remediated_plan, _ = load_plan_json(workspace_dir, offline=True)
if offline:
remediated_plan, remediation_commands = load_plan_json(workspace_dir, offline=True)
else:
remediated_plan, remediation_commands = load_plan_json(workspace_dir, offline=False)
for result in remediation_commands:
events.write("terraform", "Command completed", command=result.command, returncode=result.returncode)
if not offline:
for result in apply_saved_plan(workspace_dir):
events.write("terraform", "Command completed", command=result.command, returncode=result.returncode)
remaining_findings = find_scenario_findings(scenario_spec.name, workspace_dir, remediated_plan)
after_attack = simulate_attack(remaining_findings, "after")
events.write("validation", "Attack attempted after remediation", result=after_attack, remaining_findings=len(remaining_findings))
Expand Down Expand Up @@ -339,14 +350,15 @@ def sandbox_status(name: str = typer.Argument("plan-only", help="Sandbox backend
def sandbox_up(
name: str = typer.Argument("localstack-azure", help="Sandbox backend name."),
dry_run: bool = typer.Option(False, "--dry-run", help="Print commands without running them."),
env_file: Path | None = typer.Option(None, "--env-file", help="Docker env file for sandbox secrets such as LOCALSTACK_AUTH_TOKEN."),
) -> None:
"""Start a sandbox backend."""
try:
backend = get_backend(name)
except KeyError as error:
raise typer.BadParameter(str(error)) from error

commands = backend.up_commands()
commands = backend.up_commands(env_file=env_file)
if dry_run or not commands:
console.print(render_commands(commands))
return
Expand Down Expand Up @@ -433,6 +445,10 @@ def _resolve_backend(target: str, scenario_backend: str):
raise typer.BadParameter(str(error)) from error


def _scenario_supports_live_terraform(scenario_name: str) -> bool:
return scenario_name in {"azure-public-blob", "aws-public-s3"}


def _resolve_agent_base_url(role: str, explicit: str | None) -> str | None:
if explicit:
return explicit
Expand Down
Loading
Loading