From e057bbdd20e50549f6237069ea65fd5447104d11 Mon Sep 17 00:00:00 2001
From: Kristofer Jussmann <kristoferjussmann@gmail.com>
Date: Fri, 8 May 2026 17:28:05 +0300
Subject: [PATCH] docs: add AMD compute strategy

---
 .gitattributes           |  5 +++
 CHANGELOG.md             |  1 +
 README.md                |  1 +
 docs/case-study.md       |  3 ++
 docs/compute-strategy.md | 91 ++++++++++++++++++++++++++++++++++++++++
 docs/cost-report.md      |  1 +
 docs/runbook.md          |  8 ++++
 7 files changed, 110 insertions(+)
 create mode 100644 .gitattributes
 create mode 100644 docs/compute-strategy.md

diff --git a/.gitattributes b/.gitattributes
new file mode 100644
index 0000000..4cfda8f
--- /dev/null
+++ b/.gitattributes
@@ -0,0 +1,5 @@
+* text=auto eol=lf
+
+*.ps1 text eol=crlf
+*.bat text eol=crlf
+*.cmd text eol=crlf
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 3e068d0..fc45a6d 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -8,3 +8,4 @@ All notable changes to this project will be documented here.
 - Added sandbox backend registry and `nullstate sandbox` commands.
 - Added model metrics artifact support.
 - Added DevSecOps repository documentation and GitHub workflow templates.
+- Documented AMD Developer Cloud / DigitalOcean primary compute path with Fireworks fallback.
diff --git a/README.md b/README.md
index 5b2f1f1..6de6a0c 100644
--- a/README.md
+++ b/README.md
@@ -123,6 +123,7 @@ Each run writes:
 - [Threat model](docs/threat-model.md)
 - [CI/CD](docs/ci-cd.md)
 - [Runbook](docs/runbook.md)
+- [AMD compute strategy](docs/compute-strategy.md)
 - [Failure modes](docs/failure-modes.md)
 - [Cost report](docs/cost-report.md)
 
diff --git a/docs/case-study.md b/docs/case-study.md
index 3fb435a..1ad64c9 100644
--- a/docs/case-study.md
+++ b/docs/case-study.md
@@ -15,6 +15,7 @@ Cloud and platform teams can ship IaC faster than security teams can manually va
 - No production cloud targets by default.
 - LocalStack Azure requires Docker and `LOCALSTACK_AUTH_TOKEN`.
 - The demo must still work if Docker, Terraform, or the model endpoint is unavailable.
+- AMD Developer Cloud / DigitalOcean GPU access may be delayed, so Fireworks-compatible managed inference is kept as a contingency.
 
 ## 4. Requirements
 
@@ -80,6 +81,8 @@ See [Runbook](runbook.md).
 
 See [Cost Report](cost-report.md). V1 is designed to run locally, with AMD Developer Cloud used only for model-serving evidence.
 
+See [AMD Compute Strategy](compute-strategy.md) for the primary DigitalOcean/AMD path and Fireworks fallback.
+
 ## 10. Results
 
 - Offline CLI demo runs end to end.
diff --git a/docs/compute-strategy.md b/docs/compute-strategy.md
new file mode 100644
index 0000000..9b59db5
--- /dev/null
+++ b/docs/compute-strategy.md
@@ -0,0 +1,91 @@
+# AMD Compute Strategy
+
+## Decision
+
+Use the harder AMD Developer Cloud / DigitalOcean route as the primary path for the case study. Keep Fireworks AI as a contingency endpoint if GPU access is delayed.
+
+## Why primary path is DigitalOcean/AMD Developer Cloud
+
+- It produces stronger evidence for the hackathon: model serving, ROCm, vLLM/SGLang, GPU observability, and operational setup.
+- It gives a better personal learning outcome because the work includes real DevOps, cloud access, security boundaries, and inference operations.
+- It supports the project thesis: private or self-controlled model inference for sensitive IaC and security evidence.
+- It creates better case-study material than calling a hosted API only.
+
+## Why Fireworks stays as fallback
+
+- It can unblock the demo if AMD Developer Cloud access is delayed.
+- It keeps the red/blue agent loop working through an OpenAI-compatible endpoint.
+- It is still relevant to the AMD ecosystem, but it should be positioned as managed inference rather than private local/owned serving.
+
+## Execution plan
+
+### Track A - DigitalOcean baseline without GPU
+
+Set up the non-GPU platform first:
+
+- project/repo secrets
+- hardened control droplet or container host
+- Docker and compose baseline
+- LocalStack Azure sandbox
+- nullstate CLI installation
+- GitHub Actions environment configuration
+- run artifact storage layout
+- basic monitoring and logs
+
+This work is useful even before the GPU is available.
+
+### Track B - AMD GPU inference
+
+When MI300X access is available:
+
+- provision AMD Developer Cloud / DigitalOcean GPU instance
+- install ROCm stack or use provider image
+- serve model with vLLM or SGLang using an OpenAI-compatible API
+- expose only the required API path to the nullstate operator environment
+- record model name, context length, ROCm version, GPU model, memory, throughput, and latency
+- save vLLM `/metrics` snapshots and `amd-smi` or `rocm-smi` output into the case-study evidence folder
+
+### Track C - Fireworks contingency
+
+If GPU access blocks the submission:
+
+- configure `NULLSTATE_LLM_BASE_URL` for Fireworks-compatible endpoint
+- run the same nullstate demo
+- document this as the managed-inference fallback
+- keep the DigitalOcean/AMD setup as the next milestone rather than hiding the blocker
+
+## Demo positioning
+
+Preferred story:
+
+```text
+nullstate runs local IaC security validation and can use a private AMD MI300X-hosted model endpoint for red/blue reasoning over security evidence.
+```
+
+Fallback story:
+
+```text
+nullstate is model-provider portable through OpenAI-compatible endpoints. The same CLI can run against managed inference while the private AMD GPU endpoint is being provisioned.
+```
+
+## Evidence checklist
+
+- `runs/<id>/report.md`
+- `runs/<id>/metrics.json`
+- vLLM `/metrics` before/after snapshots
+- `amd-smi` or `rocm-smi` output
+- ROCm version
+- model server launch command
+- sanitized network diagram
+- screenshots of GPU utilization and CLI run
+- note whether the run used DigitalOcean/AMD GPU, local mock mode, or Fireworks fallback
+
+## Risks
+
+| Risk | Impact | Mitigation |
+|---|---|---|
+| AMD Developer Cloud access delayed | High | Build DO baseline and use Fireworks fallback |
+| GPU image/ROCm mismatch | Medium | Prefer provider image; document exact versions |
+| Model too large or slow | Medium | Start with one model for both red and blue roles |
+| Endpoint exposed publicly | High | restrict ingress, use token auth, document network boundary |
+| Case study overclaims private inference | High | label each run by actual endpoint type |
diff --git a/docs/cost-report.md b/docs/cost-report.md
index e7bb84f..19fcdb7 100644
--- a/docs/cost-report.md
+++ b/docs/cost-report.md
@@ -12,6 +12,7 @@ V1 is designed to keep cloud spend near zero by default. Offline mode runs local
 | Offline demo | 0 | no cloud or model endpoint |
 | LocalStack Azure | depends on LocalStack access | requires auth token |
 | AMD Developer Cloud | hackathon credits | used for MI300X model endpoint |
+| Fireworks fallback | provider dependent | contingency if GPU access is delayed |
 | GitHub Actions | low/free tier dependent | tests are lightweight |
 
 ## Controls
diff --git a/docs/runbook.md b/docs/runbook.md
index 84a9728..900a2b5 100644
--- a/docs/runbook.md
+++ b/docs/runbook.md
@@ -41,6 +41,14 @@ $env:NULLSTATE_LLM_API_KEY = "<optional>"
 
 Then run without `--offline`.
 
+## AMD Developer Cloud / DigitalOcean path
+
+Use [AMD Compute Strategy](compute-strategy.md) as the deployment checklist. Build the non-GPU DigitalOcean baseline first, then attach the MI300X-backed model endpoint when access is available.
+
+## Fireworks fallback
+
+If AMD GPU access is delayed, point `NULLSTATE_LLM_BASE_URL` at the managed endpoint and keep the same nullstate run flow. Label the evidence as managed inference, not private GPU-hosted inference.
+
 ## Artifact review before publishing
 
 Check: