From e057bbdd20e50549f6237069ea65fd5447104d11 Mon Sep 17 00:00:00 2001 From: Kristofer Jussmann Date: Fri, 8 May 2026 17:28:05 +0300 Subject: [PATCH] docs: add AMD compute strategy --- .gitattributes | 5 +++ CHANGELOG.md | 1 + README.md | 1 + docs/case-study.md | 3 ++ docs/compute-strategy.md | 91 ++++++++++++++++++++++++++++++++++++++++ docs/cost-report.md | 1 + docs/runbook.md | 8 ++++ 7 files changed, 110 insertions(+) create mode 100644 .gitattributes create mode 100644 docs/compute-strategy.md diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..4cfda8f --- /dev/null +++ b/.gitattributes @@ -0,0 +1,5 @@ +* text=auto eol=lf + +*.ps1 text eol=crlf +*.bat text eol=crlf +*.cmd text eol=crlf diff --git a/CHANGELOG.md b/CHANGELOG.md index 3e068d0..fc45a6d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,3 +8,4 @@ All notable changes to this project will be documented here. - Added sandbox backend registry and `nullstate sandbox` commands. - Added model metrics artifact support. - Added DevSecOps repository documentation and GitHub workflow templates. +- Documented AMD Developer Cloud / DigitalOcean primary compute path with Fireworks fallback. diff --git a/README.md b/README.md index 5b2f1f1..6de6a0c 100644 --- a/README.md +++ b/README.md @@ -123,6 +123,7 @@ Each run writes: - [Threat model](docs/threat-model.md) - [CI/CD](docs/ci-cd.md) - [Runbook](docs/runbook.md) +- [AMD compute strategy](docs/compute-strategy.md) - [Failure modes](docs/failure-modes.md) - [Cost report](docs/cost-report.md) diff --git a/docs/case-study.md b/docs/case-study.md index 3fb435a..1ad64c9 100644 --- a/docs/case-study.md +++ b/docs/case-study.md @@ -15,6 +15,7 @@ Cloud and platform teams can ship IaC faster than security teams can manually va - No production cloud targets by default. - LocalStack Azure requires Docker and `LOCALSTACK_AUTH_TOKEN`. - The demo must still work if Docker, Terraform, or the model endpoint is unavailable. +- AMD Developer Cloud / DigitalOcean GPU access may be delayed, so Fireworks-compatible managed inference is kept as a contingency. ## 4. Requirements @@ -80,6 +81,8 @@ See [Runbook](runbook.md). See [Cost Report](cost-report.md). V1 is designed to run locally, with AMD Developer Cloud used only for model-serving evidence. +See [AMD Compute Strategy](compute-strategy.md) for the primary DigitalOcean/AMD path and Fireworks fallback. + ## 10. Results - Offline CLI demo runs end to end. diff --git a/docs/compute-strategy.md b/docs/compute-strategy.md new file mode 100644 index 0000000..9b59db5 --- /dev/null +++ b/docs/compute-strategy.md @@ -0,0 +1,91 @@ +# AMD Compute Strategy + +## Decision + +Use the harder AMD Developer Cloud / DigitalOcean route as the primary path for the case study. Keep Fireworks AI as a contingency endpoint if GPU access is delayed. + +## Why primary path is DigitalOcean/AMD Developer Cloud + +- It produces stronger evidence for the hackathon: model serving, ROCm, vLLM/SGLang, GPU observability, and operational setup. +- It gives a better personal learning outcome because the work includes real DevOps, cloud access, security boundaries, and inference operations. +- It supports the project thesis: private or self-controlled model inference for sensitive IaC and security evidence. +- It creates better case-study material than calling a hosted API only. + +## Why Fireworks stays as fallback + +- It can unblock the demo if AMD Developer Cloud access is delayed. +- It keeps the red/blue agent loop working through an OpenAI-compatible endpoint. +- It is still relevant to the AMD ecosystem, but it should be positioned as managed inference rather than private local/owned serving. + +## Execution plan + +### Track A - DigitalOcean baseline without GPU + +Set up the non-GPU platform first: + +- project/repo secrets +- hardened control droplet or container host +- Docker and compose baseline +- LocalStack Azure sandbox +- nullstate CLI installation +- GitHub Actions environment configuration +- run artifact storage layout +- basic monitoring and logs + +This work is useful even before the GPU is available. + +### Track B - AMD GPU inference + +When MI300X access is available: + +- provision AMD Developer Cloud / DigitalOcean GPU instance +- install ROCm stack or use provider image +- serve model with vLLM or SGLang using an OpenAI-compatible API +- expose only the required API path to the nullstate operator environment +- record model name, context length, ROCm version, GPU model, memory, throughput, and latency +- save vLLM `/metrics` snapshots and `amd-smi` or `rocm-smi` output into the case-study evidence folder + +### Track C - Fireworks contingency + +If GPU access blocks the submission: + +- configure `NULLSTATE_LLM_BASE_URL` for Fireworks-compatible endpoint +- run the same nullstate demo +- document this as the managed-inference fallback +- keep the DigitalOcean/AMD setup as the next milestone rather than hiding the blocker + +## Demo positioning + +Preferred story: + +```text +nullstate runs local IaC security validation and can use a private AMD MI300X-hosted model endpoint for red/blue reasoning over security evidence. +``` + +Fallback story: + +```text +nullstate is model-provider portable through OpenAI-compatible endpoints. The same CLI can run against managed inference while the private AMD GPU endpoint is being provisioned. +``` + +## Evidence checklist + +- `runs//report.md` +- `runs//metrics.json` +- vLLM `/metrics` before/after snapshots +- `amd-smi` or `rocm-smi` output +- ROCm version +- model server launch command +- sanitized network diagram +- screenshots of GPU utilization and CLI run +- note whether the run used DigitalOcean/AMD GPU, local mock mode, or Fireworks fallback + +## Risks + +| Risk | Impact | Mitigation | +|---|---|---| +| AMD Developer Cloud access delayed | High | Build DO baseline and use Fireworks fallback | +| GPU image/ROCm mismatch | Medium | Prefer provider image; document exact versions | +| Model too large or slow | Medium | Start with one model for both red and blue roles | +| Endpoint exposed publicly | High | restrict ingress, use token auth, document network boundary | +| Case study overclaims private inference | High | label each run by actual endpoint type | diff --git a/docs/cost-report.md b/docs/cost-report.md index e7bb84f..19fcdb7 100644 --- a/docs/cost-report.md +++ b/docs/cost-report.md @@ -12,6 +12,7 @@ V1 is designed to keep cloud spend near zero by default. Offline mode runs local | Offline demo | 0 | no cloud or model endpoint | | LocalStack Azure | depends on LocalStack access | requires auth token | | AMD Developer Cloud | hackathon credits | used for MI300X model endpoint | +| Fireworks fallback | provider dependent | contingency if GPU access is delayed | | GitHub Actions | low/free tier dependent | tests are lightweight | ## Controls diff --git a/docs/runbook.md b/docs/runbook.md index 84a9728..900a2b5 100644 --- a/docs/runbook.md +++ b/docs/runbook.md @@ -41,6 +41,14 @@ $env:NULLSTATE_LLM_API_KEY = "" Then run without `--offline`. +## AMD Developer Cloud / DigitalOcean path + +Use [AMD Compute Strategy](compute-strategy.md) as the deployment checklist. Build the non-GPU DigitalOcean baseline first, then attach the MI300X-backed model endpoint when access is available. + +## Fireworks fallback + +If AMD GPU access is delayed, point `NULLSTATE_LLM_BASE_URL` at the managed endpoint and keep the same nullstate run flow. Label the evidence as managed inference, not private GPU-hosted inference. + ## Artifact review before publishing Check: