Tools, guides, and workflows for the AMD Instinct MI60 GPU for AI training and inference workloads.
Driver installation, system requirements, troubleshooting, and cooling guides for single- and dual-GPU setups. Includes a dual-duct STL.
Production inference using vLLM with tensor parallelism. Covers why vLLM, ROCm compatibility, AWQ quantization, and the big-chat configuration example.
Stable Diffusion image generation with ROCm acceleration. Container setup, model installation, and troubleshooting.
Docker-based workflow for training LoRA adapters, merging with base models, and converting to GGUF format.
Dynamic switching between GPU configurations (big-chat, coder, etc.) via HTTP API. Includes state machine and API reference.
Prometheus metrics, temperature alerts, and Grafana dashboard setup for GPU health monitoring.
| Spec | Value |
|---|---|
| Memory | 32GB HBM2 per GPU (64GB total with dual) |
| FP64 | 7.4 TFLOPS |
| FP32 | 10.6 TFLOPS |
| Interface | PCIe Gen4 |
| Architecture | gfx906 (Vega 20) |
- Set up hardware per Hardware Setup
- Install ROCm 6.x and verify with
rocm-smi - Start the gpu-state-service:
python3 gpu-state-service.py - Switch to a configuration:
curl -X POST -H "Content-Type: application/json" \ -d '{"config":"big-chat"}' http://localhost:9100/switch
- Query the model at
http://localhost:8000(OpenAI-compatible API)
- Linux (Ubuntu 22.04/24.04 recommended)
- ROCm 6.x
- 300W power per GPU
- Adequate cooling (see hardware-setup)
- 32GB+ system RAM (64GB+ for dual-GPU)
- containerd with nerdctl
MIT License - see LICENSE file.