Parent: #386 (Project Bespoke epic)
Goal
Validate that structured pruning of Gemma 4 31B is feasible and prepare all tooling for the MI300X pruning session.
Tasks
Research & Code Review
Architecture Analysis
MI300X Preparation
Sliding/Full Attention Constraint (Carmack flag)
VRAM Budget Concern
31B bf16 = 62GB. With Adam: params (62GB) + grads (62GB) + optimizer (124GB) = 248GB > 192GB MI300X.
Options:
- Muon optimizer (no momentum for matrices, ~40% less state) → ~180GB, fits tight
- Gradient checkpointing + activation offload → reduces activation memory
- Mixed precision: bf16 params + fp32 optimizer partitioned → standard DeepSpeed approach
- 8-bit Adam (bitsandbytes) → 62 + 62 + 62 = 186GB, fits
Resolve before Phase 1.
Definition of Done
- Sheared LLaMA adaptation code written and tested on a small model (Gemma 4 E2B as proxy)
- Gemma 4 31B layer importance profile completed
- Target architectures defined for each pruning level
- MI300X scripts ready to run
- VRAM budget resolved
Parent: #386 (Project Bespoke epic)
Goal
Validate that structured pruning of Gemma 4 31B is feasible and prepare all tooling for the MI300X pruning session.
Tasks
Research & Code Review
Architecture Analysis
MI300X Preparation
Sliding/Full Attention Constraint (Carmack flag)
VRAM Budget Concern
31B bf16 = 62GB. With Adam: params (62GB) + grads (62GB) + optimizer (124GB) = 248GB > 192GB MI300X.
Options:
Resolve before Phase 1.
Definition of Done