The Silver Path
Pre-releaseRelease: The Silver Path (v0.9.0)
Overview
The Silver Path release focuses the ML Container Creator on a curated, fully-tested set of models and instances — the "golden path" models that support both serverless fine-tuning and automated inference recommendations, served on NVIDIA A10G (g5) instances. Every model in this release has been validated end-to-end: generate → build → deploy → test → tune → deploy adapter → clean. This release is the "silver path" release because there are many paper cuts and issues with the release that need to be ironed in subsequent patches.
What's New Since v0.6.0
Managed Model Customization (do/tune)
Serverless fine-tuning with zero infrastructure management. Provide a dataset and technique — SageMaker handles the rest.
./do/tune --technique sft --dataset s3://my-bucket/train.jsonl
./do/tune --technique dpo --dataset hf://my-org/preference-data
./do/adapter add tuned-sft --from-tune- Supported techniques: SFT, DPO, RLAIF, RLVR
- Supported models: Qwen 2.5/3, Llama 3.1/3.2/3.3, DeepSeek R1 Distill, GPT-OSS
- Output types: LoRA adapter weights or full merged model
- Integration: Output feeds directly into
do/adapter add --from-tuneordo/add-ic --from-tune - Idempotency: Re-running resumes or reports existing jobs
BYOC Training (do/train)
Custom training jobs with your own scripts, containers, and hyperparameters.
# Configure in do/training/config.yaml, then:
./do/train
./do/train --status
./do/train --dry-run- YAML configuration: Instance type, script path, dataset, hyperparameters, spot training, checkpoints
- Managed spot training: Reduce costs with automatic checkpoint resumption
- Multi-instance: Distributed training across multiple GPUs
- Feedback loop: Detects output type (adapter vs full model) and suggests deployment commands
Notebook Export (do/export --notebook)
Generate a self-contained Jupyter notebook for deploying from SageMaker Studio.
./do/export --notebook- 10 sections: Setup → Config → Build → Model → Endpoint → IC → Test → LoRA Adapter → Fine-Tune → Cleanup
- Deployment targets: Realtime, async, batch (conditional sections)
- LoRA adapter section: Attach adapters to running endpoints (conditional on
--enable-lora) - Fine-tune section: Submit managed customization jobs from the notebook (conditional on
TUNE_SUPPORTED) - Python generator: No bash heredoc nightmare — clean programmatic JSON construction
AWS Marketplace Model Packages
Deploy vendor models without building containers.
ml-container-creator --deployment-config marketplace --model-name "marketplace://arn:aws:sagemaker:..."- No Dockerfile, no build, no push — just deploy/test/benchmark/clean
- Subscription picker MCP server — discovers active Marketplace subscriptions
- Validation: ARN format, subscription status, instance type compatibility
- Same lifecycle UX:
do/deploy,do/test,do/benchmark,do/clean
Golden-Path Catalog
Focused catalog for reliable, tested deployments:
| Category | Scope |
|---|---|
| Instances | g5 family (8 types: xlarge → 48xlarge) |
| Models | 22 tune-supported models across 5 families |
| Servers | vLLM, SGLang, TensorRT-LLM, LMI/DJL, Flask |
E2E Validation Runner
Automated end-to-end testing across configurations.
node scripts/e2e-runner.js --tier ci- Tiered testing: CI (4 configs, ~30 min), nightly (+12), weekly (+7)
- Lifecycle execution: build → push → deploy → test → clean (extensible)
- Bounded parallelism: Configurable concurrency limit
- Results: JSON + markdown summary, S3 upload, SNS on failure
Infrastructure Improvements
- MCP server portability: Relative paths in config (no more machine-specific absolute paths)
- Bootstrap stack: Tune IAM permissions, S3 buckets, MLflow integration
- LoRA adapter lifecycle:
do/adapter add/list/remove/updatefor multi-adapter serving - Multi-IC endpoints: Instance pools, priority-based fallback, heterogeneous instance types
- SageMaker AI Benchmarking:
do/benchmarkwith AIPerf integration - Secrets Manager: Zero-knowledge operation for HF tokens and NGC API keys
- Schema-driven validation: Every AWS API payload validated against service model before submission
Breaking Changes
- JumpStart model source removed — Use HuggingFace model IDs directly (e.g.,
meta-llama/Llama-3.1-8B-Instructinstead ofjumpstart://...) - Instance catalog trimmed — Only g5 family instances available. Other families (g4dn, g6e, p-series, inf2) will return in future releases as they're validated.
- Diffusors catalog emptied — Diffusion models (Stable Diffusion, FLUX) are temporarily removed from the golden path. The architecture still supports them.
Supported Models (Golden Path)
| Family | Models | Techniques |
|---|---|---|
| Qwen 2.5 | 7B, 14B, 32B, 72B | SFT, DPO, RLVR (72B) |
| Qwen 3 | 0.6B, 1.7B, 4B, 8B, 14B, 32B | SFT, DPO, RLVR (32B) |
| Llama 3 | 3.2-1B, 3.2-3B, 3.1-8B, 3.3-70B | SFT, DPO, RLVR, RLAIF (70B) |
| DeepSeek R1 | Distill-Qwen 1.5B/7B/14B/32B, Distill-Llama 8B/70B | SFT |
| GPT-OSS | 20B, 120B | SFT, DPO |
Instance Recommendations
| Model Size | Recommended Instance | GPUs | VRAM |
|---|---|---|---|
| ≤8B (bf16) | ml.g5.xlarge | 1 | 24 GB |
| 8B-14B (bf16) | ml.g5.2xlarge | 1 | 24 GB |
| 14B-32B (bf16) | ml.g5.12xlarge | 4 | 96 GB |
| 32B-72B (bf16) | ml.g5.48xlarge | 8 | 192 GB |
| 70B+ (AWQ 4-bit) | ml.g5.12xlarge | 4 | 96 GB |
What's Next
- g6 instance family* — Newer NVIDIA L4 GPUs with better price/performance
- Diffusion models — Returning support for Diffusion models
- HyperPod notebook — Kubernetes-based deployment notebook
- Hyperparameter tuning — SageMaker HPO integration
- Full e2e CI — Automated nightly/weekly validation across all tiers