Skip to content

The Silver Path

Pre-release
Pre-release

Choose a tag to compare

@dferguson992 dferguson992 released this 21 May 20:39
· 91 commits to main since this release
ee9d8c3

Release: The Silver Path (v0.9.0)

Overview

The Silver Path release focuses the ML Container Creator on a curated, fully-tested set of models and instances — the "golden path" models that support both serverless fine-tuning and automated inference recommendations, served on NVIDIA A10G (g5) instances. Every model in this release has been validated end-to-end: generate → build → deploy → test → tune → deploy adapter → clean. This release is the "silver path" release because there are many paper cuts and issues with the release that need to be ironed in subsequent patches.

What's New Since v0.6.0

Managed Model Customization (do/tune)

Serverless fine-tuning with zero infrastructure management. Provide a dataset and technique — SageMaker handles the rest.

./do/tune --technique sft --dataset s3://my-bucket/train.jsonl
./do/tune --technique dpo --dataset hf://my-org/preference-data
./do/adapter add tuned-sft --from-tune
  • Supported techniques: SFT, DPO, RLAIF, RLVR
  • Supported models: Qwen 2.5/3, Llama 3.1/3.2/3.3, DeepSeek R1 Distill, GPT-OSS
  • Output types: LoRA adapter weights or full merged model
  • Integration: Output feeds directly into do/adapter add --from-tune or do/add-ic --from-tune
  • Idempotency: Re-running resumes or reports existing jobs

BYOC Training (do/train)

Custom training jobs with your own scripts, containers, and hyperparameters.

# Configure in do/training/config.yaml, then:
./do/train
./do/train --status
./do/train --dry-run
  • YAML configuration: Instance type, script path, dataset, hyperparameters, spot training, checkpoints
  • Managed spot training: Reduce costs with automatic checkpoint resumption
  • Multi-instance: Distributed training across multiple GPUs
  • Feedback loop: Detects output type (adapter vs full model) and suggests deployment commands

Notebook Export (do/export --notebook)

Generate a self-contained Jupyter notebook for deploying from SageMaker Studio.

./do/export --notebook
  • 10 sections: Setup → Config → Build → Model → Endpoint → IC → Test → LoRA Adapter → Fine-Tune → Cleanup
  • Deployment targets: Realtime, async, batch (conditional sections)
  • LoRA adapter section: Attach adapters to running endpoints (conditional on --enable-lora)
  • Fine-tune section: Submit managed customization jobs from the notebook (conditional on TUNE_SUPPORTED)
  • Python generator: No bash heredoc nightmare — clean programmatic JSON construction

AWS Marketplace Model Packages

Deploy vendor models without building containers.

ml-container-creator --deployment-config marketplace --model-name "marketplace://arn:aws:sagemaker:..."
  • No Dockerfile, no build, no push — just deploy/test/benchmark/clean
  • Subscription picker MCP server — discovers active Marketplace subscriptions
  • Validation: ARN format, subscription status, instance type compatibility
  • Same lifecycle UX: do/deploy, do/test, do/benchmark, do/clean

Golden-Path Catalog

Focused catalog for reliable, tested deployments:

Category Scope
Instances g5 family (8 types: xlarge → 48xlarge)
Models 22 tune-supported models across 5 families
Servers vLLM, SGLang, TensorRT-LLM, LMI/DJL, Flask

E2E Validation Runner

Automated end-to-end testing across configurations.

node scripts/e2e-runner.js --tier ci
  • Tiered testing: CI (4 configs, ~30 min), nightly (+12), weekly (+7)
  • Lifecycle execution: build → push → deploy → test → clean (extensible)
  • Bounded parallelism: Configurable concurrency limit
  • Results: JSON + markdown summary, S3 upload, SNS on failure

Infrastructure Improvements

  • MCP server portability: Relative paths in config (no more machine-specific absolute paths)
  • Bootstrap stack: Tune IAM permissions, S3 buckets, MLflow integration
  • LoRA adapter lifecycle: do/adapter add/list/remove/update for multi-adapter serving
  • Multi-IC endpoints: Instance pools, priority-based fallback, heterogeneous instance types
  • SageMaker AI Benchmarking: do/benchmark with AIPerf integration
  • Secrets Manager: Zero-knowledge operation for HF tokens and NGC API keys
  • Schema-driven validation: Every AWS API payload validated against service model before submission

Breaking Changes

  • JumpStart model source removed — Use HuggingFace model IDs directly (e.g., meta-llama/Llama-3.1-8B-Instruct instead of jumpstart://...)
  • Instance catalog trimmed — Only g5 family instances available. Other families (g4dn, g6e, p-series, inf2) will return in future releases as they're validated.
  • Diffusors catalog emptied — Diffusion models (Stable Diffusion, FLUX) are temporarily removed from the golden path. The architecture still supports them.

Supported Models (Golden Path)

Family Models Techniques
Qwen 2.5 7B, 14B, 32B, 72B SFT, DPO, RLVR (72B)
Qwen 3 0.6B, 1.7B, 4B, 8B, 14B, 32B SFT, DPO, RLVR (32B)
Llama 3 3.2-1B, 3.2-3B, 3.1-8B, 3.3-70B SFT, DPO, RLVR, RLAIF (70B)
DeepSeek R1 Distill-Qwen 1.5B/7B/14B/32B, Distill-Llama 8B/70B SFT
GPT-OSS 20B, 120B SFT, DPO

Instance Recommendations

Model Size Recommended Instance GPUs VRAM
≤8B (bf16) ml.g5.xlarge 1 24 GB
8B-14B (bf16) ml.g5.2xlarge 1 24 GB
14B-32B (bf16) ml.g5.12xlarge 4 96 GB
32B-72B (bf16) ml.g5.48xlarge 8 192 GB
70B+ (AWQ 4-bit) ml.g5.12xlarge 4 96 GB

What's Next

  • g6 instance family* — Newer NVIDIA L4 GPUs with better price/performance
  • Diffusion models — Returning support for Diffusion models
  • HyperPod notebook — Kubernetes-based deployment notebook
  • Hyperparameter tuning — SageMaker HPO integration
  • Full e2e CI — Automated nightly/weekly validation across all tiers