feat: Recipe Job Init Experience (hyp init hyp-recipe-job)#409
Merged
mollyheamazon merged 23 commits intoaws:mainfrom Apr 16, 2026
Merged
feat: Recipe Job Init Experience (hyp init hyp-recipe-job)#409mollyheamazon merged 23 commits intoaws:mainfrom
mollyheamazon merged 23 commits intoaws:mainfrom
Conversation
* model customization init/find model * Adding direct create exp * Model customization Init/Create/Find * Latest model cust changes * init migration done with template validation * Init full experience migrated, CRUDL simple addition in hyp_cli.py, unit tests added, pending nova forge happy case for integ test * remove argcomplete since it is not supported yet * add reset command for dynamic template * fix integ test error for init flow * remove recipe finder and discovery changes --------- Co-authored-by: Amarjeet LNU <jamjee@amazon.com>
…efactor code for modularization, unit test added (aws#292)
…l, remove direct create support (aws#297) * bug fix for matching instance type for override params and delete command: * add pre-training-job and evaluation-job, set instance-type to optional, remove direct create support * update checkpointless flag to framework to support more modes
…e hub support, remove dynamic template
…eg test and example notebook, pending recipe update
nargokul
approved these changes
Apr 16, 2026
jam-jee
approved these changes
Apr 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's changing and why?
This PR introduces hyp-recipe-job as a new template for hyp init, enabling customers to initialize, configure, and submit fine-tuning and evaluation jobs backed by SageMaker JumpStart Hub recipes — without managing recipe
files, GitHub submodules, or S3 URIs manually.
Previously, HyperPod CLI V3 had no support for recipe-based jobs. This gap is now addressed by fetching recipes directly from JumpStart Hub APIs at init time.
User Experience Flow
1. Initialize a recipe job
This fetches the matching recipe from JumpStart Hub, downloads the override params and k8s template, and generates:
config.yaml— grouped, annotated parameter file (Job Identity → Data → Output → Hyperparameters → MLflow → Compute → Model)k8s.jinja— Kubernetes job manifest template.override_spec.json— local schema for validation and configureIf
--instance-typeis omitted, an interactive cluster selection is launched: lists your HyperPod clusters filtered to those with instance types supported by the recipe, prompts for selection, and automatically updates kubeconfig.2a. Edit
config.yamlUsers fill in required fields (data paths, output paths, job name) and optionally tune hyperparameters. The file is grouped with section headers and inline type/constraint comments.
2b. Configure individual fields
3. Validate config.yaml
Validates the current directory's config.yaml against the recipe's parameter schema (.override_spec.json). Checks required fields are present, types are correct, and values satisfy constraints (min/max/enum). Run this after editing to catch errors before submission.
4. Submit the job
Validates config, warns if the instance type isn't present in the current cluster, renders the k8s manifest, and submits to Kubernetes. A timestamped snapshot is saved under
run/<timestamp>/.Debugging & Job Management
Once a recipe job is submitted, the full set of job management commands is available under hyp-recipe-job:
Features
Model ID formats supported
meta-textgeneration-llama-3-1-8b-instructmeta-llama/Llama-3.1-8B-Instruct(resolved vialist_hub_contents+@recipe: keywordfilter, reference doc)arn:aws:sagemaker:...:hub-content/MyHub/Model/my-model/1.0.0(private hub support for internal team development)Techniques supported
Grouped config.yaml rendering
Reference doc
Parameters are ordered by user priority (not alphabetically): Job Identity → Data → Output → Core Hyperparameters → Advanced Hyperparameters → MLflow → Compute → Model. Unknown params from future recipes fall into an Other section automatically.
Instance type warning at submit time
If the
instance_typeinconfig.yamldoesn't match any node in the current cluster, a warning is shown before submission (non-blocking).Edge Cases
Considerations
Integ test currently pending downstream recipe fix for e2e happy case.
FSx dependency: Recipe k8s templates reference a fsx-claim PVC. This is a cluster infrastructure prerequisite — not something the CLI can provision. Documented in the getting started guide.
HuggingFace ID resolution: Uses list_hub_contents + @recipe: keyword filter dynamically. Works for 7/9 open-weight models; 2 edge cases (DeepSeek, Qwen3-0.6B) use a small static fallback table. A long-term fix would be asking JumpStart to add @hf_model_id: as a searchable keyword.
Reviewer Guidelines
One of the following must be true: