Skip to content

feat: Add v1.2 schema support for custom and jumpstart inference endpoints#417

Merged
mollyheamazon merged 9 commits intoaws:mainfrom
chad119:main
May 7, 2026
Merged

feat: Add v1.2 schema support for custom and jumpstart inference endpoints#417
mollyheamazon merged 9 commits intoaws:mainfrom
chad119:main

Conversation

@chad119
Copy link
Copy Markdown
Collaborator

@chad119 chad119 commented May 6, 2026

What's changing and why?

Adds v1.2 schema version for custom and jumpstart inference endpoints, introducing new fields supported by the latest inference operator CRD. Also includes bug fixes for template rendering, apiVersion, and schema validation.

New v1.2 features

  • Custom endpoint: replicas, initialReplicaCount, maxDeployTimeInSeconds, instanceTypes (list), tags, nodeAffinity, kubernetes (initContainers, volumes, schedulerName, serviceAccountName), dataCapture, loadBalancer,
    worker.probes, worker.requestLimits, worker.args, worker.command, worker.workingDir, metrics.metricsScrapeIntervalSeconds, metrics.modelMetrics, customCertificateConfig, dnsConfig
  • Jumpstart endpoint: Same new fields where applicable (metrics, loadBalancer, intelligentRouting, kvCache, dnsConfig, dataCapture, customCertificate, env)
  • CLI flags: All new fields exposed as individual CLI flags + JSON passthrough for complex objects (--auto-scaling-spec, --probes, --kubernetes, --node-affinity, --data-capture, --tags)

Bug fixes (template/schema)

  1. Fixed apiVersion in v1.2 template (hyperpod.sagemaker.aws/v1inference.sagemaker.aws.amazon.com/v1)
  2. Added all missing fields to the v1.2 Jinja template (were silently dropped from rendered k8s.yaml)
  3. Added missing pattern validations to schema.json for certificate/DNS fields

Other changes

  • Helm chart: updated inference operator CRD YAML, chart versions, manager config
  • Space: minor fix in space_access.py
  • README: updated with v1.2 CLI/SDK documentation
  • Removed stale space tests

Before/After UX

Before: v1.2 schema didn't exist. Users were limited to v1.1 fields.

After: Users can use --version 1.2 (or init experience) to access all new operator features via CLI flags or SDK.

# Example: custom endpoint with new v1.2 fields
hyp create hyp-custom-endpoint \
    --version 1.2 \
    --endpoint-name my-endpoint \
    --model-name my-model \
    --model-source-type s3 \
    --image-uri my-image:latest \
    --container-port 8080 \
    --model-volume-mount-name model-weights \
    --s3-bucket-name my-bucket \
    --s3-region us-east-1 \
    --instance-type ml.g5.8xlarge \
    --replicas 2 \
    --max-deploy-time-in-seconds 7200 \
    --worker-command 'python,serve.py' \
    --worker-args '--max-model-len,4096' \
    --load-balancer-health-check-path /ping \
    --load-balancer-routing-algorithm least_outstanding_requests \
    --tags '{"team": "ml", "env": "prod"}' \
    --kubernetes '{"serviceAccountName": "my-sa"}' \
    --probes '{"livenessProbe": {"httpGet": {"path": "/health", "port": 8080}}}'

How was this change tested?

1. Manual end-to-end testing on live cluster (account 249127818294, cluster sagemaker-chadchc-test-3b7345ba-eks) — all CLI flags for both custom and jumpstart endpoints verified against operator CRD
2. CRD field alignment verified against AWSCrescendoInferenceOperator mainline
3. Unit tests: 174 pass

Are unit tests added?

Yes — test/unit_tests/cli/test_inference.py (expanded), test/unit_tests/inference/test_hp_endpoint.py (new), test/unit_tests/inference/test_hp_jumpstart_endpoint.py (expanded)

Are integration tests added?

Yes:

- test/integration_tests/inference/cli/test_cli_custom_v12_fields.py
- test/integration_tests/inference/cli/test_cli_jumpstart_v12_fields.py
- test/integration_tests/inference/sdk/test_sdk_custom_v12_fields.py
- test/integration_tests/inference/sdk/test_sdk_jumpstart_v12_fields.py

Reviewer Guidelines

‼️ Merge Requirements: PRs with failing integration tests cannot be merged without justification.

One of the following must be true:

- [x] All automated PR checks pass
- [ ] Failed tests include local run results/screenshots proving they work
- [ ] Changes are documentation-only

@chad119 chad119 requested a review from a team as a code owner May 6, 2026 20:41
@chad119 chad119 temporarily deployed to manual-approval May 6, 2026 20:41 — with GitHub Actions Inactive
Comment thread src/sagemaker/hyperpod/inference/config/hp_endpoint_config.py
"--gated-model-download-role", "arn:aws:iam::249127818294:role/gated-download",
"--model-hub-name", "SageMakerPublicHub",
])
assert result.exit_code == 0, result.output
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a test that ensures full creation not just exit code

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added verification that the CR is actually created on the cluster after the CLI call — HPJumpStartEndpoint.get() / HPEndpoint.get() now confirms the resource exists in K8s. The exit code assert is kept for diagnostic output if creation
fails. Full field-by-field validation is in the subsequent test_verify_v12_fields_via_sdk test.

@chad119 chad119 temporarily deployed to manual-approval May 7, 2026 00:20 — with GitHub Actions Inactive
@mollyheamazon mollyheamazon merged commit 3c8a7fb into aws:main May 7, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants