Skip to content

[Inference Operator] Update sample CRD files for v3.1 features#413

Open
zicanl-amazon wants to merge 1 commit intoaws:mainfrom
zicanl-amazon:feature/update-inference-operator-samples
Open

[Inference Operator] Update sample CRD files for v3.1 features#413
zicanl-amazon wants to merge 1 commit intoaws:mainfrom
zicanl-amazon:feature/update-inference-operator-samples

Conversation

@zicanl-amazon
Copy link
Copy Markdown

@zicanl-amazon zicanl-amazon commented Apr 22, 2026

What's changing and why?

Replace 6 stale v1alpha1-era sample files with 20 curated v1 samples from the internal operator repo.

The existing samples did not reflect any of the new CRD features added in v3.1 (PR #412). The new samples demonstrate all customer-facing v3.1 features:

  • HuggingFace model source (TGI, vLLM, SGLang runtimes, prefetch on/off)
  • Kubernetes volume model source
  • ServiceAccount (IRSA) support
  • NVMe + S3 prefetch patterns (on/off, with/without fallback)
  • BYO TLS certificate
  • Health probes (liveness/readiness)
  • Multi-node inference
  • Node affinity scheduling
  • Intelligent routing with KV cache (L1, L1+L2)

Internal-only samples excluded (DPD, test fixtures, benchmarks).
All account IDs, personal bucket names, and internal hostnames sanitized with <PLACEHOLDER> values.

Before/After UX

Before:
6 outdated samples from v1alpha1 era — no huggingface, no kubernetesVolume, no serviceAccount, no dataCapture, no health probes.

After:
20 samples covering all customer-facing v3.1 features with clear comments, prerequisites, and placeholder values for easy customization.

How was this change tested?

Sample files are static YAML documentation — no runtime behavior. Validated:

  • All 20 YAML files parse correctly
  • All samples use apiVersion: inference.sagemaker.aws.amazon.com/v1
  • All internal content sanitized (account IDs, bucket names, node IDs)
  • No DPD-related samples included

Are unit tests added?

N/A — sample files are documentation/examples, not executable code.

Are integration tests added?

N/A — sample files are documentation/examples, not executable code.

Reviewer Guidelines

‼️ Merge Requirements: PRs with failing integration tests cannot be merged without justification.

One of the following must be true:

  • All automated PR checks pass
  • Failed tests include local run results/screenshots proving they work
  • Changes are documentation-only

Replace 6 stale v1alpha1-era samples with 20 curated v1 samples covering:
- HuggingFace model source (TGI, vLLM, SGLang runtimes, prefetch on/off)
- Kubernetes volume model source
- ServiceAccount (IRSA) support
- NVMe + S3 prefetch patterns (on/off, with/without fallback)
- BYO TLS certificate
- Health probes (liveness/readiness)
- Multi-node inference
- Node affinity scheduling
- Intelligent routing with KV cache (L1, L1+L2)

Internal-only samples excluded (DPD, test fixtures, benchmarks).
All account IDs, personal bucket names, and internal hostnames sanitized.

Signed-off-by: Zican Li <zicanl@amazon.com>
@zicanl-amazon zicanl-amazon requested a review from a team as a code owner April 22, 2026 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant