Skip to content

Conversation

amandladev
Copy link
Contributor

@amandladev amandladev commented Sep 24, 2025

feat(sagemaker): add containerStartupHealthCheckTimeout support for EndpointConfig

Implements container startup health check timeout configuration for SageMaker endpoint production variants as available in CloudFormation but missing in CDK constructs.

Issue # 35566

  • Add containerStartupHealthCheckTimeout property to InstanceProductionVariantProps interface
  • Add comprehensive validation for timeout range (60-3600 seconds)
  • Add CloudFormation template generation for ContainerStartupHealthCheckTimeoutInSeconds property
  • Include test coverage for validation scenarios and edge cases
  • Update README documentation with usage examples and constraints

Reason for this change

AWS SageMaker EndpointConfig supports ContainerStartupHealthCheckTimeoutInSeconds in CloudFormation to configure health check timeout for inference containers, but this property is not exposed in the CDK SageMaker L2 constructs. Users with models that require longer initialization time cannot configure appropriate health check timeouts, leading to premature health check failures.

Description of changes

Implements AWS SageMaker container startup health check timeout support in CDK SageMaker L2 constructs, enabling users to configure appropriate health check timeouts for inference containers:

  • New containerStartupHealthCheckTimeout property in InstanceProductionVariantProps interface with AWS-compliant validation:
    • Range: 60-3600 seconds (1 minute to 1 hour)
    • Type: cdk.Duration for intuitive time specification
    • Optional property maintaining backward compatibility
  • Enhanced addInstanceProductionVariant() method with comprehensive input validation
  • Automatic conversion from cdk.Duration to seconds for CloudFormation compatibility
  • Synthesis-time validation with clear, actionable error messages
  • CloudFormation integration mapping to ContainerStartupHealthCheckTimeoutInSeconds property

Usage Example:

import * as cdk from 'aws-cdk-lib';
import * as sagemaker from '@aws-cdk/aws-sagemaker-alpha';

declare const model: sagemaker.IModel;

// Create endpoint configuration with health check timeout
const endpointConfig = new sagemaker.EndpointConfig(this, 'EndpointConfig', {
  instanceProductionVariants: [{
    variantName: 'my-variant',
    model: model,
    containerStartupHealthCheckTimeout: cdk.Duration.minutes(5), // 5 minutes timeout
  }],
});

Describe any new or updated permissions being added

N/A - No new IAM permissions required. Leverages existing SageMaker endpoint configuration permissions.

Description of how you validated changes

Unit tests: Added 5 comprehensive container startup health check timeout tests covering all validation scenarios:

  • Property inclusion in CloudFormation template when provided
  • Property absence in CloudFormation template when not provided
  • Range validation for minimum value (60 seconds)
  • Range validation for maximum value (3600 seconds)
  • Acceptance of valid timeout values at boundaries
  • Duration to seconds conversion verification

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK p2 labels Sep 24, 2025
@aws-cdk-automation aws-cdk-automation requested a review from a team September 24, 2025 23:04
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter fails with the following errors:

❌ Features must contain a change to an integration test file and the resulting snapshot.

If you believe this pull request should receive an exemption, please comment and provide a justification. A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed, add Clarification Request to a comment.

✅ A exemption request has been requested. Please wait for a maintainer's review.

@amandladev amandladev changed the title feat(sagemaker): Adding containerStartupHealthCheckTimeoutInSeconds support feat(sagemaker): adding containerStartupHealthCheckTimeoutInSeconds support Sep 25, 2025
@amandladev
Copy link
Contributor Author

Exemption Request

Integration Test Justification

Why no integration test changes are included:

This PR adds an optional property (containerStartupHealthCheckTimeout) that doesn't require integration test modifications for the following reasons:

  1. Optional Property: This is a non-breaking, optional configuration that defaults to AWS's standard behavior when omitted
  2. Comprehensive Unit Test Coverage: The feature includes 5 comprehensive unit tests covering:
    • Property inclusion/omission in CloudFormation templates
    • Input validation (range 60-3600 seconds)
    • Duration-to-seconds conversion
    • Error handling for invalid values
  3. Simple Mapping: The property performs straightforward mapping from CDK Duration to CloudFormation integer with validation - no complex logic requiring integration testing
  4. Backward Compatibility: Existing integration tests continue to pass unchanged, demonstrating no regressions
  5. CloudFormation Template Verification: Unit tests already verify correct CloudFormation template generation using Template.fromStack()

The existing integration test appropriately focuses on core EndpointConfig functionality (multiple variants, VPC configuration, model associations) rather than optional timeout parameters. Adding this property would not provide additional value beyond what unit tests already cover.

Unit tests provide sufficient coverage for this straightforward property addition without requiring integration test modifications.

@aws-cdk-automation aws-cdk-automation added the pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback. label Sep 25, 2025
@amandladev amandladev closed this Sep 25, 2025
Copy link
Contributor

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK p2 pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants