Merged
Conversation
Add _wait_for_model_active() to poll get_custom_model until the model reaches Active status before calling create_custom_model_deployment. This fixes ValidationException when the custom model is not yet ready for deployment after create_custom_model returns.
Extract _is_nova_model() helper to eliminate duplicated Nova detection logic across deploy() and _get_s3_artifacts(). Uses getattr with safe defaults instead of fragile hasattr chains. Add input validation to deploy() and create_deployment(): - Raise ValueError when model_package is not set - Raise ValueError when custom_model_name or role_arn missing for Nova deployments - Raise ValueError when model_arn is empty in create_deployment Move json and urlparse imports to module level (were previously imported inside _get_checkpoint_uri_from_manifest). Replace f-string logging with lazy %s formatting throughout. Initialize status=None before the polling loop in _wait_for_model_active to avoid UnboundLocalError if the loop body never executes. Rewrite unit tests (43 tests) with full coverage: - _is_nova_model: recipe_name, hub_content_name, case insensitivity, missing base_model, None fields - __init__: None model, TrainingJob, ModelPackage - Client singletons: caching, injection - _fetch_model_package: ModelPackage, TrainingJob, ModelTrainer, unknown type - _get_s3_artifacts: None package, non-Nova, Nova delegation, Nova fallback - _get_checkpoint_uri_from_manifest: success, missing key, NoSuchKey, not TrainingJob, no artifacts, invalid JSON - _wait_for_model_active: immediate, polling, Failed, timeout - create_deployment: polling chain, extra kwargs, empty/None ARN - deploy: non-Nova, Nova full chain, hub_content_name detection, default deployment name, tags, missing params, None stripping Add integration tests for Nova E2E deployment: - Training job existence and status verification - Builder creation and Nova detection via _is_nova_model - S3 artifacts checkpoint validation - Full deploy-with-polling flow (marked @pytest.mark.slow) - Timeout behavior on bogus ARN - Validation error paths (no model_package, empty model_arn) - Resource cleanup fixture for deployments and custom models
…eployment Previously create_deployment() only polled for the custom model to reach Active status before calling CreateCustomModelDeployment, but did not wait for the deployment itself to become Active. This caused callers to receive a deployment ARN that was still in Creating state, requiring manual polling in user code. Add _wait_for_deployment_active() that polls get_custom_model_deployment until status reaches Active, raises RuntimeError on Failed, and times out after max_wait seconds (default 3600s, poll interval 30s). Wire it into create_deployment() so the full flow is now: 1. _wait_for_model_active (poll model creation) 2. create_custom_model_deployment (API call) 3. _wait_for_deployment_active (poll deployment creation) Gracefully skips deployment polling if the API response does not contain a customModelDeploymentArn. Unit tests (48 passing): - _wait_for_deployment_active: immediate Active, polling, Failed status, timeout - create_deployment: full model+deployment polling chain, skip polling when no ARN in response - deploy Nova chain: updated to verify deployment polling
The TestModelCustomizationDeployment integ tests were failing with DescribeTrainingJob 'Requested resource not found' because the SageMaker SDK caches the first session's region internally. The session-scoped cleanup_e2e_endpoints fixture (autouse) was creating a session in us-east-1 (default) before the class fixtures could set us-west-2, causing all subsequent TrainingJob.get calls to hit the wrong region. Fix by setting AWS_DEFAULT_REGION=us-west-2 in the cleanup_e2e_endpoints fixture before any SageMaker session is created. Add tests/integ/conftest.py with a session-scoped nova_training_job_name fixture that implements get-or-create: - Checks if sdk-integ-nova-micro-sft exists and is Completed - If InProgress, waits for completion - If not found, uploads minimal training data to S3 and launches a Nova Micro SFT training job via SFTTrainer - Reused across test_bedrock_nova_e2e.py and TestBedrockNovaDeployment in test_model_customization_deployment Update both Nova test files to use the shared fixture instead of hardcoded training job names.
The SageMaker SDK's SageMakerClient reads SAGEMAKER_REGION env var at init time and caches the region for all subsequent API calls. The cleanup_e2e_endpoints session fixture was the first to create a SageMakerClient (in the default region), which then poisoned all subsequent TrainingJob.get calls regardless of the region parameter. Fix by setting SAGEMAKER_REGION=us-west-2 in cleanup_e2e_endpoints before any SDK session is created, since all resources in this test file live in us-west-2. The env var is restored after cleanup. In CodeBuild (us-west-2) this is a no-op since the default region already matches. The other test files (triton, tei, tgi) are not affected since they have their own fixtures and don't import from this file.
Remove us-east-1 Nova integration tests that cannot run in the us-west-2 CodeBuild environment: - Delete tests/integ/test_bedrock_nova_e2e.py - Delete tests/integ/conftest.py (Nova get-or-create fixture) - Remove TestBedrockNovaDeployment class from test_model_customization_deployment.py Add example_notebooks/bedrock_nova_deployment.ipynb covering the full Nova workflow: SFTTrainer fine-tuning, BedrockModelBuilder deploy with model+deployment polling, inference, and cleanup. The BedrockModelBuilder source code and unit tests (48 passing) are unchanged. The us-west-2 integ tests for non-Nova Bedrock deployment (TestModelCustomizationDeployment) remain.
Add notebooks demonstrating Bedrock deployment workflows: - bedrock-modelbuilder-deployment-nova.ipynb: Nova model deployment via BedrockModelBuilder with SFTTrainer fine-tuning - boto3_deployment_notebook.ipynb: Direct boto3 Bedrock deployment - model_builder_deployment_notebook(1).ipynb: ModelBuilder deployment - 07-ml-model-development(1).ipynb: ML model development workflow - sagemaker-serve/example_notebooks/bedrock_nova_deployment.ipynb: Clean Nova deployment example with polling, inference, and cleanup
rsareddy0329
previously approved these changes
Mar 18, 2026
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def _is_nova_model(container) -> bool: |
Contributor
There was a problem hiding this comment.
not a blocker, but I think this can be moved to common utils as model builder also has this method
Add notebooks demonstrating Bedrock deployment workflows: - bedrock-modelbuilder-deployment-nova.ipynb: Nova model deployment via BedrockModelBuilder with SFTTrainer fine-tuning - boto3_deployment_notebook.ipynb: Direct boto3 Bedrock deployment - model_builder_deployment_notebook(1).ipynb: ModelBuilder deployment - 07-ml-model-development(1).ipynb: ML model development workflow - sagemaker-serve/example_notebooks/bedrock_nova_deployment.ipynb: Clean Nova deployment example with polling, inference, and cleanup
…flow
Simplify notebook to use existing completed training job with
BedrockModelBuilder deploy flow. Fix Nova inference content format
to use array of {text: ...} objects. Remove broken SFTTrainer cells
that fail due to botocore service model mismatch.
rsareddy0329
approved these changes
Mar 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Source (bedrock_model_builder.py):
Extract _is_nova_model() module-level helper using safe getattr with defaults, replacing all duplicated hasattr chains
Add ValueError guards in deploy() for missing model_package, custom_model_name, and role_arn
Add ValueError guard in create_deployment() for empty/None model_arn
Initialize status = None before the polling loop to prevent UnboundLocalError
Move json and urlparse imports to module level
Replace f-string logging with lazy %s formatting throughout