Skip to content

chore(stepfunctions-tasks): fix 5 failing integration tests#37154

Open
aemada-aws wants to merge 2 commits intoaws:mainfrom
aemada-aws:fix/stepfunctions-tasks-integ-tests
Open

chore(stepfunctions-tasks): fix 5 failing integration tests#37154
aemada-aws wants to merge 2 commits intoaws:mainfrom
aemada-aws:fix/stepfunctions-tasks-integ-tests

Conversation

@aemada-aws
Copy link
Copy Markdown
Collaborator

Issue # (if applicable)

N/A — Integration test remediation.

Reason for this change

5 of 7 integration tests in aws-stepfunctions-tasks were failing due to:

  1. Bedrock guardrail trace (integ.invoke-model-guardrail-trace): Amazon Titan Text G1 Express reached End-of-Life on August 15, 2025 and is no longer available in any region.
  2. Cross-region Lambda (integ.call-aws-service-cross-region-lambda): Test creates a Lambda in the stack's region but invokes it via the us-east-1 Lambda endpoint. Lambda Invoke API is regional — calling us-east-1 with an ARN from another region returns ResourceNotFoundException.
  3. SageMaker training job (integ.create-training-job-image): Hardcoded ECR account 811284229777 for BlazingText is only valid for us-east-1. Hardcoded training job name causes NAME_COLLISION on re-runs. S3 bucket had no cleanup policy.
  4. EMR tests (integ.emr-create-cluster-with-auto-deletion-policy-idle-timeout, integ.emr-create-cluster-with-ebs): EMR clusters create ENIs in VPC subnets that linger after cluster termination, preventing VPC/subnet deletion during stack teardown.

Description of changes

integ.invoke-model-guardrail-trace.ts:

  • Replaced EOL AMAZON_TITAN_TEXT_G1_EXPRESS_V1 with AMAZON_NOVA_MICRO_V1_0
  • Updated request body from Titan format (inputText/textGenerationConfig) to Nova format (messages/inferenceConfig)
  • Removed resultSelector and resultPath — the guardrail blocks the "test attack" input (word filter), and the blocked response body doesn't contain the model output structure
  • Added regions constraint for Nova Micro + Guardrails supported regions

integ.call-aws-service-cross-region-lambda.ts:

  • Changed region: 'us-east-1' to region: this.region — uses the stack's own region so the Lambda is reachable

integ.create-training-job-image.ts:

  • Added regions: ['us-east-1'] constraint to IntegTest (BlazingText ECR account is us-east-1 specific)
  • Replaced hardcoded trainingJobName with JsonPath.format('BlazingText-{}', JsonPath.executionName) for unique names
  • Added removalPolicy: DESTROY and autoDeleteObjects: true to S3 bucket
  • Increased assertion timeout from 10 to 30 minutes

integ.emr-create-cluster-with-auto-deletion-policy-idle-timeout.ts:

  • Added cdkCommandOptions: { destroy: { expectError: true } } for known EMR VPC teardown failures

integ.emr-create-cluster-with-ebs.ts:

  • Added cdkCommandOptions: { destroy: { expectError: true } } for known EMR VPC teardown failures

Describe any new or updated permissions being added

  • Bedrock guardrail trace: IAM policy updated from bedrock:InvokeModel on amazon.titan-text-express-v1 to amazon.nova-micro-v1:0
  • SageMaker training job: Added S3 auto-delete objects custom resource role (standard CDK auto-delete pattern)

Description of how you validated changes

All 7 tests validated via integ-runner deployment:

# Bedrock guardrail trace (us-east-1 for Nova Micro + Guardrails)
yarn integ test/aws-stepfunctions-tasks/test/bedrock/integ.invoke-model-guardrail-trace.js \
  --disable-update-workflow --update-on-failed --force --parallel-regions us-east-1

# SageMaker training job (us-east-1 for BlazingText ECR account)
yarn integ test/aws-stepfunctions-tasks/test/sagemaker/integ.create-training-job-image.js \
  --disable-update-workflow --update-on-failed --force --parallel-regions us-east-1

# Cross-region Lambda, ECS (any region)
yarn integ \
  test/aws-stepfunctions-tasks/test/lambda/integ.call-aws-service-cross-region-lambda.js \
  test/aws-stepfunctions-tasks/test/ecs/integ.ec2-run-task-ref-definition.js \
  test/aws-stepfunctions-tasks/test/ecs/integ.fargate-run-task.js \
  --disable-update-workflow --update-on-failed --force \
  --parallel-regions us-east-1 --parallel-regions us-west-2 --parallel-regions eu-west-1

# EMR tests (any region with VPC headroom)
yarn integ \
  test/aws-stepfunctions-tasks/test/emr/integ.emr-create-cluster-with-auto-deletion-policy-idle-timeout.js \
  test/aws-stepfunctions-tasks/test/emr/integ.emr-create-cluster-with-ebs.js \
  --disable-update-workflow --update-on-failed --force \
  --parallel-regions eu-west-1 --parallel-regions ap-southeast-2

Note: IntegTest regions property is NOT respected by integ-runner --parallel-regions. Region-constrained tests (bedrock, sagemaker) must be run with only their supported regions.

Destructive change: sfn-sm-training-job-image stack — TrainSetAwsCliLayer57B94C48 will be replaced (asset hash change from adding auto-delete objects).

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added the p2 label Mar 3, 2026
@aws-cdk-automation aws-cdk-automation requested a review from a team March 3, 2026 23:21
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Mar 3, 2026
@mergify mergify bot temporarily deployed to automation March 3, 2026 23:21 Inactive
@mergify mergify bot temporarily deployed to automation March 3, 2026 23:21 Inactive
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

⚠️ Experimental Feature: This security report is currently in experimental phase. Results may include false positives and the rules are being actively refined.
Please try merge from main to avoid findings unrelated to the PR.


TestsPassed ☑️SkippedFailed ❌️
Security Guardian Results240 ran238 passed2 failed
TestResult
Security Guardian Results
packages/@aws-cdk-testing/framework-integ/test/aws-stepfunctions-tasks/test/ecs/integ.ec2-run-task-ref-definition.js.snapshot/aws-sfn-tasks-ecs-run-task-ref-task-definition.template.json
iam-no-overly-permissive-passrole.guard❌ failure
packages/@aws-cdk-testing/framework-integ/test/aws-stepfunctions-tasks/test/ecs/integ.fargate-run-task.js.snapshot/aws-sfn-tasks-ecs-fargate-run-task.template.json
iam-no-overly-permissive-passrole.guard❌ failure

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

⚠️ Experimental Feature: This security report is currently in experimental phase. Results may include false positives and the rules are being actively refined.
Please try merge from main to avoid findings unrelated to the PR.


TestsPassed ✅SkippedFailed
Security Guardian Results with resolved templates240 ran240 passed
TestResult
No test annotations available

- integ.invoke-model-guardrail-trace: replace EOL Titan Text G1 Express
  with Nova Micro, update request format, remove resultSelector/resultPath
  (guardrail blocks input), add region constraint
- integ.call-aws-service-cross-region-lambda: use stack region instead of
  hardcoded us-east-1 so Lambda is reachable
- integ.create-training-job-image: add us-east-1 region constraint for
  BlazingText ECR account, use execution name for unique training job name,
  add bucket cleanup policy, increase assertion timeout
- integ.emr-create-cluster-with-auto-deletion-policy-idle-timeout: add
  expectError on destroy for EMR VPC teardown failures
- integ.emr-create-cluster-with-ebs: add expectError on destroy for EMR
  VPC teardown failures
@aemada-aws aemada-aws force-pushed the fix/stepfunctions-tasks-integ-tests branch from 7751f5c to 975c7e4 Compare March 3, 2026 23:49
- Remove incorrect issue link (aws#19275 is ECS, not EMR) from EMR expectError comments
- Replace with accurate description of EMR ENI teardown race condition
- Remove stale manual-cleanup comment on SageMaker bucket (now auto-deleted)
- Add region justification comments for Bedrock and SageMaker tests
@aemada-aws aemada-aws changed the title fix(stepfunctions-tasks): fix 5 failing integration tests chore(stepfunctions-tasks): fix 5 failing integration tests Mar 12, 2026
@aemada-aws aemada-aws marked this pull request as ready for review March 12, 2026 09:49
@aws-cdk-automation aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Mar 12, 2026
@aemada-aws aemada-aws added pr/needs-integration-tests-deployment Requires the PR to deploy the integration test snapshots. and removed pr/needs-maintainer-review This PR needs a review from a Core Team Member labels Mar 12, 2026
@aemada-aws aemada-aws had a problem deploying to deployment-integ-test March 12, 2026 09:55 — with GitHub Actions Error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution/core This is a PR that came from AWS. p2 pr/needs-integration-tests-deployment Requires the PR to deploy the integration test snapshots. pr/needs-maintainer-review This PR needs a review from a Core Team Member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants