Skip to content

fix: resolve e2e import test concurrency races#1067

Merged
tejaskash merged 2 commits intoaws:mainfrom
Hweinstock:fix/e2e-import-concurrency-races
Apr 30, 2026
Merged

fix: resolve e2e import test concurrency races#1067
tejaskash merged 2 commits intoaws:mainfrom
Hweinstock:fix/e2e-import-concurrency-races

Conversation

@Hweinstock
Copy link
Copy Markdown
Contributor

@Hweinstock Hweinstock commented Apr 30, 2026

Description

Two issues in running the e2e tests in parallel:

naming conflict

botocore.errorfactory.ConflictException: An error occurred (ConflictException) when calling the CreateAgentRuntime operation: An agent with the specified name already exists. Use a different name and try again.

Resource names are using time (in seconds) to suffix, but this creates an issue when they create at the same time. Switch to UUID or millisecond precision in resource name suffix.

list then get race condition

[error] Online evaluation configuration not found

The code performs a check that the given evaluator isn't already in an eval config, because if it is, the import will fail.
When iterating through all online eval configs in the account, we call list then get, but its possible an eval config is deleted between the list and get. This will cause the promise to reject with a ResourceNotFoundError. Therefore we can ignore these errors.

Related Issue

Closes ##1066

Documentation PR

N/A

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Other (please describe):

Testing

How have you tested the change?

  • I ran npm run test:unit and npm run test:integ
  • I ran npm run typecheck
  • I ran npm run lint
  • If I modified src/assets/, I ran npm run test:update-snapshots and committed the updated snapshots

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.

@github-actions github-actions Bot added size/s PR size: S agentcore-harness-reviewing AgentCore Harness review in progress labels Apr 30, 2026
@Hweinstock Hweinstock force-pushed the fix/e2e-import-concurrency-races branch from 7ad4f06 to 83ae514 Compare April 30, 2026 20:21
@github-actions github-actions Bot added size/s PR size: S and removed size/s PR size: S agentcore-harness-reviewing AgentCore Harness review in progress labels Apr 30, 2026
@Hweinstock Hweinstock force-pushed the fix/e2e-import-concurrency-races branch from 83ae514 to 75b490d Compare April 30, 2026 20:43
@github-actions github-actions Bot added size/s PR size: S and removed size/s PR size: S labels Apr 30, 2026
@Hweinstock Hweinstock force-pushed the fix/e2e-import-concurrency-races branch from 75b490d to 975c5ec Compare April 30, 2026 20:48
@github-actions github-actions Bot added size/m PR size: M and removed size/s PR size: S labels Apr 30, 2026
@aws aws deleted a comment from agentcore-cli-automation Apr 30, 2026
Fix two independent concurrency issues causing flaky e2e import tests:

1. TOCTOU race in evaluator import (import-evaluator.ts):
   The beforeConfigWrite hook lists all online eval configs then fetches
   details for each with Promise.all. If a config is deleted between the
   list and get calls, the API throws 'Online evaluation configuration
   not found' and the entire import fails. Fixed by using
   Promise.allSettled and filtering out disappeared configs.

2. Resource name collisions across parallel CI shards (setup_*.py):
   Python setup scripts generated resource names using int(time.time())
   (second-level precision). Parallel CI shards starting in the same
   second would collide with ConflictException. The test already passes
   a unique RESOURCE_SUFFIX env var but scripts ignored it for naming.
   Added NAME_SUFFIX to common.py that prefers RESOURCE_SUFFIX when set,
   and updated all setup scripts to use it.
@Hweinstock Hweinstock force-pushed the fix/e2e-import-concurrency-races branch from 975c5ec to f46d632 Compare April 30, 2026 20:51
@Hweinstock Hweinstock closed this Apr 30, 2026
@Hweinstock Hweinstock reopened this Apr 30, 2026
@github-actions github-actions Bot added size/s PR size: S agentcore-harness-reviewing AgentCore Harness review in progress and removed size/m PR size: M size/s PR size: S labels Apr 30, 2026
@agentcore-cli-automation
Copy link
Copy Markdown

Reviewed the changes — looks good to merge.

The two fixes are well-scoped and correct:

  1. Name collision fix: Switching from int(time.time()) to a per-process UUID (uuid.uuid4().hex[:12]) correctly addresses the ConflictException when parallel shards call CreateAgentRuntime within the same second. The fallback to RESOURCE_SUFFIX when set preserves the existing override mechanism, and 48 bits of randomness is plenty for CI collision avoidance. Each shard runs on its own GitHub Actions runner, so there's no concern about bugbash-resources.json being shared across shards (only the AWS account is shared, which was the actual source of the collision).

  2. TOCTOU race fix: Swallowing ResourceNotFoundException between listAllOnlineEvaluationConfigs and getOnlineEvaluationConfig is the right call — a config that was deleted mid-iteration can't be holding a lock on the evaluator being imported. The comment in the code explains the rationale clearly. Using instanceof ResourceNotFoundException is a bit stylistically different from the rest of the codebase, which tends to use err.name === 'ResourceNotFoundException', but it's equivalent and arguably more robust.

Very minor nit (not blocking): import time is now unused in setup_evaluator.py, setup_gateway.py, setup_memory_full.py, and setup_runtime_basic.py. Feel free to clean those up if you touch these files again.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Apr 30, 2026
@github-actions github-actions Bot added size/s PR size: S and removed size/s PR size: S labels Apr 30, 2026
@Hweinstock
Copy link
Copy Markdown
Contributor Author

removed unused time import. Good call review agent!

@Hweinstock Hweinstock marked this pull request as ready for review April 30, 2026 21:03
@Hweinstock Hweinstock requested a review from a team April 30, 2026 21:03
Copy link
Copy Markdown
Contributor

@tejaskash tejaskash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tejaskash tejaskash merged commit bd6f841 into aws:main Apr 30, 2026
23 checks passed
Copy link
Copy Markdown
Contributor

@jesseturner21 jesseturner21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/s PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants