Feat: Evaluation preview features - Batch evaluation and config bundles by padmak30 · Pull Request #446 · aws/bedrock-agentcore-sdk-python

padmak30 · 2026-04-30T22:04:24Z

Description of changes:

Batch Evaluation

Run batch evaluation on dataset
Ground truth support
Simulated dataset support

ConfigBundles

Config bundle support - read config bundle values from baggage
BaggageSpanProcessor and routing experiment context helpers

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Re-adding workflow that was lost in a force push to main. Original PR: aws/bedrock-agentcore-sdk-python-private#60 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add config bundle support to runtime Introduces a new config_bundle module that allows agent handlers to read configuration from BedrockAgentCore configuration bundles, delivered via W3C baggage headers on each invocation

* feat: add batch evaluation Introduces BatchEvaluationRunner, which orchestrates end-to-end batch evaluation against the AgentCore Evaluation Service

GITHUB_TOKEN lacks the workflows permission, so syncing .github/workflows/ from the public repo causes push failures. After merging, restore our workflow files from HEAD before committing in both the clean and conflict paths.

… operations

Bumps [cryptography](https://github.com/pyca/cryptography) from 46.0.5 to 46.0.7. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](pyca/cryptography@46.0.5...46.0.7) --- updated-dependencies: - dependency-name: cryptography dependency-version: 46.0.7 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [pillow](https://github.com/python-pillow/Pillow) from 12.1.1 to 12.2.0. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](python-pillow/Pillow@12.1.1...12.2.0) --- updated-dependencies: - dependency-name: pillow dependency-version: 12.2.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [python-multipart](https://github.com/Kludex/python-multipart) from 0.0.22 to 0.0.26. - [Release notes](https://github.com/Kludex/python-multipart/releases) - [Changelog](https://github.com/Kludex/python-multipart/blob/master/CHANGELOG.md) - [Commits](Kludex/python-multipart@0.0.22...0.0.26) --- updated-dependencies: - dependency-name: python-multipart dependency-version: 0.0.26 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat(runtime): stamp OTel spans with routing experiment baggage

FAILED, STOPPED, and DELETING were raising RuntimeError; COMPLETED_WITH_ERRORS was hitting the unknown-status RuntimeError. All terminal states now return the response so callers can inspect result.status and result.error_details.

docs: add preview warning docstrings to all new evo methods and classes

github-actions · 2026-04-30T22:04:40Z

✅ No Breaking Changes Detected

No public API breaking changes found in this PR.

notgitika · 2026-04-30T22:15:08Z

Couple things I noticed:

BatchEvaluationResult doesn't have a description field, but run_dataset_evaluation() passes description=response.get("description") when building it. Pydantic v2 will throw a ValidationError for the unexpected field — this would crash on every successful batch eval. Needs description: Optional[str] = None added to the model.
The timeout test (test_poll_for_results_timeout_raises_timeout_error) patches time.time to control the clock, but _poll_for_results actually uses time.monotonic(). The test only passes because the real monotonic clock advances past the timeout between iterations — it's not actually testing the timeout path. The patch target should be time.monotonic instead.

notgitika

LGTM

avi-alpert and others added 30 commits March 24, 2026 13:34

chore: add workflow to sync from public repo

3e0bdf4

Re-adding workflow that was lost in a force push to main. Original PR: aws/bedrock-agentcore-sdk-python-private#60 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request #71 from aws/aalpert/sync-workflow-v2

42348f8

chore: sync main with public/main

20c47e2

chore: sync main with public/main

ad47b56

chore: sync main with public/main

cddcac4

fix: bump memory integration test timeout to 15 minutes (#77)

7100433

chore: sync main with public/main

27e66fe

chore: sync main with public/main

51d6dca

chore: sync main with public/main

7169857

feat: add config bundle support to runtime (#82)

0f30dd0

* feat: add config bundle support to runtime Introduces a new config_bundle module that allows agent handlers to read configuration from BedrockAgentCore configuration bundles, delivered via W3C baggage headers on each invocation

feat: add batch evaluation (#83)

e1d861d

* feat: add batch evaluation Introduces BatchEvaluationRunner, which orchestrates end-to-end batch evaluation against the AgentCore Evaluation Service

Remove batch collect from agent span collector

86d0343

Update to prod endpoint

a72f159

fix: exclude workflow files from public repo sync (#89)

4e85852

GITHUB_TOKEN lacks the workflows permission, so syncing .github/workflows/ from the public repo causes push failures. After merging, restore our workflow files from HEAD before committing in both the clean and conflict paths.

chore: sync main with public/main

95876b2

Cleanup

db780de

feat: bundle AgentCore service models and restrict ConfigBundleClient…

e65b5d8

… operations

chore: sync main with public/main

12f8c60

chore: sync main with public/main

b80a00f

chore: sync main with public/main

415f3a2

chore: sync main with public/main

522b5a9

chore: sync main with public/main

d9e0a1e

chore: sync main with public/main

3d488be

chore: sync main with public/main

e58bbba

Merge branch 'main' into feat/evo_main

1fee391

chore: sync main with public/main

2881677

feat(runtime): stamp OTel spans with routing experiment baggage (#101)

d0d850d

* feat(runtime): stamp OTel spans with routing experiment baggage

padmak30 and others added 5 commits April 30, 2026 15:47

docs: add preview warning docstrings to all new evo methods and classes

f86ba49

Merge pull request #111 from aws/feat/evo_preview

40a9b0b

docs: add preview warning docstrings to all new evo methods and classes

Fix tests, formatting, uv.lock

66303bb

Fix jinja autoescape error

a2fdff6

padmak30 requested a review from a team April 30, 2026 22:04

chore: restore workflow files to main state

708f88a

padmak30 temporarily deployed to auto-approve April 30, 2026 22:14 — with GitHub Actions Inactive

padmak30 had a problem deploying to auto-approve April 30, 2026 22:14 — with GitHub Actions Failure

padmak30 temporarily deployed to auto-approve April 30, 2026 22:14 — with GitHub Actions Inactive

notgitika previously approved these changes Apr 30, 2026

View reviewed changes

jariy17 previously approved these changes Apr 30, 2026

View reviewed changes

Add description to batch eval result

7c2f081

padmak30 dismissed stale reviews from jariy17 and notgitika via 7c2f081 April 30, 2026 22:28

padmak30 temporarily deployed to auto-approve April 30, 2026 22:28 — with GitHub Actions Inactive

notgitika approved these changes Apr 30, 2026

View reviewed changes

padmak30 merged commit 719907b into main Apr 30, 2026
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Evaluation preview features - Batch evaluation and config bundles #446

Feat: Evaluation preview features - Batch evaluation and config bundles #446
padmak30 merged 56 commits intomainfrom
feat/evo-main

padmak30 commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

notgitika commented Apr 30, 2026

Uh oh!

notgitika left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

padmak30 commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ No Breaking Changes Detected

Uh oh!

notgitika commented Apr 30, 2026

Uh oh!

notgitika left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

github-actions Bot commented Apr 30, 2026 •

edited

Loading