feat(sagemaker-ai): update skills and bump version to 1.1.0 by pilgd-aws · Pull Request #144 · awslabs/agent-plugins

pilgd-aws · 2026-04-22T22:24:56Z

Summary

Updates the sagemaker-ai plugin from v1.0.0 to v1.1.0.

SageMakerAIAgentSkills (mainline 13d05c9):

finetuning: add continuous customization reference, EULA fixes for Nova/OSS
finetuning-setup: add benchmark-based model selection + 8 benchmark reference files
model-evaluation: add custom scorer workflow, reward function creation, MLflow support; rename notebook_cells.py → llmaaj_evaluator.py; add nova/oss reward function templates
model-deployment: EULA and deploy script fixes
dataset-transformation: notebook structure/writing guide updates
planning: scope activation to SageMaker model customization domain

Changes

Version bump: 1.0.0 → 1.1.0 in .claude-plugin/plugin.json, .codex-plugin/plugin.json, and .claude-plugin/marketplace.json
Skills updated from SageMakerAIAgentSkills mainline

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

github-advanced-security

Semgrep OSS found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

krokoko · 2026-04-22T22:31:40Z

Make sure to also bump the version in https://github.com/awslabs/agent-plugins/blob/main/.claude-plugin/marketplace.json#L164

krokoko · 2026-04-22T22:37:06Z

Silent error swallowing in Nova reward function Lambda handler

File: plugins/sagemaker-ai/skills/model-evaluation/scripts/nova_reward_function_source_template.py

The lambda_handler catches all exceptions with except Exception as e but never logs e — it silently returns a 0.0 score. Any bug in the user's customized reward_function (wrong field names, type errors, division by zero) will be invisibly converted to zero scores with no diagnostic output. In a SageMaker pipeline where you can't easily attach a debugger, this would be extremely painful to troubleshoot.

Fix: At minimum, log the exception: print(f"ERROR processing sample {i}: {type(e).name}: {e}", flush=True)

krokoko · 2026-04-22T22:37:34Z

EVALUATE_BASE placeholder is a string, not a boolean

Files: benchmark_evaluator.py, custom_scorer_evaluator.py

Both new evaluator scripts use EVALUATE_BASE = "[EVALUATE_BASE]" (a quoted string), while the existing llmaaj_evaluator.py correctly uses EVALUATE_BASE = [TRUE_OR_FALSE] (unquoted, substituted as a bare boolean). The string "[EVALUATE_BASE]" is truthy and may cause SDK errors or incorrect behavior.

Fix: Change to EVALUATE_BASE = [EVALUATE_BASE] (no quotes) to match the established pattern.

krokoko · 2026-04-22T22:38:18Z

Inconsistent error handling between reward function templates

Files: nova_reward_function_source_template.py vs reward_function_source_template.py

The two templates have fundamentally different error strategies:

Nova template: Silently continues on error, appends 0.0 score (no logging)
Generic template: Immediately returns on first sample error, abandoning the entire batch; inner error response lacks statusCode while outer has it

Both approaches have problems. The generic template also has fragile event parsing (event.get('input', event) fails with AttributeError if event is a list).

krokoko · 2026-04-22T22:38:31Z

Non-list event silently treated as empty batch

File: nova_reward_function_source_template.py

batch = event if isinstance(event, list) else []
A misconfigured Lambda invocation (dict instead of list) silently produces empty results with no warning.

Fix: Add a warning log when the event format is unexpected.

krokoko · 2026-04-22T22:39:02Z

Placeholder values never validated in evaluator scripts

Files: benchmark_evaluator.py, custom_scorer_evaluator.py

Placeholder strings like "[MLFLOW_ARN]" are truthy. The guard if MLFLOW_ARN: always passes. If the agent fails to substitute a placeholder, the SageMaker SDK receives garbage values producing confusing API errors.

krokoko · 2026-04-22T22:40:33Z

Nosemgrep rule ID fix from PR #121 is being reverted

File: plugins/sagemaker-ai/skills/dataset-evaluation/scripts/format_detector.py

This PR changes the nosemgrep rule IDs from the double-ID format back to the truncated form:

  - # nosemgrep: python.lang.maintainability.is-function-without-parentheses.is-function-without-parentheses
  + # nosemgrep: python.lang.maintainability.is-function-without-parentheses

PR #121 (b467313) specifically fixed these to the double-ID format because the truncated IDs cause suppressions to be silently ignored by semgrep. This PR undoes that fix. Likely the branch was based on a pre-#121 commit.

joshuatowner

test approval

joshuatowner · 2026-04-23T00:39:53Z

EVALUATE_BASE placeholder is a string, not a boolean

Files: benchmark_evaluator.py, custom_scorer_evaluator.py

Both new evaluator scripts use EVALUATE_BASE = "[EVALUATE_BASE]" (a quoted string), while the existing llmaaj_evaluator.py correctly uses EVALUATE_BASE = [TRUE_OR_FALSE] (unquoted, substituted as a bare boolean). The string "[EVALUATE_BASE]" is truthy and may cause SDK errors or incorrect behavior.

Fix: Change to EVALUATE_BASE = [EVALUATE_BASE] (no quotes) to match the established pattern.

this is the established pattern with the scripts, that the agent substitutes these placeholders. will take as a nit unless there's bigger concerns.

pilgd-aws · 2026-04-23T00:56:50Z

Nosemgrep rule ID fix from PR #121 is being reverted

File: plugins/sagemaker-ai/skills/dataset-evaluation/scripts/format_detector.py

This PR changes the nosemgrep rule IDs from the double-ID format back to the truncated form:
  - # nosemgrep: python.lang.maintainability.is-function-without-parentheses.is-function-without-parentheses
  + # nosemgrep: python.lang.maintainability.is-function-without-parentheses
PR #121 (b467313) specifically fixed these to the double-ID format because the truncated IDs cause suppressions to be silently ignored by semgrep. This PR undoes that fix. Likely the branch was based on a pre-#121 commit.

Fixed

SageMakerAIAgentSkills (mainline ee2105d): - finetuning: add continuous customization reference, EULA fixes, RLVR notebook UX improvements - finetuning-setup: add benchmark-based model selection + 8 benchmark references + model-licenses - model-evaluation: add custom scorer workflow, reward function creation, MLflow support; rename notebook_cells.py -> llmaaj_evaluator.py; add nova/oss reward function templates - model-deployment: EULA and deploy script fixes, model-licenses reference - dataset-transformation: notebook structure/writing guide updates - planning: scope activation to SageMaker model customization domain X-AI-Prompt: Sync SageMakerAIAgentSkills mainline into awslabs/agent-plugins sagemaker-ai plugin and bump version to 1.1.0 X-AI-Tool: Kiro CLI

pilgd-aws · 2026-04-23T19:47:47Z

Make sure to also bump the version in https://github.com/awslabs/agent-plugins/blob/main/.claude-plugin/marketplace.json#L164

Done

pilgd-aws · 2026-04-23T19:50:31Z

File: plugins/sagemaker-ai/skills/model-evaluation/scripts/nova_reward_function_source_template.py
Files: nova_reward_function_source_template.py vs reward_function_source_template.py
batch = event if isinstance(event, list) else []

These we are tackling as P1 in Asana as they require some consensus. Definitely good callouts.

Placeholder values never validated in evaluator scripts

We added Placeholder validations

pilgd-aws requested review from a team, krokoko, scottschreckengaust and theagenticguy April 22, 2026 22:24

pilgd-aws requested review from a team as code owners April 22, 2026 22:24

github-advanced-security AI found potential problems Apr 22, 2026

View reviewed changes

pilgd-aws force-pushed the feature/sagemaker-ai-v1.1.0 branch from 75192d6 to ffddcce Compare April 22, 2026 22:29

pilgd-aws force-pushed the feature/sagemaker-ai-v1.1.0 branch 4 times, most recently from 1833d3e to 594515e Compare April 22, 2026 23:22

joshuatowner previously approved these changes Apr 22, 2026

View reviewed changes

joshuatowner self-requested a review April 22, 2026 23:37

pilgd-aws dismissed joshuatowner’s stale review via 0be5b48 April 23, 2026 00:40

pilgd-aws force-pushed the feature/sagemaker-ai-v1.1.0 branch 2 times, most recently from 0be5b48 to c9108d3 Compare April 23, 2026 00:43

pilgd-aws force-pushed the feature/sagemaker-ai-v1.1.0 branch 2 times, most recently from a1de54b to 5a6a51a Compare April 23, 2026 01:00

krokoko added the do-not-merge Do not merge the pull request label Apr 23, 2026

krokoko previously approved these changes Apr 23, 2026

View reviewed changes

pilgd-aws dismissed krokoko’s stale review via e60e467 April 23, 2026 19:33

pilgd-aws force-pushed the feature/sagemaker-ai-v1.1.0 branch 2 times, most recently from 10306cc to 3ec1fb2 Compare April 23, 2026 19:38

pilgd-aws removed the do-not-merge Do not merge the pull request label Apr 23, 2026

pilgd-aws force-pushed the feature/sagemaker-ai-v1.1.0 branch 2 times, most recently from 3aa231b to 56ffe49 Compare April 23, 2026 19:43

pilgd-aws force-pushed the feature/sagemaker-ai-v1.1.0 branch from 56ffe49 to 2c6b16d Compare April 23, 2026 19:45

Merge branch 'main' into feature/sagemaker-ai-v1.1.0

9008115

krokoko self-requested a review April 23, 2026 19:53

krokoko approved these changes Apr 23, 2026

View reviewed changes

krokoko enabled auto-merge April 23, 2026 20:00

joshuatowner approved these changes Apr 23, 2026

View reviewed changes

krokoko added this pull request to the merge queue Apr 23, 2026

Merged via the queue into awslabs:main with commit bd83f9b Apr 23, 2026
22 checks passed

Conversation

pilgd-aws commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Acknowledgment

Uh oh!

github-advanced-security AI left a comment

Choose a reason for hiding this comment

Uh oh!

krokoko commented Apr 22, 2026

Uh oh!

krokoko commented Apr 22, 2026

Uh oh!

krokoko commented Apr 22, 2026

Uh oh!

krokoko commented Apr 22, 2026

Uh oh!

krokoko commented Apr 22, 2026

Uh oh!

krokoko commented Apr 22, 2026

Uh oh!

krokoko commented Apr 22, 2026

Uh oh!

joshuatowner left a comment • edited by pilgd-aws Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshuatowner commented Apr 23, 2026

Uh oh!

pilgd-aws commented Apr 23, 2026

Uh oh!

pilgd-aws commented Apr 23, 2026

Uh oh!

pilgd-aws commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pilgd-aws commented Apr 22, 2026 •

edited

Loading

joshuatowner left a comment •

edited by pilgd-aws

Loading