fix: unskip evaluator integ test classes in sm-train by lucasjia-aws · Pull Request #5854 · aws/sagemaker-python-sdk

lucasjia-aws · 2026-05-13T18:47:47Z

Issue

Some of the integration tests were passed, working on unskipping them to improve test cover rate.

Description

This PR unskips evaluator integration test classes in sagemaker-train that were temporarily skipped during a file relocation and never re-enabled.

Changes Summary

#	Test	Module	Action	Commit	How Fixed
1	TestLLMAsJudgeEvaluatorIntegration (class)	test_llm_as_judge_evaluator.py	Unskipped	`12779438`	Tests pass as-is
2	TestBenchmarkEvaluatorIntegration (class)	test_benchmark_evaluator.py	Unskipped	`12779438`	Tests pass as-is
3	TestCustomScorerEvaluatorIntegration (class)	test_custom_scorer_evaluator.py	Unskipped	`12779438`	tests pass as-is
4	test_llm_as_judge_builtin_metrics_only	test_llm_as_judge_evaluator.py	Deleted	`f58d8368`	Redundant with full_flow; replaced by unit tests for None metrics handling
5	test_llm_as_judge_custom_metrics_only	test_llm_as_judge_evaluator.py	Deleted	`f58d8368`	Redundant with full_flow; replaced by unit tests for None metrics handling
6	test_custom_scorer_with_builtin_metric	test_custom_scorer_evaluator.py	Unskipped	`107e5fbe`	Test passes as-is
7	test_custom_scorer_base_model_only	test_custom_scorer_evaluator.py	Unskipped	`107e5fbe`	Passes (test body is `pass` — placeholder for future implementation)
8	test_benchmark_evaluation_base_model_only	test_benchmark_evaluator.py	Unskipped + Fixed	`3ce1324c`, `ab553e6c`	Updated config from old account to test account; commented out non-existent mlflow_resource_arn
9	test_benchmark_evaluation_nova_model	test_benchmark_evaluator.py	Unskipped (still failing)	`107e5fbe`	Requires us-east-1 infra in test account — needs separate infra work
10	TestCustomScorerEvaluatorIntegration	test_custom_scorer_evaluator.py	Marked serial	`b1c094e0`	Added `@pytest.mark.xdist_group` to prevent pipeline concurrency conflicts

Test Results

Passing: All evaluator tests except test_benchmark_evaluation_nova_model and test_benchmark_evaluation_base_model_only
Flaky: test_custom_scorer_evaluation_full_flow (intermittent pipeline concurrency issue)
Still failing (mark back as skip to merge this pr):
- test_benchmark_evaluation_nova_model — requires cross-account us-east-1 infrastructure
- test_benchmark_evaluation_base_model_only — pipeline creation returns None (under investigation)

New Unit Tests Added

test_llm_as_judge_evaluator_builtin_metrics_only_no_custom — verifies evaluator handles custom_metrics=None correctly
test_llm_as_judge_evaluator_custom_metrics_only_no_builtin — verifies evaluator handles builtin_metrics=None correctly

…None metrics handling

…benchmark test

lucasjia-aws · 2026-05-15T20:16:42Z

succeeded test rerun:

fix: unskip evaluator integ test classes in sm-train

1277943

lucasjia-aws temporarily deployed to auto-approve May 13, 2026 18:48 — with GitHub Actions Inactive

debug: unskip all sm-train integ tests

107e5fb

lucasjia-aws temporarily deployed to auto-approve May 13, 2026 21:38 — with GitHub Actions Inactive

test: replace redundant LLM-as-judge integ tests with unit tests for …

f58d836

…None metrics handling

lucasjia-aws temporarily deployed to auto-approve May 13, 2026 22:04 — with GitHub Actions Inactive

mark TestCustomScorerEvaluatorIntegration tests as serial

b1c094e

lucasjia-aws temporarily deployed to auto-approve May 14, 2026 07:55 — with GitHub Actions Inactive

update test config for test_benchmark_evaluation_base_model_only

3ce1324

lucasjia-aws temporarily deployed to auto-approve May 14, 2026 08:06 — with GitHub Actions Inactive

unskip test_hp_contract_mpi_script

82a8a1d

lucasjia-aws temporarily deployed to auto-approve May 14, 2026 08:06 — with GitHub Actions Inactive

fix: comment out non-existent mlflow_resource_arn in base_model_only …

ab553e6

…benchmark test

lucasjia-aws temporarily deployed to auto-approve May 14, 2026 18:33 — with GitHub Actions Inactive

lucasjia-aws temporarily deployed to auto-approve May 14, 2026 19:09 — with GitHub Actions Inactive

lucasjia-aws temporarily deployed to auto-approve May 14, 2026 19:10 — with GitHub Actions Inactive

lucasjia-aws force-pushed the unskip-integ-tests branch from cc1fdd9 to ab553e6 Compare May 15, 2026 17:32

lucasjia-aws temporarily deployed to auto-approve May 15, 2026 17:32 — with GitHub Actions Inactive

fix: mark three unfixed tests as skipped, to fix them in other pr

cdf5ce7

lucasjia-aws temporarily deployed to auto-approve May 15, 2026 17:38 — with GitHub Actions Inactive

lucasjia-aws temporarily deployed to auto-approve May 15, 2026 17:39 — with GitHub Actions Inactive

mujtaba1747 approved these changes May 15, 2026

View reviewed changes

lucasjia-aws merged commit 92f8d42 into aws:master May 15, 2026
14 of 20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: unskip evaluator integ test classes in sm-train#5854

fix: unskip evaluator integ test classes in sm-train#5854
lucasjia-aws merged 8 commits into
aws:masterfrom
lucasjia-aws:unskip-integ-tests

lucasjia-aws commented May 13, 2026 •

edited

Loading

Uh oh!

lucasjia-aws commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lucasjia-aws commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Description

Changes Summary

Test Results

New Unit Tests Added

Uh oh!

lucasjia-aws commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lucasjia-aws commented May 13, 2026 •

edited

Loading