Skip to content

fix: unskip evaluator integ test classes in sm-train#5854

Merged
lucasjia-aws merged 8 commits into
aws:masterfrom
lucasjia-aws:unskip-integ-tests
May 15, 2026
Merged

fix: unskip evaluator integ test classes in sm-train#5854
lucasjia-aws merged 8 commits into
aws:masterfrom
lucasjia-aws:unskip-integ-tests

Conversation

@lucasjia-aws
Copy link
Copy Markdown
Collaborator

@lucasjia-aws lucasjia-aws commented May 13, 2026

Issue

Some of the integration tests were passed, working on unskipping them to improve test cover rate.

Description

This PR unskips evaluator integration test classes in sagemaker-train that were temporarily skipped during a file relocation and never re-enabled.

Changes Summary

# Test Module Action Commit How Fixed
1 TestLLMAsJudgeEvaluatorIntegration (class) test_llm_as_judge_evaluator.py Unskipped 12779438 Tests pass as-is
2 TestBenchmarkEvaluatorIntegration (class) test_benchmark_evaluator.py Unskipped 12779438 Tests pass as-is
3 TestCustomScorerEvaluatorIntegration (class) test_custom_scorer_evaluator.py Unskipped 12779438 tests pass as-is
4 test_llm_as_judge_builtin_metrics_only test_llm_as_judge_evaluator.py Deleted f58d8368 Redundant with full_flow; replaced by unit tests for None metrics handling
5 test_llm_as_judge_custom_metrics_only test_llm_as_judge_evaluator.py Deleted f58d8368 Redundant with full_flow; replaced by unit tests for None metrics handling
6 test_custom_scorer_with_builtin_metric test_custom_scorer_evaluator.py Unskipped 107e5fbe Test passes as-is
7 test_custom_scorer_base_model_only test_custom_scorer_evaluator.py Unskipped 107e5fbe Passes (test body is pass — placeholder for future implementation)
8 test_benchmark_evaluation_base_model_only test_benchmark_evaluator.py Unskipped + Fixed 3ce1324c, ab553e6c Updated config from old account to test account; commented out non-existent mlflow_resource_arn
9 test_benchmark_evaluation_nova_model test_benchmark_evaluator.py Unskipped (still failing) 107e5fbe Requires us-east-1 infra in test account — needs separate infra work
10 TestCustomScorerEvaluatorIntegration test_custom_scorer_evaluator.py Marked serial b1c094e0 Added @pytest.mark.xdist_group to prevent pipeline concurrency conflicts

Test Results

  • Passing: All evaluator tests except test_benchmark_evaluation_nova_model and test_benchmark_evaluation_base_model_only
  • Flaky: test_custom_scorer_evaluation_full_flow (intermittent pipeline concurrency issue)
  • Still failing (mark back as skip to merge this pr):
    • test_benchmark_evaluation_nova_model — requires cross-account us-east-1 infrastructure
    • test_benchmark_evaluation_base_model_only — pipeline creation returns None (under investigation)

New Unit Tests Added

  • test_llm_as_judge_evaluator_builtin_metrics_only_no_custom — verifies evaluator handles custom_metrics=None correctly
  • test_llm_as_judge_evaluator_custom_metrics_only_no_builtin — verifies evaluator handles builtin_metrics=None correctly

@lucasjia-aws lucasjia-aws merged commit 92f8d42 into aws:master May 15, 2026
14 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants