Test infrastructure: parallel execution and dataset filtering#2414
Merged
Test infrastructure: parallel execution and dataset filtering#2414
Conversation
This was referenced Mar 5, 2026
ballPointPenguin
approved these changes
Mar 8, 2026
Member
ballPointPenguin
left a comment
There was a problem hiding this comment.
looks good. tests pass for me when running locally
Until now, one shared object was getting reused across tests for a given dataset, and the cache kept on piling.
Use pytest-xdist's loadgroup distribution to run tests for different datasets on separate workers while keeping all tests for the same dataset on one worker (preserving fixture cache efficiency). Changes: - Add xdist_group marker to each dataset parameter via pytest.param() - Switch from --dist=loadscope to --dist=loadgroup in pytest config - Clean up conftest.py marker handling Performance: Tests now complete in ~2 minutes with 4 workers (-n4), down from ~9 minutes before deduplication and ~4 minutes sequential after deduplication. The xdist_group marker ensures each dataset's Conversation object is computed only once per worker. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add reusable helpers in conftest.py for creating pytest.param objects with xdist_group markers, enabling efficient parallel test execution across all dataset-parametrized tests. Changes: - conftest.py: Add make_dataset_params() and get_available_dataset_params() helper functions for xdist_group marker creation - Update pytest_generate_tests to use xdist_group markers - Update test_pca_smoke.py, test_repness_smoke.py, test_conversation_smoke.py, test_legacy_repness_comparison.py, test_golden_data.py, test_pipeline_integrity.py to use the new helpers Pattern: Tests parametrized by dataset now use xdist_group markers via: - make_dataset_params(["biodiversity", "vw"]) for hardcoded lists - get_available_dataset_params() for dynamically discovered datasets With --dist=loadgroup (default in pyproject.toml), pytest-xdist groups tests by dataset, ensuring fixtures are computed once per worker. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
New pytest option allows filtering tests to run on a subset of datasets: pytest --datasets=biodiversity # single dataset pytest --datasets=biodiversity,vw # multiple datasets Implementation: - pytest_generate_tests filters dynamic parametrization - pytest_collection_modifyitems deselects tests with static parametrization - Report header shows "Filtered to: ..." when active Works with both: - Dynamic parametrization (dataset fixture via pytest_generate_tests) - Static parametrization (@pytest.mark.parametrize with get_available_dataset_params()) Also updated tests/README.md with documentation for all pytest options including --include-local, --datasets, and parallel execution with -n. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The test file has its own pytest_generate_tests hook that was not respecting the --datasets option. Now imports _get_requested_datasets and make_dataset_params from conftest to properly filter datasets. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add "Parallel Test Execution" section to regression_testing.md explaining: - How to use -n auto for parallel execution - How xdist_group markers keep dataset fixtures together - When NOT to use parallel execution (database tests, debugging) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Suppress RuntimeWarning from nanmean on all-NaN columns - Harden --datasets parsing: filter empty entries, error on empty set Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…arker - Remove substring-based _extract_dataset_from_test and pytest_collection_modifyitems filtering - Remove get_available_dataset_params() (import-time evaluation) - Add @pytest.mark.use_discovered_datasets marker for dynamic parametrization that respects --datasets and --include-local - Hardcoded @pytest.mark.parametrize tests are no longer affected by --datasets CLI flag - Unify parameter name to dataset_name across all test files - Add test_dataset_selection.py with unit and pytester integration tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8b38d9b to
3ab141e
Compare
Delphi Coverage Report
|
Collaborator
Author
|
Thanks @ballPointPenguin ! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
🤖 Generated with Claude Code