Skip to content

Test infrastructure: parallel execution and dataset filtering#2414

Merged
jucor merged 9 commits intoedgefrom
jc/pca_test_infra
Mar 10, 2026
Merged

Test infrastructure: parallel execution and dataset filtering#2414
jucor merged 9 commits intoedgefrom
jc/pca_test_infra

Conversation

@jucor
Copy link
Copy Markdown
Collaborator

@jucor jucor commented Mar 5, 2026

Summary

Stacked on #2413 (PCA NaN fix). Please review and merge #2413 first.
Stack: #2413 (NaN fix) → This PR#2415 (comparer) → #2416 (sklearn PCA) → #2417 (test cleanup)

  • Isolate test state: each test class gets its own Conversation object with memory cleanup
  • Add `pytest-xdist` for parallel test execution by dataset
  • Factorize `xdist_group` markers across the test suite
  • Add `--datasets` CLI option to run tests on specific datasets
  • Document parallel test execution in `docs/regression_testing.md`

Test plan

  • 215 passed, 7 skipped, 2 xfailed

🤖 Generated with Claude Code

Copy link
Copy Markdown
Member

@ballPointPenguin ballPointPenguin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. tests pass for me when running locally

Base automatically changed from jc/pca_nan_fix to edge March 10, 2026 09:42
jucor and others added 9 commits March 10, 2026 09:46
Until now, one shared object was getting reused across tests for a given dataset,
and the cache kept on piling.
Use pytest-xdist's loadgroup distribution to run tests for different
datasets on separate workers while keeping all tests for the same
dataset on one worker (preserving fixture cache efficiency).

Changes:
- Add xdist_group marker to each dataset parameter via pytest.param()
- Switch from --dist=loadscope to --dist=loadgroup in pytest config
- Clean up conftest.py marker handling

Performance: Tests now complete in ~2 minutes with 4 workers (-n4),
down from ~9 minutes before deduplication and ~4 minutes sequential
after deduplication. The xdist_group marker ensures each dataset's
Conversation object is computed only once per worker.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add reusable helpers in conftest.py for creating pytest.param objects
with xdist_group markers, enabling efficient parallel test execution
across all dataset-parametrized tests.

Changes:
- conftest.py: Add make_dataset_params() and get_available_dataset_params()
  helper functions for xdist_group marker creation
- Update pytest_generate_tests to use xdist_group markers
- Update test_pca_smoke.py, test_repness_smoke.py, test_conversation_smoke.py,
  test_legacy_repness_comparison.py, test_golden_data.py, test_pipeline_integrity.py
  to use the new helpers

Pattern: Tests parametrized by dataset now use xdist_group markers via:
- make_dataset_params(["biodiversity", "vw"]) for hardcoded lists
- get_available_dataset_params() for dynamically discovered datasets

With --dist=loadgroup (default in pyproject.toml), pytest-xdist groups
tests by dataset, ensuring fixtures are computed once per worker.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
New pytest option allows filtering tests to run on a subset of datasets:
  pytest --datasets=biodiversity          # single dataset
  pytest --datasets=biodiversity,vw       # multiple datasets

Implementation:
- pytest_generate_tests filters dynamic parametrization
- pytest_collection_modifyitems deselects tests with static parametrization
- Report header shows "Filtered to: ..." when active

Works with both:
- Dynamic parametrization (dataset fixture via pytest_generate_tests)
- Static parametrization (@pytest.mark.parametrize with get_available_dataset_params())

Also updated tests/README.md with documentation for all pytest options
including --include-local, --datasets, and parallel execution with -n.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The test file has its own pytest_generate_tests hook that was not
respecting the --datasets option. Now imports _get_requested_datasets
and make_dataset_params from conftest to properly filter datasets.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add "Parallel Test Execution" section to regression_testing.md explaining:
- How to use -n auto for parallel execution
- How xdist_group markers keep dataset fixtures together
- When NOT to use parallel execution (database tests, debugging)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Suppress RuntimeWarning from nanmean on all-NaN columns
- Harden --datasets parsing: filter empty entries, error on empty set

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…arker

- Remove substring-based _extract_dataset_from_test and
  pytest_collection_modifyitems filtering
- Remove get_available_dataset_params() (import-time evaluation)
- Add @pytest.mark.use_discovered_datasets marker for dynamic
  parametrization that respects --datasets and --include-local
- Hardcoded @pytest.mark.parametrize tests are no longer affected
  by --datasets CLI flag
- Unify parameter name to dataset_name across all test files
- Add test_dataset_selection.py with unit and pytester integration tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jucor jucor force-pushed the jc/pca_test_infra branch from 8b38d9b to 3ab141e Compare March 10, 2026 09:46
@github-actions
Copy link
Copy Markdown

Delphi Coverage Report

File Stmts Miss Cover
init.py 3 0 100%
main.py 55 55 0%
benchmarks/bench_repness.py 81 81 0%
benchmarks/bench_update_votes.py 38 38 0%
benchmarks/benchmark_utils.py 34 34 0%
components/init.py 2 0 100%
components/config.py 165 133 19%
components/server.py 116 72 38%
conversation/init.py 2 0 100%
conversation/conversation.py 1036 352 66%
conversation/manager.py 131 42 68%
database/init.py 1 0 100%
database/dynamodb.py 387 234 40%
database/postgres.py 306 205 33%
pca_kmeans_rep/init.py 5 0 100%
pca_kmeans_rep/clusters.py 234 7 97%
pca_kmeans_rep/corr.py 98 17 83%
pca_kmeans_rep/pca.py 238 69 71%
pca_kmeans_rep/repness.py 361 44 88%
pca_kmeans_rep/stats.py 107 22 79%
poller.py 224 188 16%
regression/init.py 4 0 100%
regression/comparer.py 466 185 60%
regression/datasets.py 95 21 78%
regression/recorder.py 36 27 25%
regression/utils.py 137 38 72%
run_math_pipeline.py 260 239 8%
system.py 85 55 35%
umap_narrative/500_generate_embedding_umap_cluster.py 210 109 48%
umap_narrative/501_calculate_comment_extremity.py 112 54 52%
umap_narrative/502_calculate_priorities.py 135 135 0%
umap_narrative/700_datamapplot_for_layer.py 502 502 0%
umap_narrative/701_static_datamapplot_for_layer.py 310 310 0%
umap_narrative/702_consensus_divisive_datamapplot.py 432 432 0%
umap_narrative/801_narrative_report_batch.py 787 787 0%
umap_narrative/802_process_batch_results.py 265 265 0%
umap_narrative/803_check_batch_status.py 175 175 0%
umap_narrative/llm_factory_constructor/init.py 2 2 0%
umap_narrative/llm_factory_constructor/model_provider.py 157 157 0%
umap_narrative/polismath_commentgraph/init.py 1 0 100%
umap_narrative/polismath_commentgraph/cli.py 270 270 0%
umap_narrative/polismath_commentgraph/core/init.py 3 3 0%
umap_narrative/polismath_commentgraph/core/clustering.py 110 110 0%
umap_narrative/polismath_commentgraph/core/embedding.py 104 104 0%
umap_narrative/polismath_commentgraph/lambda_handler.py 219 219 0%
umap_narrative/polismath_commentgraph/schemas/init.py 2 0 100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py 160 9 94%
umap_narrative/polismath_commentgraph/tests/conftest.py 17 17 0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py 74 74 0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py 55 55 0%
umap_narrative/polismath_commentgraph/tests/test_storage.py 87 87 0%
umap_narrative/polismath_commentgraph/utils/init.py 3 0 100%
umap_narrative/polismath_commentgraph/utils/converter.py 283 237 16%
umap_narrative/polismath_commentgraph/utils/group_data.py 354 336 5%
umap_narrative/polismath_commentgraph/utils/storage.py 585 477 18%
umap_narrative/reset_conversation.py 159 50 69%
umap_narrative/run_pipeline.py 453 312 31%
utils/general.py 63 41 35%
Total 10796 7487 31%

@jucor
Copy link
Copy Markdown
Collaborator Author

jucor commented Mar 10, 2026

Thanks @ballPointPenguin !

@jucor jucor merged commit 28d2640 into edge Mar 10, 2026
4 checks passed
@jucor jucor deleted the jc/pca_test_infra branch March 10, 2026 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants