Test infrastructure: parallel execution and dataset filtering by jucor · Pull Request #2414 · compdemocracy/polis

jucor · 2026-03-05T13:04:44Z

Summary

Stacked on #2413 (PCA NaN fix). Please review and merge #2413 first.
Stack: #2413 (NaN fix) → This PR → #2415 (comparer) → #2416 (sklearn PCA) → #2417 (test cleanup)

Isolate test state: each test class gets its own Conversation object with memory cleanup
Add `pytest-xdist` for parallel test execution by dataset
Factorize `xdist_group` markers across the test suite
Add `--datasets` CLI option to run tests on specific datasets
Document parallel test execution in `docs/regression_testing.md`

Test plan

215 passed, 7 skipped, 2 xfailed

🤖 Generated with Claude Code

ballPointPenguin

looks good. tests pass for me when running locally

Until now, one shared object was getting reused across tests for a given dataset, and the cache kept on piling.

Use pytest-xdist's loadgroup distribution to run tests for different datasets on separate workers while keeping all tests for the same dataset on one worker (preserving fixture cache efficiency). Changes: - Add xdist_group marker to each dataset parameter via pytest.param() - Switch from --dist=loadscope to --dist=loadgroup in pytest config - Clean up conftest.py marker handling Performance: Tests now complete in ~2 minutes with 4 workers (-n4), down from ~9 minutes before deduplication and ~4 minutes sequential after deduplication. The xdist_group marker ensures each dataset's Conversation object is computed only once per worker. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add reusable helpers in conftest.py for creating pytest.param objects with xdist_group markers, enabling efficient parallel test execution across all dataset-parametrized tests. Changes: - conftest.py: Add make_dataset_params() and get_available_dataset_params() helper functions for xdist_group marker creation - Update pytest_generate_tests to use xdist_group markers - Update test_pca_smoke.py, test_repness_smoke.py, test_conversation_smoke.py, test_legacy_repness_comparison.py, test_golden_data.py, test_pipeline_integrity.py to use the new helpers Pattern: Tests parametrized by dataset now use xdist_group markers via: - make_dataset_params(["biodiversity", "vw"]) for hardcoded lists - get_available_dataset_params() for dynamically discovered datasets With --dist=loadgroup (default in pyproject.toml), pytest-xdist groups tests by dataset, ensuring fixtures are computed once per worker. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

New pytest option allows filtering tests to run on a subset of datasets: pytest --datasets=biodiversity # single dataset pytest --datasets=biodiversity,vw # multiple datasets Implementation: - pytest_generate_tests filters dynamic parametrization - pytest_collection_modifyitems deselects tests with static parametrization - Report header shows "Filtered to: ..." when active Works with both: - Dynamic parametrization (dataset fixture via pytest_generate_tests) - Static parametrization (@pytest.mark.parametrize with get_available_dataset_params()) Also updated tests/README.md with documentation for all pytest options including --include-local, --datasets, and parallel execution with -n. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The test file has its own pytest_generate_tests hook that was not respecting the --datasets option. Now imports _get_requested_datasets and make_dataset_params from conftest to properly filter datasets. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add "Parallel Test Execution" section to regression_testing.md explaining: - How to use -n auto for parallel execution - How xdist_group markers keep dataset fixtures together - When NOT to use parallel execution (database tests, debugging) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Suppress RuntimeWarning from nanmean on all-NaN columns - Harden --datasets parsing: filter empty entries, error on empty set Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…arker - Remove substring-based _extract_dataset_from_test and pytest_collection_modifyitems filtering - Remove get_available_dataset_params() (import-time evaluation) - Add @pytest.mark.use_discovered_datasets marker for dynamic parametrization that respects --datasets and --include-local - Hardcoded @pytest.mark.parametrize tests are no longer affected by --datasets CLI flag - Unify parameter name to dataset_name across all test files - Add test_dataset_selection.py with unit and pytester integration tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-10T09:55:35Z

Delphi Coverage Report

File	Stmts	Miss	Cover
init.py	3	0	100%
main.py	55	55	0%
benchmarks/bench_repness.py	81	81	0%
benchmarks/bench_update_votes.py	38	38	0%
benchmarks/benchmark_utils.py	34	34	0%
components/init.py	2	0	100%
components/config.py	165	133	19%
components/server.py	116	72	38%
conversation/init.py	2	0	100%
conversation/conversation.py	1036	352	66%
conversation/manager.py	131	42	68%
database/init.py	1	0	100%
database/dynamodb.py	387	234	40%
database/postgres.py	306	205	33%
pca_kmeans_rep/init.py	5	0	100%
pca_kmeans_rep/clusters.py	234	7	97%
pca_kmeans_rep/corr.py	98	17	83%
pca_kmeans_rep/pca.py	238	69	71%
pca_kmeans_rep/repness.py	361	44	88%
pca_kmeans_rep/stats.py	107	22	79%
poller.py	224	188	16%
regression/init.py	4	0	100%
regression/comparer.py	466	185	60%
regression/datasets.py	95	21	78%
regression/recorder.py	36	27	25%
regression/utils.py	137	38	72%
run_math_pipeline.py	260	239	8%
system.py	85	55	35%
umap_narrative/500_generate_embedding_umap_cluster.py	210	109	48%
umap_narrative/501_calculate_comment_extremity.py	112	54	52%
umap_narrative/502_calculate_priorities.py	135	135	0%
umap_narrative/700_datamapplot_for_layer.py	502	502	0%
umap_narrative/701_static_datamapplot_for_layer.py	310	310	0%
umap_narrative/702_consensus_divisive_datamapplot.py	432	432	0%
umap_narrative/801_narrative_report_batch.py	787	787	0%
umap_narrative/802_process_batch_results.py	265	265	0%
umap_narrative/803_check_batch_status.py	175	175	0%
umap_narrative/llm_factory_constructor/init.py	2	2	0%
umap_narrative/llm_factory_constructor/model_provider.py	157	157	0%
umap_narrative/polismath_commentgraph/init.py	1	0	100%
umap_narrative/polismath_commentgraph/cli.py	270	270	0%
umap_narrative/polismath_commentgraph/core/init.py	3	3	0%
umap_narrative/polismath_commentgraph/core/clustering.py	110	110	0%
umap_narrative/polismath_commentgraph/core/embedding.py	104	104	0%
umap_narrative/polismath_commentgraph/lambda_handler.py	219	219	0%
umap_narrative/polismath_commentgraph/schemas/init.py	2	0	100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py	160	9	94%
umap_narrative/polismath_commentgraph/tests/conftest.py	17	17	0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py	74	74	0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py	55	55	0%
umap_narrative/polismath_commentgraph/tests/test_storage.py	87	87	0%
umap_narrative/polismath_commentgraph/utils/init.py	3	0	100%
umap_narrative/polismath_commentgraph/utils/converter.py	283	237	16%
umap_narrative/polismath_commentgraph/utils/group_data.py	354	336	5%
umap_narrative/polismath_commentgraph/utils/storage.py	585	477	18%
umap_narrative/reset_conversation.py	159	50	69%
umap_narrative/run_pipeline.py	453	312	31%
utils/general.py	63	41	35%
Total	10796	7487	31%

jucor · 2026-03-10T10:04:23Z

Thanks @ballPointPenguin !

jucor force-pushed the jc/pca_test_infra branch from 7a3478a to 8b38d9b Compare March 5, 2026 16:28

ballPointPenguin approved these changes Mar 8, 2026

View reviewed changes

Base automatically changed from jc/pca_nan_fix to edge March 10, 2026 09:42

jucor and others added 9 commits March 10, 2026 09:46

Make each test run their own Conversation object

6a7376f

Until now, one shared object was getting reused across tests for a given dataset, and the cache kept on piling.

Actually reuse the fixtures...

76217ad

Address Copilot review feedback on test infrastructure

79cfc91

- Suppress RuntimeWarning from nanmean on all-NaN columns - Harden --datasets parsing: filter empty entries, error on empty set Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jucor force-pushed the jc/pca_test_infra branch from 8b38d9b to 3ab141e Compare March 10, 2026 09:46

jucor merged commit 28d2640 into edge Mar 10, 2026
4 checks passed

jucor deleted the jc/pca_test_infra branch March 10, 2026 10:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test infrastructure: parallel execution and dataset filtering#2414

Test infrastructure: parallel execution and dataset filtering#2414
jucor merged 9 commits intoedgefrom
jc/pca_test_infra

jucor commented Mar 5, 2026 •

edited

Loading

Uh oh!

ballPointPenguin left a comment

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

jucor commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jucor commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

ballPointPenguin left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 10, 2026

Delphi Coverage Report

Uh oh!

jucor commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jucor commented Mar 5, 2026 •

edited

Loading