Regression comparer: sign flip detection and projection metrics by jucor · Pull Request #2415 · compdemocracy/polis

jucor · 2026-03-05T13:04:54Z

Summary

Stacked on #2414 (test infra). Please review and merge #2413 → #2414 first.
Stack: #2413 (NaN fix) → #2414 (test infra) → This PR → #2416 (sklearn PCA) → #2417 (test cleanup)

Improvements to the regression comparison tooling (`comparer.py`, `regression_comparer.py`):

Save difference log regardless of errors (previously lost on exceptions)
Account for PCA sign flips in numerical difference reporting
Extend sign flip detection to cluster centers
Allow per-component error tolerance for PCA
Fix per-component PCA sign flip detection (was incorrectly applied globally)
Add projection-based comparison metrics: R², Procrustes distance, range-normalized error

These improvements are needed to properly validate the sklearn PCA transition (next PR in stack) but are independently useful for any PCA implementation change.

Test plan

215 passed, 7 skipped, 2 xfailed
No production code changes — only regression testing tooling

🤖 Generated with Claude Code

ballPointPenguin

approve with note:
the main regression-testing doc is now stale and gives a command that fails on this branch. The doc still tells readers to use --tolerance-abs / --tolerance-rel, but regression_comparer.py no longer defines those options.

otherwise, looks good and the tests pass for me

Cluster centers are derived from PCA projections, so they inherit the sign ambiguity. This fix ensures sign flips are detected and corrected for `.center` paths in addition to `.pca.comps` and `.proj.` paths. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Previously, sign flips were detected per-projection-vector, which fails when only some components are flipped (e.g., PC1 unchanged, PC2 flipped). Now detects flips at the component level (.pca.comps[N]) and stores them, then applies per-dimension correction to projections and cluster centers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

… tests TLDR: it works! Differences were just numerical errors, artificially looking huge on small values. Looking at the whole cloud of projections confirmed the post sk-learn matches the pre-sklearn results perfectly well. Problem: -------- The regression comparer was failing on datasets like FLI and pakistan with thousands of "differences" showing relative errors up to 874%. Investigation revealed these were false positives: the high relative errors occurred on near-zero values (e.g., golden=4.54e-06 vs current=4.42e-05) where even tiny absolute differences (3e-04) produce huge relative errors. Diagnosis: ---------- Comparing sklearn SVD-based PCA against power iteration golden snapshots: - The projection point clouds are visually identical (see scatter plots) - Important values (Q70-Q100 percentile) match within 0.2% - Only near-zero values (Q0-Q7 percentile, ~0.0x median) show large rel errors - These near-zero values represent participants at the origin who don't affect visualization or clustering The element-wise (abs_tol, rel_tol) approach fundamentally cannot handle this case: it either fails on small values or is too loose for large values. Solution: --------- Added projection comparison metrics that measure what actually matters: | Metric | Threshold | What it measures | |-----------------------|-----------|-------------------------------------| | Max |error| / range | < 1% | Worst displacement as % of axis | | Mean |error| / range | < 0.1% | Average displacement as % of axis | | R² (all coordinates) | > 0.9999 | Variance explained (99.99%) | | R² (per dimension) | > 0.999 | Per-PC fit quality (99.9%) | | Procrustes disparity | < 1e-4 | Shape similarity after alignment | Results for FLI dataset: - Max |error| / range: 0.0617% (was flagging 28% rel error on Q1 values) - R²: 0.9999992 - Procrustes: 8.2e-07 All 7 local datasets now pass regression tests. Also added: - Quantile context in error reports (computed from ALL values, not just failures) - Explanation when element-wise diffs exist but metrics confirm match

- Exclude .pca.center from PCA sign-flip handling (not sign-ambiguous) - Use AND logic for overall_match (don't let projection metrics override stage failures) - Guard against division by zero when projection data_range is 0 - Fix return type annotation on _log_projection_metrics Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-10T11:22:00Z

Delphi Coverage Report

File	Stmts	Miss	Cover
init.py	3	0	100%
main.py	55	55	0%
benchmarks/bench_repness.py	81	81	0%
benchmarks/bench_update_votes.py	38	38	0%
benchmarks/benchmark_utils.py	34	34	0%
components/init.py	2	0	100%
components/config.py	165	133	19%
components/server.py	116	72	38%
conversation/init.py	2	0	100%
conversation/conversation.py	1036	352	66%
conversation/manager.py	131	42	68%
database/init.py	1	0	100%
database/dynamodb.py	387	234	40%
database/postgres.py	306	205	33%
pca_kmeans_rep/init.py	5	0	100%
pca_kmeans_rep/clusters.py	234	7	97%
pca_kmeans_rep/corr.py	98	17	83%
pca_kmeans_rep/pca.py	238	69	71%
pca_kmeans_rep/repness.py	361	44	88%
pca_kmeans_rep/stats.py	107	22	79%
poller.py	224	188	16%
regression/init.py	4	0	100%
regression/comparer.py	883	405	54%
regression/datasets.py	95	21	78%
regression/recorder.py	36	27	25%
regression/utils.py	137	38	72%
run_math_pipeline.py	260	239	8%
system.py	85	55	35%
umap_narrative/500_generate_embedding_umap_cluster.py	210	109	48%
umap_narrative/501_calculate_comment_extremity.py	112	54	52%
umap_narrative/502_calculate_priorities.py	135	135	0%
umap_narrative/700_datamapplot_for_layer.py	502	502	0%
umap_narrative/701_static_datamapplot_for_layer.py	310	310	0%
umap_narrative/702_consensus_divisive_datamapplot.py	432	432	0%
umap_narrative/801_narrative_report_batch.py	787	787	0%
umap_narrative/802_process_batch_results.py	265	265	0%
umap_narrative/803_check_batch_status.py	175	175	0%
umap_narrative/llm_factory_constructor/init.py	2	2	0%
umap_narrative/llm_factory_constructor/model_provider.py	157	157	0%
umap_narrative/polismath_commentgraph/init.py	1	0	100%
umap_narrative/polismath_commentgraph/cli.py	270	270	0%
umap_narrative/polismath_commentgraph/core/init.py	3	3	0%
umap_narrative/polismath_commentgraph/core/clustering.py	110	110	0%
umap_narrative/polismath_commentgraph/core/embedding.py	104	104	0%
umap_narrative/polismath_commentgraph/lambda_handler.py	219	219	0%
umap_narrative/polismath_commentgraph/schemas/init.py	2	0	100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py	160	9	94%
umap_narrative/polismath_commentgraph/tests/conftest.py	17	17	0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py	74	74	0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py	55	55	0%
umap_narrative/polismath_commentgraph/tests/test_storage.py	87	87	0%
umap_narrative/polismath_commentgraph/utils/init.py	3	0	100%
umap_narrative/polismath_commentgraph/utils/converter.py	283	237	16%
umap_narrative/polismath_commentgraph/utils/group_data.py	354	336	5%
umap_narrative/polismath_commentgraph/utils/storage.py	585	477	18%
umap_narrative/reset_conversation.py	159	50	69%
umap_narrative/run_pipeline.py	453	312	31%
utils/general.py	63	41	35%
Total	11213	7707	31%

jucor force-pushed the jc/pca_test_infra branch from 7a3478a to 8b38d9b Compare March 5, 2026 16:28

ballPointPenguin approved these changes Mar 8, 2026

View reviewed changes

jucor force-pushed the jc/pca_test_infra branch from 8b38d9b to 3ab141e Compare March 10, 2026 09:46

Base automatically changed from jc/pca_test_infra to edge March 10, 2026 10:04

jucor and others added 8 commits March 10, 2026 11:03

Save difference log regardless of errors

b68fe84

Account for flip sign in numerical difference reporting

2dfcca3

Reformat help message

972b41e

Allow for a few components with higher error tolerance

0dba400

jucor force-pushed the jc/pca_comparer branch from 2798a2f to 503762b Compare March 10, 2026 11:12

jucor merged commit 8ff5d49 into edge Mar 10, 2026
4 checks passed

jucor deleted the jc/pca_comparer branch March 10, 2026 11:25

jucor restored the jc/pca_comparer branch March 19, 2026 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression comparer: sign flip detection and projection metrics#2415

Regression comparer: sign flip detection and projection metrics#2415
jucor merged 8 commits intoedgefrom
jc/pca_comparer

jucor commented Mar 5, 2026 •

edited

Loading

Uh oh!

ballPointPenguin left a comment •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jucor commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

ballPointPenguin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 10, 2026

Delphi Coverage Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jucor commented Mar 5, 2026 •

edited

Loading

ballPointPenguin left a comment •

edited

Loading