Add c2st #389

stefanradev93 · 2025-04-04T23:21:22Z

Adds classifier two-sample test (C2ST) metric to the bayesflow.diagnostics.metrics module. Example usage:

samples = ...
reference_samples = ...

# Using default arguments
c2st = classifier_two_sample_test(samples, reference_samples)

# Customizing arguments - will return full classification report + classifier and use a smaller validation data ratio
c2st_dict = classifier_two_sample_test(samples, reference_samples, return_metric_only=False, validation_split=0.2)

What we need to discuss is the conceptual separation of metrics, namely those intended to capture the performance of an ensemble of posteriors (e.g., SBC, RMSE) and depend on the prior or those intended to capture the quality of individual posteriors (e.g., MMD, C2ST). Should these two classes of metrics reside in different submodules? @paul-buerkner @vpratz @LarsKue

Note also, that I removed the internal cross-validation in favor of a simple train-test split. Since the C2ST will generally be computed on different source-target posterior samples, any epistemic uncertainty will average out. However, I could allow K-fold as well with some custom helpers, mainly because I wanted to avoid the explicit dependence on sklearn.

codecov · 2025-04-04T23:33:09Z

Codecov Report

Attention: Patch coverage is 82.55814% with 15 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
bayesflow/utils/classification/confusion_matrix.py	66.66%	8 Missing ⚠️
...ayesflow/utils/classification/calibration_curve.py	77.27%	5 Missing ⚠️
.../diagnostics/metrics/classifier_two_sample_test.py	93.33%	2 Missing ⚠️

Files with missing lines	Coverage Δ
bayesflow/diagnostics/metrics/__init__.py	`100.00% <100.00%> (ø)`
.../diagnostics/metrics/expected_calibration_error.py	`100.00% <100.00%> (ø)`
...yesflow/diagnostics/plots/calibration_histogram.py	`90.90% <ø> (ø)`
bayesflow/diagnostics/plots/mc_calibration.py	`100.00% <100.00%> (ø)`
bayesflow/diagnostics/plots/mc_confusion_matrix.py	`89.74% <100.00%> (ø)`
bayesflow/utils/__init__.py	`100.00% <100.00%> (ø)`
bayesflow/utils/classification/__init__.py	`100.00% <100.00%> (ø)`
.../diagnostics/metrics/classifier_two_sample_test.py	`93.33% <93.33%> (ø)`
...ayesflow/utils/classification/calibration_curve.py	`77.27% <77.27%> (ø)`
bayesflow/utils/classification/confusion_matrix.py	`66.66% <66.66%> (ø)`

... and 4 files with indirect coverage changes

vpratz · 2025-04-07T09:25:12Z

Thanks for the PR :)

Should these two classes of metrics reside in different submodules?

I like the idea, but lack good names that we could use for the respective submodules. Do you have an idea on how to distinguish them with two words that could serve as submodule names?

Note also, that I removed the internal cross-validation in favor of a simple train-test split.

As far as I can tell, using cross-validation gives us a more powerful test for the same number of samples, so I would opt against removing this in the default setting. How are other implementations handling this? Is there some default setup we might want to implement?
We use sklearn in expected_calibration_error and mc_confusion_matrix as well. Is the plan to refactor those as well, or are we going to keep the dependency anyway?

paul-buerkner · 2025-04-07T10:27:15Z

I think the PR looks good. Would you have time to add some tests too perhaps?

I would keep them in the same module for now even if their inputs are different. We clearly document the difference in input and I am not sure how we would name them if we went for two modules.

stefanradev93 · 2025-04-07T23:22:54Z

@vpratz I removed all dependencies on sklearn across the library. @paul-buerkner Added tests for sensitivity of c2st.

Copilot

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

bayesflow/diagnostics/metrics/classifier_two_sample_test.py:21

There is an inconsistency between the documented default value for mlp_widths (described as (256, 256)) and the actual default in the function signature ((64, 64)). Please update the docstring or the default value to be consistent.

mlp_widths: Sequence = (64, 64),

bayesflow/utils/classification/confusion_matrix.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Add c2st

1a2e35d

stefanradev93 assigned stefanradev93, paul-buerkner and vpratz and unassigned stefanradev93 Apr 4, 2025

stefanradev93 added 2 commits April 7, 2025 17:22

Remove sklearn as dep

ee201d5

Add test and fix weird keras behavior

d795788

stefanradev93 added 4 commits April 7, 2025 20:04

Fix n_bins to num_bins

cfdfed6

Fix torch autograd bug

0ed2524

Fix doc

00ff927

Merge branch 'dev' into c2st

4c34065

LarsKue assigned stefanradev93 and unassigned paul-buerkner and vpratz Apr 8, 2025

LarsKue requested a review from Copilot April 8, 2025 17:58

Copilot AI reviewed Apr 8, 2025

View reviewed changes

bayesflow/utils/classification/confusion_matrix.py Outdated Show resolved Hide resolved

LarsKue approved these changes Apr 8, 2025

View reviewed changes

Update bayesflow/utils/classification/confusion_matrix.py

8b9b16f

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

LarsKue merged commit 387abe8 into dev Apr 8, 2025
15 checks passed

LarsKue deleted the c2st branch April 8, 2025 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add c2st #389

Add c2st #389

Uh oh!

stefanradev93 commented Apr 4, 2025 •

edited

Loading

Uh oh!

codecov bot commented Apr 4, 2025 •

edited

Loading

Uh oh!

vpratz commented Apr 7, 2025

Uh oh!

paul-buerkner commented Apr 7, 2025

Uh oh!

stefanradev93 commented Apr 7, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add c2st #389

Add c2st #389

Uh oh!

Conversation

stefanradev93 commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

vpratz commented Apr 7, 2025

Uh oh!

paul-buerkner commented Apr 7, 2025

Uh oh!

stefanradev93 commented Apr 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

stefanradev93 commented Apr 4, 2025 •

edited

Loading

codecov bot commented Apr 4, 2025 •

edited

Loading