-
Notifications
You must be signed in to change notification settings - Fork 78
Add c2st #389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for the PR :)
I like the idea, but lack good names that we could use for the respective submodules. Do you have an idea on how to distinguish them with two words that could serve as submodule names?
As far as I can tell, using cross-validation gives us a more powerful test for the same number of samples, so I would opt against removing this in the default setting. How are other implementations handling this? Is there some default setup we might want to implement? |
|
I think the PR looks good. Would you have time to add some tests too perhaps? I would keep them in the same module for now even if their inputs are different. We clearly document the difference in input and I am not sure how we would name them if we went for two modules. |
|
@vpratz I removed all dependencies on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
bayesflow/diagnostics/metrics/classifier_two_sample_test.py:21
- There is an inconsistency between the documented default value for mlp_widths (described as (256, 256)) and the actual default in the function signature ((64, 64)). Please update the docstring or the default value to be consistent.
mlp_widths: Sequence = (64, 64),
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Adds classifier two-sample test (C2ST) metric to the
bayesflow.diagnostics.metricsmodule. Example usage:What we need to discuss is the conceptual separation of metrics, namely those intended to capture the performance of an ensemble of posteriors (e.g., SBC, RMSE) and depend on the prior or those intended to capture the quality of individual posteriors (e.g., MMD, C2ST). Should these two classes of metrics reside in different submodules? @paul-buerkner @vpratz @LarsKue
Note also, that I removed the internal cross-validation in favor of a simple train-test split. Since the C2ST will generally be computed on different source-target posterior samples, any epistemic uncertainty will average out. However, I could allow K-fold as well with some custom helpers, mainly because I wanted to avoid the explicit dependence on
sklearn.