Save predictions to sacc #349

tilmantroester · 2023-12-06T16:07:05Z

This PR addresses #346.

It writes the predicted theory vector into a sacc object, replacing the corresponding data point. By default the method raises an error if the sacc object contains data points that are not being overwritten by the predictions to avoid inconsistency.

Future additions to the functionality would be to draw a sample from the likelihood, instead of using the (zero-noise) theory prediction.

One discussion point is how to store the predictions. At the moment, only TwoPoint does this and uses predicted_statistic_ for this. I've used predicted_statistic_ for now for compatibility with existing code but I wouldn't mind renaming this to theory_vector to match the data_vector.

The test doesn't cover the checking of the prediction against the content of the sacc object yet.

tilmantroester · 2023-12-07T17:26:18Z

Apart from the missing test and mypy errors, this can be reviewed I think @marcpaterno @vitenti

…y the type of its contents.

…gets.

vitenti · 2024-01-13T16:43:30Z

@tilmantroester , please, take a look on the modification to see if you agree with the current version.

tilmantroester · 2024-01-13T17:04:39Z

firecrown/likelihood/gauss_family/gauss_family.py

+
+        # Adding Gaussian noise defined by the covariance matrix.
+        assert self.cholesky is not None
+        predictions += np.dot(self.cholesky, np.random.randn(len(predictions)))


Adding noise should be optional. A common use case is that we want a noise-free theory prediction as the data vector to test our models and pipelines.

tilmantroester · 2024-01-13T17:07:35Z

Thanks for the help! I think adding noise to the prediction should be factored out of the functionality of saving the prediction. That way the functionality can also be used to save predictions+noise during sampling and would address #323.

* Updating documentation. * More tests for Statistics.

* Noise on realization is now optional.

vitenti · 2024-01-13T17:37:35Z

@tilmantroester , what about making it optional? Take a look on the last commits.

tilmantroester · 2024-01-13T18:10:14Z

I'd move make_realization to Likelihood or GaussFamily. Other likelihoods should still be able to save theory predictions, even if the sampling from the likelihood isn't implemented yet. We could add an abstract method Likelihood.sample, with the Gaussian and Student-T likelihoods implementing the specifics (I have the math for the latter).

…ta_vector and make_realization which generate a new sacc with or without noise.

vitenti · 2024-01-13T18:20:10Z

I'd move make_realization to Likelihood or GaussFamily. Other likelihoods should still be able to save theory predictions, even if the sampling from the likelihood isn't implemented yet. We could add an abstract method Likelihood.sample, with the Gaussian and Student-T likelihoods implementing the specifics (I have the math for the latter).

I think that we came up with the same solution, I just implemented your suggestion. Please, take a look.

vitenti · 2024-01-13T18:44:22Z

@tilmantroester I think that it is now complete. Please tell me if you think there is something else to do here, otherwise we will review and merge it during our next Firecrown meeting.

tilmantroester · 2024-01-13T19:09:04Z

Looks good to me, thanks!

marcpaterno

The Firecrown team will make the changes we have requested.

marcpaterno · 2024-01-16T18:01:00Z

firecrown/likelihood/gauss_family/gauss_family.py

        self.cov = cov
        self.cholesky = scipy.linalg.cholesky(self.cov, lower=True)
        self.inv_cov = np.linalg.inv(cov)

        self.state = State.READY

    @final
-    def get_cov(self) -> npt.NDArray[np.float64]:
-        """Gets the current covariance matrix."""
+    def get_cov(self, statistic: Optional[Statistic] = None) -> npt.NDArray[np.float64]:


We would prefer to pass the numpy array of indices that corresponds to the sub-matrix desired.
This would allow the caller to obtain the sub-matrix for two or more statistics, when that is desired.

The application I wrote this for was to get error bars when plotting the data vector. The idea was specifically to abstract away the indices and instead use the statistics, since that's what the user interacts with. I see the use of passing a list of statistics though, to get their corresponding sub-matrix.

We will expand the interface to have a single method get_cov that can accept:

a single Statistic

a list of Statistic

a single np.ndarray (the indices)

a list of np.ndarray (a list of indices)

In the case when a stat or list of stats is passed in, we need also to make sure that the stat (or all the stats) that have been passed are in the likelihood object on which we've called the get_cov method. We will make the code verify this.

We also have to specify the order in which entries appear in the returned matrix. We propose to respect the order of the entries of the list of statistics (or of numpy arrays), so that the user-specified order of the list controls the ordering of the elements in the returned matrix result, rather than the order of the entires in the SACC data object controlling the ordering of the entries in the returned matrix.

For example, if we pass a list of stats of length 3 (note the order of the entries in this passed to get_cov:
stats1 -> 0:9, stats2 -> 10:19, stats3 -> 20:29 => get_cov([stats1,stats3,stats2]) -> 0:9 + 20:29 + 10:19

marcpaterno · 2024-01-16T18:13:13Z

firecrown/likelihood/gauss_family/gauss_family.py

+
+        assert len(sacc_indices) == len(new_data_vector)
+
+        if strict:


Consider collapsing the nested ifs into a single if with multiple conditions.

We'll deal with this in a latter issue.

marcpaterno · 2024-01-16T18:25:16Z

firecrown/likelihood/gauss_family/gaussian.py

@@ -15,3 +16,12 @@ def compute_loglike(self, tools: ModelingTools):
        """Compute the log-likelihood."""

        return -0.5 * self.compute_chisq(tools)
+
+    def make_realization_vector(self) -> np.ndarray:


We should be checking pre- and post-conditions on self.state in every method.
Consider introducing a decorator to solve any pylint complaints about duplicated code.

marcpaterno · 2024-01-16T18:34:53Z

firecrown/likelihood/gauss_family/statistic/statistic.py

@@ -120,6 +120,8 @@ def __init__(self, parameter_prefix: Optional[str] = None):
        super().__init__(parameter_prefix=parameter_prefix)
        self.sacc_indices: Optional[npt.NDArray[np.int64]]
        self.ready = False
+        self.computed_theory_vector = False


Consider turning computed_theory_vector into a method, which would test the nullity of theory_vector, to remove the possibility of having an invalid pair of states.

marcpaterno · 2024-01-16T18:42:17Z

firecrown/likelihood/gauss_family/statistic/two_point.py

@@ -298,7 +289,7 @@ def read(self, sacc_data: sacc.Sacc) -> None:
        # I don't think we need these copies, but being safe here.
        self._ell_or_theta = _ell_or_theta.copy()
        self.data_vector = DataVector.create(_stat)
-        self.measured_statistic_ = self.data_vector
+        self.data_vector = self.data_vector


This looks like a needless assignment.

marcpaterno · 2024-01-16T19:04:10Z

firecrown/likelihood/likelihood.py

+            "This class does not implement make_realization_vector."
+        )
+
+    def make_realization(


mypy misses that this method does not return the required sacc.Sacc object.
This method should raise a NotImplementedError, so that it is only an error to use a subclass that lacks this method if that method is called.

marcpaterno · 2024-01-16T19:09:32Z

firecrown/updatable.py

        """Set self[key] to value; raise TypeError if Value is not Updatable."""
        if not isinstance(value, Updatable):
            raise TypeError(
-                "Values inserted into an UpdatableCollection must be Updatable"
+                "Only updatable items can be appended to an UpdatableCollection"


Consider changing 'updatable' -> 'Updatable'

marcpaterno · 2024-01-16T19:20:57Z

tests/likelihood/gauss_family/test_const_gaussian.py

+    assert np.all(theory_vector == np.array([1.0, 1.0, 1.0]))
+
+
+def test_chisquared_compute_vector_not_implemented(


Consider renaming: test_compute_chisquared_works_if_compute_theory_vector_raises_not_implemented_error

marcpaterno · 2024-01-16T19:28:59Z

tests/likelihood/gauss_family/test_const_gaussian.py

+    likelihood.update(params)
+    likelihood.compute_chisq(tools_with_vanilla_cosmology)
+
+    new_sacc = likelihood.make_realization(sacc_data_for_trivial_stat)


Consider adding pytest-rerunfailures to our list of required Conda packages, and decorating this test with @pytest.mark.flaky(reruns=2). This should reduce the random failure probability from 1 in a million to a more comfortable 1 in a trillion.

This decoration should probably be done to all the tests that are statistical in nature.

marcpaterno · 2024-01-16T19:33:42Z

tests/likelihood/gauss_family/test_const_gaussian.py

+    )
+
+
+def test_make_realization_no_noise(


This test should not be decorated for re-running failures; it is not a statistical test.

codecov · 2024-01-25T23:19:10Z

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (b978402) 95.22% compared to head (ef3b08e) 95.42%.

Files	Patch %	Lines
firecrown/likelihood/gauss_family/gauss_family.py	96.66%	2 Missing ⚠️
firecrown/likelihood/likelihood.py	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #349      +/-   ##
==========================================
+ Coverage   95.22%   95.42%   +0.20%     
==========================================
  Files          36       36              
  Lines        2490     2556      +66     
==========================================
+ Hits         2371     2439      +68     
+ Misses        119      117       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

vitenti · 2024-01-30T18:58:00Z

firecrown/likelihood/gauss_family/gauss_family.py

+
+        assert len(sacc_indices) == len(new_data_vector)
+
+        if strict:


We'll deal with this in a latter issue.

tilmantroester added 3 commits December 6, 2023 16:08

working prototype

d7f51ab

Add test, rename computed to computed_theory_vector

acee539

fix spelling

08597ff

vitenti added 12 commits January 10, 2024 09:21

Merge branch 'master' into save_predictions

aa379cc

Fixed linter issues.

52c8222

Improve type usage, now UpdatableCollection allows the user to specif…

4f77feb

…y the type of its contents.

Moved type definition to __init__.

8acdb04

Reorganized theory_vector and data_vector set/get and computation.

dd85f19

Merge branch 'master' into save_predictions

043bcfb

Combining new state machine and new methods.

d93848d

Fixed reset in cosmosis connector, must be called after all possible …

5973790

…gets.

Cleaning all quantities computed after updated.

fba2463

More tests for UpdatableCollection.

1207889

Testing new methods of GaussFamily (and older not tested ones).

92b3916

Adding noise to the realizations and testing it.

5d9b462

vitenti marked this pull request as ready for review January 13, 2024 16:42

vitenti requested a review from marcpaterno January 13, 2024 16:42

tilmantroester commented Jan 13, 2024

View reviewed changes

vitenti added 2 commits January 13, 2024 14:28

* Reorganizing and renaming.

16904b8

* Updating documentation. * More tests for Statistics.

* More documentation updates.

4bd3e4b

* Noise on realization is now optional.

vitenti added 2 commits January 13, 2024 14:39

* More documentation fix.

af11932

Normalizing make_realization parameters.

6962a58

Factoring make_realization_vector, which returns a new realization da…

0798278

…ta_vector and make_realization which generate a new sacc with or without noise.

Updated documentation.

123724e

vitenti added 2 commits January 13, 2024 15:33

Removed redundant checks.

2c6ca9e

Removing more redundancies.

86be99d

marcpaterno reviewed Jan 16, 2024

View reviewed changes

marcpaterno added 5 commits January 25, 2024 15:59

Merge branch 'master' into save_predictions

04825dd

Apply black

55df315

Delete repeated test

a155d99

Simplify checking of RE match

f38f762

Add COMPUTED state to GaussFamily

3a0c3af

marcpaterno added 6 commits January 25, 2024 20:34

Address failure to test line gauss_family:202

bdc3575

Require 100% coverage on changed lines

447c24d

Remove needless call to super().__init__

af14c76

Support getting covariance for list of statistics

a0f32d6

Merge branch 'master' into save_predictions

2a6134e

Apply black

ef3b08e

vitenti approved these changes Jan 30, 2024

View reviewed changes

vitenti merged commit d39ba1f into master Jan 30, 2024
10 checks passed

vitenti deleted the save_predictions branch January 30, 2024 19:10

vitenti mentioned this pull request Jan 30, 2024

Unit tests for missed error cases in PR #349 #365

Open

tilmantroester mentioned this pull request Feb 22, 2024

Save predictions to sacc #346

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save predictions to sacc #349

Save predictions to sacc #349

tilmantroester commented Dec 6, 2023

tilmantroester commented Dec 7, 2023

vitenti commented Jan 13, 2024

tilmantroester Jan 13, 2024

tilmantroester commented Jan 13, 2024

vitenti commented Jan 13, 2024

tilmantroester commented Jan 13, 2024

vitenti commented Jan 13, 2024

vitenti commented Jan 13, 2024

tilmantroester commented Jan 13, 2024

marcpaterno left a comment •

edited

Loading

marcpaterno Jan 16, 2024

tilmantroester Jan 17, 2024

marcpaterno Jan 18, 2024

marcpaterno Jan 16, 2024

vitenti Jan 30, 2024

marcpaterno Jan 16, 2024

marcpaterno Jan 16, 2024

marcpaterno Jan 16, 2024

marcpaterno Jan 16, 2024

marcpaterno Jan 16, 2024

marcpaterno Jan 16, 2024

marcpaterno Jan 16, 2024

marcpaterno Jan 16, 2024

codecov bot commented Jan 25, 2024 •

edited

Loading

vitenti Jan 30, 2024

		assert np.all(theory_vector == np.array([1.0, 1.0, 1.0]))


		def test_chisquared_compute_vector_not_implemented(

Save predictions to sacc #349

Save predictions to sacc #349

Conversation

tilmantroester commented Dec 6, 2023

tilmantroester commented Dec 7, 2023

vitenti commented Jan 13, 2024

Choose a reason for hiding this comment

tilmantroester commented Jan 13, 2024

vitenti commented Jan 13, 2024

tilmantroester commented Jan 13, 2024

vitenti commented Jan 13, 2024

vitenti commented Jan 13, 2024

tilmantroester commented Jan 13, 2024

marcpaterno left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 25, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

marcpaterno left a comment •

edited

Loading

codecov bot commented Jan 25, 2024 •

edited

Loading