Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save predictions to sacc #349

Merged
merged 34 commits into from
Jan 30, 2024
Merged

Save predictions to sacc #349

merged 34 commits into from
Jan 30, 2024

Conversation

tilmantroester
Copy link
Contributor

This PR addresses #346.

It writes the predicted theory vector into a sacc object, replacing the corresponding data point. By default the method raises an error if the sacc object contains data points that are not being overwritten by the predictions to avoid inconsistency.

Future additions to the functionality would be to draw a sample from the likelihood, instead of using the (zero-noise) theory prediction.

One discussion point is how to store the predictions. At the moment, only TwoPoint does this and uses predicted_statistic_ for this. I've used predicted_statistic_ for now for compatibility with existing code but I wouldn't mind renaming this to theory_vector to match the data_vector.

The test doesn't cover the checking of the prediction against the content of the sacc object yet.

@tilmantroester
Copy link
Contributor Author

Apart from the missing test and mypy errors, this can be reviewed I think @marcpaterno @vitenti

@vitenti vitenti marked this pull request as ready for review January 13, 2024 16:42
@vitenti
Copy link
Collaborator

vitenti commented Jan 13, 2024

@tilmantroester , please, take a look on the modification to see if you agree with the current version.


# Adding Gaussian noise defined by the covariance matrix.
assert self.cholesky is not None
predictions += np.dot(self.cholesky, np.random.randn(len(predictions)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding noise should be optional. A common use case is that we want a noise-free theory prediction as the data vector to test our models and pipelines.

@tilmantroester
Copy link
Contributor Author

Thanks for the help! I think adding noise to the prediction should be factored out of the functionality of saving the prediction. That way the functionality can also be used to save predictions+noise during sampling and would address #323.

* Updating documentation.
* More tests for Statistics.
* Noise on realization is now optional.
@vitenti
Copy link
Collaborator

vitenti commented Jan 13, 2024

@tilmantroester , what about making it optional? Take a look on the last commits.

@tilmantroester
Copy link
Contributor Author

I'd move make_realization to Likelihood or GaussFamily. Other likelihoods should still be able to save theory predictions, even if the sampling from the likelihood isn't implemented yet. We could add an abstract method Likelihood.sample, with the Gaussian and Student-T likelihoods implementing the specifics (I have the math for the latter).

…ta_vector and make_realization which generate a new sacc with or without noise.
@vitenti
Copy link
Collaborator

vitenti commented Jan 13, 2024

I'd move make_realization to Likelihood or GaussFamily. Other likelihoods should still be able to save theory predictions, even if the sampling from the likelihood isn't implemented yet. We could add an abstract method Likelihood.sample, with the Gaussian and Student-T likelihoods implementing the specifics (I have the math for the latter).

I think that we came up with the same solution, I just implemented your suggestion. Please, take a look.

@vitenti
Copy link
Collaborator

vitenti commented Jan 13, 2024

@tilmantroester I think that it is now complete. Please tell me if you think there is something else to do here, otherwise we will review and merge it during our next Firecrown meeting.

@tilmantroester
Copy link
Contributor Author

Looks good to me, thanks!

Copy link
Collaborator

@marcpaterno marcpaterno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Firecrown team will make the changes we have requested.

self.cov = cov
self.cholesky = scipy.linalg.cholesky(self.cov, lower=True)
self.inv_cov = np.linalg.inv(cov)

self.state = State.READY

@final
def get_cov(self) -> npt.NDArray[np.float64]:
"""Gets the current covariance matrix."""
def get_cov(self, statistic: Optional[Statistic] = None) -> npt.NDArray[np.float64]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would prefer to pass the numpy array of indices that corresponds to the sub-matrix desired.
This would allow the caller to obtain the sub-matrix for two or more statistics, when that is desired.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The application I wrote this for was to get error bars when plotting the data vector. The idea was specifically to abstract away the indices and instead use the statistics, since that's what the user interacts with. I see the use of passing a list of statistics though, to get their corresponding sub-matrix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will expand the interface to have a single method get_cov that can accept:

  1. a single Statistic
  2. a list of Statistic
  3. a single np.ndarray (the indices)
  4. a list of np.ndarray (a list of indices)

In the case when a stat or list of stats is passed in, we need also to make sure that the stat (or all the stats) that have been passed are in the likelihood object on which we've called the get_cov method. We will make the code verify this.

We also have to specify the order in which entries appear in the returned matrix. We propose to respect the order of the entries of the list of statistics (or of numpy arrays), so that the user-specified order of the list controls the ordering of the elements in the returned matrix result, rather than the order of the entires in the SACC data object controlling the ordering of the entries in the returned matrix.

For example, if we pass a list of stats of length 3 (note the order of the entries in this passed to get_cov:
stats1 -> 0:9, stats2 -> 10:19, stats3 -> 20:29 => get_cov([stats1,stats3,stats2]) -> 0:9 + 20:29 + 10:19


assert len(sacc_indices) == len(new_data_vector)

if strict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider collapsing the nested ifs into a single if with multiple conditions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll deal with this in a latter issue.

@@ -15,3 +16,12 @@ def compute_loglike(self, tools: ModelingTools):
"""Compute the log-likelihood."""

return -0.5 * self.compute_chisq(tools)

def make_realization_vector(self) -> np.ndarray:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be checking pre- and post-conditions on self.state in every method.
Consider introducing a decorator to solve any pylint complaints about duplicated code.

@@ -120,6 +120,8 @@ def __init__(self, parameter_prefix: Optional[str] = None):
super().__init__(parameter_prefix=parameter_prefix)
self.sacc_indices: Optional[npt.NDArray[np.int64]]
self.ready = False
self.computed_theory_vector = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider turning computed_theory_vector into a method, which would test the nullity of theory_vector, to remove the possibility of having an invalid pair of states.

@@ -298,7 +289,7 @@ def read(self, sacc_data: sacc.Sacc) -> None:
# I don't think we need these copies, but being safe here.
self._ell_or_theta = _ell_or_theta.copy()
self.data_vector = DataVector.create(_stat)
self.measured_statistic_ = self.data_vector
self.data_vector = self.data_vector
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a needless assignment.

"This class does not implement make_realization_vector."
)

def make_realization(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy misses that this method does not return the required sacc.Sacc object.
This method should raise a NotImplementedError, so that it is only an error to use a subclass that lacks this method if that method is called.

"""Set self[key] to value; raise TypeError if Value is not Updatable."""
if not isinstance(value, Updatable):
raise TypeError(
"Values inserted into an UpdatableCollection must be Updatable"
"Only updatable items can be appended to an UpdatableCollection"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing 'updatable' -> 'Updatable'

assert np.all(theory_vector == np.array([1.0, 1.0, 1.0]))


def test_chisquared_compute_vector_not_implemented(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming: test_compute_chisquared_works_if_compute_theory_vector_raises_not_implemented_error

likelihood.update(params)
likelihood.compute_chisq(tools_with_vanilla_cosmology)

new_sacc = likelihood.make_realization(sacc_data_for_trivial_stat)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding pytest-rerunfailures to our list of required Conda packages, and decorating this test with @pytest.mark.flaky(reruns=2). This should reduce the random failure probability from 1 in a million to a more comfortable 1 in a trillion.

This decoration should probably be done to all the tests that are statistical in nature.

)


def test_make_realization_no_noise(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should not be decorated for re-running failures; it is not a statistical test.

Copy link

codecov bot commented Jan 25, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (b978402) 95.22% compared to head (ef3b08e) 95.42%.

Files Patch % Lines
firecrown/likelihood/gauss_family/gauss_family.py 96.66% 2 Missing ⚠️
firecrown/likelihood/likelihood.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #349      +/-   ##
==========================================
+ Coverage   95.22%   95.42%   +0.20%     
==========================================
  Files          36       36              
  Lines        2490     2556      +66     
==========================================
+ Hits         2371     2439      +68     
+ Misses        119      117       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


assert len(sacc_indices) == len(new_data_vector)

if strict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll deal with this in a latter issue.

@vitenti vitenti merged commit d39ba1f into master Jan 30, 2024
10 checks passed
@vitenti vitenti deleted the save_predictions branch January 30, 2024 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants