Add support for posterior predictive distributions #323

tilmantroester · 2023-10-10T15:59:07Z

This requires functionality to draw samples of data vectors from the likelihood and passing them back to the sampling framework.

The ability to draw samples from the likelihood is useful in other contexts as well, such as generating mock data vectors.

vitenti · 2023-10-10T17:15:17Z

This is already supported by NumCosmo's connector. Moreover, in Augur you can find code to do that, see for example srd_y1_3x2_like.py, where they generate a data vector from a theory vector and build a likelihood to be used by any framework.

marcpaterno · 2023-10-11T14:10:07Z

@tilmantroester is there something that Augur does, or something part of what it does, that you thing should be moved from Augur to Firecrown?

tilmantroester · 2023-10-11T14:28:09Z

There are two reasons why I think this should be in firecrown:
One reason is that it's easiest to create the PPD draws while sampling instead of trying to create them after the fact. For a Gaussian likelihood with fixed covariance doing it in a post processing step is relatively straightforward if the model predictions get saved during sampling but for other likelihoods this might require re-evaluating the likelihood at a large number of points, which we want to avoid.
Drawing posterior predictive samples conditioned on parts of the data vector is probably easier to do in firecrown as well, since the description on how the data vector is structured is readily available there.
The other reason is that I might want to be able to use firecrown and generate mock data without the augur dependency, especially when building experimental pipelines.

joezuntz · 2023-10-11T14:38:08Z

The ability to return data vectors is also useful for general debugging, and I'd recommend saving the information to do this.

However, in cosmosis I did find that the one case where this was slow compared to likelihood evaluations was Supernovae, so perhaps make it optional?

vitenti · 2023-10-16T11:04:07Z

The CosmoSIS connector presently includes a section in the DataBlock labeled data_vector. This section contains three elements: firecrown_theory (the theory vector), firecrown_data (the data vector), and firecrown_inverse_covariance (the inverse covariance). To have these components written in the output chains, you can add them to the CosmoSIS .ini file under the extra_output section.

This behavior is automatically enabled for GaussFamily likelihoods, but the current implementation is not considered ideal. We are working to refine this process, with the goal of achieving the same outcome using DerivedParameter. The reason for the delay in implementing this change is the inherent difficulty of handling vector-derived parameters without resorting to the solution of appending _n to the derived parameter name to match the vector index.

Furthermore, as pointed out by @joezuntz, including a lengthy theory vector in the output chains can have a detrimental impact on processing speed. In NumCosmo, any data added to the output chains undergoes further processing, which includes computing statistics such as mean, variance, autocorrelation, and more. Additionally, including an extensive theory vector in the output would not only significantly slow down these processing tasks but also result in exceptionally large output files.

Thus, I think we should make this behavior optional and eventually move to a more general solutions using DerivedParameter so all frameworks can use it equally. @tilmantroester, would you prefer a more complete solution where random draws are also performed from each theory + covariance?

tilmantroester · 2023-10-16T11:50:34Z

At this point I'm not too concerned about how this gets piped back to the sampling frameworks. For now I imagine just implementing a method sample in the likelihood class.
This could then be optionally be put into some data block of the sampling framework.

As you said, treating theory or mock data vectors as derived parameters and dumping them into the chain output is at best cumbersome and at worst breaks the IO.
Dealing with derived data that isn't just a parameter is something that the sampling frameworks would have to implement I think. I don't know if there is such a functionality in cosmosis yet @joezuntz

joezuntz · 2023-10-16T12:22:27Z

Samplers in CosmoSIS (or scripts using it interactively) can fully access the data block containing all the products of a pipeline, including the data vectors, so yes, this is already there. It's used in the Fisher sampler, for example.

tilmantroester · 2023-10-16T12:40:12Z

Sorry, what I meant was, is there a way to efficiently save parts of the data block while sampling, independently of the default chain output.
For example, saving the theory vector at each chain sample to a file, taking care of the usual IO pitfalls like multiple MPI processes, and without having an unwieldy extra_output option with an entry for each data point.

joezuntz · 2023-10-16T12:44:12Z

Oh, I see. You can specify a vector output for extra_output, if you know the length in advance, by doing, e.g. for a length 222 data vector, extra_output = data_vector/2pt_theory#222. I know that's a bit annoying. I don't have another approach built-in.

tilmantroester mentioned this issue Jan 13, 2024

Save predictions to sacc #349

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for posterior predictive distributions #323

Add support for posterior predictive distributions #323

tilmantroester commented Oct 10, 2023

vitenti commented Oct 10, 2023

marcpaterno commented Oct 11, 2023

tilmantroester commented Oct 11, 2023

joezuntz commented Oct 11, 2023

vitenti commented Oct 16, 2023

tilmantroester commented Oct 16, 2023 •

edited

Loading

joezuntz commented Oct 16, 2023

tilmantroester commented Oct 16, 2023

joezuntz commented Oct 16, 2023

Add support for posterior predictive distributions #323

Add support for posterior predictive distributions #323

Comments

tilmantroester commented Oct 10, 2023

vitenti commented Oct 10, 2023

marcpaterno commented Oct 11, 2023

tilmantroester commented Oct 11, 2023

joezuntz commented Oct 11, 2023

vitenti commented Oct 16, 2023

tilmantroester commented Oct 16, 2023 • edited Loading

joezuntz commented Oct 16, 2023

tilmantroester commented Oct 16, 2023

joezuntz commented Oct 16, 2023

tilmantroester commented Oct 16, 2023 •

edited

Loading