Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ppe method for predictive elicitation (experimental) #336

Merged
merged 9 commits into from
Mar 7, 2024

Conversation

aloctavodia
Copy link
Contributor

Early draft for a predictive elicitation method. I only tested on a couple of simple models, I already know it will fail for others, this is just a proof of concept.

The main idea is that the user provides a model (currently only PyMC, adding Bambi should be easy) and a "target distribution". This distribution is not any particular data set, but the "not yet observed data". The author of Understanding Advanced Statistical Methods calls this "DATA" as opposed to "data" (the dataset I want to "fit"). So if my model is about the height of adults in San Luis (from which I got a sample, i.e. my data). I can use my domain knowledge of adult humans (DATA) to elicit the target distribution.

A summary of the algorithm is:
Generate a sample from the target distribution.
Maximize the model's likelihood to that sample (i.e. we find the parameters for a fixed "observation").
Generate a new sample from the target and repeat.
Collect the optimized values in an array (one per prior parameter in the original model).
Use MLE to fit the optimized values to their corresponding families in the original model.

This approach is similar to what we do in Kulprit. One difference is that for kulprit the "target" is actually the posterior predictive distribution of a reference model, and we are interested in finding submodels (and their psoteriors) that will induce predictions as close as possible to the predictions from the reference model. Here we don't have a reference model, we instead have a human (or potentially a few humans). The other difference is that for kulrpit the optimized values are an approximation to the posterior that we care, here we need to fit those values to the prior's families in the original model, because we can not use samples as priors in a PyMC (or other PPLs) model.
The other difference is that here we use a slightly different approach to obtain the likelihood function for the optimization routine. If this can be generalized, we can use it in Kulprit too, I think this approach was not available when we discussed Kulprit's design and it could potentially make the code easier to maintain and extend.

@aloctavodia aloctavodia changed the title [WIP] PPE draft Add ppe method for predictive elicitation (experimental) Mar 7, 2024
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 16.08392% with 120 lines in your changes are missing coverage. Please review.

Project coverage is 84.05%. Comparing base (b2732e5) to head (e5829ec).

Files Patch % Lines
preliz/ppls/pymc_io.py 12.71% 103 Missing ⚠️
preliz/internal/optimization.py 9.09% 10 Missing ⚠️
preliz/predictive/ppe.py 41.66% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #336      +/-   ##
==========================================
- Coverage   86.28%   84.05%   -2.23%     
==========================================
  Files          40       42       +2     
  Lines        4425     4567     +142     
==========================================
+ Hits         3818     3839      +21     
- Misses        607      728     +121     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aloctavodia aloctavodia merged commit fd9fcb6 into arviz-devs:main Mar 7, 2024
4 checks passed
@aloctavodia aloctavodia deleted the ppe branch March 7, 2024 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants