Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to plot credible intervals with plot_ppc #1654

Open
lucianopaz opened this issue Apr 8, 2021 · 6 comments
Open

Add a way to plot credible intervals with plot_ppc #1654

lucianopaz opened this issue Apr 8, 2021 · 6 comments

Comments

@lucianopaz
Copy link
Contributor

Tell us about it

plot_ppc is very nice, you can control the kind of plot and also the number of posterior predictive lines to draw with num_pp_samples. I would like to ask if it would be possible to add an option to plot credible intervals instead of single draws from the posterior predictive. Consider the following example:

import numpy as np
import arviz as az
import pymc3 as pm
from matplotlib import pyplot as plt

_x = np.random.uniform(-5, 5, 100)
_m = 1.5
_b = -0.7
_obs = np.random.normal(x * _m + _b, 1)

with pm.Model():
    x = pm.Data("x", x)
    m = pm.Normal("m", 0, 5)
    b = pm.Normal("b", 0, 5)
    obs = pm.Normal("obs", x * m + b, 1, observed=_obs)

    idata = pm.sample(return_inferencedata=True)
    ppc = pm.sample_posterior_predictive(idata)
    idata.extend(az.from_pymc3(posterior_predictive=ppc))

az.plot_ppc(idata);

The resulting plot is something like this
image

All of the individual lines from the posterior predictive samples are quite hard to read, and it's hard to make sense of how likely it is to find a sample in a given interval.

I would like to plot something like this:

ax = az.plot_dist(idata.observed_data.to_array(), color="black", label="Observed obs")
grid = np.linspace(*ax.get_xlim(), 1000)

# Get the ppc HDI
lines = []
for line in idata.posterior_predictive.to_array().values.reshape([-1, len(_obs)]):
    lines.append(stats.gaussian_kde(line)(grid))
lines = np.array(lines)
pdf = np.mean(lines, axis=0)
hdi = az.hdi(lines, hdi_prob=0.95)
az.plot_hdi(grid, hdi_data=hdi, color="C0", ax=ax, fill_kwargs={"label": "95% HDI"})
ax.plot(grid, pdf, color="C0", linestyle="--", label="Posterior predictive mean obs")
ax.legend();

image

where the filled in area is the posterior predictive's HDI at a certain level.

@OriolAbril
Copy link
Member

I think the ideal situation here would be to implement https://arxiv.org/abs/2103.10522 to compare several (or all) of the posterior predictive distributions to the observed one. The hdi of kde lines looks nice, but I don't think there is any guarantee that kde lines of the same distribution will lie completely inside the shaded region with 95% probability.

We can definitely add the option for the kde hdi shaded area but we have to be careful in how we document it.

@lucianopaz
Copy link
Contributor Author

I agree that the paper's method is a great posterior predictive check in itself. My proposal was much more mundane.

Regarding the problem about the kde lines not being in the region of 95% probability, I think that it should happen only 5% of the time. What I thought could be an issue is that the hdi region could be artificially large due to multimodality, for example consider this graph:

image

The hdi bounds will cover an interval where the posterior predictive samples will never be seen:

image

@OriolAbril
Copy link
Member

Regarding the problem about the kde lines not being in the region of 95% probability, I think that it should happen only 5% of the time.

I would bet against this actually, we are calculating the hdi on a per gridpoint basis, so while we can say tht 5% of the lines will be outside the hdi region at a given point, I don't think we can say anything about whole lines being outside the hdi region at any point of the grid. I think that for us to say that the shaded area is the 95% hdi region, it should be interpreted that kdes will be completely inside the shaded area with 95% probability, I don't know how the gridpoint hdi probability can be translated to probabilities on whole kde lines but I am quite sure those should be different. I'll try to run some example on that.

@OriolAbril
Copy link
Member

It looks like the coverage on whole kde lines is indeed far from the "nominal" hdi when generating areas with this method.

import arviz as az
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng()
u = rng.uniform(0, 1, size=(1, 1000, 200))
grid, _ = az.kde(u[0,0,:], custom_lims=(0, 1))  # setting custom lims to make sure all kdes use the same grid

kdes = np.array([az.kde(u[0, i, :], custom_lims=(0, 1))[1] for i in range(1000)])

# check things look ok
plt.plot(grid, kdes.T, color="C0")
plt.show()

hdi = az.hdi(kdes[None, :, :])
np.mean([np.all(kdes[i, :] > hdi[:, 0]) & np.all(kdes[i, :] < hdi[:, 1]) for i in range(1000)])
# Output
# 0.534

@ghost
Copy link

ghost commented Apr 10, 2021

It looks like the coverage on whole kde lines is indeed far from the "nominal" hdi when generating areas with this method.

import arviz as az
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng()
u = rng.uniform(0, 1, size=(1, 1000, 200))
grid, _ = az.kde(u[0,0,:], custom_lims=(0, 1))  # setting custom lims to make sure all kdes use the same grid

kdes = np.array([az.kde(u[0, i, :], custom_lims=(0, 1))[1] for i in range(1000)])

# check things look ok
plt.plot(grid, kdes.T, color="C0")
plt.show()

hdi = az.hdi(kdes[None, :, :])
np.mean([np.all(kdes[i, :] > hdi[:, 0]) & np.all(kdes[i, :] < hdi[:, 1]) for i in range(1000)])
# Output
# 0.534

I think this code is testing whether the whole graph falls inside the HDI, which isn't an especially useful diagnostic (if the KDE is mostly inside the HDI, it's probably good and the bits of it sticking out are probably random chance). My intuitive understanding of what an HDI for a PPC looks like is that if I have a 94% HDI, 94% of my KDE should fall inside the HDI on average if my model is solid. This seems much easier to interpret than the mess of lines that gets returned at the moment.

@OriolAbril
Copy link
Member

OriolAbril commented Apr 10, 2021

Yes, I am testing that the whole kde line is inside the hdi shaded area, which is what I believe should be considered and the only clear and interpretable diagnostic. Again, in my opinion the ideal solution to this is using https://arxiv.org/abs/2103.10522 though, not spaghetti plots nor the hdi proposed here (nor variations on that to try and fix the hdi region to account for whole lines).

I do believe it could be useful to add this, but we have to be careful on that because it is no clear at all what exactly does the hdi shaded area represent nor how to interpret lines going outside that region.

My intuitive understanding of what an HDI for a PPC looks like is that if I have a 94% HDI, 94% of my KDE should fall inside the HDI on average if my model is solid.

I don't think that's true either and I am also quite sure about this. kdes are continuous lines, so the probability of the 2nd point being outside the region given the 1st one was outside is different that if the 1st one was inside, they are not independent values. Yet, we are calculating the hdi as if they were. Moreover, even if the hdi had this interpretation you mention, I don't think having users estimate by themselves (and visually) if 95% or 90% of the kde line is outside the hdi region is a good idea.

As a side note on spaghetti plots, interpreting with an animation could be useful. Imagine the following situation. You start the animation with an spaghetti plot with 100 kde lines from the posterior predictive, then 10 more lines are added to the plot one by one, 9 come from the posterior predictive too and one is the one corresponding to the observations. You have to try and guess which is the kde of the observed data. Once the 10 lines have been added, the plot is updated to highlight the observed kde. If you guessed which one it is (and if in general anyone can guess) then your model is not reproducing the generative process correctly, if generally it's not possible to know which is which your model is probably ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants