Additional LOO functionality #2059

sethaxen · 2022-06-23T09:55:37Z

There are several useful functions that can be used for model assessment and comparison that could be added here. In some cases these are already available in loo or soon will be:

loo_expectation(data, values, kwargs...): compute expectation of values wrt the leave-one-out posteriors using PSIS. e.g. if values are likelihoods, this computes pointwise ELPD, though less efficiently and stably than loo does. If values are posterior predictions, then the result is leave-one-out posterior predictive means. Exists in loo as E_loo and was previously proposed to be added to ArviZ in Is there a LOO R2? #1931 (comment).
logo(data, groups, var_name, kwargs...): compute leave-one-group-out crossvalidation. groups is an array of the same shape as the log-likelihood corresponding to the var specified by var_name, minus sample dimensions. The unique group values define new dimensions, and log-likelihoods that share the same group are summed together. The docstring should warn that this is more likely to produce Pareto shape errors than LOO, and it will probably fail if there are group-specific parameters.
loo_predictive_error(data, method, kwargs...): use loo_expectation to compute point estimates of LOO posterior-predictive means, and then use the name or function method to compute pointwise and mean predictive error across the data points. Coming to loo: Additional loo utilities stan-dev/loo#202
loo_crps(idata, scale=False, kwargs...): use loo_expectation to compute continuous ranked probability score (CRPS) or it's scaled variant SCRPS, which is another strictly proper scoring rule besides Log score (ELPD) to use for model comparison. Requires 2 posterior predictive draws for each posterior draw, so blocked by Supporting multiple posterior predictive draws for each posterior draw #2048. Coming to loo: CRPS and SCRPS stan-dev/loo#203

The text was updated successfully, but these errors were encountered:

OriolAbril · 2022-06-27T22:00:28Z

Generally agree. Not so sure about the logo one. It would basically be

groupby using xarray
call az.loo

I personally think it would be better to document this and have users do the groupby themselves instead of adding a logo function. If we add a function, I expect users to assume only cv schemes directly compatible with loo or logo are supported, but they can also do any combination themselves and I don't think we can support all possible cases, a draft example of already using logo with loo would be https://nbviewer.org/github/OriolAbril/Exploratory-Analysis-of-Bayesian-Models/blob/multi_obs_ic/content/Section_04/Multiple_likelihoods.ipynb#Predicting-team-level-performance.

sethaxen · 2022-07-20T11:07:50Z

Good point about logo. I agree, if we can document it, and the code only takes a few lines, then it doesn't make sense to add a logo implementation for this. Thanks for the example; should be straightforward to adapt.

aadya940 · 2023-12-21T14:16:31Z

Hi @sethaxen , I was willing to implement the loo_expectation function in arviz.stats. From what I understood, we could implement it by calculating the weights using az.loo and then use the weights to calculate the weighted expectation, something like (values * weights).sum()/weights.sum(). I just wanted to know if I'm on the right track?

sethaxen · 2023-12-21T18:15:16Z

That would be one approach. loo both computes the log-weights using psislw and then estimates the ELPD. Technically this function only needs the log-weights from loo, so it would be more efficient to just call psislw as loo does.

something like (values * weights).sum()/weights.sum().

Yep, this is the right approach, with a few notes:

The log-weights returned by psislw are already normalized, so no need for weights.sum().
You probably want to avoid numerical issues that would happen when exp(log_weights[i]) will underflow to 0 but exp(log(abs(values[i])) + log_weights[i]) might actually be large. Something like (np.sign(values) * np.exp(np.log(np.abs(values)) + log_weights)).sum() might be the right approach.
Third, the PSIS paper in Section E describes an approach for diagnosing an expectation computed using importance weights, described below.

Namely, in addition to using psis(-log_likelihood) to get the LOO log-weights, you would also compute something like

summand = np.sign(values) * np.exp(np.log(np.abs(values)) - log_likelihood))
log_abs_summand = np.abs(values) - log_likelihood
max_log_abs_summand = log_abs_summand.max()
summand_scaled = np.sign(values) * np.exp(max_log_abs_summand - max_log_abs_summand)
tail_right = ...  # select right tail of summand_scaled using approach of https://github.com/arviz-devs/arviz/blob/6b1b2be60cca804d4b0ec43a3303820bfa8785c6/arviz/stats/stats.py#L974-L980
tail_left = ...  # do the same but for `-summand_scaled`
k_hat_right, _ = _gpdfit(tail_right)
k_hat_left, _ = _gpdfit(tail_left) 
k_hat = max(k_hat_right, k_hat_left)

It might be better to refactor _psislw slightly to reduce code reuse. k_hat should at least be used to warn if an expectation will be poorly estimated, but a user may optionally want it returned. Some reworking is probably needed if you want to compute expectations with respect to multiple functions simultaneously. (IIRC, this is effectively what loo_pit does and needs)

@avehtari does this look right?

FWIW, I would support adding loo_expectation in one PR and then adding the diagnostics in a later PR.

avehtari · 2023-12-22T08:47:19Z

Corresponding R functions are in https://github.com/stan-dev/loo/blob/master/R/E_loo.R, see specifically .wmean, .wvar, .wsd, .wquant. loo package is storing log weights, but in these computations the normalized weights are computed and used in .w* functions.

For the diagnostics, loo package has now also minimum sample size and sample size dependent khat-threshold. The new diagnostic summary messages are in a PR https://github.com/n-kall/posterior/blob/9a59fb57fbd7947a06eab908b3b2209fc60a2146/R/pareto_smooth.R#L611, and will be merged before the next release.

aadya940 · 2023-12-22T11:19:31Z

@sethaxen Thanks for the quick reply 👍 . I'll keep these points in mind and try to submit a draft PR implementing the loo_expectation function, you can review it whenever you have time.

sethaxen mentioned this issue Aug 7, 2023

Additional LOO utilities arviz-devs/PosteriorStats.jl#8

Open

aadya940 mentioned this issue Dec 22, 2023

Add loo_expectation #2301

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional LOO functionality #2059

Additional LOO functionality #2059

sethaxen commented Jun 23, 2022 •

edited

Loading

OriolAbril commented Jun 27, 2022

sethaxen commented Jul 20, 2022

aadya940 commented Dec 21, 2023

sethaxen commented Dec 21, 2023

avehtari commented Dec 22, 2023

aadya940 commented Dec 22, 2023

Additional LOO functionality #2059

Additional LOO functionality #2059

Comments

sethaxen commented Jun 23, 2022 • edited Loading

OriolAbril commented Jun 27, 2022

sethaxen commented Jul 20, 2022

aadya940 commented Dec 21, 2023

sethaxen commented Dec 21, 2023

avehtari commented Dec 22, 2023

aadya940 commented Dec 22, 2023

sethaxen commented Jun 23, 2022 •

edited

Loading