do not `allow_singular=True` in `multivariate_normal.logpdf`? #187

mathause · 2022-08-10T14:29:24Z

In the function to find the optimal localization radius we calculate the logpdf of a multivariate normal distribution and allow singular matrices:

mesmer/mesmer/calibrate_mesmer/train_lv.py

Lines 386 to 388 in f5f7be1

    
           llh_cv_each_sample = multivariate_normal.logpdf( 
        
               y_cv, mean=mean_0, cov=loc_ecov, allow_singular=True 
        
           )

(note that this code is currently being moved around in #184). However, this is problematic because it can distort the (negative) log-likelihood.

import numpy as np
from scipy.stats import multivariate_normal

np.random.seed(0)
data = np.random.rand(5, 3)
data_train = data[1::2]
data_cv = data[::2]

out = list()
for crosscov in np.arange(0, 1.01, 0.01):

    localizer = np.full((3, 3), fill_value=crosscov)
    localizer[np.diag_indices(3)] = 1


    cov = np.cov(data_train, rowvar=False)

    log_likelihood = multivariate_normal.logpdf(
        data_cv, cov=cov * localizer, allow_singular=True
    ).sum()
    out.append(-log_likelihood)

out = np.array(out)

In the example the last one leads to a singular matrix - which leads to the smallest negative log likelihood, which would then be selected.

This problem is mitigated because, mesmer aborts the search early, i.e. as soon as the negative-log-likelihood starts to increase (hard to see but this would be the fourth element in out [np.argmin(out[:-1]) returns 3]).

Still I am a bit worried that we may end up selecting the wrong localization radius because of this & reluctant to leave something in that is obviously wrong.

Options

Remove the localization radius if a singular matrix is detected for any crossvalidation fold.
Only skip the localization radius if all crossvalidation folds produce singular matrices. Instead of summing all negative-log-likelihoods we would need to average them. However, I don't know if this is something that is expected to often happen. (this may also have the advantage to make them better comparable when using a different number of folds. The disadvantage is that the negative log-likelihood is usually summed and not averaged).

cc @jschwaab @leabeusch

The text was updated successfully, but these errors were encountered:

mathause · 2022-08-10T14:34:54Z

@jschwaab calculated the (positive) log-likelihood for CMIP6 models (i.e. his number are more "realsitic" than my example). We see some models (e.g. ACCESS) which show an irregularity:

mathause · 2022-08-10T15:11:17Z

Thinking a bit more about this - (2) will not work. Each crossvalidation fold has a certain "order of magnitude" and we need them all for a fair comparison between the localization radii.

jschwaab · 2022-08-10T15:19:41Z

Sorry for not responding earlier @mathause . I think you are making a good point on option 2. Actually, the plots above have already been produced with that option (2). I guess to be somehow compatible it would be possible calculate the average in every iteration only for those folds that are available, but that would of course also mean losing a lot of folds and information therein making the log-likelihood curves less stable.
I should also mention that I did keep the option to allow singular matrices. Whenever a fold was not successful in calculating the log-likelihood this was (I think) because of the covariance matrix not being positive semi-definite.

mathause changed the title ~~do not allow_singular=True in multivariate_normal.logpdf~~ do not allow_singular=True in multivariate_normal.logpdf? Aug 10, 2022

mathause mentioned this issue Aug 10, 2022

refactor localized covariance #184

Merged

5 tasks

mathause closed this as completed in #184 Aug 12, 2022

mathause added this to the v0.9.0 milestone Dec 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do not `allow_singular=True` in `multivariate_normal.logpdf`? #187

do not `allow_singular=True` in `multivariate_normal.logpdf`? #187

mathause commented Aug 10, 2022 •

edited

mathause commented Aug 10, 2022

mathause commented Aug 10, 2022

jschwaab commented Aug 10, 2022

do not allow_singular=True in multivariate_normal.logpdf? #187

do not allow_singular=True in multivariate_normal.logpdf? #187

Comments

mathause commented Aug 10, 2022 • edited

mathause commented Aug 10, 2022

mathause commented Aug 10, 2022

jschwaab commented Aug 10, 2022

do not `allow_singular=True` in `multivariate_normal.logpdf`? #187

do not `allow_singular=True` in `multivariate_normal.logpdf`? #187

mathause commented Aug 10, 2022 •

edited