Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not allow_singular=True in multivariate_normal.logpdf? #187

Closed
mathause opened this issue Aug 10, 2022 · 3 comments · Fixed by #184
Closed

do not allow_singular=True in multivariate_normal.logpdf? #187

mathause opened this issue Aug 10, 2022 · 3 comments · Fixed by #184
Milestone

Comments

@mathause
Copy link
Member

mathause commented Aug 10, 2022

In the function to find the optimal localization radius we calculate the logpdf of a multivariate normal distribution and allow singular matrices:

llh_cv_each_sample = multivariate_normal.logpdf(
y_cv, mean=mean_0, cov=loc_ecov, allow_singular=True
)

(note that this code is currently being moved around in #184). However, this is problematic because it can distort the (negative) log-likelihood.

import numpy as np
from scipy.stats import multivariate_normal

np.random.seed(0)
data = np.random.rand(5, 3)
data_train = data[1::2]
data_cv = data[::2]

out = list()
for crosscov in np.arange(0, 1.01, 0.01):

    localizer = np.full((3, 3), fill_value=crosscov)
    localizer[np.diag_indices(3)] = 1


    cov = np.cov(data_train, rowvar=False)

    log_likelihood = multivariate_normal.logpdf(
        data_cv, cov=cov * localizer, allow_singular=True
    ).sum()
    out.append(-log_likelihood)

out = np.array(out)

singular

In the example the last one leads to a singular matrix - which leads to the smallest negative log likelihood, which would then be selected.

This problem is mitigated because, mesmer aborts the search early, i.e. as soon as the negative-log-likelihood starts to increase (hard to see but this would be the fourth element in out [np.argmin(out[:-1]) returns 3]).

Still I am a bit worried that we may end up selecting the wrong localization radius because of this & reluctant to leave something in that is obviously wrong.


Options

  1. Remove the localization radius if a singular matrix is detected for any crossvalidation fold.
  2. Only skip the localization radius if all crossvalidation folds produce singular matrices. Instead of summing all negative-log-likelihoods we would need to average them. However, I don't know if this is something that is expected to often happen. (this may also have the advantage to make them better comparable when using a different number of folds. The disadvantage is that the negative log-likelihood is usually summed and not averaged).

cc @jschwaab @leabeusch

@mathause mathause changed the title do not allow_singular=True in multivariate_normal.logpdf do not allow_singular=True in multivariate_normal.logpdf? Aug 10, 2022
@mathause
Copy link
Member Author

@jschwaab calculated the (positive) log-likelihood for CMIP6 models (i.e. his number are more "realsitic" than my example). We see some models (e.g. ACCESS) which show an irregularity:

loc_radii

@mathause
Copy link
Member Author

Thinking a bit more about this - (2) will not work. Each crossvalidation fold has a certain "order of magnitude" and we need them all for a fair comparison between the localization radii.

@jschwaab
Copy link
Collaborator

Sorry for not responding earlier @mathause . I think you are making a good point on option 2. Actually, the plots above have already been produced with that option (2). I guess to be somehow compatible it would be possible calculate the average in every iteration only for those folds that are available, but that would of course also mean losing a lot of folds and information therein making the log-likelihood curves less stable.
I should also mention that I did keep the option to allow singular matrices. Whenever a fold was not successful in calculating the log-likelihood this was (I think) because of the covariance matrix not being positive semi-definite.

@mathause mathause added this to the v0.9.0 milestone Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants