Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fixes numerical issue in GMM.condition #28

Merged
merged 8 commits into from
Apr 14, 2021

Conversation

AlexanderFabisch
Copy link
Owner

@AlexanderFabisch AlexanderFabisch commented Apr 3, 2021

@AlexanderFabisch AlexanderFabisch changed the title [WIP] Fixes numerical issue in GMM.condition [MRG] Fixes numerical issue in GMM.condition Apr 4, 2021
@AlexanderFabisch
Copy link
Owner Author

@mralbu would be great if you could take a look at this PR.

@mralbu
Copy link
Contributor

mralbu commented Apr 6, 2021

Will take a look as soon as I can!

@mralbu
Copy link
Contributor

mralbu commented Apr 14, 2021

I checked a similar code snippet as in #27 on the proposed fix, running beforehand the following RegressorMixin class definition:

from sklearn.base import BaseEstimator, RegressorMixin, MultiOutputMixin
from sklearn.utils import check_X_y
from sklearn.utils.validation import check_is_fitted, check_array, FLOAT_DTYPES

from gmr import GMM

class GaussianMixtureRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):

    def __init__(self, n_components, priors=None, means=None, covariances=None,
                 verbose=0, random_state=None, R_diff=1e-4, n_iter=500, init_params="random", oracle_approximating_shrinkage=False):
        self.n_components = n_components
        self.priors = priors
        self.means = means
        self.covariances = covariances
        self.verbose = verbose
        self.random_state = random_state
        self.R_diff = R_diff
        self.n_iter = n_iter
        self.init_params = init_params
        self.oracle_approximating_shrinkage = oracle_approximating_shrinkage

    def fit(self, X, y):
        self.gmm_ = GMM(self.n_components, priors=self.priors, means=self.means,
                        covariances=self.covariances, verbose=self.verbose, random_state=self.random_state)

        X, y = check_X_y(X, y, estimator=self.gmm_, dtype=FLOAT_DTYPES, multi_output=True)
        if y.ndim == 1:
            y = np.expand_dims(y, 1)

        self.indices_ = np.arange(X.shape[1])

        self.gmm_.from_samples(np.hstack((X, y)),
                               R_diff=self.R_diff, n_iter=self.n_iter, init_params=self.init_params, oracle_approximating_shrinkage=self.oracle_approximating_shrinkage)
        return self

    def predict(self, X):
        check_is_fitted(self, ["gmm_", "indices_"])
        X = check_array(X, estimator=self.gmm_, dtype=FLOAT_DTYPES)

        return self.gmm_.predict(self.indices_, X)

Code snippet as in #27:

import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import cross_validate

X, y = load_boston(return_X_y=True)

np.random.seed(42)

cross_validate(GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=False), X, y)
>> {'fit_time': array([0.02808738, 0.00212717, 0.00335884, 0.00367188, 0.00159121]),
>>  'score_time': array([0.06866217, 0.06366682, 0.06335163, 0.06228876, 0.06565619]),
>>  'test_score': array([-1.77181999e+03,  7.71024953e-01,  5.81598835e-01,  7.68927960e-02, -1.41784818e+02])}

np.random.seed(42)

cross_validate(GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=True), X, y)
>> {'fit_time': array([0.01825643, 0.0023396 , 0.03778887, 0.012429  , 0.00279951]),
>> 'score_time': array([0.06537795, 0.05936241, 0.05965519, 0.06038404, 0.0568521 ]),
>> 'test_score': array([0.32767343, 0.5339876 , 0.27381149, 0.40139363, 0.38142808])}

def is_pos_def(x):
    return np.all(np.linalg.eigvals(x) >= 0)

np.random.seed(42)

gmr = GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=False)
gmr.fit(X, y)

print([is_pos_def(sigma) for sigma in gmr.gmm_.covariances])
>> [False, True]

np.random.seed(42)

gmr = GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=True)
gmr.fit(X, y)

print([is_pos_def(sigma) for sigma in gmr.gmm_.covariances])
>> [True, True]

With the oracle_approximating_shrinkage algorithm, all the fitted covariances were positive semi-definite.
Cross-validation on the evaluated dataset did not show np.nan or negative numbers as well.

I noticed as well that on the folds that converged in the example with oracle_approximating_shrinkage=False, the R2 metrics were different ([0.771024953, 0.581598835, 0.0768927960] for oracle_approximating_shrinkage=False and [0.5339876 , 0.27381149, 0.40139363] for oracle_approximating_shrinkage=True), but maybe this is expected given that the algorithm is stochastic, right?

That happens as well in the following snippet.

np.set_printoptions(precision=2)

np.random.seed(2)

scores = []
for _ in range(10):
    gmr = GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=True)
    gmr.fit(X, y)
    scores.append(gmr.score(X, y))
print(np.array(scores))
>> [0.62 0.62 0.64 0.64 0.64 0.64 0.59 0.64 0.62 0.64]

np.random.seed(2)

scores = []
for _ in range(10):
    gmr = GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=False)
    gmr.fit(X, y)
    scores.append(gmr.score(X, y))
print(np.array(scores))
>> [  0.74   0.74   0.75  -3.98   0.85 -14.96   0.69   0.74   0.83   0.74]

It seems that, when it converges, oracle_approximating_shrinkage=False, might lead to better R2 metrics.

@AlexanderFabisch
Copy link
Owner Author

Thank you for your analysis.

I noticed as well that on the folds that converged in the example with oracle_approximating_shrinkage=False, the R2 metrics were different ([0.771024953, 0.581598835, 0.0768927960] for oracle_approximating_shrinkage=False and [0.5339876 , 0.27381149, 0.40139363] for oracle_approximating_shrinkage=True), but maybe this is expected given that the algorithm is stochastic, right?

Yes, of course the initialization of MVNs is stochastic, but even if it wasn't I would expect similar results. I think shrinkage of the covariance has an effect similar to regularization in linear regression (not the same, although that would be interesting to investigate...).

@mralbu
Copy link
Contributor

mralbu commented Apr 14, 2021

Did some additional comparisons with an experiment using sklearn.mixture.GaussianMixture machinery for fitting the regressor.

from sklego.mixture import GMMRegressor

np.set_printoptions(precision=4)

np.random.seed(2)

scores = []
for _ in range(10):
    gmr = GMMRegressor(n_components=2)
    gmr.fit(X, y)
    scores.append(gmr.score(X, y))
print(np.array(scores))
>> [0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478]

np.random.seed(2)

scores = []
for _ in range(10):
    gmr = GMMRegressor(n_components=2, init_params='random', max_iter=100)
    gmr.fit(X, y)
    scores.append(gmr.score(X, y))
print(np.array(scores))
>> [0.8157 0.8061 0.8221 0.8152 0.8221 0.8192 0.8479 0.8282 0.8251 0.7792]

Maybe using internal sklearn.mixture machinery might help ease numerical issues, though it would introduce sklearn as a hard dependency and might be out of scope. On the other hand, it would enable the introduction of other regressors such as BayesianGMMRegressor in an easy way, and would have familiar parameters (the same as in sklearn.mixture.GaussianMixture). Do you think exploring the use of sklearn.mixture inner workings would be interesting for gmr?

@AlexanderFabisch
Copy link
Owner Author

Do you think exploring the use of sklearn.mixture inner workings would be interesting for gmr?

Yes, definitely.

I wouldn't like to introduce sklearn as a required dependency though. You can easily initialize a GMM object from the priors, means, and covariances of a sklearn class, which should make the implementation of BayesianGMMRegressor easy. There are actually some examples that use sklearn's Bayesian GMM.

@AlexanderFabisch
Copy link
Owner Author

I opened a new issue here: #30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants