-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Fixes numerical issue in GMM.condition #28
Conversation
AlexanderFabisch
commented
Apr 3, 2021
•
edited
edited
- fix nan priors
- add test case that fails without this pull request
- another problem: covariance may not always be positive semi-definite, take a look at Oracle Approximating Shrinkage: https://github.com/scikit-learn/scikit-learn/blob/95119c13a/sklearn/covariance/_shrunk_covariance.py#L434
- test case for oracle_approximating_shrinkage in GMM.from_samples
- apidoc for oracle_approximating_shrinkage
- document prior computation in GMM.condition
@mralbu would be great if you could take a look at this PR. |
Will take a look as soon as I can! |
I checked a similar code snippet as in #27 on the proposed fix, running beforehand the following RegressorMixin class definition: from sklearn.base import BaseEstimator, RegressorMixin, MultiOutputMixin
from sklearn.utils import check_X_y
from sklearn.utils.validation import check_is_fitted, check_array, FLOAT_DTYPES
from gmr import GMM
class GaussianMixtureRegressor(MultiOutputMixin, RegressorMixin, BaseEstimator):
def __init__(self, n_components, priors=None, means=None, covariances=None,
verbose=0, random_state=None, R_diff=1e-4, n_iter=500, init_params="random", oracle_approximating_shrinkage=False):
self.n_components = n_components
self.priors = priors
self.means = means
self.covariances = covariances
self.verbose = verbose
self.random_state = random_state
self.R_diff = R_diff
self.n_iter = n_iter
self.init_params = init_params
self.oracle_approximating_shrinkage = oracle_approximating_shrinkage
def fit(self, X, y):
self.gmm_ = GMM(self.n_components, priors=self.priors, means=self.means,
covariances=self.covariances, verbose=self.verbose, random_state=self.random_state)
X, y = check_X_y(X, y, estimator=self.gmm_, dtype=FLOAT_DTYPES, multi_output=True)
if y.ndim == 1:
y = np.expand_dims(y, 1)
self.indices_ = np.arange(X.shape[1])
self.gmm_.from_samples(np.hstack((X, y)),
R_diff=self.R_diff, n_iter=self.n_iter, init_params=self.init_params, oracle_approximating_shrinkage=self.oracle_approximating_shrinkage)
return self
def predict(self, X):
check_is_fitted(self, ["gmm_", "indices_"])
X = check_array(X, estimator=self.gmm_, dtype=FLOAT_DTYPES)
return self.gmm_.predict(self.indices_, X) Code snippet as in #27: import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import cross_validate
X, y = load_boston(return_X_y=True)
np.random.seed(42)
cross_validate(GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=False), X, y)
>> {'fit_time': array([0.02808738, 0.00212717, 0.00335884, 0.00367188, 0.00159121]),
>> 'score_time': array([0.06866217, 0.06366682, 0.06335163, 0.06228876, 0.06565619]),
>> 'test_score': array([-1.77181999e+03, 7.71024953e-01, 5.81598835e-01, 7.68927960e-02, -1.41784818e+02])}
np.random.seed(42)
cross_validate(GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=True), X, y)
>> {'fit_time': array([0.01825643, 0.0023396 , 0.03778887, 0.012429 , 0.00279951]),
>> 'score_time': array([0.06537795, 0.05936241, 0.05965519, 0.06038404, 0.0568521 ]),
>> 'test_score': array([0.32767343, 0.5339876 , 0.27381149, 0.40139363, 0.38142808])}
def is_pos_def(x):
return np.all(np.linalg.eigvals(x) >= 0)
np.random.seed(42)
gmr = GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=False)
gmr.fit(X, y)
print([is_pos_def(sigma) for sigma in gmr.gmm_.covariances])
>> [False, True]
np.random.seed(42)
gmr = GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=True)
gmr.fit(X, y)
print([is_pos_def(sigma) for sigma in gmr.gmm_.covariances])
>> [True, True] With the oracle_approximating_shrinkage algorithm, all the fitted covariances were positive semi-definite. I noticed as well that on the folds that converged in the example with oracle_approximating_shrinkage=False, the R2 metrics were different ([0.771024953, 0.581598835, 0.0768927960] for oracle_approximating_shrinkage=False and [0.5339876 , 0.27381149, 0.40139363] for oracle_approximating_shrinkage=True), but maybe this is expected given that the algorithm is stochastic, right? That happens as well in the following snippet. np.set_printoptions(precision=2)
np.random.seed(2)
scores = []
for _ in range(10):
gmr = GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=True)
gmr.fit(X, y)
scores.append(gmr.score(X, y))
print(np.array(scores))
>> [0.62 0.62 0.64 0.64 0.64 0.64 0.59 0.64 0.62 0.64]
np.random.seed(2)
scores = []
for _ in range(10):
gmr = GaussianMixtureRegressor(n_components=2, oracle_approximating_shrinkage=False)
gmr.fit(X, y)
scores.append(gmr.score(X, y))
print(np.array(scores))
>> [ 0.74 0.74 0.75 -3.98 0.85 -14.96 0.69 0.74 0.83 0.74] It seems that, when it converges, oracle_approximating_shrinkage=False, might lead to better R2 metrics. |
Thank you for your analysis.
Yes, of course the initialization of MVNs is stochastic, but even if it wasn't I would expect similar results. I think shrinkage of the covariance has an effect similar to regularization in linear regression (not the same, although that would be interesting to investigate...). |
Did some additional comparisons with an experiment using sklearn.mixture.GaussianMixture machinery for fitting the regressor. from sklego.mixture import GMMRegressor
np.set_printoptions(precision=4)
np.random.seed(2)
scores = []
for _ in range(10):
gmr = GMMRegressor(n_components=2)
gmr.fit(X, y)
scores.append(gmr.score(X, y))
print(np.array(scores))
>> [0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478 0.8478]
np.random.seed(2)
scores = []
for _ in range(10):
gmr = GMMRegressor(n_components=2, init_params='random', max_iter=100)
gmr.fit(X, y)
scores.append(gmr.score(X, y))
print(np.array(scores))
>> [0.8157 0.8061 0.8221 0.8152 0.8221 0.8192 0.8479 0.8282 0.8251 0.7792] Maybe using internal sklearn.mixture machinery might help ease numerical issues, though it would introduce sklearn as a hard dependency and might be out of scope. On the other hand, it would enable the introduction of other regressors such as BayesianGMMRegressor in an easy way, and would have familiar parameters (the same as in sklearn.mixture.GaussianMixture). Do you think exploring the use of sklearn.mixture inner workings would be interesting for gmr? |
Yes, definitely. I wouldn't like to introduce sklearn as a required dependency though. You can easily initialize a GMM object from the priors, means, and covariances of a sklearn class, which should make the implementation of BayesianGMMRegressor easy. There are actually some examples that use sklearn's Bayesian GMM. |
I opened a new issue here: #30 |