# The Gaussian Mixture Class

In this notebook we introduce the `sparseklearn.GaussianMixture` class and demonstrate its basic functionality. `GaussianMixture` is designed to look and feel like scikit-learn's [GaussianMixture](https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html) class, but under the hood it uses `sparseklearn.Sparsifier`'s methods to perform computations on sparsified data. It is recommended that you go through the Sparsifier's notebook before this one.

`GaussianMixture` fits a mixture of Gaussians to data using a sparsified analog of the Expectation Maximization algorithm. 

In general, Gaussian mixture models (GMMs) are intractable in high dimensions because training requires iteratively inverting $P \times P$ matrices, where $P=$ `num_feat_full` is the dimension of the original latent space. To circumvent this issue we use only spherical or diagonal covariances. This also helps keep the number of parameters down to $\mathcal{O}(P)$ rather than $\mathcal{O}(P^2)$. 

In [1]:
import numpy as np
from sparseklearn import GaussianMixture

## Basic usage

In [3]:
rs = np.random.RandomState(78)
num_samp, num_feat_full = 300, 100
X = rs.rand(num_samp, num_feat_full)

In [14]:
gmm = GaussianMixture(num_samp = num_samp, num_feat_full = num_feat_full, num_feat_comp = 10, n_components = 3)
gmm.fit(X)
y_pred = gmm.predict(X=X)

## Example application: small cluster recovery