# Matrix Decomposition

A matrix decomposition is a way of reducing a matrix into its constituent parts. It is an approach that can simplify more complex matrix operations that can be performed on the decomposed matrix rather than on the original matrix itself.

## Dictionary Learning

Finds a dictionary (a set of atoms) that performs well at sparsely encoding the fitted data.

In [1]:
import numpy as np
from sklearn.datasets import make_sparse_coded_signal
from sklearn.decomposition import DictionaryLearning
X, dictionary, code = make_sparse_coded_signal(
    n_samples=100, n_components=15, n_features=20, n_nonzero_coefs=10,
    random_state=42, data_transposed=False
)
dict_learner = DictionaryLearning(
    n_components=15, transform_algorithm='lasso_lars', transform_alpha=0.1,
    random_state=42,
)
X_transformed = dict_learner.fit_transform(X)

### We can check the level of sparsity of X_transformed:

In [2]:
np.mean(X_transformed == 0)

0.41733333333333333

We can compare the average squared euclidean norm of the reconstruction error of the sparse coded signal relative to the squared euclidean norm of the original signal

In [3]:
X_hat = X_transformed @ dict_learner.components_
np.mean(np.sum((X_hat - X) ** 2, axis=1) / np.sum(X ** 2, axis=1))

0.07777084613290733

## Factor Analysis

FactorAnalysis performs a maximum likelihood estimate of the so-called loading matrix, the transformation of the latent variables to the observed ones, using SVD based approach.

In [4]:
from sklearn.datasets import load_digits
from sklearn.decomposition import FactorAnalysis
X, _ = load_digits(return_X_y=True)
transformer = FactorAnalysis(n_components=7, random_state=0)
X_transformed = transformer.fit_transform(X)
X_transformed.shape

(1797, 7)

## FastICA

A fast algorithm for Independent Component Analysis.

In [5]:
from sklearn.datasets import load_digits
from sklearn.decomposition import FastICA
X, _ = load_digits(return_X_y=True)
transformer = FastICA(n_components=7,
        random_state=0,
        whiten='unit-variance')
X_transformed = transformer.fit_transform(X)
X_transformed.shape



(1797, 7)

## Incremental PCA

Depending on the size of the input data, this algorithm can be much more memory efficient than a PCA, and allows sparse input.

In [6]:
from sklearn.datasets import load_digits
from sklearn.decomposition import IncrementalPCA
from scipy import sparse
X, _ = load_digits(return_X_y=True)
transformer = IncrementalPCA(n_components=7, batch_size=200)
# either partially fit on smaller batches of data
transformer.partial_fit(X[:100, :])

# or let the fit function itself divide the data into batches
X_sparse = sparse.csr_matrix(X)
X_transformed = transformer.fit_transform(X_sparse)
X_transformed.shape

(1797, 7)

In [8]:
import numpy as np
from sklearn.decomposition import IncrementalPCA
X = np.array([[-1, -1], [-2, -1], [-3, -2],
              [1, 1], [2, 1], [3, 2]])
ipca = IncrementalPCA(n_components=2, batch_size=3)
ipca.fit(X)
ipca.transform(X) 

array([[-1.38340578, -0.2935787 ],
       [-2.22189802,  0.25133484],
       [-3.6053038 , -0.04224385],
       [ 1.38340578,  0.2935787 ],
       [ 2.22189802, -0.25133484],
       [ 3.6053038 ,  0.04224385]])