# Matrix Factorization in Unsupervised Learning

## 1. Introduction
Matrix factorization (MF) is a family of **unsupervised learning techniques** where a given data matrix is decomposed into a product of lower-dimensional matrices.

- **Input**: A large matrix $$ X \in \mathbb{R}^{m \times n} $$ (e.g., users vs. items, documents vs. words, pixels vs. images).  
- **Goal**: Find matrices $$ W \in \mathbb{R}^{m \times k} $$ and $$ H \in \mathbb{R}^{k \times n} $$ with $$ k \ll \min(m,n) $$ such that:

$$ X \approx W H $$

- **Output**: Latent factors (low-dimensional representations), useful for **clustering, compression, recommendation, denoising, feature extraction**, etc.

It is *unsupervised* because we don’t have labels — the learning extracts hidden structures automatically.

---

## 2. Core Concepts

### 2.1 General Idea
- **Matrix factorization = Dimensionality Reduction**  
  Just like PCA, but often with additional constraints (non-negativity, sparsity, low-rank).  
- The latent dimension $$ k $$ represents "hidden features" that explain the original matrix.

### 2.2 Formal Setup
Given:

$$ X \in \mathbb{R}^{m \times n}, \quad X_{ij} = \text{observed data (e.g., rating of user } i \text{ on item } j\text{)} $$

We approximate:

$$ X \approx W H $$

where:
- $$ W \in \mathbb{R}^{m \times k} $$: **latent representation of rows** (e.g., users, documents).  
- $$ H \in \mathbb{R}^{k \times n} $$: **latent representation of columns** (e.g., items, words).  
- $$ k $$: rank/dimensionality of the latent space.

---

## 3. Types of Matrix Factorization in Unsupervised Learning

### (a) Singular Value Decomposition (SVD) / PCA
Factorization:

$$ X = U \Sigma V^T $$

- $$ \Sigma $$ gives singular values; $$ U, V $$ give orthogonal latent features.  
- **Applications**: Dimensionality reduction, noise removal, latent semantic analysis (LSA in NLP).

### (b) Non-negative Matrix Factorization (NMF)
- Constraint: $$ W, H \geq 0 $$ (all entries nonnegative).  
- Leads to **parts-based, interpretable features**.  
- **Applications**: Topic modeling, image decomposition, gene expression analysis.

### (c) Probabilistic Matrix Factorization (PMF)
- Treats factorization as a probabilistic model:

$$ X_{ij} \sim \mathcal{N}(W_i H_j^T, \sigma^2) $$

- Basis of modern **collaborative filtering** in recommender systems.

### (d) Sparse Matrix Factorization
- Adds sparsity constraints on $$ W $$ or $$ H $$.  
- Useful in high-dimensional settings (text, genomics).

### (e) Tensor Factorization (extension)
- Generalizes MF to multi-dimensional data (e.g., user × item × time).

---

## 4. Optimization Objective

General form:

$$ \min_{W,H} \; \| X - W H \|_F^2 + \lambda (\| W \| + \| H \|) $$

- First term: Reconstruction error (Frobenius norm).  
- Second term: Regularization (to prevent overfitting).  
- Constraints: Non-negativity, sparsity, orthogonality, etc., depending on method.

---

## 5. Applications

- **Recommendation Systems**: Predict missing entries (Netflix Prize, Amazon).  
- **Topic Modeling in NLP**: Documents-words matrix factorization → topics.  
- **Computer Vision**: Image compression, facial recognition.  
- **Bioinformatics**: Gene expression clustering, protein interactions.  
- **Finance**: Extract latent factors driving stock returns.

---

## 6. Conceptual Intuition

Matrix factorization can be seen as:
- **Compression**: Turn high-dimensional data into a few interpretable latent features.  
- **Discovery**: Extract hidden structures (topics, communities, parts).  
- **Prediction**: Fill in missing data by learning low-dimensional patterns.

In [None]:
import numpy as np
from sklearn.decomposition import NMF

# Create a simple document-term matrix
X = np.array([[1, 1, 0, 0],
              [3, 3, 0, 0],
              [0, 0, 4, 4],
              [0, 0, 5, 5]])

# Apply Non-negative Matrix Factorization
nmf = NMF(n_components=2, init='random', random_state=0)
W = nmf.fit_transform(X)
H = nmf.components_

print("Original Matrix:\n", X)
print("W (documents in topic space):\n", W)
print("H (topics in word space):\n", H)
print("Reconstructed:\n", np.round(W @ H))

Original Matrix:
 [[1 1 0 0]
 [3 3 0 0]
 [0 0 4 4]
 [0 0 5 5]]
W (documents in topic space):
 [[7.39195489e-16 5.61469919e-01]
 [0.00000000e+00 1.68440976e+00]
 [1.45406660e+00 0.00000000e+00]
 [1.81758325e+00 0.00000000e+00]]
H (topics in word space):
 [[0.         0.         2.75090564 2.75090564]
 [1.78103931 1.78103931 0.         0.        ]]
Reconstructed:
 [[1. 1. 0. 0.]
 [3. 3. 0. 0.]
 [0. 0. 4. 4.]
 [0. 0. 5. 5.]]
