## Non-negative Matrix Factorization (NMF)

### Overview

Non-negative Matrix Factorization (NMF) is a group of algorithms in multivariate analysis and linear algebra where a matrix $ V $ is factorized into (usually) two matrices $ W $ and $ H $, with the property that all three matrices have no negative elements. This factorization is particularly useful in applications where the data is inherently non-negative, such as image and text processing.

### Mathematical Foundations

#### 1. **Matrix Decomposition**

Given a non-negative matrix $ V \in \mathbb{R}^{m \times n} $, NMF aims to find two non-negative matrices $ W \in \mathbb{R}^{m \times k} $ and $ H \in \mathbb{R}^{k \times n} $ such that:

$$ V \approx WH $$

where:
- $ V $ is the original matrix.
- $ W $ is the basis matrix.
- $ H $ is the coefficient matrix.
- $ k $ is the number of components (usually $ k \ll \min(m, n) $).

#### 2. **Objective Function**

The factorization is typically achieved by minimizing the reconstruction error between $ V $ and $ WH $. Common objective functions include:

- **Frobenius Norm**:

  $$
  \min_{W, H} \| V - WH \|_F^2 = \sum_{i,j} \left( V_{ij} - (WH)_{ij} \right)^2
  $$

- **Kullback-Leibler Divergence**:

  $$
  \min_{W, H} D_{KL}(V \| WH) = \sum_{i,j} \left( V_{ij} \log \frac{V_{ij}}{(WH)_{ij}} - V_{ij} + (WH)_{ij} \right)
  $$

#### 3. **Optimization Algorithms**

Common algorithms to solve NMF include:

- **Multiplicative Update Rules**:

  $$
  H \leftarrow H \circ \frac{W^T V}{W^T WH}
  $$

  $$
  W \leftarrow W \circ \frac{V H^T}{W H H^T}
  $$

  where $ \circ $ denotes element-wise multiplication and division.

- **Alternating Least Squares (ALS)**: Alternates between fixing $ W $ and solving for $ H $, and fixing $ H $ and solving for $ W $.

### Example

Consider a non-negative data matrix $ V $:

$$ V = \begin{bmatrix}
5 & 3 & 0 \\
3 & 7 & 4 \\
6 & 2 & 5
\end{bmatrix} $$

1. **Initialization**: Initialize $ W $ and $ H $ with non-negative values.
2. **Multiplicative Updates**: Apply the multiplicative update rules iteratively to update $ W $ and $ H $.
3. **Convergence**: Continue the updates until convergence, i.e., when the change in reconstruction error is below a threshold.

Suppose after convergence, we get:

$$ W = \begin{bmatrix}
0.7 & 0.2 \\
0.5 & 0.8 \\
0.6 & 0.3
\end{bmatrix}, \quad
H = \begin{bmatrix}
6 & 3 & 1 \\
2 & 7 & 5
\end{bmatrix}
$$

Then $ V \approx WH $:

$$ WH = \begin{bmatrix}
5.2 & 3.4 & 1.4 \\
4.6 & 7.6 & 4.4 \\
5.8 & 3.9 & 2.1
\end{bmatrix} $$

### When to Use NMF

- **Non-negative data**: When the data is non-negative, such as images, text frequencies, or audio signals.
- **Dimensionality reduction**: To reduce the dimensionality of non-negative data while maintaining interpretability.
- **Feature extraction**: To identify parts-based representations, such as topics in text or components in images.

### How to Use NMF

1. **Standardize the data**: Ensure the input matrix $ V $ is non-negative.
2. **Choose the number of components $ k $**: Select the desired number of components based on the application.
3. **Initialize $ W $ and $ H $**: Initialize the matrices $ W $ and $ H $ with non-negative values.
4. **Select an optimization algorithm**: Choose an algorithm (e.g., multiplicative updates, ALS).
5. **Iterate until convergence**: Apply the chosen algorithm iteratively until the reconstruction error converges.
6. **Interpret the factors**: Analyze the resulting matrices $ W $ and $ H $ to understand the underlying patterns.

### Advantages

- **Non-negativity constraint**: Ensures that the factorized matrices are interpretable, especially in applications like image and text processing.
- **Parts-based representation**: Provides a natural parts-based representation of data.
- **Scalability**: Efficient for large-scale data with sparse and non-negative values.

### Disadvantages

- **Local minima**: The optimization problem is non-convex and may converge to local minima.
- **Parameter sensitivity**: Results can be sensitive to the choice of the number of components $ k $ and initialization.
- **Non-uniqueness**: Different factorizations can provide similar reconstruction errors, leading to non-unique solutions.

### Assumptions

- **Non-negativity**: Assumes that the data and the factor matrices $ W $ and $ H $ are non-negative.
- **Linear combination**: Assumes that the observed data can be approximated by a linear combination of the factors.

### Conclusion

Non-negative Matrix Factorization (NMF) is a powerful technique for factorizing non-negative data into interpretable components. By enforcing non-negativity constraints, NMF provides meaningful parts-based representations, making it useful in various applications such as image processing, text mining, and bioinformatics. Despite challenges such as local minima and sensitivity to parameters, NMF's ability to uncover hidden structures in non-negative data makes it a valuable tool in data analysis and dimensionality reduction.