# Matrix Factorization

Matrix factorization is a mathematical technique used to decompose a matrix into two or more matrices such that when multiplied together, they approximate the original matrix. This technique is widely used in various fields like collaborative filtering in recommendation systems, natural language processing, and dimensionality reduction.

## Why Matrix Factorization Was Invented

Matrix factorization was invented to solve several key problems:

1. **Dimensionality Reduction**: In high-dimensional datasets, matrix factorization helps reduce the number of dimensions while retaining essential information. This is crucial in tasks like image processing and natural language processing.

2. **Collaborative Filtering**: In recommendation systems, matrix factorization is used to predict user preferences for items (e.g., movies, products) by uncovering latent factors representing both users and items.

3. **Noise Reduction**: By approximating the original matrix, matrix factorization helps in removing noise and capturing underlying patterns in the data.

4. **Data Compression**: It allows for the compression of large datasets by representing them with smaller matrices.

## Problems Solved by Matrix Factorization

1. **Sparse Data**: Many real-world datasets, such as user-item ratings in recommendation systems, are sparse (contain many missing values). Matrix factorization helps in predicting the missing values.

2. **Pattern Recognition**: It helps in recognizing patterns and structures in the data that are not immediately obvious.

3. **Improved Recommendations**: By capturing latent factors, matrix factorization improves the accuracy of recommendations.

## Mathematical Explanation

Matrix factorization involves decomposing a matrix $ A $ into two matrices $ W $ and $ H $ such that $ A \approx WH $. Here is a detailed explanation:

Given a matrix $ A $ of size $ m \times n $, matrix factorization aims to find two matrices $ W $ (of size $ m \times k $) and $ H $ (of size $ k \times n $) such that:

$$ A \approx WH $$

### Objective Function

One common approach is to minimize the Frobenius norm of the difference between $ A $ and $ WH $:

$$ \min_{W, H} \| A - WH \|_F^2 $$

Where $ \| \cdot \|_F $ denotes the Frobenius norm, which is defined as:

$$ \| A - WH \|_F^2 = \sum_{i=1}^m \sum_{j=1}^n (A_{ij} - (WH)_{ij})^2 $$

### Optimization

To find $ W $ and $ H $, we typically use optimization techniques like gradient descent. The update rules for gradient descent are:

$$ W \leftarrow W - \eta \frac{\partial}{\partial W} \| A - WH \|_F^2 $$
$$ H \leftarrow H - \eta \frac{\partial}{\partial H} \| A - WH \|_F^2 $$

Where $ \eta $ is the learning rate.

## Toy Example

Suppose we have the following matrix $ A $:

$$ A = \begin{bmatrix} 5 & 3 \\ 3 & 2 \\ 2 & 1 \\ 1 & 1 \end{bmatrix} $$

We want to factorize this into two matrices $ W $ and $ H $ such that $ A \approx WH $.

### Code in PyTorch

Here's a simple implementation of matrix factorization in PyTorch:


In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

# Toy example matrix A
A = torch.tensor([[5, 3],
                  [3, 2],
                  [2, 1],
                  [1, 1]], dtype=torch.float32)

# Dimensions
m, n = A.size()
k = 2  # Number of latent factors

# Initialize W and H with random values
W = torch.rand(m, k, requires_grad=True)
H = torch.rand(k, n, requires_grad=True)

# Define the learning rate and number of iterations
learning_rate = 0.01
num_iterations = 5000

# Define the optimizer
optimizer = optim.Adam([W, H], lr=learning_rate)

# Training loop
for iteration in range(num_iterations):
    optimizer.zero_grad()
    A_pred = torch.matmul(W, H)
    loss = torch.nn.functional.mse_loss(A_pred, A)
    loss.backward()
    optimizer.step()
    
    if (iteration+1) % 500 == 0:
        print(f"Iteration {iteration+1}, loss: {loss.item()}")

print("Original Matrix A:")
print(A)

print("\nFactorized Matrices W and H:")
print(W)
print(H)

print("\nReconstructed Matrix A from W and H:")
print(torch.matmul(W, H))

  from .autonotebook import tqdm as notebook_tqdm


Iteration 500, loss: 6.450703949667513e-05
Iteration 1000, loss: 8.83535022921933e-09
Iteration 1500, loss: 2.0268231537556858e-12
Iteration 2000, loss: 9.698908343125368e-13
Iteration 2500, loss: 5.773159728050814e-13
Iteration 3000, loss: 1.7275070263167436e-13
Iteration 3500, loss: 1.567634910770721e-13
Iteration 4000, loss: 3.019806626980426e-14
Iteration 4500, loss: 2.842170943040401e-14
Iteration 5000, loss: 1.7763568394002505e-15
Original Matrix A:
tensor([[5., 3.],
        [3., 2.],
        [2., 1.],
        [1., 1.]])

Factorized Matrices W and H:
tensor([[ 2.0369,  2.0575],
        [ 1.4867,  0.9807],
        [ 0.5502,  1.0768],
        [ 0.9365, -0.0961]], requires_grad=True)
tensor([[1.1957, 1.1052],
        [1.2464, 0.3640]], requires_grad=True)

Reconstructed Matrix A from W and H:
tensor([[5.0000, 3.0000],
        [3.0000, 2.0000],
        [2.0000, 1.0000],
        [1.0000, 1.0000]], grad_fn=<MmBackward0>)


### Explanation of the Code

1. **Initialization**: We initialize the matrices $ W $ and $ H $ with random values.
2. **Optimization**: We use the Adam optimizer to update the values of $ W $ and $ H $.
3. **Training Loop**: In each iteration, we compute the predicted matrix $ A_{pred} $ by multiplying $ W $ and $ H $. We then compute the mean squared error loss between $ A $ and $ A_{pred} $ and perform backpropagation to update $ W $ and $ H $.

This implementation iteratively minimizes the reconstruction error, resulting in matrices $ W $ and $ H $ that approximate the original matrix $ A $.