# Non-negative Matrix Factorization

Non-negative matrix factorization (NMF) is a simple yet effective method to decompose a matrix into a product of two non-negative matrices(that is sparse matrices with all non-negative entries). This technique is most commonly used in recommender systems, and was made well known by the Netflix Prize. NMF aims to factor a data matrix $X$ into a product of two matrices:

$$X \approx AS $$

where $X$ is a $n \times m$ matrix, $A$ is a $n \times k$ matrix, and $S$ is a $k \times m$ matrix. $k$ is usually provided by the user, and symbolizes the number of distinct "factors" in the data. For example, if our data was the total productivity of a group of factories per hour for the past week, the number of factors $k$ would be the number of factories. Without prior knowledge the number of factors would be harder to pinpoint, and would have to be chosen using cross validation or something similar

It's important to note that this problem does not have a unique solution, and we could end up with many different combinations of $A$ and $S$ that multiply to get a decent approximation of $X$. Even more, each pair of $A$ and $S$ can be scaled by any real number $\alpha$ and $\frac{1}{\alpha}$ respectively to yield an infinite number of pairs. 

## Alternating Least Squares

The natural question to ask now is how to determine $A$ and $S$ when given $X$ and $k$. One relatively simple method is to use alternating least squares, which is a generalization of the least squares method for simple linear regression. In simple linear regression, the goal is to solve the following equation for $x$:

$$ Ax = b \implies A^TAx = A^Tb \implies x = (A^TA)^{\dagger}A^Tb\$$

This can be generalized for a product of matrices by picking a random $i^{th}$ column of $X$ and $S$, which we will denote $x_{:,i}$ and $s_{:,i}$, fixing $A$, and solving for $s_{:,i}$. Then by our previous equation $X \approx AS$ we have 

$$ x_{:,i} \approx As_{:,i}$$
This yields the following update rule:

$$ s_{:,i} := (A^TA)^{\dagger}A^Tx_{:,i}$$

However, since we also want to solve for $A$ we need to sample a column of $A$ and fix $S$. To get the same linear form we do the following:

$$x_{i,:} \approx a_{i,:}S \implies x_{i,:}^T \approx S^Ta_{i,:}^T$$

We switch to updating the rows of $A$ rather than the columns due to dimensionality, and get the following update rule:

$$ a_{i,:}^T = (SS^T)^{\dagger}Sx_{i,:}^T $$

We then repeat these updates until convergence or the number of iterations is fulfilled. 

In [2]:
import numpy as np
import matplotlib.pyplot as plt

In [3]:
np.random.seed(1)
col1 = np.array([[0, 0, 9, 5, 3, 2, 1, 0, 0, 0, 0, 0]])
col2 = np.array([[0, 0, 0, 0, 0, 3, 2, 1, 1, 0, 0, 0]])
col3 = np.array([[0, 5, 5, 6, 6, 7, 4, 2, 1, 0.5, 0, 0]])

factors = np.vstack((col1, col2, col3)).T
weights = np.random.randint(0, 2, size=(3, 10))

X = np.matmul(factors, weights)
print(X)

[[ 0.   0.   0.   0.   0.   0.   0.   0.   0.   0. ]
 [ 0.   5.   0.   0.   5.   0.   0.   0.   5.   0. ]
 [ 9.  14.   0.   0.  14.   9.   9.   9.  14.   0. ]
 [ 5.  11.   0.   0.  11.   5.   5.   5.  11.   0. ]
 [ 3.   9.   0.   0.   9.   3.   3.   3.   9.   0. ]
 [ 2.  12.   0.   3.  12.   2.   2.   5.   9.   0. ]
 [ 1.   7.   0.   2.   7.   1.   1.   3.   5.   0. ]
 [ 0.   3.   0.   1.   3.   0.   0.   1.   2.   0. ]
 [ 0.   2.   0.   1.   2.   0.   0.   1.   1.   0. ]
 [ 0.   0.5  0.   0.   0.5  0.   0.   0.   0.5  0. ]
 [ 0.   0.   0.   0.   0.   0.   0.   0.   0.   0. ]
 [ 0.   0.   0.   0.   0.   0.   0.   0.   0.   0. ]]


In [15]:
np.random.seed(1)
k = 3
niter = 10000
A = np.random.rand(12, 3)
S = np.random.rand(3, 10)
#print(S)

for i in np.arange(niter):
    rowcol = np.random.randint(k)
    S[:, rowcol] = np.matmul(np.linalg.pinv(A), X[:, rowcol])
    A[rowcol, :] = np.matmul(X[rowcol, :], np.matmul(S.T, np.linalg.inv(np.matmul(S, S.T))))

approx = np.matmul(A, S)
#print(X)
print(np.round(approx, 2))
print("Relative error: ", np.linalg.norm(X - approx) / np.linalg.norm(X))

[[ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.  ]
 [ 0.8   4.58  0.    0.2   1.22  3.16 -0.06  1.41  2.18 -0.22]
 [ 8.84 14.33  0.    3.94  9.75 14.26  0.28  8.68  7.82  1.03]
 [ 1.3   5.06  0.    0.94  0.61  0.97  0.72  0.82  0.3   0.63]
 [ 0.42  2.84  0.    0.29  0.22  0.76  0.26  0.41  0.45  0.15]
 [ 1.36  4.61  0.    0.95  0.73  1.02  0.66  0.87  0.3   0.62]
 [ 0.85  4.51  0.    0.69  0.22  0.57  0.64  0.5   0.15  0.5 ]
 [ 1.83  5.3   0.    1.25  1.03  1.22  0.82  1.13  0.29  0.82]
 [ 1.32  3.96  0.    0.83  0.89  1.3   0.49  0.96  0.53  0.49]
 [ 0.78  4.67  0.    0.67  0.13  0.51  0.66  0.44  0.13  0.5 ]
 [ 0.96  5.83  0.    0.8   0.2   0.76  0.78  0.59  0.27  0.58]
 [ 1.04  4.07  0.    0.71  0.57  1.    0.52  0.74  0.41  0.44]]
Relative error:  0.7262041351699297


While the factorized $A$ and $S$ don't form a matrix that matches $X$ closely, it did preserve the row and column of zeros that were present in $X$. This is to be expected, as there are many different factorizations of X. One thing to note is we use the relative error to judge the quality of our approximation, which is the Frobenius norm of the difference between our original data matrix $X$ and the approximation $AS$ divided by the Frobenius norm of $X$.

In [7]:
np.random.seed(1)
k = 3
niter = 10000
A = np.random.rand(12, 3)
S = np.random.rand(3, 10)
#print(S)

kacziters = 1
for i in np.arange(niter):
    rowcol = np.random.randint(k)
    for i in arange(kacziters):
        kaczind = np.random.randint(len(X[, rowcol]))
        S[:, rowcol] = S[:, rowcol] + (np.linalg.norm(A[:, kaczind])**2) * (X[kaczind, rowcol] - np.matmul(A[kaczind, :], S[:, rowcol])) * A[kaczind, :]
        A[rowcol, :] = 

approx = np.matmul(A, S)
print(X)
print(np.round(approx, 2))
print(np.linalg.norm(X - approx) / np.linalg.norm(X))

array([[0. , 0. , 0. ],
       [0. , 0. , 5. ],
       [9. , 0. , 5. ],
       [5. , 0. , 6. ],
       [3. , 0. , 6. ],
       [2. , 3. , 7. ],
       [1. , 2. , 4. ],
       [0. , 1. , 2. ],
       [0. , 1. , 1. ],
       [0. , 0. , 0.5],
       [0. , 0. , 0. ],
       [0. , 0. , 0. ]])