# Non-negative Matrix Factorization

Non-negative matrix factorization (NMF) is a simple yet effective method to decompose a matrix into a product of two non-negative matrices(that is sparse matrices with all non-negative entries). This technique is most commonly used in recommender systems, and was made well known by the Netflix Prize. NMF aims to factor a data matrix $X$ into a product of two matrices:

$$X \approx AS $$

where $X$ is a $n \times m$ matrix, $A$ is a $n \times k$ matrix, and $S$ is a $k \times m$ matrix. $k$ is usually provided by the user, and symbolizes the number of distinct "factors" in the data. For example, if our data was the total productivity of a group of factories per hour for the past week, the number of factors $k$ would be the number of factories. Without prior knowledge the number of factors would be harder to pinpoint, and would have to be chosen using cross validation or something similar

It's important to note that this problem does not have a unique solution, and we could end up with many different combinations of $A$ and $S$ that multiply to get a decent approximation of $X$. Even more, each pair of $A$ and $S$ can be scaled by any real number $\alpha$ and $\frac{1}{\alpha}$ respectively to yield an infinite number of pairs. 

## Alternating Least Squares

The natural question to ask now is how to determine $A$ and $S$ when given $X$ and $k$. One relatively simple method is to use alternating least squares, which is a generalization of the least squares method for simple linear regression. In simple linear regression, the goal is to solve the following equation for $x$:

$$ Ax = b \implies A^TAx = A^Tb \implies x = (A)^{\dagger}b\$$

This can be generalized for a product of matrices by picking a random $i^{th}$ column of $X$ and $S$, which we will denote $x_{:,i}$ and $s_{:,i}$, fixing $A$, and solving for $s_{:,i}$. Then by our previous equation $X \approx AS$ we have 

$$ x_{:,i} \approx As_{:,i}$$
This yields the following update rule:

$$ s_{:,i} := (A)^{\dagger}x_{:,i}$$

However, since we also want to solve for $A$ we need to sample a column of $A$ and fix $S$. To get the same linear form we do the following:

$$x_{i,:} \approx a_{i,:}S \implies x_{i,:}^T \approx S^Ta_{i,:}^T$$

We switch to updating the rows of $A$ rather than the columns due to dimensionality, and get the following update rule:

$$ a_{i,:}^T = (SS^T)^{-1}Sx_{i,:}^T $$

We then repeat these updates until convergence or the number of iterations is fulfilled. 

In [2]:
import numpy as np
import matplotlib.pyplot as plt

In [59]:
np.random.seed(1)
col1 = np.array([[0, 0, 9, 5, 3, 2, 1, 0, 0, 0, 0, 0]])
col2 = np.array([[0, 0, 0, 0, 0, 3, 2, 1, 1, 0, 0, 0]])
col3 = np.array([[0, 5, 5, 6, 6, 7, 4, 2, 1, 0.5, 0, 0]])

factors = np.vstack((col1, col2, col3)).T
weights = np.random.randint(0, 2, size=(3, 10))

X = np.matmul(factors, weights)
print(X)

[[ 0.   0.   0.   0.   0.   0.   0.   0.   0.   0. ]
 [ 0.   5.   0.   0.   5.   0.   0.   0.   5.   0. ]
 [ 9.  14.   0.   0.  14.   9.   9.   9.  14.   0. ]
 [ 5.  11.   0.   0.  11.   5.   5.   5.  11.   0. ]
 [ 3.   9.   0.   0.   9.   3.   3.   3.   9.   0. ]
 [ 2.  12.   0.   3.  12.   2.   2.   5.   9.   0. ]
 [ 1.   7.   0.   2.   7.   1.   1.   3.   5.   0. ]
 [ 0.   3.   0.   1.   3.   0.   0.   1.   2.   0. ]
 [ 0.   2.   0.   1.   2.   0.   0.   1.   1.   0. ]
 [ 0.   0.5  0.   0.   0.5  0.   0.   0.   0.5  0. ]
 [ 0.   0.   0.   0.   0.   0.   0.   0.   0.   0. ]
 [ 0.   0.   0.   0.   0.   0.   0.   0.   0.   0. ]]


In [65]:
def nmfals(data, k, niter, reinit = 5):
    # set to negative one so we can guarantee an update for the first init
    finalerror = -1
    
    # need to compare final error to overall best and store the overall best
    seqerror = np.empty(niter)
    lowesterror = np.empty(1)
    
    # store overall best factor matrices
    lbest = np.random.rand(data.shape[0], k)
    rbest = np.random.rand(k, data.shape[1])
    
    for j in np.arange(reinit):
        # randomly initialize the factor matrices
        lfactor = np.random.rand(data.shape[0], k)
        rfactor = np.random.rand(k, data.shape[1])

        for i in np.arange(niter):
            rowcol = np.random.randint(k)
            rfactor[:, rowcol] = np.matmul(np.linalg.pinv(lfactor), data[:, rowcol])
            lfactor[rowcol, :] = np.matmul(data[rowcol, :], np.matmul(rfactor.T, np.linalg.inv(np.matmul(rfactor, rfactor.T))))
            seqerror[i] = np.linalg.norm(data - np.matmul(lfactor, rfactor)) / np.linalg.norm(data)
        # first init
        if (finalerror == -1):
            lowesterror = seqerror
            lbest = lfactor
            rbest = rfactor
        # if not first, only update if final error is lower than overall best
        elif (finalerror > seqerror[niter - 1]):
            finalerror = seqerror[niter - 1]
            lowesterror = seqerror
            lbest = lfactor
            rbest = rfactor
    return(lbest, rbest, lowesterror)

In [71]:
np.random.seed(1)
A, S, error = nmfals(X, 3, 100, 10)
print(np.round(np.matmul(A,S), 2))
print(error)

[[ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.  ]
 [-0.27  4.96  0.    1.33  0.66  1.47  2.04  1.07  2.38 -0.05]
 [ 8.92 13.85  0.    6.67  5.13  9.12 11.73  8.94 11.47  2.53]
 [ 2.45  6.63  0.    0.56  0.45  0.9   1.45  0.76  1.08  1.16]
 [ 2.45  8.72  0.    0.43  0.28  0.69  1.31  0.44  0.9   1.33]
 [ 0.72  3.16  0.    0.45  0.28  0.58  0.87  0.46  0.82  0.36]
 [ 0.43  2.91  0.    0.23  0.1   0.27  0.49  0.14  0.45  0.29]
 [ 1.52  6.55  0.    0.77  0.47  1.02  1.59  0.77  1.44  0.79]
 [ 2.09  4.58  0.    0.41  0.38  0.7   1.1   0.65  0.77  0.93]
 [ 0.6   2.38  0.    0.5   0.32  0.64  0.9   0.54  0.89  0.26]
 [ 0.56  2.65  0.    0.33  0.19  0.42  0.66  0.32  0.61  0.3 ]
 [ 1.44  5.17  0.    0.55  0.36  0.76  1.21  0.6   1.03  0.72]]
[0.92047267 0.77038192 0.71185494 0.70897452 0.70752874 0.70630165
 0.70583554 0.70575029 0.70454301 0.7035414  0.7035414  0.70279874
 0.70271886 0.70271856 0.70269855 0.70269855 0.70269855 0.70269855
 0.70203482 0.70199103 0.70199103 0.701991

In [50]:
np.random.seed(1)
k = 3
niter = 10000
#A = np.random.rand(12, 3)
#S = np.random.rand(3, 10)
A = factors + 0.01*np.random.rand(12, 3)
S = weights + 0.01*np.random.rand(3, 10)

#print(S)

for i in np.arange(niter):
    rowcol = np.random.randint(k)
    S[:, rowcol] = np.matmul(np.linalg.pinv(A), X[:, rowcol])
    A[rowcol, :] = np.matmul(X[rowcol, :], np.matmul(S.T, np.linalg.inv(np.matmul(S, S.T))))

approx = np.matmul(A, S)
#print(X)
print(np.round(approx, 2))
print("Relative error: ", np.linalg.norm(X - approx) / np.linalg.norm(X))

[[ 0.000e+00  0.000e+00  0.000e+00  0.000e+00  0.000e+00  0.000e+00
   0.000e+00  0.000e+00  0.000e+00  0.000e+00]
 [-2.000e-02  4.980e+00  0.000e+00  2.000e-02  5.010e+00  1.000e-02
   2.000e-02 -1.000e-02  5.020e+00  3.000e-02]
 [ 8.990e+00  1.398e+01  0.000e+00  3.000e-02  1.402e+01  9.040e+00
   9.010e+00  8.970e+00  1.400e+01  7.000e-02]
 [ 5.020e+00  1.103e+01  0.000e+00  8.000e-02  1.107e+01  5.070e+00
   5.060e+00  5.070e+00  1.102e+01  5.000e-02]
 [ 3.010e+00  9.010e+00  0.000e+00  7.000e-02  9.050e+00  3.050e+00
   3.050e+00  3.060e+00  9.010e+00  5.000e-02]
 [ 2.000e+00  1.199e+01  0.000e+00  3.070e+00  1.204e+01  2.070e+00
   2.070e+00  5.060e+00  9.030e+00  5.000e-02]
 [ 1.000e+00  7.000e+00  0.000e+00  2.040e+00  7.030e+00  1.040e+00
   1.040e+00  3.030e+00  5.020e+00  3.000e-02]
 [ 0.000e+00  3.010e+00  0.000e+00  1.020e+00  3.020e+00  2.000e-02
   3.000e-02  1.020e+00  2.020e+00  1.000e-02]
 [ 0.000e+00  2.010e+00  0.000e+00  1.020e+00  2.020e+00  2.000e-02
   2.000e-02

While the factorized $A$ and $S$ don't form a matrix that matches $X$ closely, it did preserve the row and column of zeros that were present in $X$. This is to be expected, as there are many different factorizations of X. One thing to note is we use the relative error to judge the quality of our approximation, which is the Frobenius norm of the difference between our original data matrix $X$ and the approximation $AS$ divided by the Frobenius norm of $X$.

## Randomized Kaczmarz Method

We saw previously that the algorithm to factorize $X$ has two main parts, one to pick our matrix column and row indices and one to take an iterative step towards the local optimum. The iterative step gives us the freedom to choose our favorite method to solve for $A$ and $S$, and rather than doing a traditional least squares method we can try applying the Randomized Kaczmarz(RK) method instead. This iterative step takes our randomly chosen row/column of either $A$ or $S$ and projects it towards the local optimum(project it towards the corresponding row/column of $X$?). 

Our current system $AS = X$ is reduced to $As_{:,i} = x_{:,i}$ when a column is sampled. Our RK iterative step then samples a row of $A$ and corresponding entry $k$ of $s_{:,i}$ and would then be the following:

$$x_{k,i}^{(j+1)} = x_{k,i}^{(j)} + \frac{x_{k,i} - a_{k,:}^Ts_{k,i}}{\lvert\lvert{a_{k,:}}\rvert\rvert ^2}a_{k,:}$$

Note that $j$ represents the current iteration of our RK method. This value can be explicitly chosen, making it an additional parameter in this algorithm. If we sampled a row of $S$ rather than a column of $A$, each step would instead be the following:

$$ $$

To summarize, we start by randomly sampling a row/column to reduce to a linear system(as usual). We then proceed to take RK steps, with each step sampling a random row and entry and updating. The number of steps before resampling our linear system can be provided as a parameter. 

In [7]:
np.random.seed(1)
k = 3
niter = 10000
A = np.random.rand(12, 3)
S = np.random.rand(3, 10)
#print(S)

kacziters = 1
for i in np.arange(niter):
    rowcol = np.random.randint(k)
    for i in arange(kacziters):
        kaczind = np.random.randint(len(X[, rowcol]))
        S[:, rowcol] = S[:, rowcol] + (np.linalg.norm(A[:, kaczind])**2) * (X[kaczind, rowcol] - np.matmul(A[kaczind, :], S[:, rowcol])) * A[kaczind, :]
        A[rowcol, :] = 

approx = np.matmul(A, S)
print(X)
print(np.round(approx, 2))
print(np.linalg.norm(X - approx) / np.linalg.norm(X))

array([[0. , 0. , 0. ],
       [0. , 0. , 5. ],
       [9. , 0. , 5. ],
       [5. , 0. , 6. ],
       [3. , 0. , 6. ],
       [2. , 3. , 7. ],
       [1. , 2. , 4. ],
       [0. , 1. , 2. ],
       [0. , 1. , 1. ],
       [0. , 0. , 0.5],
       [0. , 0. , 0. ],
       [0. , 0. , 0. ]])