### Task: create recommendations for netflix watchers using SVD

In [1]:
import pandas as pd
import numpy as np
from tqdm import tqdm
from google.colab import drive
from scipy.sparse import csr_matrix
import sys

In [2]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
netflix = pd.read_csv('drive/MyDrive/netflix0.2_train.txt', header = None)
netflix.columns = ['WatcherID', 'MovieID', 'Rating']
netflix.head()

Unnamed: 0,WatcherID,MovieID,Rating
0,1,1086807,3
1,1,525356,2
2,1,1196100,4
3,1,1792741,2
4,1,769643,1


In [4]:
print(netflix.memory_usage(index=True).sum()/1024**2) # around 2 MB

2.2889442443847656


In [5]:
netflix.shape[0] #100,000 ratings

100000

In [6]:
len(netflix['WatcherID'].unique()) #257 watchers

257

In [7]:
len(netflix['MovieID'].unique()) #76332 movies

76332

$257*76332 = 19617324$. This means that only $100000$ out of $19617324$ entries are non zero - that's about $0.5\%$. We can use sparse representation to safe space.

In [8]:
R = netflix.pivot(index='WatcherID', columns='MovieID', values='Rating')
R.fillna(0, inplace = True)
#col = R.columns
#idx = R.index
R.columns = [i for i in range(R.shape[1])]
R.index = [i for i in range(R.shape[0])]
R_np = np.array(R)

In [9]:
print(R.memory_usage(index=True).sum()/1024**2) #around 150MB used in dense representation

149.6702651977539


In [10]:
sp = csr_matrix(R)

In [11]:
sparse_mem = sys.getsizeof(sp.data) + sys.getsizeof(sp.indices) + sys.getsizeof(sp.indptr)
print(sparse_mem/1024) # around 1kB - huge difference, managable for massive datasets

1.3359375


We can't directly use SVD, because we have matrix with missing data. Instead we will try to approximate it by starting with two random matrices and then use backpropagation with mean squared error as loss function. Given matrix $R$ of size $N$x$M$, the SVD approximation process will look like this

---
Step 1: Start with two random matrices - $P$ of size $N$x$K$ and $Q$ of size $K$x$M$, where $K$ is the number of singular values you want

Step 2: Compute reconstruction loss $L$
\begin{equation}
L = \sum_{i=1}^N \sum_{j=1}^M (R_{ij} - (P \cdot Q)_{ij})^2.
\end{equation}
If it's small enough, we can stop early. Otherwise we update the values with this rule
\begin{equation}
P_{im} = P_{im} - \alpha(J_{ij}*Q_{mj} + \beta*P_{im})
\end{equation}
\begin{equation}
Q_{mj} = Q_{mj} - \alpha(J_{ij}*P_{im} + \beta*Q_{mj}),
\end{equation}
where $J_{ij} = \frac{\partial L}{\partial (P \cdot Q)_{ij}} = -2*(R_{ij} - (P \cdot Q)_{ij})$, $\alpha$ is learning rate and $\beta$ is $L_2$ regularization hyperparameter.

Step 3: Repeat step 2 for given number of epochs.

In [12]:
N = len(R_np)
M = len(R_np[0])
K = 5

P1 = np.random.rand(N, K)
Q1 = np.random.rand(K, M)
P2 = np.random.rand(N, K)
Q2 = np.random.rand(K, M)

In [13]:
def error(R, P, Q):
    err = 0
    rows, cols = R.nonzero()
    for i, j in zip(rows, cols):
        err += (R[i, j] - np.dot(P[i, :], Q[:, j])) ** 2
    return err

def matrix_factorization(R, P, Q, beta, alpha=0.01, epochs = 200):
    for _ in tqdm(range(epochs)):
        rows, cols = R.nonzero()
        for i, j in zip(rows, cols):
            for m in range(len(P[0])):
                #backpropagation
                Jij = -2 * (R[i, j] - np.dot(P[i, :], Q[:, j]))
                P[i, m] = P[i, m] - alpha * (Jij * Q[m, j] + beta * P[i, m])
                Q[m, j] = Q[m, j] - alpha * (Jij * P[i, m] + beta * Q[m, j])

        if error(R, P, Q) < 0.06: #early stopping if loss is sufficiently small
            break

In [14]:
matrix_factorization(R = sp, P = P1, Q = Q1, beta = 0.02) # with regularization

100%|██████████| 200/200 [54:05<00:00, 16.23s/it]


In [15]:
matrix_factorization(R = sp, P = P2, Q = Q2, beta = 0) # without regularization

100%|██████████| 200/200 [53:16<00:00, 15.98s/it]


In [16]:
e1 = error(sp, P1, Q1)
RMSE1 = np.sqrt(e1/np.linalg.norm(R))
print(RMSE1) #1.162

1.1624436424202653


In [17]:
e2 = error(sp, P2, Q2)
RMSE2 = np.sqrt(e2/np.linalg.norm(R))
print(RMSE2) #0.678

0.6784141181145448


The whole process takes a while, but once it's done you dont have to train it again (unless you get new data). Another good thing about SVD based recommendation is that you don't have to keep entire matrix - it might be too big. If you need rating for specific watcher for specific movie, just multiply corresponding row and column of decomposition matrices.