# Matrix Factorization

In a recommendation system, there is a group of users and a set of items. Given that each users have rated some items in the system, we would like to predict how the users would rate the items that they have not yet rated, such that we can make recommendations to the users.

Matrix factorization is one of the mainly used algorithm in recommendation systems. It can be used to discover latent features underlying the interactions between two different kinds of entities. Assume we assign a $k$ dimensional vector $u_i$ to user $i$ and $k$ dimensional vector $v_j$ to item $j$, then user $i$ rates movie $j$ by $\langle u_i, v_j\rangle$.

We can learn all $u_i$ and $v_j$ directly, which is essentially performing SVD on the user-item matrix. We can also try to learn the latent features using multi-layer neural networks. 

In this tutorial, we will work though the steps to implement these ideas in MXNet.

## Prepare Data

We use the [MovieLens](http://grouplens.org/datasets/movielens/) data here, but it can apply to other datasets as well. Each row of this dataset contains a tuple of user id, movie id, rating, and time stamp, we will only use the first three items. We first define the a batch which contains n tuples. It also provides name and shape information to MXNet about the data and label. 

In [1]:
class Batch(object):
    def __init__(self, data_names, data, label_names, label):
        self.data = data
        self.label = label
        self.data_names = data_names
        self.label_names = label_names
        
    @property
    def provide_data(self):
        return [(n, x.shape) for n, x in zip(self.data_names, self.data)]
    
    @property
    def provide_label(self):
        return [(n, x.shape) for n, x in zip(self.label_names, self.label)]


Then we define a data iterator, which returns a batch of tuples each time. 

In [2]:
import mxnet as mx
import random

class Batch(object):
    def __init__(self, data_names, data, label_names, label):
        self.data = data
        self.label = label
        self.data_names = data_names
        self.label_names = label_names

    @property
    def provide_data(self):
        return [(n, x.shape) for n, x in zip(self.data_names, self.data)]

    @property
    def provide_label(self):
        return [(n, x.shape) for n, x in zip(self.label_names, self.label)]

class DataIter(mx.io.DataIter):
    def __init__(self, fname, batch_size):
        super(DataIter, self).__init__()
        self.batch_size = batch_size
        self.data = []
        for line in file(fname):
            tks = line.strip().split('\t')
            if len(tks) != 4:
                continue
            self.data.append((int(tks[0]), int(tks[1]), float(tks[2])))
        self.provide_data = [('user', (batch_size, )), ('item', (batch_size, ))]
        self.provide_label = [('score', (self.batch_size, ))]

    def __iter__(self):
        for k in range(len(self.data) / self.batch_size):
            users = []
            items = []
            scores = []
            for i in range(self.batch_size):
                j = k * self.batch_size + i
                user, item, score = self.data[j]
                users.append(user)
                items.append(item)
                scores.append(score)

            data_all = [mx.nd.array(users), mx.nd.array(items)]
            label_all = [mx.nd.array(scores)]
            data_names = ['user', 'item']
            label_names = ['score']

            data_batch = Batch(data_names, data_all, label_names, label_all)
            yield data_batch

    def reset(self):
        random.shuffle(self.data)

Now we download the data and provide a function to obtain the data iterator:

In [3]:
import os
import urllib
import zipfile
if not os.path.exists('ml-100k.zip'):
    urllib.urlretrieve('http://files.grouplens.org/datasets/movielens/ml-100k.zip', 'ml-100k.zip')
with zipfile.ZipFile("ml-100k.zip","r") as f:
    f.extractall("./")
def get_data(batch_size):
    return (DataIter('./ml-100k/u1.base', batch_size), DataIter('./ml-100k/u1.test', batch_size))

Finally we calculate the numbers of users and items for later use.

In [4]:
def max_id(fname):
    mu = 0
    mi = 0
    for line in file(fname):
        tks = line.strip().split('\t')
        if len(tks) != 4:
            continue
        mu = max(mu, int(tks[0]))
        mi = max(mi, int(tks[1]))
    return mu + 1, mi + 1
max_user, max_item = max_id('./ml-100k/u.data')
(max_user, max_item)

(944, 1683)

## Optimization

We first implement the RMSE (root-mean-square error) measurement, which is commonly used by matrix factorization. 

In [5]:
import math
def RMSE(label, pred):
    ret = 0.0
    n = 0.0
    pred = pred.flatten()
    for i in range(len(label)):
        ret += (label[i] - pred[i]) * (label[i] - pred[i])
        n += 1.0
    return math.sqrt(ret / n)

Then we define a general training module, which is borrowed from the image classification application. 

In [6]:
def train(network, batch_size, num_epoch, learning_rate):
    model = mx.model.FeedForward(
        ctx = mx.gpu(0),  
        symbol = network,
        num_epoch = num_epoch,
        learning_rate = learning_rate,
        wd = 0.0001,
        momentum = 0.9)

    batch_size = 64
    train, test = get_data(batch_size)

    import logging
    head = '%(asctime)-15s %(message)s'
    logging.basicConfig(level=logging.DEBUG)

    model.fit(X = train, 
              eval_data = test,
              eval_metric = RMSE,
              batch_end_callback=mx.callback.Speedometer(batch_size, 20000/batch_size),)

## Networks

Now we try various networks. We first learn the latent vectors directly.

In [7]:
def plain_net(k):
    # input
    user = mx.symbol.Variable('user')
    item = mx.symbol.Variable('item')
    score = mx.symbol.Variable('score')
    # user feature lookup
    user = mx.symbol.Embedding(data = user, input_dim = max_user, output_dim = k) 
    # item feature lookup
    item = mx.symbol.Embedding(data = item, input_dim = max_item, output_dim = k)
    # predict by the inner product, which is elementwise product and then sum
    pred = user * item
    pred = mx.symbol.sum_axis(data = pred, axis = 1)
    pred = mx.symbol.Flatten(data = pred)
    # loss layer
    pred = mx.symbol.LinearRegressionOutput(data = pred, label = score)
    return pred

train(plain_net(64), batch_size=64, num_epoch=10, learning_rate=.05)

INFO:root:Start training with [gpu(0)]
INFO:root:Epoch[0] Batch [312]	Speed: 41342.22 samples/sec	Train-RMSE=3.684994
INFO:root:Epoch[0] Batch [624]	Speed: 42492.43 samples/sec	Train-RMSE=3.707191
INFO:root:Epoch[0] Batch [936]	Speed: 43505.27 samples/sec	Train-RMSE=3.694168
INFO:root:Epoch[0] Batch [1248]	Speed: 43158.00 samples/sec	Train-RMSE=3.708600
INFO:root:Epoch[0] Resetting Data Iterator
INFO:root:Epoch[0] Time cost=2.143
INFO:root:Epoch[0] Validation-RMSE=3.714679
INFO:root:Epoch[1] Batch [312]	Speed: 43131.71 samples/sec	Train-RMSE=3.687011
INFO:root:Epoch[1] Batch [624]	Speed: 43955.37 samples/sec	Train-RMSE=3.629863
INFO:root:Epoch[1] Batch [936]	Speed: 43980.74 samples/sec	Train-RMSE=3.307402
INFO:root:Epoch[1] Batch [1248]	Speed: 44082.50 samples/sec	Train-RMSE=2.602139
INFO:root:Epoch[1] Resetting Data Iterator
INFO:root:Epoch[1] Time cost=1.858
INFO:root:Epoch[1] Validation-RMSE=2.475260
INFO:root:Epoch[2] Batch [312]	Speed: 43939.81 samples/sec	Train-RMSE=2.040821
INFO

Next we try to use 2 layers neural network to learn the latent variables, which stack a fully connected layer above the embedding layers: 

In [8]:
def get_one_layer_mlp(hidden, k):
    # input
    user = mx.symbol.Variable('user')
    item = mx.symbol.Variable('item')
    score = mx.symbol.Variable('score')
    # user latent features
    user = mx.symbol.Embedding(data = user, input_dim = max_user, output_dim = k)
    user = mx.symbol.Activation(data = user, act_type="relu")
    user = mx.symbol.FullyConnected(data = user, num_hidden = hidden)
    # item latent features
    item = mx.symbol.Embedding(data = item, input_dim = max_item, output_dim = k)
    item = mx.symbol.Activation(data = item, act_type="relu")
    item = mx.symbol.FullyConnected(data = item, num_hidden = hidden)
    # predict by the inner product
    pred = user * item
    pred = mx.symbol.sum_axis(data = pred, axis = 1)
    pred = mx.symbol.Flatten(data = pred)
    # loss layer
    pred = mx.symbol.LinearRegressionOutput(data = pred, label = score)
    return pred

train(get_one_layer_mlp(64, 64), batch_size=64, num_epoch=10, learning_rate=.05)

INFO:root:Start training with [gpu(0)]
INFO:root:Epoch[0] Batch [312]	Speed: 30518.20 samples/sec	Train-RMSE=1.336297
INFO:root:Epoch[0] Batch [624]	Speed: 30186.36 samples/sec	Train-RMSE=1.031327
INFO:root:Epoch[0] Batch [936]	Speed: 29917.48 samples/sec	Train-RMSE=1.007261
INFO:root:Epoch[0] Batch [1248]	Speed: 30108.11 samples/sec	Train-RMSE=0.999305
INFO:root:Epoch[0] Resetting Data Iterator
INFO:root:Epoch[0] Time cost=2.684
INFO:root:Epoch[0] Validation-RMSE=0.993410
INFO:root:Epoch[1] Batch [312]	Speed: 30401.63 samples/sec	Train-RMSE=0.964113
INFO:root:Epoch[1] Batch [624]	Speed: 30755.11 samples/sec	Train-RMSE=0.961473
INFO:root:Epoch[1] Batch [936]	Speed: 31495.76 samples/sec	Train-RMSE=0.961091
INFO:root:Epoch[1] Batch [1248]	Speed: 31466.52 samples/sec	Train-RMSE=0.961685
INFO:root:Epoch[1] Resetting Data Iterator
INFO:root:Epoch[1] Time cost=2.613
INFO:root:Epoch[1] Validation-RMSE=0.979111
INFO:root:Epoch[2] Batch [312]	Speed: 30188.96 samples/sec	Train-RMSE=0.946188
INFO

Adding dropout layers to relief the over-fitting. 

In [9]:
def get_one_layer_dropout_mlp(hidden, k):
    # input
    user = mx.symbol.Variable('user')
    item = mx.symbol.Variable('item')
    score = mx.symbol.Variable('score')
    # user latent features
    user = mx.symbol.Embedding(data = user, input_dim = max_user, output_dim = k)
    user = mx.symbol.Activation(data = user, act_type="relu")
    user = mx.symbol.FullyConnected(data = user, num_hidden = hidden)
    user = mx.symbol.Dropout(data=user, p=0.5)
    # item latent features
    item = mx.symbol.Embedding(data = item, input_dim = max_item, output_dim = k)
    item = mx.symbol.Activation(data = item, act_type="relu")
    item = mx.symbol.FullyConnected(data = item, num_hidden = hidden)
    item = mx.symbol.Dropout(data=item, p=0.5)    
    # predict by the inner product
    pred = user * item
    pred = mx.symbol.sum_axis(data = pred, axis = 1)
    pred = mx.symbol.Flatten(data = pred)
    # loss layer
    pred = mx.symbol.LinearRegressionOutput(data = pred, label = score)
    return pred
train(get_one_layer_mlp(256, 512), batch_size=64, num_epoch=10, learning_rate=.05)

INFO:root:Start training with [gpu(0)]
INFO:root:Epoch[0] Batch [312]	Speed: 30877.91 samples/sec	Train-RMSE=1.285091
INFO:root:Epoch[0] Batch [624]	Speed: 31860.31 samples/sec	Train-RMSE=1.008286
INFO:root:Epoch[0] Batch [936]	Speed: 31813.96 samples/sec	Train-RMSE=0.980373
INFO:root:Epoch[0] Batch [1248]	Speed: 31639.54 samples/sec	Train-RMSE=0.975978
INFO:root:Epoch[0] Resetting Data Iterator
INFO:root:Epoch[0] Time cost=2.568
INFO:root:Epoch[0] Validation-RMSE=0.975440
INFO:root:Epoch[1] Batch [312]	Speed: 31823.35 samples/sec	Train-RMSE=0.947374
INFO:root:Epoch[1] Batch [624]	Speed: 31670.55 samples/sec	Train-RMSE=0.956176
INFO:root:Epoch[1] Batch [936]	Speed: 31525.85 samples/sec	Train-RMSE=0.960271
INFO:root:Epoch[1] Batch [1248]	Speed: 31663.57 samples/sec	Train-RMSE=0.949767
INFO:root:Epoch[1] Resetting Data Iterator
INFO:root:Epoch[1] Time cost=2.557
INFO:root:Epoch[1] Validation-RMSE=0.987816
INFO:root:Epoch[2] Batch [312]	Speed: 31765.88 samples/sec	Train-RMSE=0.945751
INFO

## Acknowledgement

This tutorial is based on examples from [xlvector/github](https://github.com/xlvector/).