# Matrix Factorization
In a recommendation system, there is a group of users and a set of items. Given that each users have rated some items in the system, we would like to predict how the users would rate the items that they have not yet rated, such that we can make recommendations to the users.

Matrix factorization is one of the mainly used algorithm in recommendation systems. It can be used to discover latent features underlying the interactions between two different kinds of entities.

Assume we assign a k-dimensional vector to each user and a k-dimensional vector to each item such that the dot product of these two vectors gives the user's rating of that item. We can learn the user and item vectors directly, which is essentially performing SVD on the user-item matrix. We can also try to learn the latent features using multi-layer neural networks.

In this tutorial, we will work though the steps to implement these ideas in MXNet.

## Prepare Data

We use the [MovieLens](https://grouplens.org/datasets/movielens/) data here, but it can apply to other datasets as well. Each row of this dataset contains a tuple of user id, movie id, rating, and time stamp, we will only use the first three items. We first define the a batch which contains n tuples. It also provides name and shape information to MXNet about the data and label.

## Jupyter Scala kernel
Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:

**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.

We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook.

```
classpath.addPath(<path_to_jar>)

e.g
classpath.addPath("mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar")
```

In [2]:
import ml.dmlc.mxnet._
import scala.util.Random
import scala.io.Source
import scala.collection.immutable.ListMap
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable
import ml.dmlc.mxnet.optimizer.SGD
import ml.dmlc.mxnet.Callback.Speedometer
import ml.dmlc.mxnet.{DataBatch, DataIter, NDArray, Shape}


[32mimport [36mml.dmlc.mxnet._[0m
[32mimport [36mscala.util.Random[0m
[32mimport [36mscala.io.Source[0m
[32mimport [36mscala.collection.immutable.ListMap[0m
[32mimport [36mscala.collection.mutable.ArrayBuffer[0m
[32mimport [36mscala.collection.mutable[0m
[32mimport [36mml.dmlc.mxnet.optimizer.SGD[0m
[32mimport [36mml.dmlc.mxnet.Callback.Speedometer[0m
[32mimport [36mml.dmlc.mxnet.{DataBatch, DataIter, NDArray, Shape}[0m

Then we define a data iterator, which returns a batch of tuples each time.

In [3]:
class BDataIter(filename: String, batch_size: Int) extends DataIter {

  val data = ArrayBuffer[(Float, Float, Float)]()
  for (line <- Source.fromFile(filename).getLines()) {
    val arr = line.split("\t").map(_.trim)
    if(arr.length == 4){
      data += ((arr(0).toFloat, arr(1).toFloat, arr(2).toFloat))
    }
  }

  val _provideData = ListMap("user" -> Shape(batch_size), "item" -> Shape(batch_size))
  val _provideLabel = ListMap("score" -> Shape(batch_size))

  private var k = 0

  override def next(): DataBatch = {
    if (!hasNext) throw new NoSuchElementException
    val users = ArrayBuffer[Float]()
    val items = ArrayBuffer[Float]()
    val scores = ArrayBuffer[Float]()
    for (i <- 0 to batch_size-1){
      val j = k * batch_size + i
      val (user, item, score) = data(j)
      users += user
      items += item
      scores += score
    }
    k +=1
    val data_all = Array(NDArray.array(users.toArray, shape = Shape(batch_size)),
      NDArray.array(items.toArray, shape = Shape(batch_size)))
    val label_all  = Array(NDArray.array(scores.toArray, shape = Shape(batch_size)))

    val data_names = Array("user", "item")
    val label_names = Array("score")

    new DataBatch(data=data_all,label=label_all, index=getIndex(), pad=getPad(), providedData=_provideData, providedLabel=_provideLabel)
  }

  /**
    * reset the iterator
    */
  override def reset(): Unit = {
    scala.util.Random.shuffle(data)
    k = 0
  }

  override def hasNext: Boolean = {
    k < (data.length / batch_size)
  }

  override def batchSize: Int = batch_size

  override def getData(): IndexedSeq[NDArray] = IndexedSeq()

  override def getIndex(): IndexedSeq[Long] = IndexedSeq[Long]()

  override def getLabel(): IndexedSeq[NDArray] = IndexedSeq()

  override def getPad(): Int = 0

  override def provideData: ListMap[String, Shape] = _provideData

  override def provideLabel: ListMap[String, Shape] = _provideLabel
}

defined [32mclass [36mBDataIter[0m

Now we provide a function to obtain the data iterator:

In [4]:
def getDataIter(batchSize: Int) ={
    val (dataTrain, dataTest) = (new BDataIter("/Users/roshanin/ml-100k/u1.base", batchSize), new BDataIter("/Users/roshanin/ml-100k/u1.test", batchSize))
    (dataTrain, dataTest)
}

defined [32mfunction [36mgetDataIter[0m

Finally we calculate the numbers of users and items for later use.

In [5]:
def maxId(fname: String) ={
    var mu = 0
    var mi =0
    for (line <- Source.fromFile(fname).getLines()) {
        val arr = line.split("\t").map(_.trim)
        
        if(arr.length == 4){
            mu = mu max arr(0).toInt  
            mi = mi max arr(1).toInt
        }
    }
    (mu+1, mi+1)
}

val (mu, mi) = maxId("/Users/roshanin/ml-100k/u.data")

defined [32mfunction [36mmaxId[0m
[36mmu[0m: [32mInt[0m = [32m944[0m
[36mmi[0m: [32mInt[0m = [32m1683[0m

## Optimization
We first implement the RMSE (root-mean-square error) measurement, which is commonly used by matrix factorization.

In [6]:
def RMSE(label: NDArray, pred: NDArray): Float={
    val labelArr = label.toArray
    val predArr = pred.toArray

    var ret: Float = 0.0f
    var n: Float = 0.0f
    
    for(i <- 0 to labelArr.length-1){
        ret += (labelArr(i) - predArr(i)) * (labelArr(i) - predArr(i))
        n += 1.0f
    }
    Math.sqrt(ret/n).asInstanceOf[Float]    
}

defined [32mfunction [36mRMSE[0m

Then we define a general training module, which is borrowed from the image classification application.

In [7]:
def train(network: Symbol, batchSize: Int, numEpoch: Int, learningRate: Float) {
    var batchSize = 64
    val (trainIter, testIter) = getDataIter(batchSize)
    val evalMetric = new CustomMetric(RMSE, name = "rmse")
    
//     val model = FeedForward.newBuilder(network)
//       .setContext(Context.cpu(0))
//       .setNumEpoch(numEpoch)
//       .setOptimizer(new SGD(learningRate = learningRate, momentum = 0.9f, wd = 0.0001f))
//       .setTrainData(trainIter)
//       .setEvalMetric(evalMetric)
//       .setEvalData(testIter)
//       .setBatchEndCallback(new Speedometer(batchSize, 20000/batchSize))
//       .build()
    
    val model = new FeedForward(ctx = Context.gpu(0),
                                symbol = network,
                                numEpoch = numEpoch,
                                optimizer = new SGD(learningRate = learningRate, momentum = 0.9f, wd = 0.0001f))
    

      model.fit(trainData = trainIter,
                evalData = testIter,
                evalMetric = evalMetric,
                kvStore = null,
                batchEndCallback = new Speedometer(batchSize, 20000/batchSize),
               epochEndCallback = null) 
}

defined [32mfunction [36mtrain[0m

## Networks
Now we try various networks. We first learn the latent vectors directly.

In [8]:
def plainNet(k: Int) ={
    // input
    val user = Symbol.Variable("user")
    val item = Symbol.Variable("item")
    val score = Symbol.Variable("score")
    // user feature lookup
    val user1 = Symbol.Embedding()()(Map("data" -> user, "input_dim" -> mu,
                                        "output_dim" -> k))
    // item feature lookup
    val item1 = Symbol.Embedding()()(Map("data" -> item, "input_dim" -> mi,
                                        "output_dim" -> k))
 
    // predict by the inner product, which is elementwise product and then sum
    
    val pred0 = user1 * item1
    
    val pred1 = Symbol.sum_axis()()(Map("data" -> pred0, "axis" -> 1))
    val pred2 = Symbol.Flatten()()(Map("data" -> pred1))
    // loss layer
    val pred = Symbol.LinearRegressionOutput()()(Map("data" -> pred2, "label" -> score))
    
    pred
}

train(plainNet(64), batchSize=64, numEpoch=10, learningRate=.05f)


log4j:WARN No appenders could be found for logger (MXNetJVM).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.


defined [32mfunction [36mplainNet[0m

Next we try to use 2 layers neural network to learn the latent variables, which stack a fully connected layer above the embedding layers:

In [9]:
def getOneLayerMlp(hidden: Int, k: Int) ={
    // input
    val user = Symbol.Variable("user")
    val item = Symbol.Variable("item")
    val score = Symbol.Variable("score")
    // user feature lookup
    val user1 = Symbol.Embedding()()(Map("data" -> user, "input_dim" -> mu,
                                        "output_dim" -> k))
    val user2 = Symbol.Activation()()(Map("data" -> user1, "act_type" -> "relu"))
    val user3 = Symbol.FullyConnected()()(Map("data" -> user2, "num_hidden" -> hidden))
                          
    // item feature lookup
    val item1 = Symbol.Embedding()()(Map("data" -> item, "input_dim" -> mi,
                                        "output_dim" -> k))
    val item2 = Symbol.Activation()()(Map("data" -> item1, "act_type" -> "relu"))
    val item3 = Symbol.FullyConnected()()(Map("data" -> item2, "num_hidden" -> hidden))
 
    // predict by the inner product
    
    val pred0 = user3 * item3
    
    val pred1 = Symbol.sum_axis()()(Map("data" -> pred0, "axis" -> 1))
    val pred2 = Symbol.Flatten()()(Map("data" -> pred1))
                          
    // loss layer
    val pred = Symbol.LinearRegressionOutput()()(Map("data" -> pred2, "label" -> score))
    pred
    
}

train(getOneLayerMlp(64,64), batchSize=64, numEpoch=10, learningRate=.005f)

defined [32mfunction [36mgetOneLayerMlp[0m

Adding dropout layers to relief the over-fitting.

In [10]:
def getOneLayerDropoutMlp(hidden: Int, k: Int) ={
    // input
    val user = Symbol.Variable("user")
    val item = Symbol.Variable("item")
    val score = Symbol.Variable("score")
    // user feature lookup
    val user1 = Symbol.Embedding("user")()(Map("data" -> user, "input_dim" -> mu,
                                        "output_dim" -> k))
    val user2 = Symbol.Activation()()(Map("data" -> user1, "act_type" -> "relu"))
    val user3 = Symbol.FullyConnected()()(Map("data" -> user2, "num_hidden" -> hidden))
    val user4 = Symbol.Dropout()()(Map("data" -> user3, "p" -> 0.5f))
                          
    // item feature lookup
    val item1 = Symbol.Embedding("item")()(Map("data" -> item, "input_dim" -> mi,
                                        "output_dim" -> k))
    val item2 = Symbol.Activation()()(Map("data" -> item1, "act_type" -> "relu"))
    val item3 = Symbol.FullyConnected()()(Map("data" -> item2, "num_hidden" -> hidden))
    val item4 = Symbol.Dropout()()(Map("data" -> item3, "p" -> 0.5f))

    // predict by the inner product
    
    val pred0 = user4 * item4
    
    val pred1 = Symbol.sum_axis()()(Map("data" -> pred0, "axis" -> 1))
    val pred2 = Symbol.Flatten()()(Map("data" -> pred1))
                          
    // loss layer
    val pred = Symbol.LinearRegressionOutput()()(Map("data" -> pred2, "label" -> score))
    pred
    
}
                          
train(getOneLayerDropoutMlp(256, 512), batchSize=64, numEpoch=10, learningRate=.005f)

defined [32mfunction [36mgetOneLayerDropoutMlp[0m