<h1>Collaborative Filtering</h1>
<h2>
<ol>
<li>Collaborative filtering is commonly used for recommender systems.</li>
<li>The aim is to fill in the missing entries of a user-item association (preference, score, ...) matrix.</li>
<li>Users and products are described by a small set of latent factors that can be used to predict missing entries.</li>
<li>Alternating least squares (ALS) algorithm is used to learn the latent factors.</li>
</ol>
</h2>

<a href="https://grouplens.org/datasets/movielens/1m/">Rating Dataset</a>

In [1]:
# The code was removed by DSX for sharing.

In [2]:
rdd = sc.textFile(path_1)

In [3]:
rdd.take(1)

[u'1::1193::5::978300760']

In [4]:
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.sql import Row

parts = rdd.map(lambda row: row.split("::"))
ratingsRDD = parts.map(lambda p: Row(userId=int(p[0]), movieId=int(p[1]),
                                     rating=float(p[2]), timestamp=long(p[3])))
ratings = sqlContext.createDataFrame(ratingsRDD)
(training, test) = ratings.randomSplit([0.8, 0.2])

In [5]:
# Build the recommendation model using ALS on the training data
als = ALS(maxIter=5, regParam=0.01, userCol="userId", itemCol="movieId", ratingCol="rating")
model = als.fit(training)

In [6]:
import pixiedust
display(training)

movieId,rating,timestamp,userId
1,1.0,973215902,2744
1,1.0,974643004,2534
1,1.0,974771212,1356
1,1.0,974785296,1314
1,2.0,970461364,3026
1,2.0,972057552,2888
1,2.0,972624201,2823
1,2.0,974651863,2226
1,2.0,974704143,1735
1,2.0,974709363,1776


In [10]:
training.limit(2).show()

+-------+------+---------+------+
|movieId|rating|timestamp|userId|
+-------+------+---------+------+
|      1|   1.0|974643004|  2534|
|      1|   1.0|974675906|  2015|
+-------+------+---------+------+



In [7]:
model.transform(test.limit(2)).show()

+-------+------+---------+------+----------+
|movieId|rating|timestamp|userId|prediction|
+-------+------+---------+------+----------+
|      1|   1.0|963416113|  5397| 3.5368996|
|      1|   1.0|966381072|  3942| 2.7876115|
+-------+------+---------+------+----------+

