# MovieLens Recommender System Using ALS

## Dataset source:
### MovieLens Latest Datasets: https://grouplens.org/datasets/movielens/latest/

***IMPORTANT NOTE:*** The file containing this notebook is meant to be unzipped/placed in your "C:/BigData/~notebookJupyter/" library. If this is not the case, you will either need to alter the filepath (dirPath) below or move the folder into the correct directory.

## Launch Spark Session on Jupyter

In [1]:
#********PySpark connect code start********#
import findspark
findspark.init()
import pyspark
from pyspark import SparkContext
from pyspark import SQLContext
from pyspark.sql import SparkSession
sc = SparkContext.getOrCreate()
spark = SparkSession.builder.getOrCreate()
print(sc.version)
print(spark.version)
#********PySpark connect code end********#

2.4.8
2.4.8


In [2]:
#import pyspark.pandas as ps
#pyspark.sql.DataFrame.to_pandas_on_spark
import pandas as ps
from pyspark.sql import functions as F
from pyspark.sql.functions import udf,col, when
from pyspark.sql import SQLContext
from pyspark.ml.recommendation import ALS
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.sql.types import IntegerType
import numpy as np
from IPython.display import Image
from IPython.display import display

import matplotlib.pyplot as plt

In [3]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

In [4]:
sc =spark.sparkContext
sqlContext = SQLContext(sc)

## Dataset Preprocessing and Preparation:##

The MovieLens DataSet is contained in 4 CSV files:

1.   ratings.csv - Dataset ratings of movies (represented by a unique movieID) by every user (each represented by a unique userID)
2.   movies.csv - DataSet list of movies by title, associated movieID and the genres of each movie  
3.   tags.csv - Tags attributed to each movie by individual users  
4.   links.csv - Identifiers that can be used to link to other sources of movie data


Here we load the data into dataframes:

In [5]:
#**************************************************************************
# Infere CSV schema
#**************************************************************************

#TODO: NEED TO CHANGE UP THE DIRECTORY SO THAT NOTEBOOK CAN RUN INSTANTANEOUSLY WITHOUT EDITING THE PATH FOR EACH INDIVIDUAL USER
#dirPath = "C:/BigData/~Data/movieLensData/ml-latest-small/ml-latest-small/"
dirPath = "C:/BigData/~notebookJupyter/ece552project_team03/movieLensData/ml-latest-small/ml-latest-small/"


tags_fileName = "tags.csv"
movies_fileName = "movies.csv"
ratings_fileName = "ratings.csv"
links_fileName = "links.csv"



tags_df = spark.read.format("csv"). \
            option("header", "true").option("mode", "DROPMALFORMED").option("delimiter", ","). \
            option("ignoreLeadingWhiteSpace","true").option("ignoreTrailingWhiteSpace","true"). \
            option("inferschema","true"). \
            load(dirPath + tags_fileName)
        
print(tags_df.count())
tags_df.printSchema()


movies_df = spark.read.format("csv"). \
            option("header", "true").option("mode", "DROPMALFORMED").option("delimiter", ","). \
            option("ignoreLeadingWhiteSpace","true").option("ignoreTrailingWhiteSpace","true"). \
            option("inferschema","true"). \
            load(dirPath + movies_fileName)
        
print(movies_df.count())
movies_df.printSchema()


rating_df = spark.read.format("csv"). \
            option("header", "true").option("mode", "DROPMALFORMED").option("delimiter", ","). \
            option("ignoreLeadingWhiteSpace","true").option("ignoreTrailingWhiteSpace","true"). \
            option("inferschema","true"). \
            load(dirPath + ratings_fileName)
        
print(rating_df.count())
rating_df.printSchema()

links_df = spark.read.format("csv"). \
            option("header", "true").option("mode", "DROPMALFORMED").option("delimiter", ","). \
            option("ignoreLeadingWhiteSpace","true").option("ignoreTrailingWhiteSpace","true"). \
            option("inferschema","true"). \
            load(dirPath + links_fileName)
        
print(links_df.count())
links_df.printSchema()

3683
root
 |-- userId: integer (nullable = true)
 |-- movieId: integer (nullable = true)
 |-- tag: string (nullable = true)
 |-- timestamp: integer (nullable = true)

9742
root
 |-- movieId: integer (nullable = true)
 |-- title: string (nullable = true)
 |-- genres: string (nullable = true)

100836
root
 |-- userId: integer (nullable = true)
 |-- movieId: integer (nullable = true)
 |-- rating: double (nullable = true)
 |-- timestamp: integer (nullable = true)

9742
root
 |-- movieId: integer (nullable = true)
 |-- imdbId: integer (nullable = true)
 |-- tmdbId: integer (nullable = true)



In [6]:
%%time
#**************************************************************************
# Write PARQUET files
#**************************************************************************
tags_df.coalesce(10).write.mode('overwrite').parquet(dirPath + "tags_df" + ".parquet")

movies_df.coalesce(10).write.mode('overwrite').parquet(dirPath + "movies_df" + ".parquet")

rating_df.coalesce(10).write.mode('overwrite').parquet(dirPath + "ratings_df" + ".parquet")

links_df.coalesce(10).write.mode('overwrite').parquet(dirPath + "links_df" + ".parquet")

Wall time: 6.18 s


In [7]:
%%time
#**************************************************************************
# Read PARQUET file
#**************************************************************************
parquetdf = spark.read.parquet(dirPath + "tags_df" + ".parquet")
parquetdf.count()
parquetdf.show(10)

+------+-------+-----------------+----------+
|userId|movieId|              tag| timestamp|
+------+-------+-----------------+----------+
|     2|  60756|            funny|1445714994|
|     2|  60756|  Highly quotable|1445714996|
|     2|  60756|     will ferrell|1445714992|
|     2|  89774|     Boxing story|1445715207|
|     2|  89774|              MMA|1445715200|
|     2|  89774|        Tom Hardy|1445715205|
|     2| 106782|            drugs|1445715054|
|     2| 106782|Leonardo DiCaprio|1445715051|
|     2| 106782|  Martin Scorsese|1445715056|
|     7|  48516|     way too long|1169687325|
+------+-------+-----------------+----------+
only showing top 10 rows

Wall time: 1.1 s


In [8]:
%%time
#**************************************************************************
# Read PARQUET file
#**************************************************************************
parquetdf = spark.read.parquet(dirPath + "movies_df" + ".parquet")
parquetdf.count()
parquetdf.show(10)

+-------+--------------------+--------------------+
|movieId|               title|              genres|
+-------+--------------------+--------------------+
|      1|    Toy Story (1995)|Adventure|Animati...|
|      2|      Jumanji (1995)|Adventure|Childre...|
|      3|Grumpier Old Men ...|      Comedy|Romance|
|      4|Waiting to Exhale...|Comedy|Drama|Romance|
|      5|Father of the Bri...|              Comedy|
|      6|         Heat (1995)|Action|Crime|Thri...|
|      7|      Sabrina (1995)|      Comedy|Romance|
|      8| Tom and Huck (1995)|  Adventure|Children|
|      9| Sudden Death (1995)|              Action|
|     10|    GoldenEye (1995)|Action|Adventure|...|
+-------+--------------------+--------------------+
only showing top 10 rows

Wall time: 559 ms


In [9]:
%%time
#**************************************************************************
# Read PARQUET file
#**************************************************************************
parquetdf = spark.read.parquet(dirPath + "ratings_df" + ".parquet")
parquetdf.count()
parquetdf.show(10)

+------+-------+------+---------+
|userId|movieId|rating|timestamp|
+------+-------+------+---------+
|     1|      1|   4.0|964982703|
|     1|      3|   4.0|964981247|
|     1|      6|   4.0|964982224|
|     1|     47|   5.0|964983815|
|     1|     50|   5.0|964982931|
|     1|     70|   3.0|964982400|
|     1|    101|   5.0|964980868|
|     1|    110|   4.0|964982176|
|     1|    151|   5.0|964984041|
|     1|    157|   5.0|964984100|
+------+-------+------+---------+
only showing top 10 rows

Wall time: 575 ms


In [10]:
%%time
#**************************************************************************
# Read PARQUET file
#**************************************************************************
parquetdf = spark.read.parquet(dirPath + "links_df" + ".parquet")
parquetdf.count()
parquetdf.show(10)

+-------+------+------+
|movieId|imdbId|tmdbId|
+-------+------+------+
|      1|114709|   862|
|      2|113497|  8844|
|      3|113228| 15602|
|      4|114885| 31357|
|      5|113041| 11862|
|      6|113277|   949|
|      7|114319| 11860|
|      8|112302| 45325|
|      9|114576|  9091|
|     10|113189|   710|
+-------+------+------+
only showing top 10 rows

Wall time: 385 ms


In [11]:
rating_df = rating_df.withColumn("userId", rating_df["userId"].cast(IntegerType()))
rating_df = rating_df.withColumn("movieId", rating_df["movieId"].cast(IntegerType()))
rating_df = rating_df.withColumn("rating", rating_df["rating"].cast(IntegerType()))

In [12]:
%%time
#**************************************************************************
# Write PARQUET file
#**************************************************************************
rating_df.coalesce(10).write.mode('overwrite').parquet(dirPath + "ratings_df" + ".parquet")

Wall time: 836 ms


In [13]:
ratings = rating_df.rdd

numRatings = ratings.count()
numUsers = ratings.map(lambda r: r[0]).distinct().count()
numMovies = ratings.map(lambda r: r[1]).distinct().count()

print("Got %d ratings from %d users on %d movies." % (numRatings, numUsers, numMovies))

Got 100836 ratings from 610 users on 9724 movies.


In [14]:
movies_counts = rating_df.groupBy(col("movieId")).agg(F.count(col("rating")).alias("counts"))
#movies_counts.show()

In [15]:
%%time
#**************************************************************************
# Write PARQUET file
#**************************************************************************
movies_counts.coalesce(10).write.mode('overwrite').parquet(dirPath + "movies_counts_df" + ".parquet")


#**************************************************************************
# Read PARQUET file
#**************************************************************************
parquetdf = spark.read.parquet(dirPath + "movies_counts_df" + ".parquet")
parquetdf.count()
parquetdf.show()

+-------+------+
|movieId|counts|
+-------+------+
|   1580|   165|
|   2366|    25|
|   3175|    75|
|   1088|    42|
|  32460|     4|
|  44022|    23|
|  96488|     4|
|   1238|     9|
|   1342|    11|
|   1591|    26|
|   1645|    51|
|   4519|     9|
|   2142|    10|
|    471|    40|
|   3997|    12|
|    833|     6|
|   3918|     9|
|   7982|     4|
|   1959|    15|
|  68135|    10|
+-------+------+
only showing top 20 rows

Wall time: 3.96 s


## Alternating Least Squares (ALS) Matrix Factorization Model Training

### Train/Test Split

In [16]:
train_df, test_df = rating_df.randomSplit([0.7, 0.3])

In [17]:
iterations = 10
regularization_parameter = 0.1
rank = 4
errors = []
err = 0


In [18]:
als = ALS(maxIter=iterations, regParam=regularization_parameter, rank=4, userCol="userId", itemCol="movieId", ratingCol="rating")
model = als.fit(train_df)
predictions = model.transform(test_df)
new_predictions= predictions.filter(col('prediction') !=np.nan)
evaluator =RegressionEvaluator(metricName = "rmse", labelCol="rating", predictionCol="prediction")
rmse = evaluator.evaluate(new_predictions)
print("Root-mean-square error = " + str(rmse))

Root-mean-square error = 0.9283467539309946


In [19]:
als = ALS(maxIter=iterations, regParam=regularization_parameter, rank=5, userCol="userId", itemCol="movieId", ratingCol="rating")
model = als.fit(train_df)
predictions = model.transform(test_df)
new_predictions= predictions.filter(col('prediction') !=np.nan)
evaluator =RegressionEvaluator(metricName = "rmse", labelCol="rating", predictionCol="prediction")
rmse = evaluator.evaluate(new_predictions)
print("Root-mean-square error = " + str(rmse))

Root-mean-square error = 0.9356367649538887


## Cross Validation with Grid Search:

In [20]:
for rank in range(4,10):
    als = ALS(maxIter=iterations, regParam=regularization_parameter, rank=rank, userCol="userId", itemCol="movieId", ratingCol="rating")
    model = als.fit(train_df)
    predictions = model.transform(test_df)
    new_predictions= predictions.filter(col('prediction') !=np.nan)
    evaluator =RegressionEvaluator(metricName = "rmse", labelCol="rating", predictionCol="prediction")
    rmse = evaluator.evaluate(new_predictions)
    print("Rank ", rank,": Root-mean-square error= " + str(rmse))

Rank  4 : Root-mean-square error= 0.9283467539309946
Rank  5 : Root-mean-square error= 0.9356367649538886
Rank  6 : Root-mean-square error= 0.9360138989737676
Rank  7 : Root-mean-square error= 0.9362079402234419
Rank  8 : Root-mean-square error= 0.9345221046097092
Rank  9 : Root-mean-square error= 0.9377411855562303


Now we will build a Cross Validator to evaluate over the same range of ranks with different regularization constants. We will use sparks Cross Validator class. The CrossValidator requires an Estimator, a set of Estimator ParamMaps, and an Evaluator. 

The Estimator we will be using is ALS, same as above. We use a ParamGridBuilder to construct a grid of parameters to search over. With 3 values for als.regParam and 6 values for als.rank, this grid will have 3 x 6 = 18 parameter settings for CrossValidator to choose from.

In [21]:
als = ALS(maxIter=iterations, regParam=regularization_parameter, rank=5, userCol="userId", itemCol="movieId", ratingCol="rating")
paramGrid = ParamGridBuilder() \
     .addGrid(als.regParam, [0.1,0.01,0.18]) \
     .addGrid(als.rank, range(4, 10)) \
     .build()
evaluator = RegressionEvaluator(metricName= "rmse", labelCol ="rating", predictionCol="prediction")
crossval = CrossValidator(estimator = als,
                         estimatorParamMaps = paramGrid,
                         evaluator = evaluator,
                         numFolds= 5)
print(crossval)
cvModel = crossval.fit(train_df)

CrossValidator_f5bf04a4b86b


In [22]:
print ("Number of models tested: ", len(paramGrid))

Number of models tested:  18


In [23]:
print(crossval)

CrossValidator_f5bf04a4b86b


In [24]:
best_model = cvModel.bestModel
best_pred = best_model.transform(test_df)
best_pred = best_pred.filter(col('prediction') != np.nan)

In [25]:
print("**Best Model**")
# Print "Rank"
print("  Rank:", best_model._java_obj.parent().getRank())
# Print "MaxIter"
print("  MaxIter:", best_model._java_obj.parent().getMaxIter())
# Print "RegParam"
print("  RegParam:", best_model._java_obj.parent().getRegParam())

**Best Model**
  Rank: 4
  MaxIter: 10
  RegParam: 0.1


In [26]:
# View the predictions
prediction1 = best_model.transform(test_df)
prediction1 = prediction1.filter(col('prediction') != np.nan)
selected = prediction1.select("userId", "movieId", "prediction")
for row in selected.collect():
    print(row)

Row(userId=597, movieId=471, prediction=3.9220547676086426)
Row(userId=436, movieId=471, prediction=4.000889778137207)
Row(userId=409, movieId=471, prediction=4.1340813636779785)
Row(userId=599, movieId=471, prediction=2.456048011779785)
Row(userId=218, movieId=471, prediction=3.017045497894287)
Row(userId=57, movieId=471, prediction=3.0884809494018555)
Row(userId=610, movieId=471, prediction=3.471186876296997)
Row(userId=555, movieId=471, prediction=3.3653666973114014)
Row(userId=136, movieId=471, prediction=4.5364484786987305)
Row(userId=273, movieId=471, prediction=4.498664855957031)
Row(userId=287, movieId=471, prediction=2.4648799896240234)
Row(userId=32, movieId=471, prediction=4.199872016906738)
Row(userId=414, movieId=471, prediction=3.665874481201172)
Row(userId=608, movieId=471, prediction=1.7106698751449585)
Row(userId=357, movieId=471, prediction=3.5380804538726807)
Row(userId=44, movieId=833, prediction=3.4533333778381348)
Row(userId=599, movieId=833, prediction=1.74677097

Row(userId=6, movieId=588, prediction=3.9480605125427246)
Row(userId=86, movieId=588, prediction=3.6615843772888184)
Row(userId=474, movieId=588, prediction=3.4922561645507812)
Row(userId=142, movieId=588, prediction=3.8725905418395996)
Row(userId=40, movieId=588, prediction=4.2698822021484375)
Row(userId=500, movieId=588, prediction=3.076228141784668)
Row(userId=94, movieId=588, prediction=3.2699010372161865)
Row(userId=402, movieId=588, prediction=4.151436805725098)
Row(userId=57, movieId=588, prediction=3.4900662899017334)
Row(userId=235, movieId=588, prediction=3.954099416732788)
Row(userId=554, movieId=588, prediction=4.310524940490723)
Row(userId=484, movieId=588, prediction=3.8906397819519043)
Row(userId=590, movieId=588, prediction=3.444380044937134)
Row(userId=374, movieId=588, prediction=3.523249626159668)
Row(userId=586, movieId=588, prediction=3.7391929626464844)
Row(userId=538, movieId=588, prediction=4.283202171325684)
Row(userId=583, movieId=588, prediction=3.51533818244

Row(userId=28, movieId=64983, prediction=2.417853832244873)
Row(userId=177, movieId=64983, prediction=2.723506212234497)
Row(userId=220, movieId=64983, prediction=3.3151416778564453)
Row(userId=448, movieId=66297, prediction=3.5736312866210938)
Row(userId=298, movieId=66297, prediction=2.6900389194488525)
Row(userId=249, movieId=73106, prediction=1.576824426651001)
Row(userId=89, movieId=73106, prediction=1.3586711883544922)
Row(userId=103, movieId=140715, prediction=4.0758843421936035)
Row(userId=111, movieId=140715, prediction=3.7329704761505127)
Row(userId=495, movieId=140715, prediction=4.248971939086914)
Row(userId=414, movieId=140715, prediction=4.105321407318115)
Row(userId=111, movieId=148626, prediction=3.365220546722412)
Row(userId=318, movieId=148626, prediction=3.3951661586761475)
Row(userId=462, movieId=148626, prediction=3.6970582008361816)
Row(userId=564, movieId=148626, prediction=2.793900966644287)
Row(userId=50, movieId=148626, prediction=3.026733160018921)
Row(userId

Row(userId=474, movieId=8008, prediction=3.7985763549804688)
Row(userId=605, movieId=8008, prediction=3.9467005729675293)
Row(userId=594, movieId=8132, prediction=4.418618679046631)
Row(userId=129, movieId=8132, prediction=4.541925430297852)
Row(userId=474, movieId=32584, prediction=2.3693737983703613)
Row(userId=387, movieId=41716, prediction=2.888760566711426)
Row(userId=219, movieId=41716, prediction=2.8329660892486572)
Row(userId=414, movieId=41716, prediction=3.6428492069244385)
Row(userId=599, movieId=45440, prediction=2.8352794647216797)
Row(userId=288, movieId=45440, prediction=3.3978230953216553)
Row(userId=480, movieId=45440, prediction=3.50984263420105)
Row(userId=600, movieId=45440, prediction=3.0698299407958984)
Row(userId=387, movieId=54785, prediction=2.4463157653808594)
Row(userId=610, movieId=54785, prediction=3.227867364883423)
Row(userId=307, movieId=54785, prediction=2.4191555976867676)
Row(userId=483, movieId=103655, prediction=2.784223794937134)
Row(userId=305, mo

Row(userId=361, movieId=1210, prediction=3.374403953552246)
Row(userId=82, movieId=1210, prediction=3.7068519592285156)
Row(userId=364, movieId=1210, prediction=4.262847900390625)
Row(userId=408, movieId=1210, prediction=4.468330383300781)
Row(userId=125, movieId=1210, prediction=3.781411647796631)
Row(userId=95, movieId=1210, prediction=4.407008171081543)
Row(userId=226, movieId=1210, prediction=3.9552810192108154)
Row(userId=414, movieId=1210, prediction=3.9610862731933594)
Row(userId=334, movieId=1210, prediction=3.211613893508911)
Row(userId=200, movieId=1210, prediction=4.182580947875977)
Row(userId=254, movieId=1210, prediction=3.8798680305480957)
Row(userId=303, movieId=1210, prediction=4.562747001647949)
Row(userId=33, movieId=1210, prediction=3.661745548248291)
Row(userId=68, movieId=1210, prediction=3.6011576652526855)
Row(userId=71, movieId=1210, prediction=4.033846378326416)
Row(userId=567, movieId=1210, prediction=2.4063665866851807)
Row(userId=149, movieId=1210, predictio

Row(userId=610, movieId=45499, prediction=3.117910385131836)
Row(userId=380, movieId=45499, prediction=3.4379334449768066)
Row(userId=7, movieId=45499, prediction=2.600491762161255)
Row(userId=448, movieId=45499, prediction=2.437244415283203)
Row(userId=356, movieId=45499, prediction=3.18093204498291)
Row(userId=73, movieId=45499, prediction=3.389127254486084)
Row(userId=480, movieId=45499, prediction=2.652419328689575)
Row(userId=608, movieId=45499, prediction=3.411179542541504)
Row(userId=200, movieId=45499, prediction=3.455186605453491)
Row(userId=18, movieId=45499, prediction=3.066396713256836)
Row(userId=232, movieId=49396, prediction=3.0009782314300537)
Row(userId=222, movieId=60950, prediction=3.3062710762023926)
Row(userId=241, movieId=60950, prediction=3.189037799835205)
Row(userId=153, movieId=60950, prediction=1.8914694786071777)
Row(userId=105, movieId=69529, prediction=4.158350944519043)
Row(userId=483, movieId=96448, prediction=3.0110111236572266)
Row(userId=414, movieId=

Row(userId=368, movieId=1944, prediction=3.134256601333618)
Row(userId=465, movieId=1944, prediction=4.3946051597595215)
Row(userId=255, movieId=2064, prediction=3.1235368251800537)
Row(userId=597, movieId=2064, prediction=4.459273815155029)
Row(userId=253, movieId=2064, prediction=4.168330669403076)
Row(userId=599, movieId=2064, prediction=2.991394281387329)
Row(userId=603, movieId=2064, prediction=3.3478450775146484)
Row(userId=182, movieId=2064, prediction=3.866982936859131)
Row(userId=245, movieId=2064, prediction=1.7053887844085693)
Row(userId=477, movieId=2064, prediction=4.181493759155273)
Row(userId=294, movieId=2064, prediction=3.086059808731079)
Row(userId=357, movieId=2064, prediction=3.8810153007507324)
Row(userId=275, movieId=2064, prediction=4.605453014373779)
Row(userId=555, movieId=2549, prediction=2.929687976837158)
Row(userId=182, movieId=3045, prediction=3.0194952487945557)
Row(userId=253, movieId=4642, prediction=4.663304328918457)
Row(userId=307, movieId=4642, pred

Row(userId=413, movieId=54286, prediction=3.9937992095947266)
Row(userId=18, movieId=54286, prediction=3.5642032623291016)
Row(userId=393, movieId=54286, prediction=4.060234069824219)
Row(userId=537, movieId=104218, prediction=3.708853244781494)
Row(userId=365, movieId=111617, prediction=2.7715601921081543)
Row(userId=227, movieId=117533, prediction=3.8282086849212646)
Row(userId=525, movieId=117533, prediction=3.328228235244751)
Row(userId=111, movieId=149352, prediction=3.267089366912842)
Row(userId=249, movieId=149352, prediction=2.227867841720581)
Row(userId=372, movieId=259, prediction=2.5643460750579834)
Row(userId=19, movieId=512, prediction=2.105414390563965)
Row(userId=186, movieId=512, prediction=3.0367112159729004)
Row(userId=587, movieId=2054, prediction=2.961883544921875)
Row(userId=103, movieId=2054, prediction=2.8886401653289795)
Row(userId=330, movieId=2054, prediction=2.3210768699645996)
Row(userId=233, movieId=2054, prediction=2.1297054290771484)
Row(userId=474, movie

Row(userId=387, movieId=1089, prediction=3.771610975265503)
Row(userId=441, movieId=1089, prediction=4.423815727233887)
Row(userId=171, movieId=1089, prediction=5.089029788970947)
Row(userId=239, movieId=1089, prediction=4.232814311981201)
Row(userId=312, movieId=1089, prediction=4.529376983642578)
Row(userId=63, movieId=1089, prediction=4.022348403930664)
Row(userId=261, movieId=1089, prediction=4.199760437011719)
Row(userId=82, movieId=1089, prediction=3.2228569984436035)
Row(userId=62, movieId=1089, prediction=4.423483371734619)
Row(userId=219, movieId=1089, prediction=3.8402066230773926)
Row(userId=95, movieId=1089, prediction=4.576077938079834)
Row(userId=433, movieId=1089, prediction=4.154897689819336)
Row(userId=254, movieId=1089, prediction=4.235363483428955)
Row(userId=561, movieId=1089, prediction=3.791887044906616)
Row(userId=198, movieId=1089, prediction=3.8608248233795166)
Row(userId=317, movieId=1089, prediction=4.158473014831543)
Row(userId=42, movieId=1089, prediction=4

Row(userId=560, movieId=377, prediction=2.9998722076416016)
Row(userId=415, movieId=377, prediction=3.167133092880249)
Row(userId=425, movieId=377, prediction=3.0108413696289062)
Row(userId=347, movieId=377, prediction=3.6430728435516357)
Row(userId=607, movieId=377, prediction=3.507399559020996)
Row(userId=43, movieId=377, prediction=4.777223587036133)
Row(userId=179, movieId=377, prediction=3.833561658859253)
Row(userId=381, movieId=377, prediction=3.170708179473877)
Row(userId=307, movieId=377, prediction=2.044984817504883)
Row(userId=173, movieId=377, prediction=3.261913299560547)
Row(userId=220, movieId=377, prediction=3.7635693550109863)
Row(userId=8, movieId=377, prediction=3.448695659637451)
Row(userId=176, movieId=377, prediction=4.163736820220947)
Row(userId=380, movieId=377, prediction=3.9546165466308594)
Row(userId=446, movieId=377, prediction=2.9916272163391113)
Row(userId=84, movieId=377, prediction=3.2833194732666016)
Row(userId=453, movieId=377, prediction=3.89924597740

Row(userId=80, movieId=4226, prediction=4.510221004486084)
Row(userId=351, movieId=4226, prediction=3.7788279056549072)
Row(userId=480, movieId=4226, prediction=3.782992124557495)
Row(userId=219, movieId=4226, prediction=3.698932409286499)
Row(userId=464, movieId=4226, prediction=4.123528480529785)
Row(userId=490, movieId=4226, prediction=3.4645802974700928)
Row(userId=414, movieId=4226, prediction=4.209451198577881)
Row(userId=105, movieId=4226, prediction=4.220125675201416)
Row(userId=204, movieId=4226, prediction=4.396629810333252)
Row(userId=369, movieId=4226, prediction=3.619316816329956)
Row(userId=123, movieId=4226, prediction=3.8600335121154785)
Row(userId=199, movieId=4226, prediction=3.655200242996216)
Row(userId=135, movieId=4226, prediction=4.496586322784424)
Row(userId=522, movieId=4226, prediction=3.8477835655212402)
Row(userId=215, movieId=4226, prediction=3.7527518272399902)
Row(userId=551, movieId=4226, prediction=3.4204981327056885)
Row(userId=4, movieId=4765, predict

Row(userId=414, movieId=3257, prediction=2.5596325397491455)
Row(userId=68, movieId=3257, prediction=2.7565855979919434)
Row(userId=42, movieId=3257, prediction=2.9366543292999268)
Row(userId=606, movieId=3655, prediction=3.762089252471924)
Row(userId=462, movieId=5735, prediction=1.4843950271606445)
Row(userId=305, movieId=8340, prediction=3.979907274246216)
Row(userId=346, movieId=8507, prediction=2.8217923641204834)
Row(userId=177, movieId=8507, prediction=2.630134105682373)
Row(userId=474, movieId=8507, prediction=2.6348443031311035)
Row(userId=462, movieId=8507, prediction=2.6835882663726807)
Row(userId=600, movieId=8507, prediction=2.4515492916107178)
Row(userId=474, movieId=8614, prediction=2.8196232318878174)
Row(userId=448, movieId=8614, prediction=2.940416097640991)
Row(userId=156, movieId=8614, prediction=2.6342344284057617)
Row(userId=226, movieId=8614, prediction=2.3264856338500977)
Row(userId=28, movieId=53972, prediction=2.60626482963562)
Row(userId=274, movieId=53972, p

Row(userId=521, movieId=43, prediction=3.3766555786132812)
Row(userId=31, movieId=1136, prediction=4.4803385734558105)
Row(userId=137, movieId=1136, prediction=3.8774499893188477)
Row(userId=385, movieId=1136, prediction=4.09946870803833)
Row(userId=76, movieId=1136, prediction=2.7105906009674072)
Row(userId=606, movieId=1136, prediction=4.012940883636475)
Row(userId=128, movieId=1136, prediction=4.234559059143066)
Row(userId=330, movieId=1136, prediction=3.8625290393829346)
Row(userId=372, movieId=1136, prediction=3.954820156097412)
Row(userId=232, movieId=1136, prediction=3.6282875537872314)
Row(userId=367, movieId=1136, prediction=4.361037731170654)
Row(userId=599, movieId=1136, prediction=3.211760997772217)
Row(userId=348, movieId=1136, prediction=4.893227577209473)
Row(userId=328, movieId=1136, prediction=3.9745635986328125)
Row(userId=57, movieId=1136, prediction=4.178310394287109)
Row(userId=560, movieId=1136, prediction=3.939448595046997)
Row(userId=425, movieId=1136, predictio

Row(userId=610, movieId=7022, prediction=4.248342514038086)
Row(userId=380, movieId=7022, prediction=4.137282371520996)
Row(userId=87, movieId=7022, prediction=3.8922767639160156)
Row(userId=63, movieId=7022, prediction=4.2127509117126465)
Row(userId=361, movieId=7022, prediction=2.107816457748413)
Row(userId=249, movieId=7022, prediction=3.9540164470672607)
Row(userId=125, movieId=7022, prediction=3.675691604614258)
Row(userId=105, movieId=7022, prediction=3.645071506500244)
Row(userId=298, movieId=7022, prediction=3.478316307067871)
Row(userId=260, movieId=7022, prediction=3.2378618717193604)
Row(userId=562, movieId=7022, prediction=3.5577006340026855)
Row(userId=187, movieId=7022, prediction=4.268824577331543)
Row(userId=580, movieId=7254, prediction=3.582911252975464)
Row(userId=417, movieId=7254, prediction=4.311038494110107)
Row(userId=599, movieId=7254, prediction=2.7043240070343018)
Row(userId=111, movieId=7254, prediction=3.4727468490600586)
Row(userId=380, movieId=7254, predi

Row(userId=182, movieId=4545, prediction=3.1764214038848877)
Row(userId=448, movieId=4545, prediction=2.6203160285949707)
Row(userId=543, movieId=4545, prediction=5.127169609069824)
Row(userId=414, movieId=4545, prediction=3.536177396774292)
Row(userId=608, movieId=4545, prediction=4.613227367401123)
Row(userId=369, movieId=4545, prediction=3.8849449157714844)
Row(userId=68, movieId=4545, prediction=3.5267505645751953)
Row(userId=561, movieId=4545, prediction=3.1529555320739746)
Row(userId=186, movieId=4545, prediction=4.226110935211182)
Row(userId=474, movieId=4809, prediction=3.3869998455047607)
Row(userId=510, movieId=4809, prediction=3.534306764602661)
Row(userId=182, movieId=5283, prediction=2.947944164276123)
Row(userId=564, movieId=5283, prediction=2.985560178756714)
Row(userId=387, movieId=5283, prediction=2.6158945560455322)
Row(userId=610, movieId=5283, prediction=2.70816969871521)
Row(userId=45, movieId=5283, prediction=3.1607162952423096)
Row(userId=427, movieId=5283, predi

Row(userId=292, movieId=2716, prediction=3.0876879692077637)
Row(userId=96, movieId=2716, prediction=3.9261767864227295)
Row(userId=19, movieId=2716, prediction=3.1781346797943115)
Row(userId=299, movieId=2716, prediction=3.81369948387146)
Row(userId=489, movieId=2716, prediction=2.666221857070923)
Row(userId=202, movieId=2716, prediction=3.9373230934143066)
Row(userId=441, movieId=2716, prediction=4.333385944366455)
Row(userId=307, movieId=2716, prediction=2.937577247619629)
Row(userId=380, movieId=2716, prediction=4.226194381713867)
Row(userId=448, movieId=2716, prediction=3.684845447540283)
Row(userId=288, movieId=2716, prediction=3.4848461151123047)
Row(userId=156, movieId=2716, prediction=3.5714030265808105)
Row(userId=480, movieId=2716, prediction=3.2822773456573486)
Row(userId=21, movieId=2716, prediction=2.752734899520874)
Row(userId=608, movieId=2716, prediction=2.931485652923584)
Row(userId=376, movieId=2716, prediction=3.8019256591796875)
Row(userId=354, movieId=2716, predic

Row(userId=274, movieId=7360, prediction=3.7039730548858643)
Row(userId=573, movieId=7360, prediction=4.777495384216309)
Row(userId=352, movieId=27821, prediction=3.0444576740264893)
Row(userId=232, movieId=51084, prediction=2.923891305923462)
Row(userId=177, movieId=51084, prediction=3.6303067207336426)
Row(userId=76, movieId=73017, prediction=2.3229482173919678)
Row(userId=103, movieId=73017, prediction=3.7810540199279785)
Row(userId=460, movieId=73017, prediction=3.7402846813201904)
Row(userId=578, movieId=73017, prediction=4.145012378692627)
Row(userId=111, movieId=73017, prediction=3.4307498931884766)
Row(userId=489, movieId=73017, prediction=3.0931828022003174)
Row(userId=381, movieId=73017, prediction=3.62001633644104)
Row(userId=610, movieId=73017, prediction=3.7012171745300293)
Row(userId=520, movieId=73017, prediction=3.806358814239502)
Row(userId=434, movieId=73017, prediction=3.34671688079834)
Row(userId=509, movieId=73017, prediction=3.174675703048706)
Row(userId=408, movi

Row(userId=219, movieId=31221, prediction=0.7255971431732178)
Row(userId=414, movieId=31221, prediction=1.2183341979980469)
Row(userId=305, movieId=31420, prediction=2.941960573196411)
Row(userId=177, movieId=33499, prediction=1.7353763580322266)
Row(userId=563, movieId=33499, prediction=1.7243506908416748)
Row(userId=474, movieId=34271, prediction=3.3520400524139404)
Row(userId=484, movieId=34271, prediction=3.2237794399261475)
Row(userId=600, movieId=50658, prediction=2.7508559226989746)
Row(userId=105, movieId=56782, prediction=3.929553508758545)
Row(userId=567, movieId=56782, prediction=2.852696418762207)
Row(userId=74, movieId=56782, prediction=4.232006549835205)
Row(userId=298, movieId=59421, prediction=1.8223521709442139)
Row(userId=68, movieId=59421, prediction=2.816962957382202)
Row(userId=232, movieId=62081, prediction=2.5347297191619873)
Row(userId=610, movieId=62081, prediction=2.3255043029785156)
Row(userId=403, movieId=62081, prediction=2.2789852619171143)
Row(userId=564,

Row(userId=599, movieId=122886, prediction=2.2815182209014893)
Row(userId=305, movieId=122886, prediction=3.341282367706299)
Row(userId=52, movieId=122886, prediction=4.553927898406982)
Row(userId=212, movieId=122886, prediction=3.2395594120025635)
Row(userId=400, movieId=122886, prediction=3.472794532775879)
Row(userId=249, movieId=122886, prediction=3.6423888206481934)
Row(userId=62, movieId=122886, prediction=3.7695059776306152)
Row(userId=534, movieId=122886, prediction=4.013599395751953)
Row(userId=298, movieId=122886, prediction=1.649477243423462)
Row(userId=525, movieId=122886, prediction=3.289080858230591)
Row(userId=385, movieId=194, prediction=3.447387933731079)
Row(userId=603, movieId=194, prediction=4.474382400512695)
Row(userId=182, movieId=194, prediction=3.7800803184509277)
Row(userId=202, movieId=194, prediction=3.8111729621887207)
Row(userId=385, movieId=277, prediction=2.486935615539551)
Row(userId=406, movieId=277, prediction=3.2877578735351562)
Row(userId=372, movie

Row(userId=606, movieId=6947, prediction=3.6229188442230225)
Row(userId=140, movieId=6947, prediction=3.4541547298431396)
Row(userId=177, movieId=6947, prediction=3.4387781620025635)
Row(userId=182, movieId=6947, prediction=3.553837299346924)
Row(userId=474, movieId=6947, prediction=3.328726053237915)
Row(userId=432, movieId=6947, prediction=3.406310796737671)
Row(userId=560, movieId=6947, prediction=3.4847021102905273)
Row(userId=552, movieId=6947, prediction=3.345552444458008)
Row(userId=438, movieId=6947, prediction=3.5243613719940186)
Row(userId=95, movieId=6947, prediction=4.098366737365723)
Row(userId=414, movieId=6947, prediction=3.8055500984191895)
Row(userId=199, movieId=6947, prediction=3.2824862003326416)
Row(userId=28, movieId=7841, prediction=1.3549617528915405)
Row(userId=274, movieId=30810, prediction=3.084399700164795)
Row(userId=560, movieId=30810, prediction=3.2377915382385254)
Row(userId=495, movieId=30810, prediction=4.053461074829102)
Row(userId=63, movieId=30810, 

Row(userId=573, movieId=64839, prediction=3.752819776535034)
Row(userId=332, movieId=86332, prediction=3.0281965732574463)
Row(userId=246, movieId=86332, prediction=4.052913188934326)
Row(userId=139, movieId=86332, prediction=2.070618152618408)
Row(userId=432, movieId=86332, prediction=3.4778804779052734)
Row(userId=339, movieId=86332, prediction=3.522629976272583)
Row(userId=380, movieId=86332, prediction=3.8348546028137207)
Row(userId=509, movieId=86332, prediction=2.9577977657318115)
Row(userId=62, movieId=86332, prediction=3.7294914722442627)
Row(userId=184, movieId=86332, prediction=3.1554908752441406)
Row(userId=21, movieId=143245, prediction=2.5125670433044434)
Row(userId=125, movieId=156387, prediction=4.148497581481934)
Row(userId=305, movieId=163645, prediction=4.112030982971191)
Row(userId=184, movieId=163645, prediction=4.243861675262451)
Row(userId=599, movieId=267, prediction=1.8851584196090698)
Row(userId=217, movieId=267, prediction=2.818960666656494)
Row(userId=543, mo

Row(userId=387, movieId=4228, prediction=1.8952736854553223)
Row(userId=610, movieId=4228, prediction=2.102403163909912)
Row(userId=307, movieId=4228, prediction=1.9379292726516724)
Row(userId=160, movieId=4228, prediction=1.9992095232009888)
Row(userId=414, movieId=4228, prediction=2.071831464767456)
Row(userId=298, movieId=4228, prediction=1.4342930316925049)
Row(userId=438, movieId=5106, prediction=0.7632825374603271)
Row(userId=82, movieId=5106, prediction=0.9726710319519043)
Row(userId=463, movieId=5378, prediction=3.429368734359741)
Row(userId=540, movieId=5378, prediction=3.6572508811950684)
Row(userId=296, movieId=5378, prediction=2.557436466217041)
Row(userId=34, movieId=5378, prediction=2.977263927459717)
Row(userId=103, movieId=5378, prediction=3.005077838897705)
Row(userId=232, movieId=5378, prediction=3.3556740283966064)
Row(userId=305, movieId=5378, prediction=3.66813588142395)
Row(userId=279, movieId=5378, prediction=2.8441920280456543)
Row(userId=489, movieId=5378, pred

Row(userId=380, movieId=3755, prediction=3.620769500732422)
Row(userId=63, movieId=3755, prediction=3.0478453636169434)
Row(userId=216, movieId=3755, prediction=2.3555212020874023)
Row(userId=599, movieId=3973, prediction=1.2493653297424316)
Row(userId=477, movieId=3973, prediction=1.6764628887176514)
Row(userId=610, movieId=5128, prediction=2.7903993129730225)
Row(userId=139, movieId=5957, prediction=1.7251355648040771)
Row(userId=504, movieId=5957, prediction=2.9693374633789062)
Row(userId=594, movieId=5957, prediction=3.271182060241699)
Row(userId=495, movieId=5957, prediction=3.0396132469177246)
Row(userId=380, movieId=5957, prediction=3.2844350337982178)
Row(userId=414, movieId=5957, prediction=2.8535406589508057)
Row(userId=307, movieId=6012, prediction=1.1743111610412598)
Row(userId=288, movieId=6012, prediction=2.0354814529418945)
Row(userId=580, movieId=6287, prediction=3.2830162048339844)
Row(userId=232, movieId=6287, prediction=3.071216344833374)
Row(userId=599, movieId=6287

Row(userId=187, movieId=6377, prediction=2.9656434059143066)
Row(userId=68, movieId=56336, prediction=2.9434921741485596)
Row(userId=241, movieId=81819, prediction=3.418963670730591)
Row(userId=610, movieId=98239, prediction=1.9340676069259644)
Row(userId=388, movieId=103335, prediction=4.443445682525635)
Row(userId=292, movieId=103335, prediction=3.494400978088379)
Row(userId=509, movieId=103335, prediction=2.7791476249694824)
Row(userId=21, movieId=103335, prediction=3.240941286087036)
Row(userId=534, movieId=103335, prediction=3.6613388061523438)
Row(userId=119, movieId=103335, prediction=3.7702009677886963)
Row(userId=73, movieId=128520, prediction=3.8048782348632812)
Row(userId=89, movieId=128520, prediction=3.3596549034118652)
Row(userId=525, movieId=137337, prediction=3.5227584838867188)
Row(userId=321, movieId=21, prediction=3.2826266288757324)
Row(userId=368, movieId=21, prediction=3.0080294609069824)
Row(userId=115, movieId=21, prediction=3.5232458114624023)
Row(userId=91, mo

Row(userId=381, movieId=44199, prediction=4.11142635345459)
Row(userId=610, movieId=44199, prediction=3.5157830715179443)
Row(userId=239, movieId=44199, prediction=4.074182987213135)
Row(userId=219, movieId=44199, prediction=2.6829731464385986)
Row(userId=528, movieId=44199, prediction=2.7629313468933105)
Row(userId=298, movieId=44199, prediction=1.6556777954101562)
Row(userId=561, movieId=44199, prediction=3.181757688522339)
Row(userId=317, movieId=44199, prediction=3.584754228591919)
Row(userId=123, movieId=44199, prediction=3.430386543273926)
Row(userId=18, movieId=44199, prediction=3.6647725105285645)
Row(userId=393, movieId=44199, prediction=4.865931034088135)
Row(userId=509, movieId=52694, prediction=2.9512510299682617)
Row(userId=517, movieId=52694, prediction=2.382404088973999)
Row(userId=232, movieId=58293, prediction=2.3172407150268555)
Row(userId=560, movieId=58293, prediction=2.2422730922698975)
Row(userId=387, movieId=58293, prediction=1.30854332447052)
Row(userId=256, mov

Row(userId=226, movieId=608, prediction=3.822103500366211)
Row(userId=526, movieId=608, prediction=4.322299480438232)
Row(userId=32, movieId=608, prediction=4.317412853240967)
Row(userId=433, movieId=608, prediction=3.8547232151031494)
Row(userId=68, movieId=608, prediction=2.8128976821899414)
Row(userId=198, movieId=608, prediction=3.6433403491973877)
Row(userId=317, movieId=608, prediction=3.933673620223999)
Row(userId=546, movieId=608, prediction=2.8716061115264893)
Row(userId=600, movieId=608, prediction=3.1546430587768555)
Row(userId=424, movieId=608, prediction=3.7297940254211426)
Row(userId=270, movieId=608, prediction=4.306280136108398)
Row(userId=131, movieId=608, prediction=3.418255090713501)
Row(userId=294, movieId=608, prediction=3.6515235900878906)
Row(userId=124, movieId=608, prediction=4.329989433288574)
Row(userId=302, movieId=608, prediction=4.139807224273682)
Row(userId=66, movieId=608, prediction=4.441279888153076)
Row(userId=137, movieId=724, prediction=2.6668248176

Row(userId=414, movieId=2816, prediction=1.582371473312378)
Row(userId=140, movieId=3230, prediction=3.438663959503174)
Row(userId=325, movieId=3230, prediction=4.113766193389893)
Row(userId=474, movieId=3230, prediction=3.4948313236236572)
Row(userId=420, movieId=3559, prediction=3.2590725421905518)
Row(userId=274, movieId=4954, prediction=3.5243735313415527)
Row(userId=238, movieId=4954, prediction=4.283136367797852)
Row(userId=232, movieId=5881, prediction=2.718060255050659)
Row(userId=599, movieId=5881, prediction=2.516066789627075)
Row(userId=489, movieId=5881, prediction=3.1859216690063477)
Row(userId=177, movieId=6566, prediction=2.091139078140259)
Row(userId=599, movieId=6920, prediction=3.155170440673828)
Row(userId=23, movieId=6920, prediction=3.682175397872925)
Row(userId=594, movieId=7149, prediction=2.828004837036133)
Row(userId=100, movieId=7149, prediction=3.086442708969116)
Row(userId=438, movieId=7149, prediction=2.3775079250335693)
Row(userId=82, movieId=7149, predict

Row(userId=334, movieId=260, prediction=3.3653244972229004)
Row(userId=200, movieId=260, prediction=4.251744747161865)
Row(userId=370, movieId=260, prediction=3.5843870639801025)
Row(userId=71, movieId=260, prediction=4.594732761383057)
Row(userId=198, movieId=260, prediction=3.776279926300049)
Row(userId=600, movieId=260, prediction=3.6160552501678467)
Row(userId=344, movieId=260, prediction=4.815807342529297)
Row(userId=199, movieId=260, prediction=3.853285789489746)
Row(userId=79, movieId=260, prediction=4.736043453216553)
Row(userId=201, movieId=260, prediction=5.060208797454834)
Row(userId=131, movieId=260, prediction=3.4922025203704834)
Row(userId=294, movieId=260, prediction=3.327320098876953)
Row(userId=357, movieId=260, prediction=4.0229811668396)
Row(userId=529, movieId=260, prediction=3.930924654006958)
Row(userId=282, movieId=260, prediction=4.329916954040527)
Row(userId=215, movieId=260, prediction=3.8928871154785156)
Row(userId=410, movieId=260, prediction=4.6937489509582

Row(userId=488, movieId=2762, prediction=3.5670957565307617)
Row(userId=354, movieId=2762, prediction=3.8213000297546387)
Row(userId=304, movieId=2762, prediction=4.072023391723633)
Row(userId=344, movieId=2762, prediction=3.998971939086914)
Row(userId=135, movieId=2762, prediction=4.088657855987549)
Row(userId=357, movieId=2762, prediction=3.914717197418213)
Row(userId=313, movieId=2762, prediction=3.7782704830169678)
Row(userId=399, movieId=2762, prediction=3.6196534633636475)
Row(userId=144, movieId=2762, prediction=3.6651244163513184)
Row(userId=91, movieId=2890, prediction=3.3985092639923096)
Row(userId=182, movieId=2890, prediction=3.6482977867126465)
Row(userId=552, movieId=2890, prediction=3.5371367931365967)
Row(userId=244, movieId=2890, prediction=3.907546043395996)
Row(userId=465, movieId=2890, prediction=4.0275139808654785)
Row(userId=129, movieId=2890, prediction=3.384399890899658)
Row(userId=199, movieId=2890, prediction=3.314962148666382)
Row(userId=275, movieId=2890, pr

Row(userId=256, movieId=6502, prediction=3.7593953609466553)
Row(userId=608, movieId=6502, prediction=3.6946792602539062)
Row(userId=573, movieId=6502, prediction=4.0914459228515625)
Row(userId=187, movieId=6502, prediction=3.4930975437164307)
Row(userId=477, movieId=8253, prediction=2.935647487640381)
Row(userId=483, movieId=8253, prediction=2.5983150005340576)
Row(userId=332, movieId=93766, prediction=2.3197460174560547)
Row(userId=103, movieId=106766, prediction=3.670952320098877)
Row(userId=610, movieId=106766, prediction=3.506443738937378)
Row(userId=567, movieId=106766, prediction=2.562347888946533)
Row(userId=495, movieId=117881, prediction=4.761261940002441)
Row(userId=380, movieId=130490, prediction=2.210116386413574)
Row(userId=125, movieId=130490, prediction=0.9371557235717773)
Row(userId=365, movieId=132800, prediction=1.1551462411880493)
Row(userId=111, movieId=135887, prediction=2.5864927768707275)
Row(userId=41, movieId=135887, prediction=3.015903949737549)
Row(userId=24

Row(userId=20, movieId=1012, prediction=3.7684497833251953)
Row(userId=288, movieId=1012, prediction=3.248964309692383)
Row(userId=600, movieId=1012, prediction=2.903765916824341)
Row(userId=571, movieId=2465, prediction=1.1861708164215088)
Row(userId=580, movieId=3176, prediction=2.587341547012329)
Row(userId=606, movieId=3176, prediction=3.3168158531188965)
Row(userId=182, movieId=3176, prediction=3.345270872116089)
Row(userId=280, movieId=3176, prediction=3.4819178581237793)
Row(userId=474, movieId=3176, prediction=3.1008946895599365)
Row(userId=292, movieId=3176, prediction=2.7591092586517334)
Row(userId=19, movieId=3176, prediction=2.718832492828369)
Row(userId=579, movieId=3176, prediction=3.4786176681518555)
Row(userId=202, movieId=3176, prediction=3.573702812194824)
Row(userId=387, movieId=3176, prediction=2.8363051414489746)
Row(userId=449, movieId=3176, prediction=2.5323824882507324)
Row(userId=428, movieId=3176, prediction=2.3501760959625244)
Row(userId=234, movieId=3176, pr

Row(userId=199, movieId=562, prediction=3.565156936645508)
Row(userId=44, movieId=1429, prediction=3.3968920707702637)
Row(userId=372, movieId=1429, prediction=3.3032407760620117)
Row(userId=274, movieId=1429, prediction=3.241176128387451)
Row(userId=594, movieId=1429, prediction=3.170271396636963)
Row(userId=220, movieId=1429, prediction=3.687124490737915)
Row(userId=391, movieId=1429, prediction=3.433461904525757)
Row(userId=587, movieId=1948, prediction=3.355154037475586)
Row(userId=325, movieId=1948, prediction=3.3750672340393066)
Row(userId=603, movieId=1948, prediction=3.2174482345581055)
Row(userId=554, movieId=2282, prediction=1.5294811725616455)
Row(userId=414, movieId=2282, prediction=2.948092222213745)
Row(userId=599, movieId=2381, prediction=1.2205641269683838)
Row(userId=217, movieId=2381, prediction=2.3707754611968994)
Row(userId=428, movieId=2381, prediction=0.8084121346473694)
Row(userId=19, movieId=2900, prediction=2.148468017578125)
Row(userId=28, movieId=3555, predic

Row(userId=357, movieId=1097, prediction=3.871872663497925)
Row(userId=275, movieId=1097, prediction=3.862924337387085)
Row(userId=562, movieId=1097, prediction=4.050858974456787)
Row(userId=483, movieId=1097, prediction=3.2417593002319336)
Row(userId=384, movieId=1204, prediction=3.6354916095733643)
Row(userId=606, movieId=1204, prediction=4.217129230499268)
Row(userId=572, movieId=1204, prediction=4.628659248352051)
Row(userId=57, movieId=1204, prediction=3.991441249847412)
Row(userId=202, movieId=1204, prediction=4.571399688720703)
Row(userId=387, movieId=1204, prediction=3.4021079540252686)
Row(userId=310, movieId=1204, prediction=4.13437557220459)
Row(userId=75, movieId=1204, prediction=3.7871711254119873)
Row(userId=382, movieId=1204, prediction=4.619847774505615)
Row(userId=414, movieId=1204, prediction=4.370211124420166)
Row(userId=354, movieId=1204, prediction=4.088572025299072)
Row(userId=294, movieId=1204, prediction=3.8610875606536865)
Row(userId=186, movieId=1204, predicti

In [27]:
rmse = evaluator.evaluate(prediction1)
print("The RMSE for Optimal Grid Parameters with Cross Validation is: {}".format(rmse))

The RMSE for Optimal Grid Parameters with Cross Validation is: 0.9283467539309946


In [28]:
%%time
#**************************************************************************
# Write PARQUET file
#**************************************************************************
best_pred.coalesce(10).write.mode('overwrite').parquet(dirPath + "best_pred_by_movieID_df" + ".parquet")

#**************************************************************************
# Read PARQUET file
#**************************************************************************
parquetdf = spark.read.parquet(dirPath + "best_pred_by_movieID_df" + ".parquet")
parquetdf.count()
parquetdf.show(10)

#predictions.show(n = 10)
#best_pred.show(10)

+------+-------+------+----------+----------+
|userId|movieId|rating| timestamp|prediction|
+------+-------+------+----------+----------+
|   436|    536|     3| 833531946| 2.5542357|
|   314|    536|     4| 834241987| 2.2407923|
|    19|    611|     2| 965704056| 2.2289987|
|   603|    612|     3| 963179060| 2.5626721|
|    20|    783|     3|1054038251| 3.7072778|
|   169|    783|     4|1059427166| 3.7481077|
|   120|    783|     3| 860070141| 3.0107772|
|    64|    783|     2|1161559862| 2.7501926|
|   117|    783|     4| 844163543| 3.0780563|
|   563|    783|     4|1441846374|  2.947924|
+------+-------+------+----------+----------+
only showing top 10 rows

Wall time: 22 s


In [29]:
#predictions.join(movies_df, "movieId").select("userId", "title", "genres", "prediction").show(3)
#best_pred.join(movies_df, "movieId").select("userId", "title", "genres", "prediction").show(10)

best_pred_title_genre = best_pred.join(movies_df, "movieId").select("userId", "title", "genres", "prediction").sort("prediction", ascending=False)

In [30]:
%%time
#**************************************************************************
# Write PARQUET file
#**************************************************************************
best_pred_title_genre.coalesce(10).write.mode('overwrite').parquet(dirPath + "best_pred_by_title_genre_df" + ".parquet")

#**************************************************************************
# Read PARQUET file
#**************************************************************************
parquetdf = spark.read.parquet(dirPath + "best_pred_by_title_genre_df" + ".parquet").sort("prediction", ascending=False)
parquetdf.count()
parquetdf.show(10)

+------+--------------------+--------------------+----------+
|userId|               title|              genres|prediction|
+------+--------------------+--------------------+----------+
|   224|  Dying Young (1991)|       Drama|Romance| 6.6648426|
|    53|Roman Holiday (1953)|Comedy|Drama|Romance|  6.507361|
|   543|Tucker & Dale vs ...|       Comedy|Horror|  6.138631|
|    53|River Runs Throug...|               Drama| 6.0811663|
|   143|     Fired Up (2009)|              Comedy| 5.8134437|
|    90|Beautiful Thing (...|       Drama|Romance| 5.7368402|
|   580|         Dune (2000)|Drama|Fantasy|Sci-Fi| 5.6204157|
|   224|Schindler's List ...|           Drama|War| 5.5456953|
|   594|Walk to Remember,...|       Drama|Romance|  5.508151|
|    25|Three Billboards ...|         Crime|Drama| 5.4913807|
+------+--------------------+--------------------+----------+
only showing top 10 rows

Wall time: 45 s


### Recommendations for One Selected User:

In [31]:
#selected_user_id = 597
#for_one_user = predictions.filter(col("userId") == selected_user_id).join(movies_df, "movieId").join(links_df, "movieId").select("userId", "title", "genres", "prediction")
#for_one_user = best_pred.filter(col("userId") == selected_user_id).join(movies_df, "movieId").join(links_df, "movieId").select("userId", "title", "genres", "prediction")

np.random.seed(42)
user_id = np.random.choice(numUsers)
for_one_user = best_pred.filter(col("userId") == user_id).join(movies_df, "movieId").join(links_df, "movieId").select("userId", "title", "genres", "prediction").sort("prediction", ascending=False)
for_one_user.show(10)

+------+--------------------+--------------------+----------+
|userId|               title|              genres|prediction|
+------+--------------------+--------------------+----------+
|   102|Fugitive, The (1993)|            Thriller| 3.8864586|
|   102|Interview with th...|        Drama|Horror| 3.6017718|
|   102|         Dave (1993)|      Comedy|Romance| 3.3378823|
|   102|Three Musketeers,...|Action|Adventure|...|  3.085153|
|   102|Ace Ventura: Pet ...|              Comedy| 2.9737644|
|   102|  Cliffhanger (1993)|Action|Adventure|...| 2.7643664|
|   102|Beverly Hills Cop...|Action|Comedy|Cri...|  2.548684|
|   102|       Casper (1995)|  Adventure|Children| 2.5242767|
|   102|Under Siege 2: Da...|              Action| 2.3637016|
|   102|City Slickers II:...|Adventure|Comedy|...| 2.2028992|
+------+--------------------+--------------------+----------+



In [32]:
%%time
#**************************************************************************
# Write PARQUET file
#**************************************************************************
for_one_user.coalesce(10).write.mode('overwrite').parquet(dirPath + "one_user_recs_df" + ".parquet")

#**************************************************************************
# Read PARQUET file
#**************************************************************************
parquetdf = spark.read.parquet(dirPath + "one_user_recs_df" + ".parquet")
parquetdf.count()
parquetdf.show(10)

+------+--------------------+--------------------+----------+
|userId|               title|              genres|prediction|
+------+--------------------+--------------------+----------+
|   102|City Slickers II:...|Adventure|Comedy|...| 2.2028992|
|   102|Interview with th...|        Drama|Horror| 3.6017718|
|   102|Three Musketeers,...|Action|Adventure|...|  3.085153|
|   102|Beverly Hills Cop...|Action|Comedy|Cri...|  2.548684|
|   102|  Cliffhanger (1993)|Action|Adventure|...| 2.7643664|
|   102|Under Siege 2: Da...|              Action| 2.3637016|
|   102|Ace Ventura: Pet ...|              Comedy| 2.9737644|
|   102|       Casper (1995)|  Adventure|Children| 2.5242767|
|   102|Fugitive, The (1993)|            Thriller| 3.8864586|
|   102|         Dave (1993)|      Comedy|Romance| 3.3378823|
+------+--------------------+--------------------+----------+

Wall time: 8.44 s


### Top Predictions for Each User:

In [33]:
# Top recommendation for each user
#userRecom = model.recommendForAllUsers(10)
userRecom = best_model.recommendForAllUsers(10)

# Top recommendation for each movie
#movieRecom = model.recommendForAllItems(10)
movieRecom = best_model.recommendForAllItems(10)

In [34]:
userRecom.printSchema()

root
 |-- userId: integer (nullable = false)
 |-- recommendations: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- movieId: integer (nullable = true)
 |    |    |-- rating: float (nullable = true)



In [35]:
userRecom.select("userId", "recommendations.movieId").show(10)

+------+--------------------+
|userId|             movieId|
+------+--------------------+
|   471|[136469, 3235, 89...|
|   463|[33649, 26928, 13...|
|   496|[136469, 171495, ...|
|   148|[33649, 3200, 259...|
|   540|[33649, 6732, 269...|
|   392|[8477, 40491, 257...|
|   243|[3200, 25947, 336...|
|    31|[86347, 172547, 4...|
|   516|[86347, 4495, 823...|
|   580|[4821, 183897, 78...|
+------+--------------------+
only showing top 10 rows



In [55]:
%%time
#**************************************************************************
# Write PARQUET file
#**************************************************************************
userRecom.select("userId", "recommendations.movieId").coalesce(10).write.mode('overwrite').parquet(dirPath + "userRecom_df" + ".parquet")

#**************************************************************************
# Read PARQUET file
#**************************************************************************
parquetdf = spark.read.parquet(dirPath + "userRecom_df" + ".parquet").sort("userID", ascending=True)
parquetdf.count()
parquetdf.show(10)

+------+--------------------+
|userId|             movieId|
+------+--------------------+
|     1|[25947, 33649, 32...|
|     2|[4495, 6201, 8235...|
|     3|[4509, 879, 26258...|
|     4|[86347, 461, 4518...|
|     5|[3235, 8235, 4495...|
|     6|[86347, 4495, 620...|
|     7|[26258, 25947, 84...|
|     8|[8477, 25947, 320...|
|     9|[8477, 25947, 257...|
|    10|[86320, 26171, 11...|
+------+--------------------+
only showing top 10 rows

Wall time: 13.8 s


### Top Recommended Users for Each Movie:

In [37]:
movieRecom.printSchema()

root
 |-- movieId: integer (nullable = false)
 |-- recommendations: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- userId: integer (nullable = true)
 |    |    |-- rating: float (nullable = true)



In [38]:
#movieRecom.select("movieId", "recommendations.userId","recommendations.rating").show(10)
movieRecom.select("movieId", "recommendations.userId").show(10)

+-------+--------------------+
|movieId|              userId|
+-------+--------------------+
|   1580|[53, 543, 243, 43...|
|   4900|[53, 43, 1, 360, ...|
|   5300|[461, 236, 40, 10...|
|   6620|[360, 258, 375, 5...|
|   7340|[224, 554, 393, 5...|
|  32460|[53, 43, 236, 40,...|
|  54190|[53, 12, 452, 276...|
|    471|[485, 536, 393, 4...|
|   1591|[53, 224, 236, 38...|
| 140541|[138, 55, 598, 20...|
+-------+--------------------+
only showing top 10 rows



In [39]:
%%time
#**************************************************************************
# Write PARQUET file
#**************************************************************************
movieRecom.select("movieId", "recommendations.userId").coalesce(10).write.mode('overwrite').parquet(dirPath + "movieRecom_df" + ".parquet")

#**************************************************************************
# Read PARQUET file
#**************************************************************************
parquetdf = spark.read.parquet(dirPath + "movieRecom_df" + ".parquet")
parquetdf.count()
parquetdf.show(10)

+-------+--------------------+
|movieId|              userId|
+-------+--------------------+
|   4520|[258, 53, 236, 27...|
|  72330|[393, 224, 485, 2...|
|   2841|[53, 543, 12, 276...|
|   5941|[53, 543, 393, 54...|
|  69251|[53, 543, 243, 27...|
|  89761|[236, 224, 259, 2...|
| 128991|[53, 276, 452, 12...|
|   1732|[360, 258, 154, 3...|
|   4662|[53, 360, 154, 24...|
|   8372|[259, 329, 207, 5...|
+-------+--------------------+
only showing top 10 rows

Wall time: 27.8 s


## Recommendations for Movies the User Hasn't Previously Reviewed Yet
We can further refine the recommender to filter out movies which the user has previously seen/reviewed. 

Let's now take a hyptothetical random user that has already rated some movies. We'll take a random user_id out the range of user id's in the dataset. Next, we get all of their ratings and sort to show the top 10 rated movies.

In [40]:
np.random.seed(42)
#user_id = np.random.choice(numUsers)

In [41]:
new_user_ratings = rating_df.filter(rating_df.userId == user_id)
new_user_ratings.sort('rating', ascending=True).take(10) # top rated movies for this user

[Row(userId=102, movieId=540, rating=1, timestamp=835877588),
 Row(userId=102, movieId=204, rating=2, timestamp=835876471),
 Row(userId=102, movieId=173, rating=2, timestamp=835876239),
 Row(userId=102, movieId=329, rating=2, timestamp=835877081),
 Row(userId=102, movieId=163, rating=3, timestamp=835877307),
 Row(userId=102, movieId=23, rating=3, timestamp=835877570),
 Row(userId=102, movieId=21, rating=3, timestamp=835876107),
 Row(userId=102, movieId=172, rating=3, timestamp=835876471),
 Row(userId=102, movieId=39, rating=3, timestamp=835876151),
 Row(userId=102, movieId=165, rating=3, timestamp=835875790)]

In [42]:
new_user_ratings.describe('rating').show()

+-------+------------------+
|summary|            rating|
+-------+------------------+
|  count|                56|
|   mean| 3.357142857142857|
| stddev|0.7960943623504901|
|    min|                 1|
|    max|                 5|
+-------+------------------+



In [43]:
%%time
#**************************************************************************
# Write PARQUET file
#**************************************************************************
new_user_ratings.coalesce(10).write.mode('overwrite').parquet(dirPath + "new_user_ratings_df" + ".parquet")

Wall time: 603 ms


In [44]:
%%time
#**************************************************************************
# Read PARQUET file
#**************************************************************************
parquetdf = spark.read.parquet(dirPath + "new_user_ratings_df" + ".parquet")
parquetdf.count()
parquetdf.show()

+------+-------+------+---------+
|userId|movieId|rating|timestamp|
+------+-------+------+---------+
|   102|      3|     5|840635033|
|   102|      6|     3|835877535|
|   102|     21|     3|835876107|
|   102|     23|     3|835877570|
|   102|     39|     3|835876151|
|   102|     47|     5|835876045|
|   102|    150|     3|835875691|
|   102|    153|     3|835875790|
|   102|    158|     3|835876534|
|   102|    160|     3|835876194|
|   102|    161|     3|835875943|
|   102|    163|     3|835877307|
|   102|    165|     3|835875790|
|   102|    172|     3|835876471|
|   102|    173|     2|835876239|
|   102|    186|     3|835876384|
|   102|    204|     2|835876471|
|   102|    223|     5|835877270|
|   102|    231|     3|835875836|
|   102|    236|     3|835876272|
+------+-------+------+---------+
only showing top 20 rows

Wall time: 231 ms


### Remove Previously Reviewed Movies from Consideration:

The following will get us a list of all movieIds not previously rated by the user. Additionally, unrated movies are required to have more  than 25 ratings in order not to be filtered out. This prevents movies which don't have many reviews from being recommended.

In [46]:
new_user_rated_movieIds = [i.movieId for i in new_user_ratings.select('movieId').distinct().collect()]
movieIds = [i.movieId for i in movies_counts.filter(movies_counts.counts > 25).select('movieId').distinct().collect()]
new_user_unrated_movieIds = list(set(movieIds) - set(new_user_rated_movieIds))
print(new_user_unrated_movieIds)

[1, 122882, 2, 5, 2054, 7, 122886, 10, 6155, 2058, 6157, 11, 122892, 2064, 16, 17, 19, 122900, 22, 57368, 25, 122904, 24, 2076, 30749, 2078, 31, 2080, 2081, 34, 2082, 29, 2085, 122918, 2087, 32, 5299, 36, 6188, 44, 2094, 48, 45, 50, 102445, 52, 4148, 2100, 51255, 2105, 58, 60, 2109, 116797, 88125, 62, 65, 88129, 2115, 70, 30793, 53322, 6218, 88140, 2124, 30810, 30812, 2140, 95, 2144, 2145, 88163, 92259, 2150, 104, 30825, 107, 110, 111, 112, 2161, 2160, 2167, 49272, 139385, 69757, 2174, 4223, 4226, 45186, 135, 4232, 6281, 141, 4239, 6287, 2193, 2194, 145, 4246, 151, 4262, 168, 8360, 8361, 170, 4270, 8368, 180, 2231, 2232, 185, 8376, 84152, 6333, 193, 2243, 196, 198, 106696, 2248, 4299, 208, 4306, 69844, 4308, 112852, 4310, 216, 2268, 6365, 224, 4321, 225, 59615, 2273, 6373, 2278, 6377, 6378, 235, 33004, 2288, 2289, 2291, 2294, 246, 4343, 4344, 2300, 2302, 252, 256, 260, 261, 4361, 265, 39183, 8464, 53519, 4369, 273, 2324, 277, 4370, 2321, 272, 2329, 282, 106782, 2335, 4128, 292, 4388, 3

In [47]:
import time
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
num_ratings = len(new_user_unrated_movieIds)
cols = ('userId', 'movieId', 'timestamp')
timestamps = [int(time.time())] * num_ratings
userIds = [user_id] * num_ratings
# ratings = [0] * num_ratings
new_user_preds = sqlContext.createDataFrame(zip(userIds, new_user_unrated_movieIds, timestamps), cols)

In [48]:
new_user_preds = best_model.transform(new_user_preds).filter(col('prediction') != np.nan)

In [49]:
new_user_preds.sort('prediction', ascending=False).take(10)

[Row(userId=102, movieId=898, timestamp=1652843132, prediction=4.401188850402832),
 Row(userId=102, movieId=750, timestamp=1652843132, prediction=4.3704986572265625),
 Row(userId=102, movieId=720, timestamp=1652843132, prediction=4.321457386016846),
 Row(userId=102, movieId=2019, timestamp=1652843132, prediction=4.2671098709106445),
 Row(userId=102, movieId=1148, timestamp=1652843132, prediction=4.2554497718811035),
 Row(userId=102, movieId=1223, timestamp=1652843132, prediction=4.244061470031738),
 Row(userId=102, movieId=858, timestamp=1652843132, prediction=4.236511707305908),
 Row(userId=102, movieId=2959, timestamp=1652843132, prediction=4.235004425048828),
 Row(userId=102, movieId=1250, timestamp=1652843132, prediction=4.23486852645874),
 Row(userId=102, movieId=1136, timestamp=1652843132, prediction=4.220710277557373)]

### Top  Recommendations for Movies not Previously Reviewed by the User:

Here we see the top recommendations for movies which a particular user hasn't previously reviewed. This lets the user choose movies that are new to them. For simplicity, it is assumed that movies that haven't been reviewed by the user have also not yet been viewed by the user. 

In [50]:
#one_user_unrated_df = new_user_preds.filter(col("userId") == user_id).join(movies_df, "movieId").join(links_df, "movieId").select("userId", "title", "genres", "prediction")
one_user_unrated_df = new_user_preds.filter(col("userId") == user_id).join(movies_df, "movieId").join(links_df, "movieId").select("userId", "title", "genres", "prediction").sort('prediction', ascending=False)
one_user_unrated_df.show(10)

+------+--------------------+--------------------+----------+
|userId|               title|              genres|prediction|
+------+--------------------+--------------------+----------+
|   102|Philadelphia Stor...|Comedy|Drama|Romance|  4.401189|
|   102|Dr. Strangelove o...|          Comedy|War| 4.3704987|
|   102|Wallace & Gromit:...|Adventure|Animati...| 4.3214574|
|   102|Seven Samurai (Sh...|Action|Adventure|...|   4.26711|
|   102|Wallace & Gromit:...|Animation|Childre...|   4.25545|
|   102|Grand Day Out wit...|Adventure|Animati...| 4.2440615|
|   102|Godfather, The (1...|         Crime|Drama| 4.2365117|
|   102|   Fight Club (1999)|Action|Crime|Dram...| 4.2350044|
|   102|Bridge on the Riv...| Adventure|Drama|War| 4.2348685|
|   102|Monty Python and ...|Adventure|Comedy|...| 4.2207103|
+------+--------------------+--------------------+----------+
only showing top 10 rows



In [51]:
#===============================================================
#===============================================================
# PARQUET
#===============================================================
#===============================================================

In [52]:
%%time
#**************************************************************************
# Write PARQUET file
#**************************************************************************
one_user_unrated_df.coalesce(10).write.mode('overwrite').parquet(dirPath + "one_user_unrated_df" + ".parquet")

Wall time: 12.6 s


In [56]:
%%time
#**************************************************************************
# Read PARQUET file
#**************************************************************************
parquetdf = spark.read.parquet(dirPath + "one_user_unrated_df" + ".parquet")
parquetdf.count()
parquetdf.show(10)

+------+--------------------+--------------------+----------+
|userId|               title|              genres|prediction|
+------+--------------------+--------------------+----------+
|   102|Philadelphia Stor...|Comedy|Drama|Romance|  4.401189|
|   102|Dr. Strangelove o...|          Comedy|War| 4.3704987|
|   102|Wallace & Gromit:...|Adventure|Animati...| 4.3214574|
|   102|Seven Samurai (Sh...|Action|Adventure|...|   4.26711|
|   102|Wallace & Gromit:...|Animation|Childre...|   4.25545|
|   102|Grand Day Out wit...|Adventure|Animati...| 4.2440615|
|   102|Godfather, The (1...|         Crime|Drama| 4.2365117|
|   102|   Fight Club (1999)|Action|Crime|Dram...| 4.2350044|
|   102|Bridge on the Riv...| Adventure|Drama|War| 4.2348685|
|   102|Monty Python and ...|Adventure|Comedy|...| 4.2207103|
+------+--------------------+--------------------+----------+
only showing top 10 rows

Wall time: 478 ms
