# Recommendation Consulting Project
<div style="text-align: right"><h4>- Based on ALS using PySpark MLLib Library</h4></div>

Perform movie ratings predictions of customers and calculate the Mean Squared Error.

#### Import necessary libraries and datasets

In [1]:
import findspark
findspark.init('E:\DATA\Apps\hadoop-env\spark-2.4.7-bin-hadoop2.7')

In [2]:
from pyspark import SparkContext
from pyspark.mllib.recommendation import Rating
from pyspark.mllib.recommendation import ALS

In [3]:
sc = SparkContext()

In [4]:
data = sc.textFile('ratings.csv')
data.take(5)

['1,1,4,964982703',
 '1,3,4,964981247',
 '1,6,4,964982224',
 '1,47,5,964983815',
 '1,50,5,964982931']

#### Transform data

In [5]:
ratings = data.map(lambda l: l.split(','))
ratings.take(5)

[['1', '1', '4', '964982703'],
 ['1', '3', '4', '964981247'],
 ['1', '6', '4', '964982224'],
 ['1', '47', '5', '964983815'],
 ['1', '50', '5', '964982931']]

In [6]:
ratings_final = ratings.map(lambda line: Rating(int(line[0]), int(line[1]), float(line[2])))
ratings_final.take(5)

[Rating(user=1, product=1, rating=4.0),
 Rating(user=1, product=3, rating=4.0),
 Rating(user=1, product=6, rating=4.0),
 Rating(user=1, product=47, rating=5.0),
 Rating(user=1, product=50, rating=5.0)]

In [7]:
training_data, test_data = ratings_final.randomSplit([0.8, 0.2])

#### Build the model

In [8]:
model = ALS.train(training_data, rank=10, iterations=10)

In [9]:
test_data_no_rating = test_data.map(lambda p: (p[0], p[1]))

#### Predict on test data

In [10]:
predictions = model.predictAll(test_data_no_rating)

In [11]:
predictions.take(5)

[Rating(user=600, product=1084, rating=3.9748820035019046),
 Rating(user=597, product=1084, rating=4.101415917312405),
 Rating(user=405, product=1084, rating=3.9628211792019634),
 Rating(user=357, product=1084, rating=4.127987429159679),
 Rating(user=74, product=1084, rating=3.5826700027887513)]

In [12]:
rates = ratings_final.map(lambda r: ((r[0], r[1]), r[2]))

In [13]:
preds = predictions.map(lambda p: ((p[0], p[1]), p[2]))

In [14]:
rates_and_preds = rates.join(preds)

#### Calculate MSE

In [15]:
MSE = rates_and_preds.map(lambda r: (r[1][0] - r[1][1])**2).mean()
print("Mean Squared Error of the model for the test data = {:.2f}".format(MSE))

Mean Squared Error of the model for the test data = 1.25
