### Recommender system with ALS ( Alternating least squares) Collaborative Filtring approach

* In this notebook we will use collaborative filtering approach for making predictions. To find `latent vector representation` for users and items `explicit ALS` method is used.
* For the the implementation we use Apache Spark . Refer to the Spark MLLib documentation for Python API details:   https://spark.apache.org/docs/2.1.0/api/python/pyspark.mllib.html

In [1]:
import findspark
findspark.init()

import pandas as pd
import numpy as np
from math import sqrt

from sklearn.model_selection import KFold
from pyspark.mllib.recommendation import ALS, Rating
from pyspark import SparkConf, SparkContext

In [2]:
sc = SparkContext(conf=SparkConf().setAppName("MyApp").setMaster("local"))

In [3]:
#Data
data = sc.textFile('ml-100k/u.data')

In [4]:
#Remove the header
header = data.first()
data = data.filter(lambda r: r != header)

In [5]:
data = data.map(lambda r: r.split('\t'))

In [6]:
data = data.map(lambda r: Rating(int(r[0]), int(r[1]), float(r[2]))).repartition(64)

In [7]:
data.take(1)

[Rating(user=268, product=231, rating=4.0)]

In [8]:
##3-Fold cross validation data split
weights = [1,1,1]
RDDtr1, RDDtr2, RDDtr3 = data.randomSplit(weights)
rdds = [(RDDtr1.union(RDDtr2),RDDtr3), (RDDtr1.union(RDDtr3),RDDtr2), (RDDtr2.union(RDDtr3),RDDtr1)]

In [None]:
#Train three ALS models
RMSE_scores = []

for i in range(len(weights)):
    rddtrain = rdds[i][0] 
    rddtest  = rdds[i][1]
    
    model = ALS.train(rddtrain, 2, iterations=5, lambda_=0.1)
    
    predictions = model.predictAll(rddtest.map(lambda r: (r[0], r[1])))
    
    #Join the resulting rdd with rddtest
    preds_ratings = predictions.map(lambda r: ((r[0],r[1]), r[2])).join(rddtest.map(lambda r: ((r[0],r[1]), r[2]))).values()
    
    #Compute RMSE
    score = sqrt(preds_ratings.map(lambda r: (r[0] - r[1])**2).mean())
    
    RMSE_scores.append(score)

In [None]:
RMSE_scores