### Machine Leaning for Movie Recommendation

In this notebook, we are looking to perform machine learning using "MLLib" library which seats on top of Spark. It provides us with the oppurtunity to easily perform analytics and machine learning using spark. Many models are available:

- ``KNN``
- ``Linear Regression``
- ``SVM``
- ``Naiive Bayes``
- ``Decision Trees``
- ``Principal Component Analysis``
- ``Singular Value Decomposition``
- ``Alternatig Least Square`` --> this allows us to perform recommendation easily thtough spark

There are three additional data types, nameley:

- ``vector`` for dense and sparse matrixes
- ``labeled points`` to assign label to values
- ``rating`` to perform recommendation analysis



In [1]:
import findspark
findspark.init()

In [2]:
import sys
from pyspark import SparkConf, SparkContext
# we do need to explicitly import them
from pyspark.mllib.recommendation import ALS, Rating

conf = SparkConf().setMaster("local[*]").setAppName("MovieRecommendationsALS")
sc = SparkContext(conf = conf)
sc.setCheckpointDir('checkpoint')

In [5]:

def loadMovieNames():
    movieNames = {}
    with open("ml-100k/u.ITEM", encoding='ascii', errors="ignore") as f:
        for line in f:
            fields = line.split('|')
            movieNames[int(fields[0])] = fields[1]
    return movieNames


print("\nLoading movie names...")
nameDict = loadMovieNames()

data = sc.textFile("ml-100k/u.data")

# bringing data to Rating datatype
ratings = data.map(lambda l: l.split()).map(lambda l: Rating(int(l[0]), int(l[1]), float(l[2]))).cache()

# Build the recommendation model using Alternating Least Squares
print("\nTraining recommendation model...")
rank = 10
# Lowered numIterations to ensure it works on lower-end systems
numIterations = 6 # increasing it would increase the time 
model = ALS.train(ratings, rank, numIterations)

userID = 0

#thenw we ask to provide top 10 recommendation for the userID
print("\nRatings for user ID " + str(userID) + ":")
userRatings = ratings.filter(lambda l: l[0] == userID)
for rating in userRatings.collect():
    print (nameDict[int(rating[1])] + ": " + str(rating[2]))

print("\nTop 10 recommendations:")
recommendations = model.recommendProducts(userID, 10)
for recommendation in recommendations:
    print (nameDict[int(recommendation[1])] + \
        " score " + str(recommendation[2]))


Loading movie names...

Training recommendation model...

Ratings for user ID 0:
Star Wars (1977): 5.0
Empire Strikes Back, The (1980): 5.0
Gone with the Wind (1939): 1.0

Top 10 recommendations:
Stupids, The (1996) score 7.274646363916272
Love Jones (1997) score 7.0868202856518465
In the Line of Duty 2 (1987) score 6.8831234364748966
Secret Agent, The (1996) score 6.634535455305416
unknown score 6.580162800145996
In the Mouth of Madness (1995) score 6.3871337384291476
Maya Lin: A Strong Clear Vision (1994) score 6.3348871818306725
Cronos (1992) score 6.32136812081478
Inspector General, The (1949) score 6.318240177389676
Army of Darkness (1993) score 5.954021252033902
