## Collaborative recommendation using Alternative least sqaures - ALS

In [1]:
from pyspark.sql import SparkSession 

In [11]:
import sys

In [3]:
from pyspark.mllib.recommendation import ALS,Rating

In [4]:
spark = SparkSession.builder.appName("ALS-Movie-recommendations").getOrCreate()

In [115]:
# read rdd
movies_rdd = spark.sparkContext.textFile("./datasets/ml-100k/u.data")
sc.setCheckpointDir('checkpoint')

## checkpoint vs cache

[https://github.com/JerryLead/SparkInternals/blob/master/markdown/english/6-CacheAndCheckpoint.md](https://github.com/JerryLead/SparkInternals/blob/master/markdown/english/6-CacheAndCheckpoint.md)


Rating is a data structure provided by ALS, this contains the data on which the recommedation to be made

In [30]:
# Rating data structure accepts userid:int, product:int, rating:float
movies_ratings = movies_rdd.map(lambda x : x.split()).map(lambda field : 
                                                      Rating(int(field[0]), int(field[1]) , float(field[2])) ).cache()
# as movies ratings will be evaluated many times inside ALS method

### Training recommedation model

to more about the rank and other parameters please 

* [ALS](https://stackoverflow.com/a/45838873)

* [Rank](https://stackoverflow.com/a/30732231)


Signature: ALS.train(ratings, rank, iterations=5, lambda_=0.01, blocks=-1, nonnegative=False, seed=None)

Docstring:
Train a matrix factorization model given an RDD of ratings by users
for a subset of products. The ratings matrix is approximated as the
product of two lower-rank matrices of a given rank (number of
features). To solve for these features, ALS is run iteratively with
a configurable level of parallelism.

:param ratings:
  RDD of `Rating` or (userID, productID, rating) tuple.
:param rank:
  Number of features to use (also referred to as the number of latent factors).
:param iterations:
  Number of iterations of ALS.
  (default: 5)
:param lambda_:
  Regularization parameter.
  (default: 0.01)
:param blocks:
  Number of blocks used to parallelize the computation. A value
  of -1 will use an auto-configured number of blocks.
  (default: -1)
:param nonnegative:
  A value of True will solve least-squares with nonnegativity
  constraints.
  (default: False)
:param seed:
  Random seed for initial matrix factorization model. A value
  of None will use system time as the seed.
  (default: None)

.. versionadded:: 0.9.0

In [108]:
rank = 20
numIterations = 20

### Let's create a model

In [109]:
model = ALS.train(ratings=movies_ratings,rank=rank,iterations=numIterations)

In [110]:
userid = 0
numberOfRecommendations=10
recommendations = model.recommendProducts(userid,numberOfRecommendations)
recommendations

[Rating(user=0, product=50, rating=5.023538948659228),
 Rating(user=0, product=172, rating=4.954395508936481),
 Rating(user=0, product=181, rating=4.697438278245935),
 Rating(user=0, product=630, rating=4.6269015067165435),
 Rating(user=0, product=174, rating=4.590881185280563),
 Rating(user=0, product=173, rating=4.510254117272762),
 Rating(user=0, product=195, rating=4.356694170662087),
 Rating(user=0, product=184, rating=4.284870788289972),
 Rating(user=0, product=109, rating=4.2698675552721514),
 Rating(user=0, product=686, rating=4.139831430935725)]

### load movie names 

In [111]:
movie_names= spark.sparkContext.textFile("./datasets/ml-100k/u.item")
movie_names=movie_names.map(lambda x : (int(x.split("|")[0]), x.split("|")[1]))
movie_name_dict=movie_names.collectAsMap()

In [112]:
user_rated_movies = movies_ratings.filter(lambda x : x[0] == userid)

In [113]:
for user_rated_movie in user_rated_movies.collect():
    print("Movie: %s  - score: %.1f" % (movie_name_dict[user_rated_movie[1]] , user_rated_movie[2]))

Movie: Star Wars (1977)  - score: 5.0
Movie: Empire Strikes Back, The (1980)  - score: 5.0
Movie: Gone with the Wind (1939)  - score: 1.0


In [114]:
for recommendation in recommendations:
    print("Movie: %s  - score: %.10f" % (movie_name_dict[recommendation[1]] , recommendation[2]))
    

Movie: Star Wars (1977)  - score: 5.0235389487
Movie: Empire Strikes Back, The (1980)  - score: 4.9543955089
Movie: Return of the Jedi (1983)  - score: 4.6974382782
Movie: Great Race, The (1965)  - score: 4.6269015067
Movie: Raiders of the Lost Ark (1981)  - score: 4.5908811853
Movie: Princess Bride, The (1987)  - score: 4.5102541173
Movie: Terminator, The (1984)  - score: 4.3566941707
Movie: Army of Darkness (1993)  - score: 4.2848707883
Movie: Mystery Science Theater 3000: The Movie (1996)  - score: 4.2698675553
Movie: Perfect World, A (1993)  - score: 4.1398314309


## The Conclusion

The recommendations not so promising this could be due to many reasons

![](./images/ALS.png)
