<h1>ALS Recommendation System</h1>

Here we are working on <b>movieLens-100K</b> dataset.

In [23]:
import pandas as pd

<h2>Reading Users' File</h2>

In [24]:
u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users =  pd.read_csv('ml-100k/u.user', sep='|', names=u_cols,
 encoding='latin-1')

<h2>Reading Ratings File</h2>

In [25]:
r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings = pd.read_csv('ml-100k/u.data', sep='\t', names=r_cols,
 encoding='latin-1')

<h2>Reading Items File</h2>

In [41]:
i_cols = ['movie_id', 'movie_title' ,'release date','video release date', 'IMDb URL', 'unknown', 'Action', 'Adventure',
 'Animation', 'Children\'s', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy',
 'Film-Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']
items = pd.read_csv('ml-100k/u.item', sep='|', names=i_cols, encoding='latin-1')

Now let's see how our dataset looks

<h2>Users</h2>

In [42]:
print users.shape
users.head()

(943, 5)


Unnamed: 0,user_id,age,sex,occupation,zip_code
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213


This shows that we have 943 users, with 5 features each.

<h2>Ratings</h2>

In [43]:
print ratings.shape
ratings.head()

(100000, 4)


Unnamed: 0,user_id,movie_id,rating,unix_timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


This shows that we have the information of 100K ratings.

<h2>Items</h2>

In [44]:
print items.shape
items.head()

(1682, 24)


Unnamed: 0,movie_id,movie_title,release date,video release date,IMDb URL,unknown,Action,Adventure,Animation,Children's,...,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,0,0,0,0,0,0,0,0,0,0
1,2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,...,0,0,0,0,0,0,0,1,0,0
2,3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


This shows that we have details of <b>1682</b> different movies.

<p> First we <b>Merge</b> the three dataframes into one single dataframe.

In [45]:
dataset = pd.merge(pd.merge(items, ratings),users)
dataset.head()

Unnamed: 0,movie_id,movie_title,release date,video release date,IMDb URL,unknown,Action,Adventure,Animation,Children's,...,Thriller,War,Western,user_id,rating,unix_timestamp,age,sex,occupation,zip_code
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,0,0,0,308,4,887736532,60,M,retired,95076
1,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,0,0,0,308,5,887737890,60,M,retired,95076
2,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,1,0,0,308,4,887739608,60,M,retired,95076
3,7,Twelve Monkeys (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Twelve%20Monk...,0,0,0,0,0,...,0,0,0,308,4,887738847,60,M,retired,95076
4,8,Babe (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Babe%20(1995),0,0,0,0,1,...,0,0,0,308,5,887736696,60,M,retired,95076


<p>Now our dataset looks good</p>

In [46]:
import sys
import numpy as np
import scipy.sparse as sparse
from scipy.sparse.linalg import spsolve
import random

from sklearn.preprocessing import MinMaxScaler

import implicit

<h2>Creating Sparse Matrix</h2>

As we'll be using implicit library, so as The implicit library expects data as a item-user matrix so we create two matricies, one for fitting the model <b>(item-user)</b> and one for recommendations <b>(user-item)</b>.

In [47]:
sparse_item_user = sparse.csr_matrix((dataset['rating'].astype(float),(dataset['movie_id'], dataset['user_id'])))

In [48]:
sparse_user_item = sparse.csr_matrix((dataset['rating'].astype(float),(dataset['user_id'], dataset['movie_id'])))

<h2>Initialising The ALS model</h2>

In [49]:
model = implicit.als.AlternatingLeastSquares(factors=20,regularization=0.1,iterations=20)

<p>Calculating the <b>Confidence</b> by multiplying it by our <b>Alpha</b> value.

In [50]:
alpha_val = 15
data_conf = (sparse_item_user * alpha_val).astype('double')

<h2>Fit the model</h2>

In [51]:
model.fit(data_conf)

100%|██████████| 20.0/20 [00:04<00:00,  4.57it/s]


<h1>1.Find Similar Items</h1>

<p>Finding the 5 most similar movies to <b>Braveheart</b>(movie_id = <b>22</b>)

In [52]:
item_id = 22
n_similar = 5
similar = model.similar_items(item_id,n_similar)

Print the names of our most similar items.

In [53]:
for item in similar:
    idx,score = item
    print dataset.movie_title.loc[dataset.movie_id == idx].iloc[0]

Braveheart (1995)
Fugitive, The (1993)
Empire Strikes Back, The (1980)
Terminator 2: Judgment Day (1991)
Raiders of the Lost Ark (1981)


As we can see, all these are pretty similar movies!

<h1>2.Create User Recommendations</h1>

Let's randomly create recommendations for user with <b>user_id = 936</b>

In [54]:
user_id = 936
recommended = model.recommend(user_id,sparse_user_item)

In [57]:
movies = []
scores = []

<p>Get movie names from ids</p>

In [58]:
for item in recommended:
    idx, score = item
    movies.append(dataset.movie_title.loc[dataset.movie_id == idx].iloc[0])
    scores.append(score)

Create dataframe of movie names and scores

In [59]:
recommendations = pd.DataFrame({'movies': movies, 'scores':scores})

In [60]:
print recommendations

                             movies    scores
0              Crucible, The (1996)  1.236327
1                     Kundun (1997)  1.190871
2          Evening Star, The (1996)  1.189424
3  James and the Giant Peach (1996)  1.178872
4           Fierce Creatures (1997)  1.123278
5     Muppet Treasure Island (1996)  1.121075
6               Multiplicity (1996)  1.101344
7       Nutty Professor, The (1996)  1.096204
8        Looking for Richard (1996)  1.094989
9             Close Shave, A (1995)  1.072428


All these movies are the recommendations for the user with <b>user_id = 936</b>