# **Movie Recommendation system**



Recommendation algorithms are at the core of many service providers strategies. They provide consumers
with personalized suggestions to reduce time spent looking for an item or a service. This inturn reduce 
the frustration of finding great content to watch on **Netflix**, awesome books on **Amazon**, terrific  videos
on **Youtube** and incredible other recommendations from other service providers.
The importance of a recommender system cannot be stressed enough. The financial benefits are enomous which is why
big corporate companies employ them. 

In this notebook, we build a movie recommendation system. A typical scenario of where you will need our system
is that of a **Friday** night, while having a nice cup of wine, some snacks sitting comfortably in your lounge,
or bed, all alone, or with some company. An almost perfect night, except that you don't have thee perfect movie. 
Now you wish you had went out, but you can't since there is still **Covid-19** and you are **locked-down**. You turn to
our movie recommender and you recall you enjoyed a movie called **The heart of Christmas** and then booom: 


![title](../images/reco_movies.jpg)

Now you have a headache of which great movie to choose, a much better and more welcomed problem in your life. **Enjoy your Movie**

# Load Necessary packages

In [1]:
#Data wranglers
import pandas as pd
import numpy as np

#Visualizations
from matplotlib import pyplot as plt 


# Load datasets

In [2]:
train = pd.read_csv('../data/train.csv')
#test = pd.read_csv('../data/test.csv')

In [3]:
train

Unnamed: 0,userId,movieId,rating,timestamp
0,5163,57669,4.0,1518349992
1,106343,5,4.5,1206238739
2,146790,5459,5.0,1076215539
3,106362,32296,2.0,1423042565
4,9041,366,3.0,833375837
...,...,...,...,...
10000033,136395,99114,5.0,1521235092
10000034,140078,553,3.0,1002580977
10000035,154807,56782,4.0,1227674807
10000036,85805,327,4.0,1479921530


# Colaborative filtering

Let us begin by taking a small piece of the data set since the original set is too big. This allows for making a working model before worrying about incoporating 
the entire data set. The null values make sense since the user can't possibly have watched all the movies and rated all the movies. we will replace the null values with 0 and drop all movies that have less than 10 ratings.

In [4]:
smaller_data = train.iloc[0:60000,:]
user_rating = smaller_data.pivot(index = 'movieId',columns = 'userId',values='rating')
user_rating.head()

userId,2,3,4,12,31,38,41,43,50,51,...,162495,162501,162505,162507,162508,162516,162517,162522,162534,162541
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,,,,,,,,,...,,,,,,,,,,


In [5]:
user_rating = user_rating.fillna(0)
user_rating.head()

userId,2,3,4,12,31,38,41,43,50,51,...,162495,162501,162505,162507,162508,162516,162517,162522,162534,162541
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [6]:
from scipy.sparse import csr_matrix

feature_df = csr_matrix(user_rating)

In [7]:
from sklearn.neighbors import NearestNeighbors

knn = NearestNeighbors(metric = 'cosine', algorithm = 'brute')

knn.fit(feature_df)

NearestNeighbors(algorithm='brute', metric='cosine')

In [11]:
movie_index = np.random.choice(feature_df.shape[0])

In [12]:
distances, indices = knn.kneighbors(user_rating.iloc[movie_index,:].values.reshape(1, -1), n_neighbors = 6)

In [13]:
for i in range(0, len(distances.flatten())):
    if i == 0:
        print('Recommendations for {0}:\n'.format(user_rating.index[movie_index]))
    else:
        print('{0}: {1}, with distance of {2}:'.format(i, user_rating.index[indices.flatten()[i]], distances.flatten()[i]))

Recommendations for 47978:

1: 44929, with distance of 0.24074339763470343:
2: 40494, with distance of 0.24074339763470343:
3: 71442, with distance of 0.3492086265440315:
4: 116213, with distance of 0.3492086265440315:
5: 4989, with distance of 0.3682603205240038:


In [42]:
#build similarity matrix
movie_similarity = user_rating.corr(method='pearson')
#movie_similarity = pd.DataFrame(movie_similarity,index = user_rating.columns,columns = user_rating.columns)

In [43]:
movie_similarity

movieId,1,2,3,4,5,6,7,9,10,11,...,195399,195921,197203,197691,199223,199237,199470,201024,202575,204542
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.000000,-0.001144,-0.000890,-0.000484,-0.000953,-0.001588,-0.000484,-0.000819,-0.001578,-0.001249,...,-0.000484,-0.000484,-0.000484,-0.000484,-0.000484,-0.000484,-0.000484,-0.000484,-0.000484,-0.000484
2,-0.001144,1.000000,-0.000482,-0.000262,-0.000516,-0.000860,-0.000262,-0.000443,-0.000854,-0.000676,...,-0.000262,-0.000262,-0.000262,-0.000262,-0.000262,-0.000262,-0.000262,-0.000262,-0.000262,-0.000262
3,-0.000890,-0.000482,1.000000,-0.000204,-0.000402,-0.000669,-0.000204,-0.000345,-0.000664,-0.000526,...,-0.000204,-0.000204,-0.000204,-0.000204,-0.000204,-0.000204,-0.000204,-0.000204,-0.000204,-0.000204
4,-0.000484,-0.000262,-0.000204,1.000000,-0.000218,-0.000364,-0.000111,-0.000188,-0.000361,-0.000286,...,-0.000111,-0.000111,-0.000111,-0.000111,-0.000111,-0.000111,-0.000111,-0.000111,-0.000111,-0.000111
5,-0.000953,-0.000516,-0.000402,-0.000218,1.000000,-0.000716,-0.000218,-0.000369,-0.000712,-0.000563,...,-0.000218,-0.000218,-0.000218,-0.000218,-0.000218,-0.000218,-0.000218,-0.000218,-0.000218,-0.000218
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
199237,-0.000484,-0.000262,-0.000204,-0.000111,-0.000218,-0.000364,-0.000111,-0.000188,-0.000361,-0.000286,...,-0.000111,-0.000111,-0.000111,-0.000111,-0.000111,1.000000,-0.000111,-0.000111,-0.000111,-0.000111
199470,-0.000484,-0.000262,-0.000204,-0.000111,-0.000218,-0.000364,-0.000111,-0.000188,-0.000361,-0.000286,...,-0.000111,-0.000111,-0.000111,-0.000111,-0.000111,-0.000111,1.000000,-0.000111,-0.000111,-0.000111
201024,-0.000484,-0.000262,-0.000204,-0.000111,-0.000218,-0.000364,-0.000111,-0.000188,-0.000361,-0.000286,...,-0.000111,-0.000111,-0.000111,-0.000111,-0.000111,-0.000111,-0.000111,1.000000,-0.000111,-0.000111
202575,-0.000484,-0.000262,-0.000204,-0.000111,-0.000218,-0.000364,-0.000111,-0.000188,-0.000361,-0.000286,...,-0.000111,-0.000111,-0.000111,-0.000111,-0.000111,-0.000111,-0.000111,-0.000111,1.000000,-0.000111


In [36]:
movie_similarity[1]

movieId
1         1.000000
6        -0.001588
10       -0.001578
21       -0.001826
32       -0.001881
            ...   
70286    -0.001532
79132    -0.001798
89745    -0.001594
99114    -0.001502
109487   -0.001815
Name: 1, Length: 169, dtype: float64

In [46]:
#perform recommendations
%%writefile ../mod
def similar_movie_recommender(movieId, rating):
    scores = movie_similarity[movieId]*rating
    scores = scores.sort_values(ascending=False)
    
    return scores
    

In [47]:
print(similar_movie_recommender(1,5))

movieId
1       5.000000
5345    1.138915
419     0.996248
898     0.578638
5784    0.567223
          ...   
318    -0.012963
593    -0.013264
110    -0.013344
50     -0.013923
7153   -0.014131
Name: 1, Length: 3676, dtype: float64


In [15]:
user_rating.columns

Int64Index([     1,      2,      3,      5,      6,      7,      9,     10,
                11,     14,
            ...
            164179, 166461, 166528, 166635, 168248, 168252, 171763, 176371,
            179819, 187593],
           dtype='int64', name='movieId', length=1505)