
Recommender systems are among the most popular applications of data science today. They are used to predict the "rating" or "preference" that a user would give to an item. Almost every major tech company has applied them in some form or the other: Amazon uses it to suggest products to customers, YouTube uses it to decide which video to play next on autoplay, and Facebook uses it to recommend pages to like and people to follow.

Broadly, recommender systems can be classified into 3 types:

    Simple recommenders: offer generalized recommendations to every user, based on movie popularity and/or genre. The basic idea behind this system is that movies that are more popular and critically acclaimed will have a higher probability of being liked by the average audience. IMDB Top 250 is an example of this system.
    Content-based recommenders: suggest similar items based on a particular item. This system uses item metadata, such as genre, director, description, actors, etc. for movies, to make these recommendations. The general idea behind these recommender systems is that if a person liked a particular item, he or she will also like an item that is similar to it.
    Collaborative filtering engines: these systems try to predict the rating or preference that a user would give an item-based on past ratings and preferences of other users. Collaborative filters do not require item metadata like its content-based counterparts.


This is a basic model of collabrative filtering.

In [1]:
import numpy as np # linear algebra
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
ratings=pd.read_csv('../input/ratings.csv')
ratings.head(15)

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205
5,1,1263,2.0,1260759151
6,1,1287,2.0,1260759187
7,1,1293,2.0,1260759148
8,1,1339,3.5,1260759125
9,1,1343,2.0,1260759131


In [2]:
movies=pd.read_csv('../input/movies.csv')
movies.head(10)

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
5,6,Heat (1995),Action|Crime|Thriller
6,7,Sabrina (1995),Comedy|Romance
7,8,Tom and Huck (1995),Adventure|Children
8,9,Sudden Death (1995),Action
9,10,GoldenEye (1995),Action|Adventure|Thriller


In [3]:
movie_ratings = pd.merge(movies, ratings)
movie_ratings.head(10)

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,7,3.0,851866703
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,9,4.0,938629179
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,13,5.0,1331380058
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,15,2.0,997938310
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,19,3.0,855190091
5,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,20,3.5,1238729767
6,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,23,3.0,1148729853
7,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,26,5.0,1360087980
8,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,30,4.0,944943070
9,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,37,4.0,981308121


In [4]:
ratings_matrix = ratings.pivot_table(index=['movieId'],columns=['userId'],values='rating').reset_index(drop=True)
ratings_matrix.fillna( 0, inplace = True )
ratings_matrix.head(15)

userId,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,...,632,633,634,635,636,637,638,639,640,641,642,643,644,645,646,647,648,649,650,651,652,653,654,655,656,657,658,659,660,661,662,663,664,665,666,667,668,669,670,671
0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,4.0,0.0,0.0,0.0,5.0,0.0,2.0,0.0,0.0,0.0,3.0,3.5,0.0,0.0,3.0,0.0,0.0,5.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,5.0,4.0,0.0,4.0,0.0,0.0,0.0,4.0,5.0,0.0,0.0,0.0,0.0,0.0,2.5,0.0,0.0,4.0,3.5,0.0,0.0,0.0,0.0,0.0,4.0,5.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,3.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.5,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,4.5,4.0,3.0,0.0,0.0,0.0,3.5,5.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,3.0,0.0,3.0,0.0,0.0,4.0,0.0,...,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,4.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,4.0,0.0,5.0,4.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,4.0,0.0,4.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,3.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,4.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0


In [5]:
movie_similarity=cosine_similarity(ratings_matrix)
np.fill_diagonal( movie_similarity, 0 ) 
movie_similarity

array([[ 0.        ,  0.39451145,  0.30651588, ...,  0.        ,
         0.        ,  0.05582876],
       [ 0.39451145,  0.        ,  0.21749153, ...,  0.        ,
         0.        ,  0.        ],
       [ 0.30651588,  0.21749153,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       ..., 
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         1.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  1.        ,
         0.        ,  0.        ],
       [ 0.05582876,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ]])

In [6]:
ratings_matrix = pd.DataFrame( movie_similarity )
ratings_matrix.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,...,9026,9027,9028,9029,9030,9031,9032,9033,9034,9035,9036,9037,9038,9039,9040,9041,9042,9043,9044,9045,9046,9047,9048,9049,9050,9051,9052,9053,9054,9055,9056,9057,9058,9059,9060,9061,9062,9063,9064,9065
0,0.0,0.394511,0.306516,0.133614,0.245102,0.377086,0.278629,0.063031,0.117499,0.310689,0.328194,0.189994,0.146473,0.190222,0.170197,0.323016,0.349109,0.185447,0.304818,0.069941,0.33046,0.178615,0.126226,0.136525,0.343095,0.124661,0.083653,0.169978,0.258917,0.137337,0.109389,0.489275,0.484232,0.113921,0.358201,0.045398,0.095839,0.378537,0.117677,0.157052,...,0.031902,0.0,0.055829,0.061899,0.069025,0.079755,0.031902,0.10221,0.079755,0.031902,0.08103,0.0,0.031902,0.0,0.010088,0.079755,0.031902,0.031902,0.076528,0.031902,0.031902,0.031902,0.05961,0.079755,0.085111,0.031902,0.031902,0.063804,0.055829,0.055829,0.055829,0.031902,0.079755,0.079755,0.079755,0.079755,0.079755,0.0,0.0,0.055829
1,0.394511,0.0,0.217492,0.164651,0.278476,0.222003,0.207299,0.223524,0.113669,0.418124,0.293312,0.079558,0.219004,0.055918,0.165269,0.237198,0.223396,0.092847,0.42075,0.189806,0.287743,0.25567,0.174576,0.177014,0.203398,0.044192,0.142698,0.106068,0.171252,0.075484,0.218125,0.321346,0.366372,0.102839,0.225403,0.104427,0.07832,0.351724,0.0,0.070863,...,0.055038,0.0,0.0,0.080092,0.036243,0.068797,0.055038,0.12946,0.082557,0.055038,0.088018,0.110076,0.055038,0.137594,0.017404,0.068797,0.055038,0.055038,0.086723,0.055038,0.055038,0.055038,0.102839,0.082557,0.146835,0.055038,0.055038,0.0,0.0,0.0,0.0,0.055038,0.068797,0.082557,0.082557,0.137594,0.068797,0.0,0.0,0.0
2,0.306516,0.217492,0.0,0.177012,0.370732,0.247499,0.435648,0.127574,0.306717,0.191255,0.220983,0.182194,0.111808,0.244043,0.155901,0.192242,0.226681,0.178096,0.208704,0.140288,0.16674,0.204972,0.17921,0.018933,0.254918,0.10369,0.123276,0.035136,0.038067,0.097413,0.14831,0.270095,0.139225,0.057912,0.215727,0.0,0.122513,0.186139,0.0,0.14671,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.116226,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.116226,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.116226,0.116226,0.0,0.0,0.0,0.0,0.0
3,0.133614,0.164651,0.177012,0.0,0.179556,0.072518,0.184626,0.501513,0.25463,0.111447,0.152753,0.0,0.133192,0.082895,0.026531,0.098929,0.095295,0.013764,0.048416,0.225098,0.180602,0.080957,0.021566,0.120697,0.093818,0.0,0.0,0.0,0.118419,0.0,0.015329,0.091049,0.08866,0.171649,0.104829,0.0,0.0,0.191105,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.245102,0.278476,0.370732,0.179556,0.0,0.272645,0.388476,0.194113,0.367941,0.246846,0.293352,0.187843,0.164968,0.172188,0.081467,0.169367,0.252014,0.108283,0.251421,0.141337,0.171056,0.197865,0.171398,0.122501,0.248038,0.15672,0.115785,0.057359,0.0424,0.080847,0.145366,0.226915,0.224845,0.048954,0.283303,0.074564,0.124274,0.260779,0.0,0.112111,...,0.176845,0.117897,0.0,0.042891,0.116453,0.0,0.176845,0.057376,0.117897,0.176845,0.116453,0.0,0.176845,0.0,0.055923,0.0,0.176845,0.176845,0.133089,0.176845,0.176845,0.176845,0.0,0.117897,0.042891,0.176845,0.176845,0.0,0.0,0.0,0.0,0.176845,0.0,0.117897,0.117897,0.0,0.0,0.0,0.0,0.0
5,0.377086,0.222003,0.247499,0.072518,0.272645,0.0,0.278855,0.097561,0.248155,0.307948,0.2892,0.151322,0.034547,0.263074,0.106127,0.430175,0.251111,0.203392,0.278596,0.164817,0.372513,0.354552,0.265531,0.156205,0.504869,0.06608,0.162942,0.155339,0.220952,0.157079,0.242956,0.491099,0.316684,0.082014,0.407919,0.0,0.0,0.265138,0.131548,0.156449,...,0.098758,0.0,0.061724,0.023952,0.065033,0.111103,0.098758,0.06208,0.0,0.098758,0.148646,0.0,0.098758,0.086413,0.03123,0.111103,0.098758,0.098758,0.147485,0.098758,0.098758,0.098758,0.0,0.0,0.023952,0.098758,0.098758,0.0,0.061724,0.061724,0.061724,0.098758,0.111103,0.0,0.0,0.0,0.111103,0.0,0.0,0.061724
6,0.278629,0.207299,0.435648,0.184626,0.388476,0.278855,0.0,0.196091,0.349827,0.177425,0.372882,0.302999,0.20484,0.244025,0.116185,0.124265,0.357849,0.166313,0.156439,0.08019,0.217937,0.180956,0.10756,0.068167,0.320816,0.138129,0.145037,0.156392,0.085283,0.141412,0.190202,0.25132,0.220839,0.069234,0.221785,0.075324,0.0,0.279948,0.03661,0.11626,...,0.0,0.0,0.079399,0.0,0.0,0.0,0.0,0.038641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.079399,0.079399,0.079399,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.079399
7,0.063031,0.223524,0.127574,0.501513,0.194113,0.097561,0.196091,0.0,0.264477,0.042169,0.130968,0.0,0.14399,0.089216,0.0,0.130198,0.044039,0.049378,0.0,0.270385,0.183758,0.0,0.116055,0.14498,0.137435,0.0,0.0,0.024887,0.0,0.0,0.0,0.015637,0.098464,0.0,0.093722,0.0,0.0,0.043607,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.117499,0.113669,0.306717,0.25463,0.367941,0.248155,0.349827,0.264477,0.0,0.130475,0.216373,0.238833,0.228782,0.293766,0.075554,0.127025,0.169407,0.050325,0.18894,0.135666,0.149826,0.156898,0.195618,0.160036,0.182951,0.110552,0.0,0.0,0.079581,0.062945,0.132579,0.142207,0.144598,0.051454,0.191459,0.0,0.0,0.126493,0.0,0.083424,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.310689,0.418124,0.191255,0.111447,0.246846,0.307948,0.177425,0.042169,0.130475,0.0,0.282299,0.154012,0.053756,0.070908,0.167087,0.292487,0.213907,0.110895,0.39515,0.193054,0.364273,0.294831,0.222559,0.09472,0.247546,0.033588,0.117716,0.093999,0.130269,0.133674,0.258345,0.390403,0.356669,0.057427,0.188849,0.0,0.097189,0.380287,0.094473,0.054313,...,0.076835,0.0,0.0,0.018635,0.127695,0.076835,0.076835,0.024928,0.102446,0.076835,0.10842,0.0,0.076835,0.0,0.024297,0.076835,0.076835,0.076835,0.10842,0.076835,0.076835,0.076835,0.0,0.102446,0.018635,0.076835,0.076835,0.102446,0.0,0.0,0.0,0.076835,0.076835,0.102446,0.102446,0.0,0.076835,0.0,0.0,0.0


In [7]:
try:
    #user_inp=input('Enter the reference movie title based on which recommendations are to be made: ')
    user_inp="Jumanji (1995)"
    inp=movies[movies['title']==user_inp].index.tolist()
    inp=inp[0]
    
    movies['similarity'] = ratings_matrix.iloc[inp]
    movies.head(5)
    
except:
    print("Sorry, the movie is not in the database!")
    
print("Recommended movies based on your choice of ",user_inp ,": \n", movies.sort_values( ["similarity"], ascending = False )[1:10])

Recommended movies based on your choice of  Jumanji (1995) : 
      movieId    ...     similarity
328      364    ...       0.530357
283      317    ...       0.505831
331      367    ...       0.494605
527      595    ...       0.494124
521      588    ...       0.493995
309      344    ...       0.475673
520      587    ...       0.474787
427      480    ...       0.471624
341      377    ...       0.470265

[9 rows x 4 columns]


ValueError: Impossible to parse line. Check the line_format and sep parameters.