**MOVIE RECOMMENDATION ENGINE**

> Recommending movies, using the KNN algorithom, on the dataset from Kaggle 



In [1]:
import pandas as pd
import numpy as np
from math import sqrt
from sklearn.metrics import mean_squared_error
from sklearn.metrics import pairwise_distances
from sklearn.neighbors import NearestNeighbors
from sklearn.feature_extraction.text import CountVectorizer 
from google.colab import files
import csv

Now let's upload the files. 


>**links.csv** contains the **iMDB ID** of the movies

>**movies.csv** contains **movie_name** and **genre** of the movies

>**ratings.csv** contains the **ratings** given by users for the movies

>**tags.csv** contains the **overall review** for the movies by the user




In [2]:
from google.colab import drive
drive.mount("/content/drive/")

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive/


For the moment we will work with **movies.csv** and **ratings.csv** files

>As **links.csv** are only the iMDB ID's, we **don't actually need** those for recommendation

>And **tags.csv** contains user overall review. that might be **too biased**, so for the moment we are **not using that**

Now **construct the dataframes**, from the available data/information

In [3]:
df_links = pd.read_csv("drive/My Drive/Colab Notebooks/links.csv")
df_movies = pd.read_csv('drive/My Drive/Colab Notebooks/movies.csv')
df_ratings = pd.read_csv('drive/My Drive/Colab Notebooks/ratings.csv')
df_tags = pd.read_csv('drive/My Drive/Colab Notebooks/tags.csv')
print("Done")

Done


In [4]:
df_links.head(5)
df_links.shape

(9742, 3)

In [5]:
df_movies.head(5)
df_movies.shape

(9742, 3)

In [6]:
df_ratings.head(5)
df_ratings.shape

(100836, 4)

In [7]:
df_tags.head(5)
df_tags.shape

(3683, 4)

Now let's **combine the DataFrame** form the important dataframes **movies** and **ratings**. 

We will use the **combined DataFrame** for the **rest of the process** of building the recommendation

After combining we need to **remove the unimportant columns** (i.e. timestamps or genres (for now)).

In [8]:
df_combined = pd.merge(df_ratings, df_movies, on = 'movieId')
df_combined.head(5)

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
0,1,1,4.0,964982703,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,5,1,4.0,847434962,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,7,1,4.5,1106635946,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
3,15,1,2.5,1510577970,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
4,17,1,4.5,1305696483,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy


In [9]:
new_df = df_combined.drop(['timestamp', 'genres'], axis = 1)
new_df.head(5)

Unnamed: 0,userId,movieId,rating,title
0,1,1,4.0,Toy Story (1995)
1,5,1,4.0,Toy Story (1995)
2,7,1,4.5,Toy Story (1995)
3,15,1,2.5,Toy Story (1995)
4,17,1,4.5,Toy Story (1995)


In [10]:
new_df.shape

(100836, 4)

# **Explaination of the approach**

*   First we will make a **pivot table**, with **users being the row** and **movies being the column** and **ratings being the values**, to have better understanding of **which user, have given which movies, what rating** !! We also will have a clear view of **which user has seen which movies**, and what are the movies yet to be watched by each user. 
*   Then we will prapare some lists based on that pivot table. i.e. **list of movies seen by each user, index of those movies** and also **list of the movies yet to be seen by each user, the index of those movies**. Those will help us in further proceedings.
*   As rating in any movie, completely depends on the user, to avoid the bias, we will **not use the ratings as pivot table values**. Rather we should use binary values, where **1 represents the user has seen the movie**, and **0 represents the user hasn't seen the movie**.
*   Now each **column of the pivot table denotes a Vector** (depending on the users, if seen or not), based on that we will find the **cosine similarity of the movies**, with each other (here we will use python (SkLearn) built in cosine_sim and  Nearest_Neighbour functions). In that way we can easily find the **K-most similar movies (with the similarity values)**, for each movie (here we assume K = 10), representing K-Nearest-Neighbours.
*   Now comes the recommendation time. **For each user, the movies he/she has seen, for all of those movies we will find the similar movies. We will list them out, then remove all the movies from that list which has already been watched. Then we will sort them based on the similarity values. And finally recommend the movies which have the maximum similarity values.**


In [11]:
see_whats_happening = new_df.pivot_table(values = 'rating', index = 'userId', columns = 'title')
see_whats_happening.tail(5)

title,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...All the Marbles (1981),...And Justice for All (1979),00 Schneider - Jagd auf Nihil Baxter (1994),1-900 (06) (1994),10 (1979),10 Cent Pistol (2015),10 Cloverfield Lane (2016),10 Items or Less (2006),10 Things I Hate About You (1999),10 Years (2011),"10,000 BC (2008)",100 Girls (2000),100 Streets (2016),101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),101 Dalmatians II: Patch's London Adventure (2003),101 Reykjavik (101 Reykjavík) (2000),102 Dalmatians (2000),10th & Wolf (2006),"10th Kingdom, The (2000)","10th Victim, The (La decima vittima) (1965)","11'09""01 - September 11 (2002)",11:14 (2003),"11th Hour, The (2007)",12 Angry Men (1957),12 Angry Men (1997),12 Chairs (1971),12 Chairs (1976),12 Rounds (2009),12 Years a Slave (2013),...,Zathura (2005),Zatoichi and the Chest of Gold (Zatôichi senryô-kubi) (Zatôichi 6) (1964),Zazie dans le métro (1960),Zebraman (2004),"Zed & Two Noughts, A (1985)",Zeitgeist: Addendum (2008),Zeitgeist: Moving Forward (2011),Zeitgeist: The Movie (2007),Zelary (2003),Zelig (1983),Zero Dark Thirty (2012),Zero Effect (1998),"Zero Theorem, The (2013)",Zero de conduite (Zero for Conduct) (Zéro de conduite: Jeunes diables au collège) (1933),Zeus and Roxanne (1997),Zipper (2015),Zodiac (2007),Zombeavers (2014),Zombie (a.k.a. Zombie 2: The Dead Are Among Us) (Zombi 2) (1979),Zombie Strippers! (2008),Zombieland (2009),Zone 39 (1997),"Zone, The (La Zona) (2007)",Zookeeper (2011),Zoolander (2001),Zoolander 2 (2016),Zoom (2006),Zoom (2015),Zootopia (2016),Zulu (1964),Zulu (2013),[REC] (2007),[REC]² (2009),[REC]³ 3 Génesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
606,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,,,,,,,,,...,,,,,,,,,,4.0,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,
607,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
608,,,,,,,,,,,,,,,,,,3.5,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,4.5,3.5,,,
609,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
610,4.0,,,,,,,,3.5,,,,,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,...,,,,4.0,,,,,,,,,3.5,,,,5.0,3.5,,,3.5,,,,4.0,,,,4.0,,,4.0,3.5,3.0,,,2.0,1.5,,


In [12]:
see_whats_happening.shape

(610, 9719)

**So from the Pivot Table we can see :**
*   We have 610 different users (who has seen at least one movie) and
*   We have 9719 different movies (that has been watched at least by one user)

In [13]:
rows = [x for x in range (0, len(see_whats_happening.columns))]
movies_by_user = {}
movies_not_rated_by_user = {}
movie_indexes = {}
movie_not_rated_indexes = {}
for i, row in see_whats_happening.iterrows():
  if_reviewed = list(zip(row.index, row.values, rows))
  #rated list
  if_rated = [(x,z) for x,y,z in if_reviewed if str(y) != 'nan']
  rated_movies_index = [x[1] for x in if_rated]
  rated_movies = [x[0] for x in if_rated]
  movies_by_user[i] = rated_movies
  movie_indexes[i] = rated_movies_index
  #unrated list
  if_not_rated = [(x,z) for x,y,z in if_reviewed if str(y) == 'nan']
  unrated_movies_index = [x[1] for x in if_not_rated]
  unrated_movies = [x[0] for x in if_not_rated]
  movies_not_rated_by_user[i] = unrated_movies
  movie_not_rated_indexes[i] = unrated_movies_index
print('Done')

Done


In [14]:
print(movies_by_user[20])
print(movie_indexes[20])

# print(movies_not_rated_by_user[20])
# print(movie_not_rated_indexes[20])

['101 Dalmatians (1996)', '101 Dalmatians (One Hundred and One Dalmatians) (1961)', '6th Day, The (2000)', 'A.I. Artificial Intelligence (2001)', 'Adanggaman (2000)', 'Aladdin (1992)', 'Alice in Wonderland (1951)', 'Almost Famous (2000)', 'American Pie 2 (2001)', 'American Psycho (2000)', 'American Tail, An (1986)', 'Anastasia (1997)', 'Angels in the Outfield (1994)', 'Annie (1982)', 'Antz (1998)', 'Atlantis: The Lost Empire (2001)', 'Austin Powers in Goldmember (2002)', 'Babe (1995)', 'Balto (1995)', 'Bambi (1942)', 'Beautiful Mind, A (2001)', 'Beauty and the Beast (1991)', 'Bedknobs and Broomsticks (1971)', 'Billy Elliot (2000)', 'Birds, The (1963)', 'Black Cauldron, The (1985)', 'Black Stallion, The (1979)', 'Blade (1998)', 'Blade II (2002)', 'Borrowers, The (1997)', 'Bourne Identity, The (2002)', 'Bowfinger (1999)', 'Brave Little Toaster, The (1987)', "Bridget Jones's Diary (2001)", 'Bring It On (2000)', "Bug's Life, A (1998)", 'Casper (1995)', 'Cast Away (2000)', 'Catch Me If You 

In [15]:
#filling the NaN with 0's
pivot_table = see_whats_happening.fillna(0)
#making the table a binary one(only 0/1)
pivot_table = pivot_table.apply(np.sign)
#reversing the table, making rows into columns and columns into rows
pivot_table.T.tail(5)
# pivot_table.head(5)

userId,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,...,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
eXistenZ (1999),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
xXx (2002),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0
xXx: State of the Union (2005),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
¡Three Amigos! (1986),1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
À nous la liberté (Freedom for Us) (1931),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [16]:
#for us let k be 20, for K-Nearest Neighbours
k = 20
#we will use cosine similarity
cosine_sim = NearestNeighbors(n_neighbors = k, algorithm = 'brute', metric = 'cosine') 
#now let's fit our data
item_sim = cosine_sim.fit(pivot_table.T.values)
#now the neighbours
item_distances, item_indices = item_sim.kneighbors(pivot_table.T.values)
print('Done')

Done


In [17]:
print(item_indices[20])
print(item_distances[20])

[  20 2213 3316 4328 8686 5146 3389  252  889 3551 9134 7526 8896 9063
 4982 6189   21 6096 5622 1949]
[0.         0.54165075 0.57125354 0.59003997 0.60394098 0.61412818
 0.61651751 0.62952071 0.6333206  0.6333206  0.63436379 0.63436379
 0.63619656 0.63619656 0.63619656 0.63619656 0.63619656 0.64599478
 0.64705882 0.64992998]


In [18]:
#now lets list out(dictionary) the nearest neighbours(movies)
sim_movies = {}
for i in range(len(pivot_table.T.index)):
  movie_names = pivot_table.T.index[item_indices[i]].tolist()
  sim_movies[pivot_table.T.index[i]] = movie_names
print('Done')

Done


In [19]:
recommendations = {}
for user,seen_movies in movie_indexes.items():
  neighbour_ind = [j for i in item_indices[seen_movies] for j in i]
  neighbour_dis = [j for i in item_distances[seen_movies] for j in i]
  #now combine the index and the distances
  neighbour_info = list(zip(neighbour_ind, neighbour_dis))
  #now create another dictionary to seperate the movies which the user have not seen
  not_seen_neighbour = {i:d for i,d in neighbour_info if i not in seen_movies}
  #now create a list back from dictionary
  neighbours_not_seen_info = list(zip(not_seen_neighbour.keys(), not_seen_neighbour.values())) 
  #now sort based on distances(nearest)
  nearest_neighbours = sorted(neighbours_not_seen_info, key = lambda x:x[1])
  #converting the indexes into movie names
  nearest_unseen_movies = [pivot_table.columns[i] for i, distance in nearest_neighbours]
  nearest_unseen_movies = nearest_unseen_movies[0 : 25]
  #now add that to the final recommendations dictionary
  recommendations[user] = nearest_unseen_movies
  if user == 20:
    print(recommendations[user])
print('done')

['Pirate Movie, The (1982)', 'Real Women Have Curves (2002)', 'Northfork (2003)', 'Notorious Bettie Page, The (2005)', 'Nicholas Nickleby (2002)', 'Nowhere in Africa (Nirgendwo in Afrika) (2001)', 'Valley Girl (1983)', 'Halloween II (2009)', 'Tremors 3: Back to Perfection (2001)', 'Dog Soldiers (2002)', 'Wallace & Gromit: A Close Shave (1995)', 'Batman: Year One (2011)', 'Texas Chainsaw Massacre: The Next Generation (a.k.a. The Return of the Texas Chainsaw Massacre) (1994)', "Pyromaniac's Love Story, A (1995)", 'Candleshoe (1977)', "Class of Nuke 'Em High (1986)", 'Ugly, The (1997)', 'Twelve Chairs, The (1970)', 'Chairman of the Board (1998)', 'Pest, The (1997)', 'Turbulence (1997)', 'Dark Half, The (1993)', 'Oxygen (1999)', 'Deadtime Stories (1987)', 'That Darn Cat (1997)']
done


Now what we will do is that, we will **save the recommendations** per user in a **.CSV file**.

So that during **prediction**, we will have the **preprocessed output** already in our hand.

In [20]:
output_file = open("drive/My Drive/Colab Notebooks/recommend_movies.csv", "w", newline = "")
write_csv = csv.writer(output_file)
tuple = ("user_ID", "movie 1", "movie 2", "movie 3", "movie 4", "movie 5", "movie 6", "movie 7", "movie 8", "movie 9", "movie 10", "movie 11", "movie 12", "movie 13", "movie 14", "movie 15", "movie 16", "movie 17", "movie 18", "movie 19", "movie 20", "movie 21", "movie 22", "movie 23", "movie 24", "movie 25")
write_csv.writerow(tuple)
for user, movies in recommendations.items():
 movies.insert(0, user)
 tuple = (i for i in movies)
 write_csv.writerow(tuple)
output_file.close()

* What we did previously is that, for each movie we tried to get the similar movies.

* Then based on that, and what one user has already seen, we recommend movies to them.

* Now we want to try another thing is that, for each user we will try to find the similar users, based on what movies they have seen.

* Now for each movie we are trying to get the users whom we can recommend this to.

In [21]:
rows = [x for x in range (0, len(see_whats_happening.T.columns))]
user_ID = {}
not_seen_user_ID = {}
user_indexes = {}
not_seen_user_indexes = {}
for i, row in see_whats_happening.T.iterrows():
  if_reviewed_user = list(zip(row.index, row.values, rows))
  #rated list
  if_rated_user = [(x,z) for x,y,z in if_reviewed_user if str(y) != 'nan']
  rated_user_index = [x[1] for x in if_rated_user]
  rated_users = [x[0] for x in if_rated_user]
  user_ID[i] = rated_users
  user_indexes[i] = rated_user_index
  #unrated list
  if_not_rated_user = [(x,z) for x,y,z in if_reviewed_user if str(y) == 'nan']
  unrated_user_index = [x[1] for x in if_not_rated_user]
  unrated_users = [x[0] for x in if_not_rated_user]
  not_seen_user_ID[i] = unrated_users
  not_seen_user_indexes[i] = unrated_user_index
print('Done')

Done


In [22]:
print(user_ID["Star Wars: Episode VI - Return of the Jedi (1983)"])
print(user_indexes["Star Wars: Episode VI - Return of the Jedi (1983)"])

[1, 7, 11, 15, 17, 18, 19, 21, 27, 28, 30, 33, 39, 42, 44, 45, 52, 57, 59, 62, 63, 64, 66, 68, 69, 70, 71, 72, 77, 79, 82, 84, 86, 91, 95, 96, 97, 101, 103, 112, 114, 120, 122, 124, 125, 129, 132, 135, 137, 140, 141, 149, 160, 164, 165, 166, 167, 171, 172, 177, 182, 183, 186, 187, 195, 198, 199, 200, 201, 202, 208, 210, 211, 212, 217, 219, 220, 224, 226, 232, 234, 239, 246, 247, 248, 249, 254, 255, 256, 261, 263, 264, 266, 267, 268, 274, 275, 276, 279, 283, 288, 292, 294, 298, 303, 304, 305, 307, 312, 313, 318, 328, 330, 332, 334, 337, 344, 350, 354, 357, 361, 364, 368, 370, 372, 376, 380, 381, 382, 385, 387, 391, 399, 400, 408, 414, 425, 428, 430, 432, 434, 437, 438, 448, 452, 453, 462, 464, 465, 469, 474, 475, 477, 479, 480, 483, 492, 494, 513, 514, 517, 522, 524, 525, 527, 534, 540, 549, 551, 554, 555, 557, 559, 560, 561, 567, 570, 572, 573, 577, 580, 586, 590, 591, 593, 594, 596, 597, 599, 600, 603, 605, 606, 607, 608, 610]
[0, 6, 10, 14, 16, 17, 18, 20, 26, 27, 29, 32, 38, 41, 43,

In [23]:
#for us let k be 50, for K-Nearest Neighbours (for users)
k = 50
#we will use cosine similarity
cosine_sim_user = NearestNeighbors(n_neighbors = k, algorithm = 'brute', metric = 'cosine') 
#now let's fit our data
item_sim_user = cosine_sim.fit(pivot_table.values)
#now the neighbours
item_distances_user, item_indices_user = item_sim_user.kneighbors(pivot_table.values)
print('Done')

Done


In [25]:
print(item_indices_user[20])
print(item_distances_user[20])

[ 20 533 248 291 474 560 379  61  67  17 140 176 297 482 447 572 304 465
  62 219]
[0.         0.58538067 0.59748427 0.6085467  0.61456277 0.63423803
 0.64604517 0.64734834 0.65319525 0.66071375 0.66643161 0.66973678
 0.67129887 0.67247429 0.67536361 0.67577645 0.68044777 0.68266552
 0.68829968 0.69288862]


In [26]:
#now lets list out(dictionary) the nearest neighbours(users)
sim_users = {}
for i in range(len(pivot_table.index)):
  user = pivot_table.index[item_indices_user[i]].tolist()
  sim_users[pivot_table.index[i]] = user
print('Done')

Done


In [27]:
recommendations_user = {}
for movies, who_watched in user_indexes.items():
  neighbour_ind_user = [j for i in item_indices_user[who_watched] for j in i]
  neighbour_dis_user = [j for i in item_distances_user[who_watched] for j in i]
  #now combine the index and the distances
  neighbour_info_user = list(zip(neighbour_ind_user, neighbour_dis_user))
  #now create another dictionary to seperate the users who already have seen the movie
  not_seen_neighbour_user = {i:d for i,d in neighbour_info_user if i not in who_watched}
  #now create a list back from dictionary
  neighbours_not_seen_info_user = list(zip(not_seen_neighbour_user.keys(), not_seen_neighbour_user.values())) 
  #now sort based on distances(nearest)
  nearest_neighbours_user = sorted(neighbours_not_seen_info_user, key = lambda x:x[1])
  #converting the indexes into user_ID's
  nearest_unseen_user = [pivot_table.T.columns[i] for i, distance in nearest_neighbours_user]
  nearest_unseen_user = nearest_unseen_user[0 : 10]
  #now add that to the final recommendations dictionary
  recommendations_user[movies] = nearest_unseen_user
  if movies == "Star Wars: Episode VI - Return of the Jedi (1983)":
    print(recommendations_user[movies])
print('done')

[240, 58, 456, 284, 270, 447, 43, 151, 436, 389]
done


In [28]:
output_file = open("drive/My Drive/Colab Notebooks/recommend_user.csv", "w", newline = "")
write_csv = csv.writer(output_file)
tuple = ("movie_Name", "user 1", "user 2", "user 3", "user 4", "user 5", "user 6", "user 7", "user 8", "user 9", "user 10")
write_csv.writerow(tuple)
for movie, users in recommendations_user.items():
  users.insert(0, movie)
  tuple = (i for i in users)
  write_csv.writerow(tuple)
output_file.close()

Now Let's see what we have achieved :)

In [29]:
df_movies_rec = pd.read_csv("drive/My Drive/Colab Notebooks/recommend_movies.csv")
df_users_rec = pd.read_csv("drive/My Drive/Colab Notebooks/recommend_user.csv")
print('Done')

Done


In [30]:
df_movies_rec.head(5)

Unnamed: 0,user_ID,movie 1,movie 2,movie 3,movie 4,movie 5,movie 6,movie 7,movie 8,movie 9,movie 10,movie 11,movie 12,movie 13,movie 14,movie 15,movie 16,movie 17,movie 18,movie 19,movie 20,movie 21,movie 22,movie 23,movie 24,movie 25
0,1,Heart Condition (1990),Liberty Heights (1999),Rare Birds (2001),About Adam (2000),Who Is Cletis Tout? (2001),Jesus' Son (1999),Get Real (1998),Dark Blue World (Tmavomodrý svet) (2001),Deuces Wild (2002),Hardball (2001),Gunga Din (1939),West Beirut (West Beyrouth) (1998),"Golden Bowl, The (2000)",On Her Majesty's Secret Service (1969),Pretty Woman (1990),Terminator 2: Judgment Day (1991),For Your Eyes Only (1981),Action Jackson (1988),Firewalker (1986),Blue Car (2002),Love Liza (2002),"Dancer Upstairs, The (2002)",Impostor (2002),Iron Eagle II (1988),Gangster No. 1 (2000)
1,2,Kill Bill: Vol. 2 (2004),Mandela: Long Walk to Freedom (2013),Louis C.K.: Chewed Up (2008),Louis C.K.: Live at the Beacon Theater (2011),Forrest Gump (1994),Pulp Fiction (1994),John Wick (2014),Pirates of the Caribbean: The Curse of the Bla...,"Silence of the Lambs, The (1991)",Sin City (2005),Gangster Squad (2013),"Incredibles, The (2004)","Usual Suspects, The (1995)",Star Wars: Episode VII - The Force Awakens (2015),V for Vendetta (2006),Role Models (2008),Louis C.K.: Shameless (2007),Schindler's List (1993),Batman Begins (2005),Scott Pilgrim vs. the World (2010),X-Men (2000),Super Troopers (2001),Shrek (2001),Fight Club (1999),Harold and Kumar Go to White Castle (2004)
2,3,"Darkest Hour, The (2011)",City Hunter (Sing si lip yan) (1993),Regarding Henry (1991),Star Trek VI: The Undiscovered Country (1991),"Agony and the Ecstasy, The (1965)",Atragon (Kaitei Gunkan) (1963),"Angry Red Planet, The (1959)",20 Million Miles to Earth (1957),Alice in Wonderland (1933),American Grindhouse (2010),Allegro non troppo (1977),Annie Get Your Gun (1950),Attack of the Puppet People (1958),"10th Victim, The (La decima vittima) (1965)",Attack of the Crab Monsters (1957),Attack of the 50 Foot Woman (1958),As You Like It (2006),Aelita: The Queen of Mars (Aelita) (1924),And Starring Pancho Villa as Himself (2003),Alien from L.A. (1988),Plastic (2014),Florence Foster Jenkins (2016),Shrink (2009),Wizards of the Lost Kingdom II (1989),Tormented (1960)
3,4,Paper Clips (2004),Sarah Silverman: Jesus Is Magic (2005),Dear Frankie (2004),Harry Potter and the Prisoner of Azkaban (2004),Harry Potter and the Goblet of Fire (2005),Walking and Talking (1996),Heartburn (1986),Tex (1982),"This World, Then the Fireworks (1997)",With Six You Get Eggroll (1968),"Party 2, The (Boum 2, La) (1982)","Do You Remember Dolly Bell? (Sjecas li se, Dol...",Leningrad Cowboys Go America (1989),"Piano Teacher, The (La pianiste) (2001)",In July (Im Juli) (2000),Investigation of a Citizen Above Suspicion (In...,"Angel at My Table, An (1990)","Kiss Me, Stupid (1964)",Dear Diary (Caro Diario) (1994),Jamaica Inn (1939),"Best Man, The (Testimone dello sposo, Il) (1998)",War and Peace (1956),Not One Less (Yi ge dou bu neng shao) (1999),Final Analysis (1992),Rosetta (1999)
4,5,"Terminator, The (1984)",Speed (1994),Die Hard: With a Vengeance (1995),Ghost (1990),Cliffhanger (1993),Waterworld (1995),Bambi (1942),Jurassic Park (1993),Crimson Tide (1995),Star Trek: Generations (1994),Home Alone (1990),Outbreak (1995),"Net, The (1995)",Reservoir Dogs (1992),Monty Python and the Holy Grail (1975),My Left Foot (1989),Seven (a.k.a. Se7en) (1995),GoldenEye (1995),Forrest Gump (1994),Goodfellas (1990),Independence Day (a.k.a. ID4) (1996),Aliens (1986),Indiana Jones and the Last Crusade (1989),"American President, The (1995)","Godfather, The (1972)"


In [31]:
df_users_rec.head(5)

Unnamed: 0,movie_Name,user 1,user 2,user 3,user 4,user 5,user 6,user 7,user 8,user 9,user 10
0,'71 (2014),249,274,380,298,305,68,414,560,561,219
1,'Hellboy': The Seeds of Creation (2004),434,330,580,239,63,18,247,64,254,573
2,'Round Midnight (1986),330,239,63,18,247,64,254,573,219,328
3,'Salem's Lot (2004),271,377,440,555,587,288,146,590,603,51
4,'Til There Was You (1997),271,377,156,572,84,199,224,606,409,391
