**In this notebook we will apply the content base filtering algorithm that we saw in the *Recommender systems course*. We will take the movie recommendation use case with the *ml-latest-small MovieLens dataset* that you can download it from [here](https://grouplens.org/datasets/movielens/).**

# Content Based Recommandation System

Submitted By: Snnidhi Bookseller
MSc in Artificial Intelligence System, EPITA.
Year: 2019-2020


In [1]:
import pandas as pd
import numpy as np
from pathlib import Path

pd.options.display.max_columns= None

In [2]:
!tree data

Folder PATH listing
Volume serial number is AC37-2D9C
C:\USERS\SANNIDHI\DOWNLOADS\RECOMENDATION SYSTEM\DATA
ÀÄÄÄml-latest-small


In [3]:
DATA_FOLDER = Path('data/ml-latest-small')
MOVIES_FILEPATH = DATA_FOLDER / 'movies.csv'
RATINGS_FILEPATH = DATA_FOLDER / 'ratings.csv'

# Dataset

In [4]:
movies = pd.read_csv(MOVIES_FILEPATH)
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [5]:
ratings = pd.read_csv(RATINGS_FILEPATH)
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


- To only deal with a single dataframe, we will merge the *movies* and *ratings* dataframes in a single one.

In [6]:
users_info = movies.merge(ratings, on='movieId', how='inner')
users_info

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1,4.0,964982703
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,5,4.0,847434962
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,7,4.5,1106635946
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,15,2.5,1510577970
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,17,4.5,1305696483
5,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,18,3.5,1455209816
6,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,19,4.0,965705637
7,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,21,3.5,1407618878
8,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,27,3.0,962685262
9,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,31,5.0,850466616


- We will start by applying the algorithm for a single user before generalizing it to all the users dataset.

In [7]:
target_user_id = 1
target_user = users_info[users_info['userId'] == target_user_id]
target_user.sample(3)

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
38226,2012,Back to the Future Part III (1990),Adventure|Comedy|Sci-Fi|Western,1,4.0,964984176
45014,2571,"Matrix, The (1999)",Action|Sci-Fi|Thriller,1,5.0,964981888
41232,2253,Toys (1992),Comedy|Fantasy,1,2.0,964981775


In [8]:
target_user = target_user.reset_index(drop=True).drop(columns=['userId', 'timestamp'])
target_user.head()

Unnamed: 0,movieId,title,genres,rating
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,4.0
1,3,Grumpier Old Men (1995),Comedy|Romance,4.0
2,6,Heat (1995),Action|Crime|Thriller,4.0
3,47,Seven (a.k.a. Se7en) (1995),Mystery|Thriller,5.0
4,50,"Usual Suspects, The (1995)",Crime|Mystery|Thriller,5.0


- To test our algorithm, we will split it to a train and a test datasets

In [9]:
train = target_user[:200].copy()
test = target_user[200:].copy()
train.shape, test.shape

((200, 4), (32, 4))

# Recommend items depending on the user profile - single user

## Model the movies the user interacted with

- We will start by extracting the movie genres from our dataset

In [10]:
movie_genres = list(set('|'.join(train['genres'].values).split('|')))
movie_genres

['Film-Noir',
 'Musical',
 'Children',
 'Drama',
 'Fantasy',
 'Romance',
 'Adventure',
 'Crime',
 'Horror',
 'Mystery',
 'Thriller',
 'War',
 'Sci-Fi',
 'Western',
 'Animation',
 'Comedy',
 'Action']

- We add the columns for each genre

In [11]:
train[movie_genres] = pd.DataFrame([[0] * len(movie_genres)], index=train.index)
train.head()

Unnamed: 0,movieId,title,genres,rating,Film-Noir,Musical,Children,Drama,Fantasy,Romance,Adventure,Crime,Horror,Mystery,Thriller,War,Sci-Fi,Western,Animation,Comedy,Action
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,4.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,3,Grumpier Old Men (1995),Comedy|Romance,4.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,6,Heat (1995),Action|Crime|Thriller,4.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,47,Seven (a.k.a. Se7en) (1995),Mystery|Thriller,5.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,50,"Usual Suspects, The (1995)",Crime|Mystery|Thriller,5.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


- We hot encode each movie depending on its genres

In [12]:
for movie_genre in movie_genres:
    mask = train['genres'].str.contains(movie_genre)
    train.loc[mask, movie_genre] = 1
                    
train.sample(3)

Unnamed: 0,movieId,title,genres,rating,Film-Noir,Musical,Children,Drama,Fantasy,Romance,Adventure,Crime,Horror,Mystery,Thriller,War,Sci-Fi,Western,Animation,Comedy,Action
157,2450,Howard the Duck (1986),Adventure|Comedy|Sci-Fi,4.0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0
67,1136,Monty Python and the Holy Grail (1975),Adventure|Comedy|Fantasy,5.0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0
40,733,"Rock, The (1996)",Action|Adventure|Thriller,4.0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1


In [13]:
train[movie_genres].values

array([[0, 0, 1, ..., 1, 1, 0],
       [0, 0, 0, ..., 0, 1, 0],
       [0, 0, 0, ..., 0, 0, 1],
       ...,
       [0, 0, 0, ..., 0, 1, 0],
       [0, 0, 0, ..., 0, 1, 0],
       [0, 1, 1, ..., 1, 1, 0]], dtype=int64)

## Determine the user preferences using the movies he interacted with

In [14]:
movie_rating_embedding = train[movie_genres].mul(train['rating'], axis=0)
movie_rating_embedding.sample(5)

Unnamed: 0,Film-Noir,Musical,Children,Drama,Fantasy,Romance,Adventure,Crime,Horror,Mystery,Thriller,War,Sci-Fi,Western,Animation,Comedy,Action
14,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0
154,0.0,0.0,0.0,0.0,0.0,4.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,4.0
29,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,4.0,0.0
70,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0
7,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,4.0


In [15]:
user_profile = movie_rating_embedding.mean()
user_profile

Film-Noir    0.025
Musical      0.515
Children     0.895
Drama        1.295
Fantasy      0.885
Romance      0.540
Adventure    1.600
Crime        0.895
Horror       0.270
Mystery      0.320
Thriller     0.975
War          0.375
Sci-Fi       0.730
Western      0.125
Animation    0.660
Comedy       1.470
Action       1.655
dtype: float64

## Compute the similarity between the user and preferences and the candidate movies

With user's profile and the complete list of movies and their genres in hand, and take the weighted average of every 
movie based on his profile and recommend the top twenty movies that match his preference.

In [16]:
## Multiply the genres by the weights and then take the weighted average.
recommendation_table_df = (movie_rating_embedding.dot(user_profile)) / user_profile.sum()
recommendation_table_df.head()

0    1.665911
1    0.607710
2    1.065760
3    0.489418
4    0.827664
dtype: float64

Sort the recommendation table in descending order

In [17]:
# sort values from great to small
recommendation_table_df.sort_values(ascending=False, inplace=True)
recommendation_table_df.head(20)

194    2.541572
118    2.458428
69     2.324263
128    2.020030
181    1.957672
199    1.942555
126    1.942555
192    1.821618
100    1.821618
98     1.785714
36     1.751701
138    1.747921
56     1.721466
38     1.721466
6      1.702570
125    1.687075
137    1.681784
193    1.678005
0      1.665911
96     1.606954
dtype: float64

The recommendation table. Complete with movie details and genres for the top 20 movies that match user's profile.

In [18]:
# first we make a copy of the original movies_df
copy = movies.copy(deep=True)

# Then we set its index to movieId
copy = copy.set_index('movieId', drop=True)

# Next we enlist the top 20 recommended movieIds we defined above
top_20_index = recommendation_table_df.index[:20].tolist()

# finally we slice these indices from the copied movies df and save in a variable
recommended_movies = copy.loc[top_20_index, :]

# Now we can display the top 20 movies in descending order of preference
recommended_movies

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_tuple(key)


Unnamed: 0_level_0,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
194,Smoke (1995),Comedy|Drama
118,If Lucy Fell (1996),Comedy|Romance
69,Friday (1995),Comedy
128,Jupiter's Wife (1994),Documentary
181,Mighty Morphin Power Rangers: The Movie (1995),Action|Children
199,"Umbrellas of Cherbourg, The (Parapluies de Che...",Drama|Musical|Romance
126,"NeverEnding Story III, The (1994)",Adventure|Children|Fantasy
192,,
100,City Hall (1996),Drama|Thriller
98,,


# Recommend items depending on the user profile - multiple users

In [19]:
# TODO: generalize the algorithm to multiple users
# I tried with little bit different method

In [20]:
# Splitting genres and convert list
movies['genres'] = movies.genres.str.split('|')
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]"
1,2,Jumanji (1995),"[Adventure, Children, Fantasy]"
2,3,Grumpier Old Men (1995),"[Comedy, Romance]"
3,4,Waiting to Exhale (1995),"[Comedy, Drama, Romance]"
4,5,Father of the Bride Part II (1995),[Comedy]


In [21]:
#Merge the two movies.csv and rating.csv files
user_info =  movies.merge(ratings, on='movieId', how='inner')
user_info.head()

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]",1,4.0,964982703
1,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]",5,4.0,847434962
2,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]",7,4.5,1106635946
3,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]",15,2.5,1510577970
4,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]",17,4.5,1305696483


In [22]:
# Drop the extra columns which are not much important
target_user = user_info.reset_index(drop=True).drop(columns=['userId', 'timestamp'])
target_user.head()

Unnamed: 0,movieId,title,genres,rating
0,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]",4.0
1,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]",4.0
2,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]",4.5
3,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]",2.5
4,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]",4.5


###### One Hot Encoding Method

In [23]:
# First let's make a copy of the movies_df
movies_with_genres = target_user.copy(deep=True)

# Iterate through movies_df, then append the movie genres as columns of 1s or 0s.
# 1 if that column contains movies in the genre at the present index and 0 if not.

x = []
for index, row in movies.iterrows():
    x.append(index)
    for genre in row['genres']:
        movies_with_genres.at[index, genre] = 1

# Confirm that every row has been iterated and acted upon
print(len(x) == len(movies))
#print(x)
#print(row)
# Filling in the NaN values with 0 to show that a movie doesn't have that column's genre
movies_with_genres = movies_with_genres.fillna(0)

# Reset index to default and drop the existing index.
movies_with_genres = movies_with_genres.set_index(movies_with_genres.movieId)

movies_with_genres.head()

True


Unnamed: 0_level_0,movieId,title,genres,rating,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
1,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]",4.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]",4.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]",4.5,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]",2.5,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1,Toy Story (1995),"[Adventure, Animation, Children, Comedy, Fantasy]",4.5,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Step -1: Create user's profile

In [24]:
user_movie_ratings = target_user.copy(deep=True)
 
# Drop genres column  because we dont need 
user_movie_ratings = user_movie_ratings.drop(['genres'], 1)

user_movie_ratings

Unnamed: 0,movieId,title,rating
0,1,Toy Story (1995),4.0
1,1,Toy Story (1995),4.0
2,1,Toy Story (1995),4.5
3,1,Toy Story (1995),2.5
4,1,Toy Story (1995),4.5
5,1,Toy Story (1995),3.5
6,1,Toy Story (1995),4.0
7,1,Toy Story (1995),3.5
8,1,Toy Story (1995),3.0
9,1,Toy Story (1995),5.0


Step - 2: Learning user's profile

It starts by learning the input's preferences, so get the subset of movies that the input has watched from the Dataframe containing genres defined with binary values.

In [25]:
# filter the selection by outputing movies that exist in both user_movie_ratings and movies_with_genres
user_genres_df = movies_with_genres[movies_with_genres.movieId.isin(user_movie_ratings.movieId)]

#We'll only need the actual genre table, so let's clean this up a bit by resetting the index and dropping the movieId
#, title, genres and year columns.
user_genres_df = user_genres_df.drop(['movieId','title','genres', 'rating'], axis=1)
user_genres_df = user_genres_df.reset_index(drop=True)
user_genres_df

Unnamed: 0,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [32]:
# Find the shapes of our data frames
print('Shape of user_movie_ratings is:', user_movie_ratings.shape)
print('Shape of user_genres_df is:', user_genres_df.shape)


Shape of user_movie_ratings is: (100836, 3)
Shape of user_genres_df is: (100836, 20)


In [27]:
# The dot product of transpose of user_genres_df by user_movie_rating column
user_profile = user_genres_df.T.dot(user_movie_ratings.rating)
user_profile

Adventure              4486.0
Animation              2170.0
Children               2390.0
Comedy                13192.0
Fantasy                2722.0
Romance                5608.0
Drama                 15460.0
Action                 6488.0
Crime                  4199.0
Thriller               6669.5
Horror                 3406.5
Mystery                2005.5
Sci-Fi                 3438.5
War                    1350.0
Musical                1173.5
Documentary            1548.5
IMAX                    582.5
Western                 572.0
Film-Noir               307.0
(no genres listed)      125.5
dtype: float64

Step 4: Deploying The Content-Based Recommender System
Delete irrelevant columns from the movies_with_genres data frame that contains all 9742 movies and distinctive columns 
of genres.

In [28]:
# Deleting four unnecessary columns.
movies_with_genres.drop(['movieId','title','genres','rating'], axis=1, inplace=True)
movies_with_genres.head()

Unnamed: 0_level_0,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


With user's profile and the complete list of movies and their genres in hand, and take the weighted average of every 
movie based on his profile and recommend the top twenty movies that match his preference.

In [29]:
# Multiply the genres by the weights and then take the weighted average.
recommendation_table_df = (movies_with_genres.dot(user_profile)) / user_profile.sum()
recommendation_table_df.head()

movieId
1    0.320435
1    0.123219
1    0.241354
1    0.439828
1    0.169358
dtype: float64

Sort the recommendation table in descending order

In [30]:
# sort values from great to small
recommendation_table_df.sort_values(ascending=False, inplace=True)
recommendation_table_df.head(20)

movieId
280    0.729010
110    0.690509
110    0.666335
165    0.634388
317    0.629285
161    0.615882
193    0.598043
247    0.594340
208    0.590700
161    0.590655
19     0.590655
1      0.590655
1      0.590655
145    0.590655
111    0.590655
225    0.580712
235    0.580481
329    0.579358
70     0.579358
26     0.579358
dtype: float64

Here's the recommendation table. Complete with movie details and genres for the top 20 movies that match user's profile.

In [31]:
# first we make a copy of the original movies_df
copy = movies.copy(deep=True)

# Then we set its index to movieId
copy = copy.set_index('movieId', drop=True)

# Next we enlist the top 20 recommended movieIds we defined above
top_20_index = recommendation_table_df.index[:20].tolist()

# finally we slice these indices from the copied movies df and save in a variable
recommended_movies = copy.loc[top_20_index, :]

# Now we can display the top 20 movies in descending order of preference
recommended_movies

Unnamed: 0_level_0,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
280,Murder in the First (1995),"[Drama, Thriller]"
110,Braveheart (1995),"[Action, Drama, War]"
110,Braveheart (1995),"[Action, Drama, War]"
165,Die Hard: With a Vengeance (1995),"[Action, Crime, Thriller]"
317,"Santa Clause, The (1994)","[Comedy, Drama, Fantasy]"
161,Crimson Tide (1995),"[Drama, Thriller, War]"
193,Showgirls (1995),[Drama]
247,Heavenly Creatures (1994),"[Crime, Drama]"
208,Waterworld (1995),"[Action, Adventure, Sci-Fi]"
161,Crimson Tide (1995),"[Drama, Thriller, War]"
