#FilmPulse Recommender : a Movie Recommender Engine based on content based filtering 🎥

In this project, we will try to build an efficent movie recommender system that can recommend to users the movies that they would most likely give a high rating to. To do this, we will work with the content based filtering algorithm.






# Loading and preprocessing the data

The data set is derived from the MovieLens ml-latest-small dataset.
The original dataset has 9000 movies rated by 600 users with ratings on a scale of 0.5 to 5 in 0.5 step increments. The dataset has been reduced in size to focus on movies from the years since 2000 and popular genres. The reduced dataset has users and movies. For each movie, the dataset provides a movie title, release date, and one or more genres. For example "Toy Story 3" was released in 2010 and has several genres: "Adventure|Animation|Children|Comedy|Fantasy|IMAX". This dataset contains little information about users other than their ratings.

In [None]:
#importing the libraries we need
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler,StandardScaler
from sklearn.model_selection import train_test_split


In [None]:
#loading the data
user_train=np.array(pd.read_csv("/content/drive/MyDrive/FilmPurse/content_user_train.csv"))
user_train=pd.DataFrame(user_train,columns=['user id','rating count','rating ave','Action','Adventure','Animation','Children','Comedy','Crime','Documentary','Drama','Fantasy','Horror','Mystery','Romance','Sci-Fi','Thriller'])
item_train=np.array(pd.read_csv("/content/drive/MyDrive/FilmPurse/content_item_train.csv"))
item_train=pd.DataFrame(item_train,columns=['movie id','year','ave rating','Action','Adventure','Animation','Children','Comedy','Crime','Documentary','Drama','Fantasy','Horror','Mystery','Romance','Sci-Fi','Thriller'])
y_train=np.array(pd.read_csv("/content/drive/MyDrive/FilmPurse/content_y_train.csv"))

In [None]:
#loading the movie list
movie_list=pd.read_csv("/content/drive/MyDrive/FilmPurse/content_movie_list.csv")

In [None]:
movie_list.head()

Unnamed: 0,movieId,title,genres
0,4054,Save the Last Dance (2001),Drama|Romance
1,4069,"Wedding Planner, The (2001)",Comedy|Romance
2,4148,Hannibal (2001),Horror|Thriller
3,4149,Saving Silverman (Evil Woman) (2001),Comedy|Romance
4,4153,Down to Earth (2001),Comedy|Fantasy|Romance


In [None]:
user_train.head()

Unnamed: 0,user id,rating count,rating ave,Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Horror,Mystery,Romance,Sci-Fi,Thriller
0,2.0,16.0,4.0625,3.9,5.0,0.0,0.0,4.0,4.2,4.0,4.0,0.0,3.0,4.0,0.0,4.25,3.875
1,2.0,16.0,4.0625,3.9,5.0,0.0,0.0,4.0,4.2,4.0,4.0,0.0,3.0,4.0,0.0,4.25,3.875
2,2.0,16.0,4.0625,3.9,5.0,0.0,0.0,4.0,4.2,4.0,4.0,0.0,3.0,4.0,0.0,4.25,3.875
3,2.0,16.0,4.0625,3.9,5.0,0.0,0.0,4.0,4.2,4.0,4.0,0.0,3.0,4.0,0.0,4.25,3.875
4,2.0,16.0,4.0625,3.9,5.0,0.0,0.0,4.0,4.2,4.0,4.0,0.0,3.0,4.0,0.0,4.25,3.875


In [None]:
item_train.head()

Unnamed: 0,movie id,year,ave rating,Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Horror,Mystery,Romance,Sci-Fi,Thriller
0,6874.0,2003.0,3.961832,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,6874.0,2003.0,3.961832,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
2,8798.0,2004.0,3.761364,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,8798.0,2004.0,3.761364,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,8798.0,2004.0,3.761364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
#scaling the features
scaler_user=StandardScaler()
scaler_item=StandardScaler()
scaled_user_train=pd.DataFrame(scaler_user.fit_transform(user_train),columns=['user id','rating count','rating ave','Action','Adventure','Animation','Children','Comedy','Crime','Documentary','Drama','Fantasy','Horror','Mystery','Romance','Sci-Fi','Thriller'])
scaled_item_train=pd.DataFrame(scaler_item.fit_transform(item_train),columns=['movie id','year','ave rating','Action','Adventure','Animation','Children','Comedy','Crime','Documentary','Drama','Fantasy','Horror','Mystery','Romance','Sci-Fi','Thriller'])

In [None]:
#splitting the data into training and test sets
user_train,user_test=train_test_split(scaled_user_train,test_size=0.2,random_state=42)
item_train,item_test=train_test_split(scaled_item_train,test_size=0.2,random_state=42)
y_train,y_test=train_test_split(y_train,test_size=0.2,random_state=42)

In [None]:
#scaling the targets
scaler_y=MinMaxScaler((-1,1))
y_train=scaler_y.fit_transform(y_train)
y_test=scaler_y.transform(y_test)

# Building a Neural Network for content based filtering



In [None]:
tf.random.set_seed(42)
#building the user neural network
user_nn=tf.keras.models.Sequential([
    tf.keras.layers.Dense(256,activation='relu'),
    tf.keras.layers.Dense(128,activation='relu'),
    tf.keras.layers.Dense(32,activation='linear'),
])
#building the movie neural network
movie_nn=tf.keras.models.Sequential([
    tf.keras.layers.Dense(256,activation='relu'),
    tf.keras.layers.Dense(128,activation='relu'),
    tf.keras.layers.Dense(32,activation='linear'),
])
#creating the user input
input_user=tf.keras.layers.Input(shape=(14))
user_vector=user_nn(input_user)
user_vector=tf.linalg.l2_normalize(user_vector,axis=1)
#creating the movie input
input_movie=tf.keras.layers.Input(shape=(16))
movie_vector=movie_nn(input_movie)
movie_vector=tf.linalg.l2_normalize(movie_vector,axis=1)
#compute the dot product of the user vector and the movie vector
output=tf.keras.layers.Dot(axes=1)([user_vector,movie_vector])
#setting up the inputs and the outputs for the model
model=tf.keras.models.Model([input_user,input_movie],output)
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 14)]         0           []                               
                                                                                                  
 input_2 (InputLayer)           [(None, 16)]         0           []                               
                                                                                                  
 sequential (Sequential)        (None, 32)           40864       ['input_1[0][0]']                
                                                                                                  
 sequential_1 (Sequential)      (None, 32)           41376       ['input_2[0][0]']                
                                                                                              

In [None]:
#compile the model
tf.random.set_seed(42)
model.compile(optimizer=tf.keras.optimizers.Adam(),loss=tf.keras.losses.MeanSquaredError())

In [None]:
user_train=np.array(user_train)
item_train=np.array(item_train)

In [None]:
#fit the model
tf.random.set_seed(42)
model.fit([[user_train[:,3:],item_train[:,1:]]],y_train,epochs=30)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7a17c1a81660>

In [None]:
#evaluating the model on the test set
model.evaluate([[np.array(user_test)[:,3:],np.array(item_test)[:,1:]]],y_test)



0.10197644680738449

# Recommending the movies for a new user

We will try to use FilmPurse to predict the top 10 movies that hanane would like based on her genre preferences.

In [None]:
new_user_id = 5000
new_rating_ave = 1.0
new_action = 3.5
new_adventure = 1
new_animation = 1
new_childrens = 1
new_comedy = 3
new_crime = 1
new_documentary = 1
new_drama = 1
new_fantasy = 1
new_horror = 4
new_mystery = 1
new_romance = 5
new_scifi = 1
new_thriller = 3.5
new_rating_count = 3

user_vec = np.array([[new_user_id, new_rating_count, new_rating_ave,
                      new_action, new_adventure, new_animation, new_childrens,
                      new_comedy, new_crime, new_documentary,
                      new_drama, new_fantasy, new_horror, new_mystery,
                      new_romance, new_scifi, new_thriller]])

In [None]:
movie_vec=pd.DataFrame(np.array(pd.read_csv("/content/drive/MyDrive/FilmPurse/content_item_vecs.csv")),columns=['movie id','year','ave rating','Action','Adventure','Animation','Children','Comedy','Crime','Documentary','Drama','Fantasy','Horror','Mystery','Romance','Sci-Fi','Thriller'])
movie_vec.head()

Unnamed: 0,movie id,year,ave rating,Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Horror,Mystery,Romance,Sci-Fi,Thriller
0,4054.0,2001.0,2.84375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
1,4069.0,2001.0,2.909091,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,4069.0,2001.0,2.909091,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
3,4148.0,2001.0,2.935897,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
4,4148.0,2001.0,2.935897,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


In [None]:
#generating the input user vector
new_user_vec=pd.DataFrame(user_vec,columns=['user id','rating count','rating ave','Action','Adventure','Animation','Children','Comedy','Crime','Documentary','Drama','Fantasy','Horror','Mystery','Romance','Sci-Fi','Thriller'])
final_user_vec=pd.DataFrame()
for i in range(len(movie_vec)) :
    final_user_vec=pd.concat([final_user_vec,new_user_vec],axis=0,ignore_index=True)

In [None]:
#normalize the user and movie data
movie_vec=scaler_item.transform(movie_vec)
final_user_vec=scaler_user.transform(final_user_vec)

In [None]:
#generating a list of movie ids to help us present the recommendations nicely
movies=pd.DataFrame(np.array(pd.read_csv("/content/drive/MyDrive/FilmPurse/content_item_vecs.csv")),columns=['movie id','year','ave rating','Action','Adventure','Animation','Children','Comedy','Crime','Documentary','Drama','Fantasy','Horror','Mystery','Romance','Sci-Fi','Thriller'])
movie_ids=movies['movie id']

In [None]:
def predict_new_user(model,user_data,item_data,movie_list,scaler_y,movie_ids) :
    """
    Predicts the movies that a new user might like
    """
    y_p=model.predict([user_data,item_data])
    y_p=scaler_y.inverse_transform(y_p)
    movie_desc=pd.DataFrame(columns=['movieId','title','genres'])
    for id in movie_ids :
        movie_desc=movie_desc.append(movie_list[movie_list['movieId']==id],ignore_index=True)
    recommendations=pd.concat([pd.DataFrame(y_p,columns=['y_p']),movie_desc],axis=1)
    return recommendations[['movieId','y_p','title','genres']].groupby(by=['movieId']).max().sort_values(by=['y_p'],ascending=False)

In [None]:
import warnings
warnings.filterwarnings(action='ignore')

In [None]:
#getting the recommendations
recommendations=predict_new_user(model,final_user_vec[:,3:],movie_vec[:,1:],movie_list,scaler_y,movie_ids)



In [None]:
recommendations.head(10)

Unnamed: 0_level_0,y_p,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
4994,4.19504,"Majestic, The (2001)",Comedy|Drama|Romance
4641,4.19504,Ghost World (2001),Comedy|Drama
4816,4.192513,Zoolander (2001),Comedy
5785,4.181002,Jackass: The Movie (2002),Action|Comedy|Documentary
4701,4.160255,Rush Hour 2 (2001),Action|Comedy
7137,4.155022,"Cooler, The (2003)",Comedy|Drama|Romance
6188,4.153722,Old School (2003),Comedy
6753,4.150526,Secondhand Lions (2003),Children|Comedy|Drama
4161,4.150064,"Mexican, The (2001)",Action|Comedy
4299,4.146762,"Knight's Tale, A (2001)",Action|Comedy|Romance


In [None]:
recommendations_1=recommendations.head(5)

In [None]:
recommendations_2=recommendations.head(5)

In [None]:
recommendations_final=pd.concat([recommendations_1,recommendations_2],axis=0).sort_values(by=['y_p'],ascending=False)
recommendations_final.head(10)

Unnamed: 0_level_0,y_p,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
4994,4.19504,"Majestic, The (2001)",Comedy|Drama|Romance
4641,4.19504,Ghost World (2001),Comedy|Drama
4816,4.192513,Zoolander (2001),Comedy
5785,4.181002,Jackass: The Movie (2002),Action|Comedy|Documentary
4701,4.160255,Rush Hour 2 (2001),Action|Comedy
142488,3.753299,Spotlight (2015),Thriller
119145,3.745461,Kingsman: The Secret Service (2015),Action|Adventure|Comedy|Crime
168252,3.624371,Logan (2017),Action|Sci-Fi
134130,3.54229,The Martian (2015),Adventure|Drama|Sci-Fi
134393,3.540345,Trainwreck (2015),Comedy|Romance


In [None]:
print('FilmPurse thinks hanane would like the following movies :')
for index, movie in recommendations_final.iterrows()  :
  print('    -'+movie[1]+ ' and she would give it a rating of '+str(round(movie[0],2))+'.'+' The genres of this movie are : '+movie[2]+'.')

FilmPurse thinks hanane would like the following movies :
    -Majestic, The (2001) and she would give it a rating of 4.2. The genres of this movie are : Comedy|Drama|Romance.
    -Ghost World (2001) and she would give it a rating of 4.2. The genres of this movie are : Comedy|Drama.
    -Zoolander (2001) and she would give it a rating of 4.19. The genres of this movie are : Comedy.
    -Jackass: The Movie (2002) and she would give it a rating of 4.18. The genres of this movie are : Action|Comedy|Documentary.
    -Rush Hour 2 (2001) and she would give it a rating of 4.16. The genres of this movie are : Action|Comedy.
    -Spotlight (2015) and she would give it a rating of 3.75. The genres of this movie are : Thriller.
    -Kingsman: The Secret Service (2015) and she would give it a rating of 3.75. The genres of this movie are : Action|Adventure|Comedy|Crime.
    -Logan (2017) and she would give it a rating of 3.62. The genres of this movie are : Action|Sci-Fi.
    -The Martian (2015) an