**A recommender system using a Deep Neural Network**<br>

Recommender systems are one of the most successful and widespread application of machine learning technologies in business.

You can apply recommender systems in scenarios where many users interact with many items. You can find large scale recommender systems in retail, video on demand, or music streaming. In order to develop and maintain such systems, a company typically needs a group of expensive data scientist and engineers. That is why even large corporates such as BBC decided to outsource its recommendation services.

Surprisingly, recommendation of news or videos for media, product recommendation or personalization in travel and retail can be handled by similar machine learning algorithms. Furthermore, these algorithms need to be adjusted.

The three basic data sources for a recommender system are users, items, and the interactions among them. We store these interactions between a set of users U and a set of items I in a rating matrix R as shown below. This matrix has m rows for users and n columns for items. Each entry (i, j) contains the specific interaction.

DNN for recommender

For this lecture we use a part of a MovieLens Dataset. It's a small dataset with 100,836 ratings and 3,683 tag applications applied to 9,742 movies by 610 users. Ratings are made on a 5-star scale, with half-star increments (0.5 stars - 5.0 stars).

Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included.

## **1. Read dataset**

In [0]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split


In [4]:
from google.colab import files
uploaded=files.upload()


Saving ratings.csv to ratings (1).csv


In [5]:
# LOAD dataset
ratings = pd.read_csv("ratings.csv",
                      usecols=['userId', 'movieId', 'rating'],
                      header=0,
                      sep=",")

# SPLIT dataset in train and test set
ratings_train, ratings_test = train_test_split(ratings, test_size=0.3, shuffle=True, random_state=77)
ratings.head()

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0


In [6]:
from google.colab import files
uploaded2=files.upload()


Saving movies.csv to movies (1).csv


In [7]:
movies = pd.read_csv("movies.csv",usecols=['movieId', 'title', 'genres'],
                      header=0,sep=',')
movies.head(10)

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
5,6,Heat (1995),Action|Crime|Thriller
6,7,Sabrina (1995),Comedy|Romance
7,8,Tom and Huck (1995),Adventure|Children
8,9,Sudden Death (1995),Action
9,10,GoldenEye (1995),Action|Adventure|Thriller


In [8]:
ratings = pd.merge(ratings,movies,on='movieId')
ratings.head()

Unnamed: 0,userId,movieId,rating,title,genres
0,1,1,4.0,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,5,1,4.0,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,7,1,4.5,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
3,15,1,2.5,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
4,17,1,4.5,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy


## **2. Build pivot matrix**

Try to build the pivot matrix to have the movies in columns and the users in line.

**Idea:** use the `pivot_table()` function of pandas

In [9]:
moviemat = ratings.pivot_table(index='userId',columns='movieId',values='rating')
moviemat.head()

movieId,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,34,36,38,39,40,41,42,43,...,185135,185435,185473,185585,186587,187031,187541,187593,187595,187717,188189,188301,188675,188751,188797,188833,189043,189111,189333,189381,189547,189713,190183,190207,190209,190213,190215,190219,190221,191005,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
1,4.0,,4.0,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.5,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,2.0,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,4.0,,,,,,,,,,,,,,,,,,,,4.0,,,,,,,,,,,,4.0,4.0,,3.0,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


## **3. A non personalized recommender systems with a deep neural network**

### **3.1. Construct labels vector (y)**

In [9]:
y_train = ratings_train['rating'] #No need to encode because we will use regression, if we use classifier we need to encode y
y_train

33427    5.0
27617    5.0
44532    3.0
65092    4.5
34807    4.0
        ... 
84203    4.5
59348    4.5
61012    3.0
74335    4.0
47831    3.5
Name: rating, Length: 70585, dtype: float64

In [0]:
y_test=ratings_test['rating']

#### **Encode the featuree vector**

**Remember:** Label encoding has the advantage that it is straightforward but it has the disadvantage that the numeric values can be “misinterpreted” by the algorithms. For example, the value of 0 is obviously less than the value of 4 but does that really correspond to the data set in real life? Does a wagon have “4X” more weight in our calculation than the convertible? In this example, I don’t think so.

A common alternative approach is called one hot encoding (but also goes by several different names shown below). Despite the different names, the basic strategy is to convert each category value into a new column and assigns a 1 or 0 (True/False) value to the column. This has the benefit of not weighting a value improperly but does have the downside of adding more columns to the data set.

Depending on whether you have chosen to build a dataframe or an array, select one of the following methods to encode the selected features:

* Pandas supports this feature using `get_dummies`. This function is named this way because it creates dummy/indicator variables (aka 1 or 0).

* Sklean can also encode categorical integer features as a one-hot numeric array using `OneHotEncoder`.

In [12]:
X_train_encoded = pd.get_dummies(ratings_train['movieId'])
X_train_encoded.shape

(70585, 8526)

In [13]:
X_test_encoded = pd.get_dummies(ratings_test['movieId'])
print(X_test_encoded.shape)

(30251, 6105)


In [0]:
X_train,X_test = X_train_encoded.align(X_test_encoded, join='outer', axis=1, fill_value=0)

### **3.2. Network**

Using different modules from the `sklearn` library, build a model that is similar to the introductory figure.

But first, should we better use an `MLPClassifier` or an `MLPRegressor? 

**Use only 50 epochs in order to have a correct response time**

In [0]:
from sklearn.neural_network import MLPRegressor

In [0]:
''' BUILD your network here '''

model_nonpersonalised = MLPRegressor(hidden_layer_sizes=(2,),
                                       activation='relu',
                                       solver='adam',
                                       learning_rate='constant',
                                       max_iter=50,
                                       learning_rate_init=0.01,
                                       alpha=0.01)

In [0]:
fitter = model_nonpersonalised.fit(X_train,y_train)


In [0]:
prediction_nonpersonalised = model_nonpersonalised.predict(X_test)


In [0]:
from sklearn.metrics import mean_squared_error

In [20]:
mean_squared_error(y_test,prediction_nonpersonalised)

0.9356753006673889

##**4 A personalized recommender systems with a deep neural network**<br>
Let's now try to build a rather naive neural network that takes a couple (user, film) as input and tries to predict a note.

In [0]:
X_coupletrain = ratings_train[['userId','movieId']]
X_coupletest = ratings_test[['userId','movieId']]

In [35]:
X_coupletrain_encoded = pd.get_dummies(X_coupletrain,columns=['userId','movieId'])
X_coupletrain_encoded

Unnamed: 0,userId_1,userId_2,userId_3,userId_4,userId_5,userId_6,userId_7,userId_8,userId_9,userId_10,userId_11,userId_12,userId_13,userId_14,userId_15,userId_16,userId_17,userId_18,userId_19,userId_20,userId_21,userId_22,userId_23,userId_24,userId_25,userId_26,userId_27,userId_28,userId_29,userId_30,userId_31,userId_32,userId_33,userId_34,userId_35,userId_36,userId_37,userId_38,userId_39,userId_40,...,movieId_184245,movieId_184253,movieId_184257,movieId_184349,movieId_184471,movieId_184641,movieId_184721,movieId_184791,movieId_184931,movieId_185029,movieId_185033,movieId_185135,movieId_185435,movieId_185585,movieId_186587,movieId_187031,movieId_187541,movieId_187593,movieId_187595,movieId_187717,movieId_188189,movieId_188301,movieId_188751,movieId_188833,movieId_189043,movieId_189333,movieId_189547,movieId_189713,movieId_190183,movieId_190213,movieId_190215,movieId_190219,movieId_190221,movieId_193571,movieId_193573,movieId_193579,movieId_193581,movieId_193583,movieId_193585,movieId_193587
33427,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
27617,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
44532,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
65092,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
34807,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
84203,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
59348,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
61012,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
74335,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [0]:
X_coupletest_encoded = pd.get_dummies(X_coupletest,columns=['userId','movieId'])

In [31]:
print(X_coupletrain_encoded.shape)
print(X_coupletest_encoded.shape)

(70585, 9136)
(30251, 6715)


In [32]:
X_train2,X_test2 = X_coupletrain_encoded.align(X_coupletest_encoded, join='outer', axis=1, fill_value=0)
print(X_train2.shape)
print(X_test2.shape)

(70585, 10334)
(30251, 10334)


### **4.2 Network**

In [0]:
model_personalised = MLPRegressor(hidden_layer_sizes=(2,),
                                       activation='relu',
                                       solver='adam',
                                       learning_rate='constant',
                                       max_iter=50,
                                       learning_rate_init=0.01,
                                       alpha=0.01)

In [0]:
fitter2 = model_personalised.fit(X_train2,y_train)


In [0]:
prediction_personalised = model_personalised.predict(X_test2)


In [36]:
mean_squared_error(y_test,prediction_personalised)

0.7666388821087163

## **5. A content based recommender systems with a deep neural network (Optional)**

We complete our network by taking into account the profile of each movie. For this reason I propose to reuse the work done last week on the text by counting the occurrences of each genre (CountVectorizer or TfIdf) and to complete the feature vector with this new information.

### **5.1. Read and process the profile of each feature**

In [11]:
''' Build the profile of each movie '''
import itertools

def movie_preprocessing(movie):
    movie_col = list(movie.columns)
    movie_genre = [doc.split('|') for doc in movie['genres']]
    genre_table = {token: idx for idx, token in enumerate(set(itertools.chain.from_iterable(movie_genre)))}

    index_genre = {v: k for k, v in genre_table.items()}
    
    movie_genre = pd.DataFrame(movie_genre)
    #display("movie_genre", movie_genre.head())
    genre_table = pd.DataFrame(genre_table.items())
    genre_table.columns = ['genres', 'Index']
    #display("genre_table", genre_table.head())

    # use one-hot encoding for movie genres (here called tag)
    genre_dummy = np.zeros([len(movie), len(genre_table)])

    for i in range(len(movie)):
        for j in range(len(genre_table)):
            if genre_table['genres'][j] in list(movie_genre.iloc[i, :]):
                genre_dummy[i, j] = 1

    # combine the tag_dummy one-hot encoding table to original movie files
    movie = pd.concat([movie, pd.DataFrame(genre_dummy)], 1)
    movie_col.extend([index_genre[i] for i in range(len(genre_table))])
    movie.columns = movie_col
    movie = movie.drop('genres', 1)
    return movie, genre_table['genres'].to_list()

profiles, genres = movie_preprocessing(movies)
profiles.head()

Unnamed: 0,movieId,title,Action,Horror,IMAX,Drama,Comedy,Western,Crime,(no genres listed),Musical,Animation,Documentary,Film-Noir,War,Fantasy,Mystery,Thriller,Sci-Fi,Adventure,Romance,Children
0,1,Toy Story (1995),0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0
1,2,Jumanji (1995),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0
2,3,Grumpier Old Men (1995),0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3,4,Waiting to Exhale (1995),0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,5,Father of the Bride Part II (1995),0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


###**5.2. Build features vector (X)**


Using the pandas merge() function, build the feature vector. It is a question of associating to each evaluation (i.e. each user-movie couple), the profile of the film that has just been evaluated.

In [14]:
X_tripletrain = pd.merge(X_coupletrain,profiles.loc[:, profiles.columns != 'title'],on='movieId')
X_tripletrain.head()

Unnamed: 0,userId,movieId,Action,Horror,IMAX,Drama,Comedy,Western,Crime,(no genres listed),Musical,Animation,Documentary,Film-Noir,War,Fantasy,Mystery,Thriller,Sci-Fi,Adventure,Romance,Children
0,226,6188,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,282,6188,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,124,6188,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,480,6188,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,573,6188,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [15]:
X_tripletest = pd.merge(X_coupletest,profiles.loc[:, profiles.columns != 'title'],on='movieId')
X_tripletest.head()

Unnamed: 0,userId,movieId,Action,Horror,IMAX,Drama,Comedy,Western,Crime,(no genres listed),Musical,Animation,Documentary,Film-Noir,War,Fantasy,Mystery,Thriller,Sci-Fi,Adventure,Romance,Children
0,578,72407,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0
1,563,72407,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0
2,600,719,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,84,719,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,438,719,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [16]:
X_tripletrain_encoded = pd.get_dummies(X_tripletrain,columns=X_tripletrain.columns)
X_tripletrain_encoded

Unnamed: 0,userId_1,userId_2,userId_3,userId_4,userId_5,userId_6,userId_7,userId_8,userId_9,userId_10,userId_11,userId_12,userId_13,userId_14,userId_15,userId_16,userId_17,userId_18,userId_19,userId_20,userId_21,userId_22,userId_23,userId_24,userId_25,userId_26,userId_27,userId_28,userId_29,userId_30,userId_31,userId_32,userId_33,userId_34,userId_35,userId_36,userId_37,userId_38,userId_39,userId_40,...,Action_0.0,Action_1.0,Horror_0.0,Horror_1.0,IMAX_0.0,IMAX_1.0,Drama_0.0,Drama_1.0,Comedy_0.0,Comedy_1.0,Western_0.0,Western_1.0,Crime_0.0,Crime_1.0,(no genres listed)_0.0,(no genres listed)_1.0,Musical_0.0,Musical_1.0,Animation_0.0,Animation_1.0,Documentary_0.0,Documentary_1.0,Film-Noir_0.0,Film-Noir_1.0,War_0.0,War_1.0,Fantasy_0.0,Fantasy_1.0,Mystery_0.0,Mystery_1.0,Thriller_0.0,Thriller_1.0,Sci-Fi_0.0,Sci-Fi_1.0,Adventure_0.0,Adventure_1.0,Romance_0.0,Romance_1.0,Children_0.0,Children_1.0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,1,0,1,0,1,0,1,0,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,1,0,1,0,1,0,1,0,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,1,0,1,0,1,0,1,0,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,1,0,1,0,1,0,1,0,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,1,0,1,0,1,0,1,0,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70580,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,1,0,1,0,1,0,1,0,0,1,1,0,1,0,1,0,1,0,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,0,1,1,0,1,0
70581,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,0,1,1,0,1,0,1,0,1,0,1,0,0,1,0,1,1,0,1,0,1,0
70582,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,1,0,1,0,1,0,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0
70583,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,1,1,0,1,0,1,0,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0


In [0]:
X_tripletest_encoded = pd.get_dummies(X_tripletest,columns=X_tripletest.columns)


In [18]:
print(X_tripletest_encoded.shape)
print(X_tripletrain_encoded.shape)


(30251, 6755)
(70585, 9176)


In [19]:
X_train3,X_test3 = X_tripletrain_encoded.align(X_tripletest_encoded, join='outer', axis=1, fill_value=0)
print(X_train3.shape)
print(X_test3.shape)

(70585, 10374)
(30251, 10374)


### **5.3. Network**

In [0]:
model_contentbased = MLPRegressor(hidden_layer_sizes=(2,),
                                       activation='relu',
                                       solver='adam',
                                       learning_rate='constant',
                                       max_iter=50,
                                       learning_rate_init=0.01,
                                       alpha=0.01)

In [0]:
fitter3 = model_contentbased.fit(X_train3,y_train)


In [0]:
prediction_contentbased = model_contentbased.predict(X_test3)


In [27]:
mean_squared_error(y_test,prediction_contentbased)

1.1394780423588815

In [0]:
from math import sqrt


In [30]:
rms = sqrt(mean_squared_error(y_test,prediction_contentbased))
print(rms)

1.0674633681578405


In [31]:
prediction_contentbased

array([3.5118114 , 3.27174515, 3.60699987, ..., 3.49401737, 3.59317645,
       3.76454045])

##**6. Ranking metrics (Optional)**<br>
Search the Internet for a ranking metric and try to evaluate your results with them.

Be careful, very often it is necessary to establish the list of N-top recommendations that your model predicts for each user in order to compare it with the one present in the test set.

## **7. Conclusion**

For non-personalized, collaborative filtering and content-based type recommenders I used MLPRegressor with one hidden layer with 2 neurons and maximum 50 iterations. Then I calculated MSE for each:<br>
For non-personalized (only movie Id) : 0.93<br>
For collaborative filtering (user Id and movie Id) : 0.76<br>
For content-based filtering (user Id, movie Id and movie genres) : 1.06<br>
Since we are calculating MSE, we need to minimize the error which worked for the collaborative filtering but the third system is not accurate with this model. After this submission I'll try to change the parameters in the model and try to evaluate the model. 