## Movies Recommendation using Collaborative Filtering Algorithm

In this notebook a small machine learning model will be trained to make movies recomentations

Objectives:
- Show how a `Colloborative Filtering algorithm` can be used to train a model to make products recommendations.
- Train a small model based on ratings from a small hypothetical movies ratings dataset.
- Make movies recommendation to a user.

In [1]:
import numpy as np
import pandas as pd
import collaborative_filtering_algorithm

### 01- Get Data:
This small dataset contains hypothetical ratings that 10 people have given to 15 famous movies.<br>
The ratings range between 0 and 5 in steps of 0.5. 

In [17]:
# Load hypothetical dataset:
df_raw = pd.read_excel('movies_ratings.xlsx')
df_raw

Unnamed: 0,movieId,Movie Name,1,2,3,4,5,6,7,8,9,10
0,1,Taxi Driver (1976),4.0,,,5.0,4.0,,4.5,,,
1,2,Apollo 13 (1995),,2.0,,2.5,4.5,4.0,3.0,,3.0,3.5
2,3,"Walking Dead, The (1995)",4.0,,,,,5.0,,,,
3,4,Beverly Hills Cop III (1994),4.5,3.5,4.0,3.0,2.5,3.0,2.5,3.0,3.0,3.5
4,5,Harry Potter and the Deathly Hallows: Part 1 (...,,2.0,1.5,4.4,,0.5,1.7,3.3,0.0,3.0
5,6,Rio (2011),3.5,,2.5,,2.5,,3.0,,1.0,
6,7,"Adventures of Tintin, The (2011)",,4.5,,3.5,2.0,,3.5,2.5,1.0,
7,8,"Amazing Spider-Man, The (2012)",4.0,2.5,1.0,5.0,,2.0,,,2.5,1.5
8,9,"Batman: The Dark Knight Returns, Part 1 (2012)",2.5,,1.0,,3.0,,1.5,,1.5,3.5
9,10,Iron Man 3 (2013),,1.0,5.0,,,3.0,,2.0,,


In this dataset each row is a movie, and each column $(1,2,3,...,10)$ correspond to person who gave ratings.

### 02- Train Model:

Collaborative Filtering Algorithm uses the rating that users have given to a product to estime:
- Underlining features of the movies: $X$.
- Underlining features of the user movie taste: $W$ and $b$.

Once these features, about the movies and users, are known the model can predict the rating the $user^{(j)}$ would give to the $movie^{(i)}$ using the following model:
$$
  \hat{y} =\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)}
$$

#### 02.01- Set Model:
To train the Collaborative filtering, it is necessary set the number underlining features $x$ to be discovered.<br>
For the case of movies, the vectors $x$ would capture information in the sense of, how much of commedy, romance, drama, there is in the movie

In [3]:
# For this model, let us set 'Number of Features X' = 10

col_filtr_model = collaborative_filtering_algorithm.Col_filt(nr_features_x=10)

#### 02.02- Train Model:


In [26]:
# Getting dataset in the format for training:
Y = df_raw.iloc[:, 2:]
Y

Unnamed: 0,1,2,3,4,5,6,7,8,9,10
0,4.0,,,5.0,4.0,,4.5,,,
1,,2.0,,2.5,4.5,4.0,3.0,,3.0,3.5
2,4.0,,,,,5.0,,,,
3,4.5,3.5,4.0,3.0,2.5,3.0,2.5,3.0,3.0,3.5
4,,2.0,1.5,4.4,,0.5,1.7,3.3,0.0,3.0
5,3.5,,2.5,,2.5,,3.0,,1.0,
6,,4.5,,3.5,2.0,,3.5,2.5,1.0,
7,4.0,2.5,1.0,5.0,,2.0,,,2.5,1.5
8,2.5,,1.0,,3.0,,1.5,,1.5,3.5
9,,1.0,5.0,,,3.0,,2.0,,


In [5]:
# TRAIN MODEL:


col_filtr_model.train(y= Y.values,        # movies ratings given by the users
                      nr_iterations=5000, # Number of iterations to optimize the cost using gradient descent
                      alpha=1e-4,         # Learning rate in the gradient descent
                      lbda=1e-1,          # Regularizationterm in the minimization of the cost function
                      print_cost=False)   # Do not print cost in each iteration


### 03- Evaluate the predictions:

In [6]:
# SHOW the rating values the model predicted: 

y_hat = np.round(col_filtr_model.y_hat + col_filtr_model.y_mean.reshape(-1,1), 1)
df_y_hat= pd.DataFrame(data=y_hat, columns=range(1,11))

# These are the rating the model would predict:
df_y_hat

Unnamed: 0,1,2,3,4,5,6,7,8,9,10
0,4.0,4.4,4.4,5.0,4.0,4.4,4.3,4.4,4.4,4.4
1,3.2,2.0,3.2,2.5,4.5,4.0,2.8,3.2,3.0,3.5
2,4.0,4.5,4.5,4.5,4.5,5.0,4.5,4.5,4.5,4.5
3,4.5,3.5,4.0,3.0,2.5,3.0,2.8,3.0,3.0,3.5
4,2.0,2.0,1.5,4.4,2.0,0.5,1.8,3.3,0.0,3.0
5,3.5,2.5,2.5,2.5,2.5,2.5,3.0,2.5,1.0,2.5
6,2.8,4.5,2.8,3.5,2.0,2.8,3.5,2.5,1.0,2.8
7,4.0,2.5,1.0,5.0,2.6,2.0,2.6,2.6,2.5,1.5
8,2.5,2.2,1.0,2.2,3.0,2.2,1.4,2.2,1.5,3.5
9,2.8,1.0,5.0,2.8,2.8,3.0,2.8,2.0,2.8,2.8


In [16]:
# Show the difference between the predicted rating values and the actual rated values.
Y.values - df_y_hat.values

array([[ 0. ,  nan,  nan,  0. ,  0. ,  nan,  0.2,  nan,  nan,  nan],
       [ nan,  0. ,  nan,  0. ,  0. ,  0. ,  0.2,  nan,  0. ,  0. ],
       [ 0. ,  nan,  nan,  nan,  nan,  0. ,  nan,  nan,  nan,  nan],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. , -0.3,  0. ,  0. ,  0. ],
       [ nan,  0. ,  0. ,  0. ,  nan,  0. , -0.1,  0. ,  0. ,  0. ],
       [ 0. ,  nan,  0. ,  nan,  0. ,  nan,  0. ,  nan,  0. ,  nan],
       [ nan,  0. ,  nan,  0. ,  0. ,  nan,  0. ,  0. ,  0. ,  nan],
       [ 0. ,  0. ,  0. ,  0. ,  nan,  0. ,  nan,  nan,  0. ,  0. ],
       [ 0. ,  nan,  0. ,  nan,  0. ,  nan,  0.1,  nan,  0. ,  0. ],
       [ nan,  0. ,  0. ,  nan,  nan,  0. ,  nan,  0. ,  nan,  nan],
       [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. , -0.3,  0. ,  0. ,  0. ],
       [ 0. ,  nan,  0. ,  0. ,  nan,  nan, -0.1,  0. ,  nan,  0. ],
       [ 0. ,  0. ,  nan,  nan,  0. ,  nan,  0.1,  0. ,  nan,  0. ],
       [ nan,  nan,  0. ,  0. ,  nan,  0. , -0.1,  nan,  0. ,  nan],
       [ 0. ,  nan,  nan,  0. ,  0

Observe that the model managed to learn very well values of $X$, $W$, and $b$ to fit the given ratings and predict the ratings that a given user did not rate.

### 04- Making Recomendations to an User

Let us assume an user `P` has watched 2 of those 10 movies and gave these ratings:

| Movie      | Rating |
| ----------- | ----------- |
| Annabelle: Creation (2017)      |  2       |
| Harry Potter and the Deathly Hallows: Part 1   | 4.5        |


What movies should we suggest him to watch?



In [25]:

# Defining the rating the User P has given:

ratings_user_p= np.array([
      np.NaN,  #Taxi Driver (1976)
      np.NaN,  #Apollo 13 (1995)
      np.NaN,  #Walking Dead, The (1995)
      np.NaN,  #Beverly Hills Cop III (1994)
        4.5 ,  #Harry Potter and the Deathly Hallows: Part 1
      np.NaN,  #Rio (2011)
      np.NaN,  #Adventures of Tintin, The (2011)
      np.NaN,  #Amazing Spider-Man, The (2012)
      np.NaN,  #Batman: The Dark Knight Returns, Part 1 (2012)
      np.NaN,  #Iron Man 3 (2013)
      np.NaN,  #Monsters University (2013)
      np.NaN,  #Gravity (2013)
      np.NaN,  #The Hunger Games: Catching Fire (2013)
         2  ,  #Annabelle: Creation (2017)
      np.NaN,  #Jurassic World: Fallen Kingdom (2018)
      ])



# Adding the rating of user P to the main array of ratings:

Y[11] = ratings_user_p # User P is in the colummn 11

In [41]:
#TRAIN MODEL
# Now having the ratings from user x in the main array of ratings
#  train the model so the model learns about the preferences of user p.


col_filtr_model.train(y= Y.values,        
                      nr_iterations=5000, 
                      alpha=1e-4,         
                      lbda=1e-1,          
                      print_cost=False)   

In [42]:
# Get the ratings generated by the model:

y_hat = np.round(col_filtr_model.y_hat +
                 col_filtr_model.y_mean.reshape(-1, 1), 1)
# Put the ratings in a dataframe.
df_y_hat = pd.DataFrame(data=y_hat, columns=range(1, 12))
# Add the movies name:
df_y_hat= pd.concat([df_raw['Movie Name'], df_y_hat], axis=1)
# These are the rating the model would predict:
df_y_hat


Unnamed: 0,Movie Name,1,2,3,4,5,6,7,8,9,10,11
0,Taxi Driver (1976),4.0,4.4,4.4,5.0,4.0,4.4,4.5,4.4,4.4,4.4,4.4
1,Apollo 13 (1995),3.2,2.0,3.2,2.5,4.5,4.0,3.0,3.2,3.0,3.5,3.2
2,"Walking Dead, The (1995)",4.0,4.5,4.5,4.5,4.5,5.0,4.5,4.5,4.5,4.5,4.5
3,Beverly Hills Cop III (1994),4.7,3.6,3.9,3.5,2.3,3.1,2.4,3.1,3.1,3.7,3.2
4,Harry Potter and the Deathly Hallows: Part 1 (...,2.3,2.0,1.5,4.3,2.3,0.5,1.7,3.3,-0.0,3.0,4.5
5,Rio (2011),3.5,2.5,2.5,2.5,2.5,2.5,3.0,2.5,1.0,2.5,2.5
6,"Adventures of Tintin, The (2011)",2.8,4.5,2.8,3.5,2.0,2.8,3.5,2.5,1.0,2.8,2.8
7,"Amazing Spider-Man, The (2012)",4.0,2.5,1.0,5.0,2.6,2.0,2.6,2.6,2.5,1.5,2.6
8,"Batman: The Dark Knight Returns, Part 1 (2012)",2.5,2.2,1.0,2.2,3.0,2.2,1.5,2.2,1.5,3.5,2.2
9,Iron Man 3 (2013),2.8,1.0,5.0,2.8,2.8,3.0,2.8,2.0,2.8,2.8,2.8


In [43]:
# Suggest 3 movies to the user p:

df_y_hat[['Movie Name', 11]].sort_values(by=11, ascending=False)

Unnamed: 0,Movie Name,11
2,"Walking Dead, The (1995)",4.5
4,Harry Potter and the Deathly Hallows: Part 1 (...,4.5
0,Taxi Driver (1976),4.4
12,The Hunger Games: Catching Fire (2013),3.3
1,Apollo 13 (1995),3.2
3,Beverly Hills Cop III (1994),3.2
10,Monsters University (2013),3.1
6,"Adventures of Tintin, The (2011)",2.8
9,Iron Man 3 (2013),2.8
7,"Amazing Spider-Man, The (2012)",2.6


Based on the model predictions,the user to watch the movies:
- Walking Dead, The (1995)
- Taxi Driver (1976)
- The Hunger Games: Catching Fire (2013)	

Author:<br>
**Emerson Goncalves**