# <img align="left" src="./images/film_strip_vertical.png"     style=" width:40px;  " > Practice lab: Deep Learning for Content-Based Filtering

In this exercise, you will implement content-based filtering using a neural network to build a recommender system for movies. 


# Outline
- [ 1 - Packages ](#1)
- [ 2 - Movie ratings dataset ](#2)
- [ 3 - Content-based filtering with a neural network](#3)
  - [ 3.1 Training Data](#3.1)
  - [ 3.2 Preparing the training data](#3.2)
- [ 4 - Neural Network for content-based filtering](#4)
  - [ Exercise 1](#ex01)
- [ 5 - Predictions](#5)
  - [ 5.1 - Predictions for a new user](#5.1)
  - [ 5.2 - Predictions for an existing user.](#5.2)
  - [ 5.3 - Finding Similar Items](#5.3)
    - [ Exercise 2](#ex02)
- [ 6 - Congratulations! ](#6)


_**NOTE:** To prevent errors from the autograder, you are not allowed to edit or delete non-graded cells in this lab. Please also refrain from adding any new cells. 
**Once you have passed this assignment** and want to experiment with any of the non-graded code, you may follow the instructions at the bottom of this notebook._

<a name="1"></a>
## 1 - Packages <img align="left" src="./images/movie_camera.png"     style=" width:40px;  ">
We will use familiar packages, NumPy, TensorFlow and helpful routines from [scikit-learn](https://scikit-learn.org/stable/). We will also use [tabulate](https://pypi.org/project/tabulate/) to neatly print tables and [Pandas](https://pandas.pydata.org/) to organize tabular data.

In [2]:
import numpy as np
import numpy.ma as ma
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
import tabulate
from recsysNN_utils import *
pd.set_option("display.precision", 1)

<a name="2"></a>
## 2 - Movie ratings dataset <img align="left" src="./images/film_rating.png" style=" width:40px;" >
The data set is derived from the [MovieLens ml-latest-small](https://grouplens.org/datasets/movielens/latest/) dataset. 

[F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. <https://doi.org/10.1145/2827872>]

The original dataset has roughly 9000 movies rated by 600 users with ratings on a scale of 0.5 to 5 in 0.5 step increments. The dataset has been reduced in size to focus on movies from the years since 2000 and popular genres. The reduced dataset has $n_u = 397$ users, $n_m= 847$ movies and 25521 ratings. For each movie, the dataset provides a movie title, release date, and one or more genres. For example "Toy Story 3" was released in 2010 and has several genres: "Adventure|Animation|Children|Comedy|Fantasy". This dataset contains little information about users other than their ratings. This dataset is used to create training vectors for the neural networks described below. 
Let's learn a bit more about this data set. The table below shows the top 10 movies ranked by the number of ratings. These movies also happen to have high average ratings. How many of these movies have you watched? 

In [3]:
top10_df = pd.read_csv("./data/content_top10_df.csv")
bygenre_df = pd.read_csv("./data/content_bygenre_df.csv")
top10_df

Unnamed: 0,movie id,num ratings,ave rating,title,genres
0,4993,198,4.1,"Lord of the Rings: The Fellowship of the Ring,...",Adventure|Fantasy
1,5952,188,4.0,"Lord of the Rings: The Two Towers, The",Adventure|Fantasy
2,7153,185,4.1,"Lord of the Rings: The Return of the King, The",Action|Adventure|Drama|Fantasy
3,4306,170,3.9,Shrek,Adventure|Animation|Children|Comedy|Fantasy|Ro...
4,58559,149,4.2,"Dark Knight, The",Action|Crime|Drama
5,6539,149,3.8,Pirates of the Caribbean: The Curse of the Bla...,Action|Adventure|Comedy|Fantasy
6,79132,143,4.1,Inception,Action|Crime|Drama|Mystery|Sci-Fi|Thriller
7,6377,141,4.0,Finding Nemo,Adventure|Animation|Children|Comedy
8,4886,132,3.9,"Monsters, Inc.",Adventure|Animation|Children|Comedy|Fantasy
9,7361,131,4.2,Eternal Sunshine of the Spotless Mind,Drama|Romance|Sci-Fi


The next table shows information sorted by genre. The number of ratings per genre vary substantially. Note that a movie may have multiple genre's so the sum of the ratings below is larger than the number of original ratings.

In [4]:
bygenre_df

Unnamed: 0,genre,num movies,ave rating/genre,ratings per genre
0,Action,321,3.4,10377
1,Adventure,234,3.4,8785
2,Animation,76,3.6,2588
3,Children,69,3.4,2472
4,Comedy,326,3.4,8911
5,Crime,139,3.5,4671
6,Documentary,13,3.8,280
7,Drama,342,3.6,10201
8,Fantasy,124,3.4,4468
9,Horror,56,3.2,1345


<a name="3"></a>
## 3 - Content-based filtering with a neural network

In the collaborative filtering lab, you generated two vectors, a user vector and an item/movie vector whose dot product would predict a rating. The vectors were derived solely from the ratings.   

Content-based filtering also generates a user and movie feature vector but recognizes there may be other information available about the user and/or movie that may improve the prediction. The additional information is provided to a neural network which then generates the user and movie vector as shown below.
<figure>
    <center> <img src="./images/RecSysNN.png"   style="width:500px;height:280px;" ></center>
</figure>

<a name="3.1"></a>
### 3.1 Training Data
The movie content provided to the network is a combination of the original data and some 'engineered features'. Recall the feature engineering discussion and lab from Course 1, Week 2, lab 4. The original features are the year the movie was released and the movie's genre's presented as a one-hot vector. There are 14 genres. The engineered feature is an average rating derived from the user ratings. 

The user content is composed of engineered features. A per genre average rating is computed per user. Additionally, a user id, rating count and rating average are available but not included in the training or prediction content. They are carried with the data set because they are useful in interpreting data.

The training set consists of all the ratings made by the users in the data set. Some ratings are repeated to boost the number of training examples of underrepresented genre's. The training set is split into two arrays with the same number of entries, a user array and a movie/item array.  

Below, let's load and display some of the data.

In [5]:
item_train, user_train, y_train, item_features, user_features, item_vecs, movie_dict, user_to_genre = load_data()
user_train.shape #(50884, 17)
user_train[:3]

array([[ 2.  , 22.  ,  4.  ,  3.95,  4.25,  0.  ,  0.  ,  4.  ,  4.12,
         4.  ,  4.04,  0.  ,  3.  ,  4.  ,  0.  ,  3.88,  3.89],
       [ 2.  , 22.  ,  4.  ,  3.95,  4.25,  0.  ,  0.  ,  4.  ,  4.12,
         4.  ,  4.04,  0.  ,  3.  ,  4.  ,  0.  ,  3.88,  3.89],
       [ 2.  , 22.  ,  4.  ,  3.95,  4.25,  0.  ,  0.  ,  4.  ,  4.12,
         4.  ,  4.04,  0.  ,  3.  ,  4.  ,  0.  ,  3.88,  3.89]])

In [6]:
item_train.shape

(50884, 17)

In [7]:
# Load Data, set configuration variables
item_train, user_train, y_train, item_features, user_features, item_vecs, movie_dict, user_to_genre = load_data()

num_user_features = user_train.shape[1] - 3  # remove userid, rating count and ave rating during training
num_item_features = item_train.shape[1] - 1  # remove movie id at train time
uvs = 3  # user genre vector start
ivs = 3  # item genre vector start
u_s = 3  # start of columns to use in training, user
i_s = 1  # start of columns to use in training, items
print(f"Number of training vectors: {len(item_train)}")

Number of training vectors: 50884


In [55]:
num_user_features

14

In [54]:
num_item_features

16

In [21]:
item_vecs

array([[4.05400000e+03, 2.00100000e+03, 2.84375000e+00, ...,
        1.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [4.06900000e+03, 2.00100000e+03, 2.90909091e+00, ...,
        1.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [4.14800000e+03, 2.00100000e+03, 2.93589744e+00, ...,
        0.00000000e+00, 0.00000000e+00, 1.00000000e+00],
       ...,
       [1.77765000e+05, 2.01700000e+03, 3.53846154e+00, ...,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
       [1.79819000e+05, 2.01700000e+03, 3.12500000e+00, ...,
        0.00000000e+00, 1.00000000e+00, 0.00000000e+00],
       [1.87593000e+05, 2.01800000e+03, 3.87500000e+00, ...,
        0.00000000e+00, 1.00000000e+00, 0.00000000e+00]], shape=(847, 17))

Let's look at the first few entries in the user training array.

In [8]:
pprint_train(user_train, user_features, uvs,  u_s, maxcount=5)

[user id],[rating count],[rating ave],Act ion,Adve nture,Anim ation,Chil dren,Com edy,Crime,Docum entary,Drama,Fan tasy,Hor ror,Mys tery,Rom ance,Sci -Fi,Thri ller
2,22,4.0,4.0,4.2,0.0,0.0,4.0,4.1,4.0,4.0,0.0,3.0,4.0,0.0,3.9,3.9
2,22,4.0,4.0,4.2,0.0,0.0,4.0,4.1,4.0,4.0,0.0,3.0,4.0,0.0,3.9,3.9
2,22,4.0,4.0,4.2,0.0,0.0,4.0,4.1,4.0,4.0,0.0,3.0,4.0,0.0,3.9,3.9
2,22,4.0,4.0,4.2,0.0,0.0,4.0,4.1,4.0,4.0,0.0,3.0,4.0,0.0,3.9,3.9
2,22,4.0,4.0,4.2,0.0,0.0,4.0,4.1,4.0,4.0,0.0,3.0,4.0,0.0,3.9,3.9


Some of the user and item/movie features are not used in training. In the table above, the features in brackets "[]" such as the "user id", "rating count" and "rating ave" are not included when the model is trained and used.
Above you can see the per genre rating average for user 2. Zero entries are genre's which the user had not rated. The user vector is the same for all the movies rated by a user.  
Let's look at the first few entries of the movie/item array.

In [9]:
pprint_train(item_train, item_features, ivs, i_s, maxcount=5, user=False)

[movie id],year,ave rating,Act ion,Adve nture,Anim ation,Chil dren,Com edy,Crime,Docum entary,Drama,Fan tasy,Hor ror,Mys tery,Rom ance,Sci -Fi,Thri ller
6874,2003,4.0,1,0,0,0,0,1,0,0,0,0,0,0,0,1
8798,2004,3.8,1,0,0,0,0,1,0,1,0,0,0,0,0,1
46970,2006,3.2,1,0,0,0,1,0,0,0,0,0,0,0,0,0
48516,2006,4.3,0,0,0,0,0,1,0,1,0,0,0,0,0,1
58559,2008,4.2,1,0,0,0,0,1,0,1,0,0,0,0,0,0


Above, the movie array contains the year the film was released, the average rating and an indicator for each potential genre. The indicator is one for each genre that applies to the movie. The movie id is not used in training but is useful when interpreting the data.

In [10]:
print(f"y_train[:5]: {y_train[:5]}")

y_train[:5]: [4.  3.5 4.  4.  4.5]


The target, y, is the movie rating given by the user. 

Above, we can see that movie 6874 is an Action/Crime/Thriller movie released in 2003. User 2 rates action movies as 3.9 on average. MovieLens users gave the movie an average rating of 4. 'y' is 4 indicating user 2 rated movie 6874 as a 4 as well. A single training example consists of a row from both the user and item arrays and a rating from y_train.

<a name="3.2"></a>
### 3.2 Preparing the training data
Recall in Course 1, Week 2, you explored feature scaling as a means of improving convergence. We'll scale the input features using the [scikit learn StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html). This was used in Course 1, Week 2, Lab 5.  Below, the inverse_transform is also shown to produce the original inputs. We'll scale the target ratings using a Min Max Scaler which scales the target to be between -1 and 1. [scikit learn MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html)

In [11]:
# scale training data
item_train_unscaled = item_train
user_train_unscaled = user_train
y_train_unscaled    = y_train

scalerItem = StandardScaler()
item_train = scalerItem.fit_transform(item_train)
'''
StandardScaler(): This is a class from the sklearn.preprocessing module, which is used to standardize or normalize the features of your data.
Standardization means scaling your data such that it has:
A mean of 0.
A standard deviation of 1.
scalerItem: This is an instance of the StandardScaler class. It will later be used to fit to your data and transform it (i.e., standardize it).
fit_transform() is a combined method that:
fit(): Calculates the mean and standard deviation of each feature (column) in the input data (item_train), using the training data.
This is done for each column in the data (i.e., feature-wise).
transform(): Then scales the data by subtracting the mean and dividing by the standard deviation, so that:
Each feature in the dataset has a mean of 0.
Each feature has a standard deviation of 1.
item_train: This is the transformed data where the features have been standardized.
'''

scalerUser = StandardScaler()
user_train = scalerUser.fit_transform(user_train)

scalerTarget = MinMaxScaler((-1, 1))  # Set the range from -1 to 1
y_train = scalerTarget.fit_transform(y_train.reshape(-1, 1))  # Fit and transform in one step
'''
MinMaxScaler: This is a class from sklearn.preprocessing that scales data to a given range. By default, it scales the data to a range of 0 to 1, but here we specify (-1, 1) as the desired range.
(-1, 1): This means that the smallest value in your data will be scaled to -1, and the largest value will be scaled to 1. All the values in between will be scaled proportionally.
scalerTarget: This is an instance of the MinMaxScaler initialized with the range (-1, 1). It's ready to be used to scale your data.
fit_transform(): This is a combination of two methods: fit() and transform() in one step. Here's how they work:
fit(): The fit() method computes the minimum and maximum values of the data. It uses y_train.reshape(-1, 1) to convert y_train into a 2D array (required by the MinMaxScaler).
transform(): After fitting, transform() scales the data based on the min and max values that were calculated during the fit() step. It scales each value in y_train to be between -1 and 1.
reshape(-1, 1): This reshapes the 1D array y_train (which has shape (n,)) into a 2D array with shape (n, 1). This is necessary because MinMaxScaler expects a 2D array (i.e., a matrix) for scaling.
The result: The scaled data is stored back in y_train. After this line, y_train contains values that are scaled between -1 and 1.
'''
#ynorm_test = scalerTarget.transform(y_test.reshape(-1, 1))

print(np.allclose(item_train_unscaled, scalerItem.inverse_transform(item_train))) #is used to check whether the transformation and the inverse transformation of item_train match the original unscaled data (item_train_unscaled).
'''
scalerItem.inverse_transform(item_train):
The inverse_transform() method reverses the scaling operation that was applied earlier by the StandardScaler.
This method restores the original values (or close to it) by applying the inverse of the scaling transformation. Specifically, it applies:
𝑥_original = 𝑥_scaled × std + mean
where std and mean are the values learned by StandardScaler during the fit() step.
np.allclose():
This function checks whether two arrays are element-wise equal within a tolerance. It is used here to check if the restored values from the inverse transformation are close enough to the original unscaled values (item_train_unscaled).
It returns True if the arrays are equal within the specified tolerance, and False if they are not.
Purpose:
The goal of this code is to confirm that the inverse transformation has successfully restored the original data. If np.allclose() returns True, it means that the scaling and inverse scaling process has worked correctly, and the data matches.
If it returns False, there might be a discrepancy between the original and restored data, indicating an issue in the transformation process (e.g., incorrect scaling, or the unscaled data might not match exactly due to floating-point precision).
'''
print(np.allclose(user_train_unscaled, scalerUser.inverse_transform(user_train)))

True
True


To allow us to evaluate the results, we will split the data into training and test sets as was discussed in Course 2, Week 3. Here we will use [sklean train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) to split and shuffle the data. Note that setting the initial random state to the same value ensures item, user, and y are shuffled identically.

In [12]:
item_train, item_test = train_test_split(item_train, train_size=0.80, shuffle=True, random_state=1)
'''
Explanation of Arguments:
item_train:
This is the dataset you want to split. It could be a NumPy array, pandas DataFrame, or any other data structure containing your data.
train_size=0.80:
This specifies the proportion of the data to include in the training set. Here, 80% of the data (0.80) will be used for training, and the remaining 20% will be used for testing.
shuffle=True:
This argument ensures that the data is shuffled randomly before splitting. If set to True, the data will be shuffled before splitting into train and test sets.
This is useful to ensure that the data is randomly distributed and that the model doesn't learn any patterns based on the order of the data (which might lead to bias).
random_state=1:
This is the random seed used for reproducibility. It ensures that every time you run the code with the same data, you get the same random split.
Setting random_state=1 guarantees that the shuffle operation will be deterministic, i.e., the split will be the same every time the code is run.
'''
user_train, user_test = train_test_split(user_train, train_size=0.80, shuffle=True, random_state=1)
y_train, y_test       = train_test_split(y_train,    train_size=0.80, shuffle=True, random_state=1)
print(f"movie/item training data shape: {item_train.shape}")
print(f"movie/item test data shape: {item_test.shape}")

print(f"user training data shape: {user_train.shape}")
print(f"user test data shape: {user_test.shape}")

movie/item training data shape: (40707, 17)
movie/item test data shape: (10177, 17)
user training data shape: (40707, 17)
user test data shape: (10177, 17)


The scaled, shuffled data now has a mean of zero.

In [13]:
pprint_train(user_train, user_features, uvs, u_s, maxcount=5)

[user id],[rating count],[rating ave],Act ion,Adve nture,Anim ation,Chil dren,Com edy,Crime,Docum entary,Drama,Fan tasy,Hor ror,Mys tery,Rom ance,Sci -Fi,Thri ller
1,0,-1.0,-0.8,-0.7,0.1,-0.0,-1.2,-0.4,0.6,-0.5,-0.5,-0.1,-0.6,-0.6,-0.7,-0.7
0,1,-0.7,-0.5,-0.7,-0.1,-0.2,-0.6,-0.2,0.7,-0.5,-0.8,0.1,-0.0,-0.6,-0.5,-0.4
-1,-1,-0.2,0.3,-0.4,0.4,0.5,1.0,0.6,-1.2,-0.3,-0.6,-2.3,-0.1,0.0,0.4,-0.0
0,-1,0.6,0.5,0.5,0.2,0.6,-0.1,0.5,-1.2,0.9,1.2,-2.3,-0.1,0.0,0.2,0.3
-1,0,0.7,0.6,0.5,0.3,0.5,0.4,0.6,1.0,0.6,0.3,0.8,0.8,0.4,0.7,0.7


<a name="4"></a>
## 4 - Neural Network for content-based filtering
Now, let's construct a neural network as described in the figure above. It will have two networks that are combined by a dot product. You will construct the two networks. In this example, they will be identical. Note that these networks do not need to be the same. If the user content was substantially larger than the movie content, you might elect to increase the complexity of the user network relative to the movie network. In this case, the content is similar, so the networks are the same.

<a name="ex01"></a>
### Exercise 1

- Use a Keras sequential model
    - The first layer is a dense layer with 256 units and a relu activation.
    - The second layer is a dense layer with 128 units and a relu activation.
    - The third layer is a dense layer with `num_outputs` units and a linear or no activation.   
    
The remainder of the network will be provided. The provided code does not use the Keras sequential model but instead uses the Keras [functional api](https://keras.io/guides/functional_api/). This format allows for more flexibility in how components are interconnected.


In [14]:
# GRADED_CELL
# UNQ_C1
from tensorflow.keras.layers import Lambda

num_outputs = 32
num_user_features = user_train.shape[1] - 3
tf.random.set_seed(1)
user_NN = tf.keras.models.Sequential([
    ### START CODE HERE ###   
    tf.keras.layers.Dense(units = 256, activation='relu'),
    tf.keras.layers.Dense(units = 128, activation='relu'),
    tf.keras.layers.Dense(units = num_outputs, activation='linear'),
    ### END CODE HERE ###  
])

item_NN = tf.keras.models.Sequential([
    ### START CODE HERE ###     
    tf.keras.layers.Dense(units = 256, activation='relu'),
    tf.keras.layers.Dense(units = 128, activation='relu'),
    tf.keras.layers.Dense(units = num_outputs, activation='linear'),  
    ### END CODE HERE ###  
])

# create the user input and point to the base network
input_user = tf.keras.layers.Input(shape=(num_user_features,))
# vu = user_NN(input_user)
# vu = tf.linalg.l2_normalize(vu, axis=1)
# Instead of using tf.linalg.l2_normalize, use L2Normalization layer
# Create user input layer
vu = user_NN(input_user)

# Normalize the output using L2 Normalization (using Lambda layer)
vu = Lambda(lambda x: tf.nn.l2_normalize(x, axis=1))(vu)

# create the item input and point to the base network
input_item = tf.keras.layers.Input(shape=(num_item_features,))
# vm = item_NN(input_item)
# vm = tf.linalg.l2_normalize(vm, axis=1)
# Create user input layer
vm = item_NN(input_item)

# Normalize the output using L2 Normalization (using Lambda layer)
vm = Lambda(lambda x: tf.nn.l2_normalize(x, axis=1))(vm)
# compute the dot product of the two vectors vu and vm
output = tf.keras.layers.Dot(axes=1)([vu, vm]) #The Dot(axes=1) layer computes the dot product of corresponding rows in vu and vm.

# specify the inputs and output of the model
model = tf.keras.Model([input_user, input_item], output)
print(vm.shape)
model.summary()

# Summary of the Model Structure:
# Input Layer: Accepts data with shape (None, 14) and (None, 16) for user and item features respectively.
# Sequential Layers: The user and item features are passed through two fully connected layers with 32 output features each.
# Lambda Layers: Custom transformations (like normalization) are applied to the user and item features.
# Dot Layer: Computes the dot product of the user and item vectors to generate a similarity score or prediction.

(None, 32)


In [16]:
# Public tests
from public_tests import *
# test_tower(user_NN)
# test_tower(item_NN)

<details>
  <summary><font size="3" color="darkgreen"><b>Click for hints</b></font></summary>
    
  You can create a dense layer with a relu activation as shown.
    
```python     
user_NN = tf.keras.models.Sequential([
    ### START CODE HERE ###     
  tf.keras.layers.Dense(256, activation='relu'),

    
    ### END CODE HERE ###  
])

item_NN = tf.keras.models.Sequential([
    ### START CODE HERE ###     
  tf.keras.layers.Dense(256, activation='relu'),

    
    ### END CODE HERE ###  
])
```    
<details>
    <summary><font size="2" color="darkblue"><b> Click for solution</b></font></summary>
    
```python 
user_NN = tf.keras.models.Sequential([
    ### START CODE HERE ###     
  tf.keras.layers.Dense(256, activation='relu'),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(num_outputs),
    ### END CODE HERE ###  
])

item_NN = tf.keras.models.Sequential([
    ### START CODE HERE ###     
  tf.keras.layers.Dense(256, activation='relu'),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(num_outputs),
    ### END CODE HERE ###  
])
```
</details>
</details>

    


We will use a mean squared error loss and an Adam optimizer.

In [17]:
tf.random.set_seed(1)
cost_fn = tf.keras.losses.MeanSquaredError()
opt = keras.optimizers.Adam(learning_rate=0.01)
model.compile(optimizer=opt,
              loss=cost_fn)

In [None]:
tf.random.set_seed(1)
model.fit([user_train[:, u_s:], item_train[:, i_s:]], y_train, epochs=30) #This is the function that trains the model on your data.
'''
Breakdown of model.fit():
    model: The Keras model that you are training. It should already be defined and compiled.
user_train[:, u_s:] and item_train[:, i_s:]:
    user_train and item_train are the training datasets for the user and item features, respectively. These datasets are likely matrices or 2D arrays where each row corresponds to a sample, and each column corresponds to a feature.
    Slicing: The notation [:, u_s:] means "take all rows and from column u_s onwards", i.e., you are selecting a subset of the columns starting from index u_s for user_train, and similarly from index i_s for item_train.
    Why slicing?: It's likely that user_train and item_train contain different sets of features for the user and item, and you're selecting a specific part of those features that corresponds to the features needed for training.
    u_s and i_s: These are the starting column indices for the user and item features, respectively. They are parameters that indicate where to start selecting the columns for training.
y_train: This is the target variable (labels) for the training set. It corresponds to the actual values you are trying to predict.
epochs=30: The number of epochs specifies how many times the entire dataset is passed through the model during training.
    Epoch: One epoch means that every sample in the dataset has been used once to update the model's weights.
    30 epochs means that the model will iterate 30 times over the entire training dataset to learn the patterns.

How the training works in this case:
    The model is trained using the training data provided by user_train and item_train (with columns sliced by u_s and i_s respectively) as inputs and y_train as the target.
    The training runs for 30 epochs, meaning the model will update its parameters after seeing the entire training data 30 times.

The fit() method updates the model's internal weights using backpropagation and gradient descent during each epoch based on the loss calculated from the difference between the model's predictions and y_train.

Summary:
tf.random.set_seed(1) ensures reproducibility.
model.fit() trains the model for 30 epochs using the user and item data (with specific columns selected using slicing), and y_train as the target labels.
'''

Epoch 1/30
[1m1273/1273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - loss: 0.1310
Epoch 2/30
[1m1273/1273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 0.1142
Epoch 3/30
[1m1273/1273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 0.1104
Epoch 4/30
[1m1273/1273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 0.1073
Epoch 5/30
[1m1273/1273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 0.1045
Epoch 6/30
[1m1273/1273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 0.1018
Epoch 7/30
[1m1273/1273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 0.0989
Epoch 8/30
[1m1273/1273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 0.0967
Epoch 9/30
[1m1273/1273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 0.0946
Epoch 10/30
[1m1273/1273[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m

<keras.src.callbacks.history.History at 0x3154dda50>

Evaluate the model to determine loss on the test data. 

In [None]:
model.evaluate([user_test[:, u_s:], item_test[:, i_s:]], y_test)
'''
model.evaluate() is used to evaluate the performance of the trained model on the test set. It computes the loss and any additional metrics specified when compiling the model. This is typically done after training is complete, to understand how well the model is performing on unseen data (test data).
What model.evaluate() does:
    The method computes the loss (such as Mean Squared Error, Binary Cross-Entropy, etc.) and any other metrics that were specified during model compilation (like accuracy, precision, etc.) on the test set.
    It returns the loss value and metrics, which help you understand how well the model generalizes to unseen data.
'''

[1m319/319[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 605us/step - loss: 0.0879


0.08472093939781189

loss: 0.0879 is the loss calculated during evaluation on the test set.

0.0847... is the final reported loss value, which is slightly more precise.

It is comparable to the training loss indicating the model has not substantially overfit the training data.

<a name="5"></a>
## 5 - Predictions
Below, you'll use your model to make predictions in a number of circumstances. 
<a name="5.1"></a>
### 5.1 - Predictions for a new user
First, we'll create a new user and have the model suggest movies for that user. After you have tried this on the example user content, feel free to change the user content to match your own preferences and see what the model suggests. Note that ratings are between 0.5 and 5.0, inclusive, in half-step increments.

In [20]:
new_user_id = 5000
new_rating_ave = 0.0
new_action = 0.0
new_adventure = 5.0
new_animation = 0.0
new_childrens = 0.0
new_comedy = 0.0
new_crime = 0.0
new_documentary = 0.0
new_drama = 0.0
new_fantasy = 5.0
new_horror = 0.0
new_mystery = 0.0
new_romance = 0.0
new_scifi = 0.0
new_thriller = 0.0
new_rating_count = 3

user_vec = np.array([[new_user_id, new_rating_count, new_rating_ave,
                      new_action, new_adventure, new_animation, new_childrens,
                      new_comedy, new_crime, new_documentary,
                      new_drama, new_fantasy, new_horror, new_mystery,
                      new_romance, new_scifi, new_thriller]])

The new user enjoys movies from the adventure, fantasy genres. Let's find the top-rated movies for the new user.  
Below, we'll use a set of movie/item vectors, `item_vecs` that have a vector for each movie in the training/test set. This is matched with the new user vector above and the scaled vectors are used to predict ratings for all the movies.

In [None]:
# generate and replicate the user vector to match the number movies in the data set.
user_vecs = gen_user_vecs(user_vec,len(item_vecs))

# scale our user and item vectors
suser_vecs = scalerUser.transform(user_vecs)
sitem_vecs = scalerItem.transform(item_vecs)

# make a prediction
y_p = model.predict([suser_vecs[:, u_s:], sitem_vecs[:, i_s:]])

# unscale y prediction 
y_pu = scalerTarget.inverse_transform(y_p)

# sort the results, highest prediction first
sorted_index = np.argsort(-y_pu,axis=0).reshape(-1).tolist()  #negate to get largest rating first
'''
np.argsort(-y_pu, axis=0):
np.argsort(): This is a NumPy function that returns the indices that would sort an array. In other words, it gives you the order of the elements from smallest to largest (by default).
-y_pu: This negates the values of y_pu. The reason for negating is to change the sorting order. Normally, np.argsort() will return the indices in ascending order, but by negating y_pu, you ensure that the largest values come first. For example:
If y_pu = [3, 1, 5, 4], then -y_pu = [-3, -1, -5, -4], and sorting the negative values will result in the indices [2, 3, 0, 1], which corresponds to the order of largest to smallest values in the original array.
axis=0: This indicates that the sorting will be done along the first axis (typically rows). In this case, if y_pu is a 1D array, this is just a simple sort. If y_pu is a 2D array, it will sort each column individually along the rows.
So, this step is sorting the values in y_pu in descending order, and the function returns the indices of the sorted elements.
.reshape(-1):
reshape(-1): This reshapes the array to a 1D vector. The -1 means "flatten the array," regardless of its original shape. It makes sure that the output of np.argsort() is in a 1D form, which is useful for accessing elements sequentially.
.tolist():
.tolist(): This converts the NumPy array into a standard Python list. This makes the output easier to work with in Python if you need to access or manipulate it using typical Python list operations.
What does this code do?
The goal of this code is to sort the elements in y_pu in descending order and return the indices of the elements in that order.
The sorted indices are then reshaped into a 1D list and returned as a Python list.
'''
sorted_ypu   = y_pu[sorted_index]
sorted_items = item_vecs[sorted_index]  #using unscaled vectors for display

print_pred_movies(sorted_ypu, sorted_items, movie_dict, maxcount = 10)
type(y_p)


[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


y_p,movie id,rating ave,title,genres
4.2,54001,3.9,Harry Potter and the Order of the Phoenix (2007),Adventure|Drama|Fantasy
4.2,8368,3.9,Harry Potter and the Prisoner of Azkaban (2004),Adventure|Fantasy
4.2,6539,3.8,Pirates of the Caribbean: The Curse of the Black Pearl (2003),Action|Adventure|Comedy|Fantasy
4.2,59387,4.0,"Fall, The (2006)",Adventure|Drama|Fantasy
4.1,81834,4.0,Harry Potter and the Deathly Hallows: Part 1 (2010),Action|Adventure|Fantasy
4.1,98809,3.8,"Hobbit: An Unexpected Journey, The (2012)",Adventure|Fantasy
4.1,40815,3.8,Harry Potter and the Goblet of Fire (2005),Adventure|Fantasy|Thriller
4.1,5816,3.6,Harry Potter and the Chamber of Secrets (2002),Adventure|Fantasy
4.1,5952,4.0,"Lord of the Rings: The Two Towers, The (2002)",Adventure|Fantasy
4.1,106489,3.6,"Hobbit: The Desolation of Smaug, The (2013)",Adventure|Fantasy


In [24]:
type(y_p)
print(y_p.shape)

(847, 1)


<a name="5.2"></a>
### 5.2 - Predictions for an existing user.
Let's look at the predictions for "user 2", one of the users in the data set. We can compare the predicted ratings with the model's ratings.

In [27]:
uid = 2 
# form a set of user vectors. This is the same vector, transformed and repeated.
user_vecs, y_vecs = get_user_vecs(uid, user_train_unscaled, item_vecs, user_to_genre)

# scale our user and item vectors
suser_vecs = scalerUser.transform(user_vecs)
sitem_vecs = scalerItem.transform(item_vecs)

# make a prediction
y_p = model.predict([suser_vecs[:, u_s:], sitem_vecs[:, i_s:]])

# unscale y prediction 
y_pu = scalerTarget.inverse_transform(y_p)

# sort the results, highest prediction first
sorted_index = np.argsort(-y_pu,axis=0).reshape(-1).tolist()  #negate to get largest rating first
sorted_ypu   = y_pu[sorted_index]
sorted_items = item_vecs[sorted_index]  #using unscaled vectors for display
sorted_user  = user_vecs[sorted_index]
sorted_y     = y_vecs[sorted_index]

#print sorted predictions for movies rated by the user
print_existing_user(sorted_ypu, sorted_y.reshape(-1,1), sorted_user, sorted_items, ivs, uvs, movie_dict, maxcount = 50)

[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 


y_p,y,user,user genre ave,movie rating ave,movie id,title,genres
4.5,5.0,2,[4.0],4.3,80906,Inside Job (2010),Documentary
4.2,4.0,2,"[4.0,4.1,4.0,4.0,3.9,3.9]",4.1,79132,Inception (2010),Action|Crime|Drama|Mystery|Sci-Fi|Thriller
4.2,5.0,2,"[4.0,4.1,4.0]",3.9,106782,"Wolf of Wall Street, The (2013)",Comedy|Crime|Drama
4.1,4.5,2,"[4.0,4.1,4.0]",4.2,58559,"Dark Knight, The (2008)",Action|Crime|Drama
4.1,3.0,2,[3.9],4.0,109487,Interstellar (2014),Sci-Fi
4.1,4.0,2,"[4.1,4.0,3.9]",4.3,48516,"Departed, The (2006)",Crime|Drama|Thriller
4.1,4.0,2,"[4.0,4.1,3.9]",4.0,6874,Kill Bill: Vol. 1 (2003),Action|Crime|Thriller
4.0,4.5,2,"[4.0,4.0]",4.1,68157,Inglourious Basterds (2009),Action|Drama
4.0,3.5,2,"[4.0,3.9,3.9]",3.9,115713,Ex Machina (2015),Drama|Sci-Fi|Thriller
4.0,3.5,2,"[4.0,4.0]",3.9,99114,Django Unchained (2012),Action|Drama


In [39]:
sorted_y[sorted_y !=0].shape

(22,)

The model prediction is generally within 1 of the actual rating though it is not a very accurate predictor of how a user rates specific movies. This is especially true if the user rating is significantly different than the user's genre average. You can vary the user id above to try different users. Not all user id's were used in the training set.

<a name="5.3"></a>
### 5.3 - Finding Similar Items
The neural network above produces two feature vectors, a user feature vector $v_u$, and a movie feature vector, $v_m$. These are 32 entry vectors whose values are difficult to interpret. However, similar items will have similar vectors. This information can be used to make recommendations. For example, if a user has rated "Toy Story 3" highly, one could recommend similar movies by selecting movies with similar movie feature vectors.

A similarity measure is the squared distance between the two vectors $ \mathbf{v_m^{(k)}}$ and $\mathbf{v_m^{(i)}}$ :
$$\left\Vert \mathbf{v_m^{(k)}} - \mathbf{v_m^{(i)}}  \right\Vert^2 = \sum_{l=1}^{n}(v_{m_l}^{(k)} - v_{m_l}^{(i)})^2\tag{1}$$

<a name="ex02"></a>
### Exercise 2

Write a function to compute the square distance.

In [48]:
# GRADED_FUNCTION: sq_dist
# UNQ_C2
def sq_dist(a,b):
    """
    Returns the squared distance between two vectors
    Args:
      a (ndarray (n,)): vector with n features
      b (ndarray (n,)): vector with n features
    Returns:
      d (float) : distance
    """
    ### START CODE HERE ###     
    d = np.sum((a-b)**2)
    ### END CODE HERE ###     
    return d

In [49]:
np.sum(a1)

np.float64(6.0)

In [50]:
a1 = np.array([1.0, 2.0, 3.0]); b1 = np.array([1.0, 2.0, 3.0])
a2 = np.array([1.1, 2.1, 3.1]); b2 = np.array([1.0, 2.0, 3.0])
a3 = np.array([0, 1, 0]);       b3 = np.array([1, 0, 0])
print(f"squared distance between a1 and b1: {sq_dist(a1, b1):0.3f}")
print(f"squared distance between a2 and b2: {sq_dist(a2, b2):0.3f}")
print(f"squared distance between a3 and b3: {sq_dist(a3, b3):0.3f}")

squared distance between a1 and b1: 0.000
squared distance between a2 and b2: 0.030
squared distance between a3 and b3: 2.000


**Expected Output**:

squared distance between a1 and b1: 0.000    
squared distance between a2 and b2: 0.030   
squared distance between a3 and b3: 2.000

In [51]:
# Public tests
test_sq_dist(sq_dist)

[92mAll tests passed!


<details>
  <summary><font size="3" color="darkgreen"><b>Click for hints</b></font></summary>
    
  While a summation is often an indication a for loop should be used, here the subtraction can be element-wise in one statement. Further, you can utilized np.square to square, element-wise, the result of the subtraction. np.sum can be used to sum the squared elements.
    
</details>

    


A matrix of distances between movies can be computed once when the model is trained and then reused for new recommendations without retraining. The first step, once a model is trained, is to obtain the movie feature vector, $v_m$, for each of the movies. To do this, we will use the trained `item_NN` and build a small model to allow us to run the movie vectors through it to generate $v_m$.

In [None]:
# create the item input and point to the base network
input_item = tf.keras.layers.Input(shape=(num_item_features,))
# vm = item_NN(input_item)
# vm = tf.linalg.l2_normalize(vm, axis=1)
# Create user input layer
vm = item_NN(input_item)

# Normalize the output using L2 Normalization (using Lambda layer)
vm = Lambda(lambda x: tf.nn.l2_normalize(x, axis=1))(vm)

In [None]:
input_item_m = tf.keras.layers.Input(shape=(num_item_features,))    # input layer
vm_m = item_NN(input_item_m)                                       # use the trained item_NN
vm_m = Lambda(lambda x: tf.nn.l2_normalize(x, axis=1))(vm_m)       # incorporate normalization as was done in the original model

# specify the inputs and output of the model
model_m = tf.keras.Model(input_item_m, vm_m)                                
model_m.summary()

Once you have a movie model, you can create a set of movie feature vectors by using the model to predict using a set of item/movie vectors as input. `item_vecs` is a set of all of the movie vectors. It must be scaled to use with the trained model. The result of the prediction is a 32 entry feature vector for each movie.

In [57]:
item_vecs.shape

(847, 17)

In [58]:
scaled_item_vecs = scalerItem.transform(item_vecs)
vms = model_m.predict(scaled_item_vecs[:,i_s:])
print(f"size of all predicted movie feature vectors: {vms.shape}")

[1m27/27[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
size of all predicted movie feature vectors: (847, 32)


Let's now compute a matrix of the squared distance between each movie feature vector and all other movie feature vectors:
<figure>
    <left> <img src="./images/distmatrix.PNG"   style="width:400px;height:225px;" ></center>
</figure>

We can then find the closest movie by finding the minimum along each row. We will make use of [numpy masked arrays](https://numpy.org/doc/1.21/user/tutorial-ma.html) to avoid selecting the same movie. The masked values along the diagonal won't be included in the computation.

In [None]:
# The provided code calculates the similarity (using squared distance) between movies and then creates a table to display the most similar movies for a given set of movies.
count = 50  # number of movies to display
dim = len(vms)  # `dim` is the number of movies you are comparing
dist = np.zeros((dim,dim))  # Create a square matrix to hold distances between movies
'''
count defines how many movies you want to display in the result.
dim is the total number of movies, which is based on the length of vms. This suggests that vms is a matrix where each row represents a movie's features.
dist is an empty square matrix of size (dim, dim), where each element will hold the distance between two movies.
'''

for i in range(dim):
    for j in range(dim):
        dist[i,j] = sq_dist(vms[i, :], vms[j, :])
'''
The nested loops iterate over every pair of movies (i, j), where i and j are movie indices.
sq_dist(vms[i, :], vms[j, :]) calculates the squared distance between the features of the i-th and j-th movies. This function likely measures how similar or dissimilar the two movies are based on their feature vectors (vms[i, :] and vms[j, :]).
The distance is stored in the matrix dist[i, j].
'''        
m_dist = ma.masked_array(dist, mask=np.identity(dist.shape[0]))  # mask the diagonal. This line creates a version of the dist matrix where the diagonal (self-comparisons) is hidden (masked) so that they are ignored in further operations like finding the most similar movies.
'''
m_dist is a masked version of the distance matrix dist. np.identity(dist.shape[0]) creates a matrix where the diagonal elements are 1 and all off-diagonal elements are 0.
This identity matrix is used as a mask, which will mask out (ignore) the diagonal elements in the distance matrix (the self-comparisons, i.e., comparing a movie to itself).
'''
disp = [["movie1", "genres", "movie2", "genres"]] #disp is a list that will hold the table rows. The first row is the header row that defines the columns: "movie1", "genres", "movie2", "genres".
for i in range(count):
    min_idx = np.argmin(m_dist[i])
    movie1_id = int(item_vecs[i,0])
    movie2_id = int(item_vecs[min_idx,0])
    disp.append( [movie_dict[movie1_id]['title'], movie_dict[movie1_id]['genres'],
                  movie_dict[movie2_id]['title'], movie_dict[movie1_id]['genres']]
               )
'''
for i in range(count): Loop over the first count movies (in this case, 50).
np.argmin(m_dist[i]): Find the index of the movie in the i-th row of m_dist that has the smallest distance (i.e., most similar) to movie i.
movie1_id = int(item_vecs[i,0]): Get the movie ID for movie1 (the i-th movie).
movie2_id = int(item_vecs[min_idx,0]): Get the movie ID for the most similar movie (min_idx).
Then, the title and genres of movie1 and movie2 are fetched from movie_dict and added to the display list disp.
'''
table = tabulate.tabulate(disp, tablefmt='html', headers="firstrow")
table

movie1,genres,movie2,genres.1
Save the Last Dance (2001),Drama|Romance,Mona Lisa Smile (2003),Drama|Romance
"Wedding Planner, The (2001)",Comedy|Romance,Mr. Deeds (2002),Comedy|Romance
Hannibal (2001),Horror|Thriller,Final Destination 2 (2003),Horror|Thriller
Saving Silverman (Evil Woman) (2001),Comedy|Romance,"Sweetest Thing, The (2002)",Comedy|Romance
Down to Earth (2001),Comedy|Fantasy|Romance,America's Sweethearts (2001),Comedy|Fantasy|Romance
"Mexican, The (2001)",Action|Comedy,Rush Hour 2 (2001),Action|Comedy
15 Minutes (2001),Thriller,Panic Room (2002),Thriller
Enemy at the Gates (2001),Drama,"Aviator, The (2004)",Drama
Heartbreakers (2001),Comedy|Crime|Romance,"Fast and the Furious: Tokyo Drift, The (Fast and the Furious 3, The) (2006)",Comedy|Crime|Romance
Spy Kids (2001),Action|Adventure|Children|Comedy,Scooby-Doo (2002),Action|Adventure|Children|Comedy


Where:

movie1 is a movie from the dataset.

movie2 is the most similar movie to movie1.

The genres of both movies are displayed as well.

The results show the model will generally suggest a movie with similar genre's.

In [None]:
m_dist = ma.masked_array(dist, mask=np.identity(dist.shape[0])) 
'''
What it does:
    ma.masked_array: This is a function from the numpy.ma (masked array) module, which creates a masked array. A masked array allows you to mark certain elements as "masked", meaning those elements are ignored in computations or operations. This is useful when you want to exclude certain values from processing, such as avoiding division by zero, excluding missing data, etc.
    dist: This is a matrix of distances between movies. It is calculated using the squared distances between movie feature vectors, where each element represents the distance between two movies.
    mask=np.identity(dist.shape[0]): The mask parameter creates a mask that determines which elements of the dist matrix should be masked. Let's break it down further:
        np.identity(dist.shape[0]): This generates an identity matrix of size (n, n) where n is the number of movies (dist.shape[0]). An identity matrix has 1s on the diagonal and 0s off the diagonal.
    For example, if you have a 3x3 distance matrix, the identity matrix will look like this:
    [[1, 0, 0],
    [0, 1, 0],
    [0, 0, 1]]


The identity matrix is used as the mask, meaning the diagonal elements (where the movie is compared to itself) will be marked as "masked", and these values will be excluded from computations.
m_dist = ma.masked_array(dist, mask=np.identity(dist.shape[0])): This line creates a masked version of the dist matrix. The diagonal elements (where np.identity has 1s) are "masked", so they are effectively ignored. The off-diagonal elements (the distances between different movies) remain unmasked.
Why is this done?
Masking the diagonal: The diagonal elements represent the distance of a movie to itself, which will always be zero (since the distance between a movie and itself is zero). These self-distances are not useful when you are trying to find the most similar movies, so masking them prevents them from interfering with similarity calculations.
How it affects further operations: When you perform operations on m_dist later (e.g., finding the most similar movies), the diagonal elements will be ignored because they are "masked." This ensures that the system doesn't mistakenly suggest a movie as its own most similar movie.
'''

Wouldn't it be better to use `m_dist = ma.masked_array(dist, mask=np.zeros(dist.shape[0]))`, since the distance of each movie from itself is zero, not one?

That's a great question! The key difference between using np.identity(dist.shape[0]) and np.zeros(dist.shape[0]) as the mask lies in the behavior of the mask and the role of the identity matrix.

Explanation:

Using np.identity(dist.shape[0]):

The identity matrix has 1s on the diagonal and 0s off the diagonal.

When you use np.identity(dist.shape[0]) as the mask, you're essentially masking the diagonal elements (self-distances) by setting them as "masked" values.

This ensures that you are excluding the self-comparisons, which is what you want to do when you are finding the most similar movies.

Result: You mask the self-comparisons (diagonal), even though the self-distance is zero. This avoids considering the zero distances between a movie and itself when looking for the most similar movies.

Using np.zeros(dist.shape[0]):

np.zeros(dist.shape[0]) will create a row vector (or column vector, depending on the context) of zeros.

The mask will have zeros across all rows, meaning it won't actually mask anything. So all the elements in the dist matrix, including the diagonal (self-distances), will remain unmasked, i.e., no exclusions.

This would not have the desired effect because you still want to mask the diagonal (self-comparisons) and exclude them from calculations.

Why np.identity is preferred:

Using np.identity(dist.shape[0]) creates a mask specifically targeting the diagonal of the distance matrix. The identity matrix represents the self-comparisons and is what you want to mask. By masking the diagonal, you ensure that the algorithm doesn't use the self-distances in further calculations when finding the most similar items.

To summarize:

np.identity(dist.shape[0]): Masks the diagonal (self-distances), which is what you want.

np.zeros(dist.shape[0]): Doesn't mask anything because it doesn't affect the diagonal, leaving it unmasked and still included in calculations, which isn't the desired behavior here.

So, np.identity(dist.shape[0]) is the correct approach.

In [60]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

<a name="6"></a>
## 6 - Congratulations! <img align="left" src="./images/film_award.png" style=" width:40px;">
You have completed a content-based recommender system.    

This structure is the basis of many commercial recommender systems. The user content can be greatly expanded to incorporate more information about the user if it is available.  Items are not limited to movies. This can be used to recommend any item, books, cars or items that are similar to an item in your 'shopping cart'.

<details>
  <summary><font size="2" color="darkgreen"><b>Please click here if you want to experiment with any of the non-graded code.</b></font></summary>
    <p><i><b>Important Note: Please only do this when you've already passed the assignment to avoid problems with the autograder.</b></i>
    <ol>
        <li> On the notebook’s menu, click “View” > “Cell Toolbar” > “Edit Metadata”</li>
        <li> Hit the “Edit Metadata” button next to the code cell which you want to lock/unlock</li>
        <li> Set the attribute value for “editable” to:
            <ul>
                <li> “true” if you want to unlock it </li>
                <li> “false” if you want to lock it </li>
            </ul>
        </li>
        <li> On the notebook’s menu, click “View” > “Cell Toolbar” > “None” </li>
    </ol>
    <p> Here's a short demo of how to do the steps above: 
        <br>
        <img src="https://drive.google.com/uc?export=view&id=14Xy_Mb17CZVgzVAgq7NCjMVBvSae3xO1" align="center" alt="unlock_cells.gif">
</details>