<a href="https://colab.research.google.com/github/andi23/Custlr-AI/blob/Zobaid_branch/Recommendation_System/Google_AI-HUB_Practise.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction of Matrix Factorization

With the rise of internet shopping and online services, there has been a motivation to create systems that can predict which items a user would like. This can allow for a better user experience, and can allow a seller to present items that a user would be more likely to purchase. For example. a company like Amazon wants to show you relevant items that you will consider. 

A class of such systems are called "collaborative filtering". They analyze data between user and items to come up with previosuly hidden relationships that can be used to better reccomend items to users. Matrix Factorization is a specific type of collabroative fitering method that in a general sense tries to approximate sub-matricies that are equivalent to a larger matrix.

Matrix factorization (MF) grew in popularity in part of the netflix prize dataset (which we will be using in this article), but also because of the sparsity and expanse of data being collected by companies. A company such as Netflix has millions of users and maybe millions of movies. Each user has not viewed and rated each movie - in fact they have likely viewed very few movies of the overall set. If we were to simply create a matrix based on a users for each row and movies for each column, there would be a gigantic sparse matrix. Most collabrorative filtering algorithms would have issues dealing with such sparse matricies, but MF is uniquely equipped to deal with this issue. Below we see an objective function detailing how we come up with our smaller sub-matricies. 

![](https://storage.googleapis.com/kf-pipeline-contrib-public/release-0.1.1/kfp-components/notebooks/recommender_system/assets/explain_img_2.jpeg)

The basis of Matrix factorization goes a little bit like this: say you have a sparse matrix A that consists of rows defining users and colums defining items for purchase. Each element of this matrix will be the rating given from a user to a movie. For example, let's say user 1 really liked 'Fast & Furious': in the row for user 1 and the row for 'Fast & Furious' we will see the rating given by user 1 to the movie. Repeat this process for every user and every item that the user has rated. 

Now what if we would create smaller matricies B, C whose multiplication would give us back all the elements of A? In this way, we wouldn't need to keep a gigantic sparse matrix on file, and could rely on updating these smaller matricies.  This would mean that B * C = A. B and C are both built by creating what are known as "latent features". Each user and item will have a chosen number of features, explaining hidden relationships between users and items. 

Going back to the Fast & Furious and user 1 example, let's create some hypothetical latent features. User 1 may really like action movies and comedy movies. Internally, Fast and Furious may have latent features showing that it is an action adventure movie with some comedy.

Below we see an illustrated example of hypothethical latent features in a movie case. There are 5 movies and 4 users. Each movie has a score of how much comedy or action it contains (a score from 1 to 5), and each user has a score of whether they like comedy or action (a check mark can be considered 1 and an 'X' can be considered to be 0). We can use the smaller matricies to generate the larger matrix. As example we can take the dot product of the first row of the user matrix (1,0) and the first column of the movie matrix (3,1) to get the movie rating for movie 1 by user 1 to be 3. ($ 1*3 + 0*1 = 3$)

![](https://storage.googleapis.com/kf-pipeline-contrib-public/release-0.1.1/kfp-components/notebooks/recommender_system/assets/explain_img_1.jpeg)

Matrix factorization can be used in domains that contain the following things [4]: 
- Many Users
- Many Items
- Many Ratings
- Users rate multiple items
- For each user of the community, there are other users with common needs or tastes
- Item evaluation requires personal taste
- Items persists
- Taste persists
- Items are homogenous

As an example of the space saving power of Matrix Factorization, let's look at the following example where we have 4,500 items and 470,000 users (as we will in this notebook). If we create a sparse matrix we would have 2.1 Billion enteries. If we use matrix factorization we will have 4.7M + 45K entries. This is approximately a space savings of over 400x! This means that if we just use the submatrices to store information about users and items, we can save 400x the disk space. 

![](https://storage.googleapis.com/kf-pipeline-contrib-public/release-0.1.1/kfp-components/notebooks/recommender_system/assets/explain_img_3.jpeg)

Matrix factorization is not a silver bullet however, there are limilations and problems with this method. One of the biggest problems is the "cold start". This is when a new user or item is added to the system, and there is not enough information to give ratings with a new user, and the initial suggestions are often incorrect -- we can't fill in the blank. Also, if certain users have unusual interests (with respect to the rest of the users), it can cause errors in inference. [5]

# This Notebook

This notebook will go over : 

1. Downloading user and item data
2. Donwloading offical Tensorflow model dependacies. 
3. Creating a pandas dataframe 
4. Creating tf.keras model
5. Converting dataframe to loadable data
6. Training model
7. Evaluating model and finding metrics 
8. Visualizing model performance

# The Data: Netflix Prize Dataset

Netflix during the Nextflix Prize competition provided a training data set of 100,480,507 ratings that 480,189 users gave to 17,770 movies. Each training rating is a quadruplet of the form user, movie, date of grade, grade. The user and movie fields are integer IDs, while grades are from 1 to 5 stars. The data used in this article was downloaded from Kaggle, in the form of 4 text files. In this notebook, we will only be using 1 of the 4 data files consiting of ~4,500 movies and ~470,000 users. The name of each movie is included in a different file. Let's download the dataset and show how the data is structured. We download this dataset from kaggle: https://www.kaggle.com/netflix-inc/netflix-prize-data. In order to use this notebook, you will need to have an account on kaggle as well as a token. 

From the kaggle API readme[6]: 
>To use the Kaggle API, sign up for a Kaggle account at https://www.kaggle.com. Then go to the 'Account' tab of your user profile (https://www.kaggle.com/"username"/account) [replacing "username" with your chosen username] and select 'Create API Token'.This will trigger the download of kaggle.json, a file containing your API credentials. 
    
Open up the json file in your favorite text editor, and copy over the username and key into this notebook.     

In [0]:
# parameters
kaggle_username = "<your username>"
kaggle_key = "<your token>"

epochs = 500
adam_learning_rate = .001
adam_beta_1 = 0.9
adam_beta_2 = 0.999
adam_epsilon = 0.1

In [0]:
import json
import os
%matplotlib inline

token = {"username": kaggle_username, "key": kaggle_key}

!pip3 install --user -q kaggle

!mkdir credentials

with open('credentials/kaggle.json', 'w') as file:
    json.dump(token, file)

!chmod 600 credentials/kaggle.json

os.environ['KAGGLE_CONFIG_DIR'] = "./credentials/"

!~/.local/bin/kaggle datasets download netflix-inc/netflix-prize-data -p . --force

!mkdir ./netflix-prize-data

!unzip -o netflix-prize-data.zip -d ./netflix-prize-data

!rm netflix-prize-data.zip

In [0]:
!head ./netflix-prize-data/combined_data_1.txt

Below we see the distribution of the ratings within the netflix prize dataset. We can see that most of the ratings are 3 or above. This means that overall most users found the movies in this dataset to be 3 stars or more. Our model will likely mimic this distribution of ratings.

![title](https://storage.googleapis.com/kf-pipeline-contrib-public/release-0.1.1/kfp-components/notebooks/recommender_system/assets/dist.png)

For this notebook we will be using an offical model from tensorflow. We will need to download the repository and then add it to the path. 

In [0]:
import sys
import urllib.request

url = 'https://github.com/tensorflow/models/archive/master.zip'
urllib.request.urlretrieve(url, './models-master.zip')

!unzip -o -q models-master.zip
!rm models-master.zip

sys.path.append('./models-master')

# Matrix Factorization Implementation with Tensorflow:

Let's import some of the things that we need. 

In [0]:
import numpy as np
import tensorflow as tf
import pandas as pd

We will first have to determine the users and number of movies from the dataset, let's create a function to help us do that. We will create a function that will return a pandas dataframe with the following colums: user_id, movie_id, rating. In the dataset there are 4 files that contain ratings from customer, for this example we will only be using the first of 4 files. In the function, we will first extract all the user ids and ratings, and then we will go about assigning a movie id to each rating as well. 


We also need a way to reorganize the customer ids that are linear (0-N-1), such that each ID has some movies assocaited with it. After we do that we calculate the number of unqiue movies and unqiue users. 

In [0]:
def create_dataframe(file_paths):
    # we create an empty dataframe with 2 columns for the user_id and ratings
    df = pd.DataFrame(columns=['user_id', 'rating'])
    for file in file_paths:  # we loop through any files we have as arguments
        # read the file and append to empty data frame
        df_added = pd.read_csv(file, header=None, names=[
                               'user_id', 'rating'], usecols=[0, 1])
        df = df.append(df_added)
    # figure out which indices are nan to determine where the movie ids are
    df_nan = pd.DataFrame(pd.isnull(df.rating))
    df_nan = df_nan[df_nan['rating'] == True]
    df_nan = df_nan.reset_index()
    
    movie_np = []
    movie_id = 0

    temp_hold = []
    # go through each index of the nans and find which users are associated with a certain movie _id
    for i, j in zip(df_nan['index'][1:], df_nan['index'][:-1]):

        temp = np.full((i-j-1, 1), movie_id)
        temp_hold.append(temp)
        movie_id += 1

    # add the last record length
    last_record = np.full((len(df) - df_nan.iloc[-1, 0] - 1, 1), movie_id)
    temp_hold.append(last_record)

    movie_np = np.vstack(temp_hold)

    # pick only indices from the original dataframe that are not nan
    df = df[pd.notnull(df['rating'])]

    # add the movie_id column to the dataframe
    df['movie_id'] = movie_np.astype(int)
    df['user_id'] = df['user_id'].astype(int)

    return df

file_paths = ['./netflix-prize-data/combined_data_1.txt', './netflix-prize-data/combined_data_2.txt',
              './netflix-prize-data/combined_data_3.txt', './netflix-prize-data/combined_data_4.txt']

files = ['./netflix-prize-data/combined_data_1.txt']

df = create_dataframe(files)

user_ids = df['user_id'].values

unique_ids = np.unique(user_ids)

id_dict = {id: counter for counter, id in enumerate(unique_ids)}

df['user_id'] = df['user_id'].apply(lambda x: id_dict[x])

num_movies = np.unique(df['movie_id'].values).size
num_users = unique_ids.size

# Create Matrix Factorization Model

For the model we will need to define parameters with a dictionary that we will feed into the model. These parameters include information for training hyperparemeters, layer sizes, regulation, optimization hyperparameters, etc. We can play around with these parameters to influence the model. 

In [0]:
params = \
    {
        "train_epochs": epochs,
        "batches_per_step": 1,
        "use_seed": False,
        "batch_size": 10000,
        "eval_batch_size": 1,
        "learning_rate": adam_learning_rate,
        "mf_dim": 10,  # this is number of latent dimensions, basically how we describe each movie & user
        "model_layers": [int(layer) for layer in ["64", "32", "16", "8"]],
        "mf_regularization": 0.0,
        "mlp_reg_layers": [float(reg) for reg in ["0.", "0.", "0.", "0."]],
        "num_neg": 4,
        "num_gpus": 0,
        "use_tpu": False,
        "tpu": None,
        "tpu_zone": None,
        "tpu_gcp_project": None,
        "beta1": adam_beta_1,
        "beta2": adam_beta_2,
        "epsilon": adam_epsilon,
        "match_mlperf": False,
        "use_xla_for_gpu": False,
        "clone_model_in_keras_dist_strat":False,
        "epochs_between_evals": 1,
        "turn_off_distribution_strategy": True,
        "num_users": num_users,
        "num_items": num_movies,
        "loss": 'mse',
        "train_size": 0.95  # split between
    }

After we generate our parameters dictionary, we have to create the model we will use for matrix factorization. The model that we will use is from the tensorflow offical models for recommendation engines. This model and all of the dependencies can be downloaded from: https://github.com/tensorflow/models. We will be using the tf.keras interface for the model. 

In [0]:
from official.recommendation import neumf_model

def create_model(params):
    user_input = tf.keras.layers.Input(
        shape=(),
        batch_size=None,
        name='user_id',
        dtype=tf.int32)

    item_input = tf.keras.layers.Input(
        shape=(),
        batch_size=None,
        name='movie_id',
        dtype=tf.int32)

    base_model = neumf_model.construct_model(
        user_input, item_input, params, need_strip=False)

    rating = base_model.output

    keras_model = tf.keras.Model(
        inputs=[user_input, item_input],
        outputs=rating)
    return keras_model

model = create_model(params)

We have to prepare the dataframe that we have for training. We need to create 3, 1-dimensional tensors that will hold information about users, movies, and the related ratings.

In [0]:
from sklearn.model_selection import train_test_split

num_ratings = df.shape[0]

ratings = np.concatenate((np.array(df['user_id'], dtype=pd.Series).reshape(num_ratings, 1),
                          np.array(df['movie_id'], dtype=pd.Series).reshape(
                              num_ratings, 1),
                          np.array(df['rating'], dtype=pd.Series).reshape(num_ratings, 1)), axis=1)
ratings = ratings.astype(np.float64)

ratings_tr, ratings_val = train_test_split(
    ratings, train_size=params['train_size'])

user_tr, movie_tr, rating_tr = ratings_tr[:, 0].T.astype(np.int32).reshape(-1,), \
    ratings_tr[:, 1].T.astype(np.int32).reshape(-1,), \
    ratings_tr[:, 2].T

user_val, movie_val, rating_val = ratings_val[:, 0].T.astype(np.int32).reshape(-1,), \
    ratings_val[:, 1].T.astype(np.int32).reshape(-1,), \
    ratings_val[:, 2].T

In the next code block we will define three different metrics, precision, recall, and F1 score for our model. To generate these metrics, we need to also define recommended items and relevant items. A relevant item is when the actual rating is above a threshold (for us it will be 3.5). A recommended item is when we predict the rating to be above a certain threshold (again 3.5). Precision, or positive predictive value, is the proportion of recommended items that are relevant. Recall, or sensitivity, is the proportion of relevant items found among the recommended items. F1 score is the harmonic mean between precision and recall. 

In [0]:
threshold = tf.constant(3.5, dtype=tf.float32)
checker = tf.constant(2.0, dtype=tf.float32)

def precision(y_true, y_pred):
    relevant = tf.cast(y_true >= threshold, dtype=tf.float32)
    recommended = tf.cast(y_pred >= threshold, dtype=tf.float32)

    relevant_recommend = tf.reduce_sum(tf.cast(tf.reshape(
        relevant + recommended, (-1, 1)) >= checker, dtype=tf.float32))

    return tf.divide(relevant_recommend, tf.reduce_sum(recommended))

def recall(y_true, y_pred):

    relevant = tf.cast(y_true >= threshold, dtype=tf.float32)
    recommended = tf.cast(y_pred >= threshold, dtype=tf.float32)

    relevant_recommend = tf.reduce_sum(tf.cast(tf.reshape(
        relevant + recommended, (-1, 1)) >= checker, dtype=tf.float32))
    return tf.divide(relevant_recommend, tf.reduce_sum(relevant))

def f1(y_true, y_pred):
    # harmonic mean of precision & recall
    def precision(y_true, y_pred):
        relevant = tf.cast(y_true >= threshold, dtype=tf.float32)
        recommended = tf.cast(y_pred >= threshold, dtype=tf.float32)

        relevant_recommend = tf.reduce_sum(tf.cast(tf.reshape(
            relevant + recommended, (-1, 1)) >= checker, dtype=tf.float32))

        return tf.divide(relevant_recommend, tf.reduce_sum(recommended))

    def recall(y_true, y_pred):

        relevant = tf.cast(y_true >= threshold, dtype=tf.float32)
        recommended = tf.cast(y_pred >= threshold, dtype=tf.float32)

        relevant_recommend = tf.reduce_sum(tf.cast(tf.reshape(
            relevant + recommended, (-1, 1)) >= checker, dtype=tf.float32))
        return tf.divide(relevant_recommend, tf.reduce_sum(relevant))

    precision = precision(y_true, y_pred)
    recall = recall(y_true, y_pred)

    return 2*(precision*recall)/(precision + recall)

# Train Model

Below we will compile and train our model. We have callbacks for model checkpoint and early stopping. This will make sure that the model with the best performance on the validation set is saved, and that we stop training when the performance on the validation stagnates. The history of our training will be saved, and then used later to visualize our performance on the training and validation sets. 

In [0]:
optimizer = tf.keras.optimizers.Adam(
    lr=params["learning_rate"],
    beta_1=params["beta1"],
    beta_2=params["beta2"],
    epsilon=params["epsilon"])

callback = tf.keras.callbacks.ModelCheckpoint(
    './model.h5', monitor='val_loss', save_best_only=True, verbose=0, mode='auto', period=1)

earlystopping = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss', patience=10, verbose=0)

model.compile(
    loss=params['loss'],
    metrics=[precision, recall, f1],
    optimizer=optimizer)

history = model.fit(
    x=(user_tr, movie_tr), y=rating_tr,
    epochs=params['train_epochs'],
    verbose=1,
    batch_size=params['batch_size'],
    validation_data=([user_val, movie_val], rating_val),
    callbacks=[callback, earlystopping])

# Evaluate Model

After we train the model, let's delete the most recent model from memory and let's load the model that had the best perfomance on the validation set. We will load the model with our custom metrics. We will also generate some metrics on the validation set using our loaded model. 

In [0]:
del model

model = tf.keras.models.load_model('model.h5', custom_objects={
                                   'precision': precision, 'recall': recall, 'f1': f1})

names = model.metrics_names

loss_and_metrics = model.evaluate(
    x=[user_val, movie_val], y=rating_val, batch_size=params['batch_size'], verbose=0)

for name, value in zip(names, loss_and_metrics):
    print('{0:s} : {1:1.4f}'.format(name, value))

Below we will visualize the following

- Loss curve on the train and validation set. 

- Precision curve on the train and validation set.

- Recall curve on the train and validation set.

- F1 curve on the train and validation set. 

We will see that for the validation set, the precision, recall, and F1 curves are rather noisy. 

In [0]:
import matplotlib.pyplot as plt

hist_keys = ['loss', 'precision', 'recall', 'f1', 'val_loss', 'val_precision',  'val_recall', 'val_f1']

fig = plt.figure(figsize=(10, 10))
for plot_num, train, val in zip(range(1, 5), hist_keys[0:4], hist_keys[4:]):
    plt.subplot(4, 1, plot_num)
    plt.plot(history.history[train])
    plt.plot(history.history[val])
    plt.title(train.capitalize() + ' curve')
    plt.xlabel('Epoch')
    plt.ylabel(train.capitalize())
    plt.legend(['training', 'validation'])
plt.subplots_adjust(left=0, bottom=0, right=1, top=1, wspace=0.5, hspace=0.5)

# Interpret Model Performance

As we noted earlier, our model will likely mimic the distribution of the dataset at large. Below we will plot the distribution of predicted ratings in our validation dataset. In the red, we will see the true distribution of our dataset, and the blue will represent the predicted distribution of our dataset. We see that they match up closely, but our model is hesistant to apply extreme ratings (1 or 5) to inputted data. 

In [0]:
import seaborn as sns

predicted_ratings = model.predict([user_val, movie_val])
predicted_ratings = np.round(predicted_ratings[(np.round(predicted_ratings) <= 5) & (np.round(predicted_ratings) >= 1)])
fig = plt.figure(figsize=(10, 5))
plt.subplot(2, 1, 1)
sns.countplot(rating_val, color = 'r', saturation = 1)
plt.xlabel('Rating')
plt.ylabel('Count')
plt.title('Rating Distribution of Actual Ratings')
plt.subplot(2, 1, 2)
sns.countplot(predicted_ratings.flatten(),color = 'b', saturation = 1)
plt.xlabel('Rating')
plt.ylabel('Count')
plt.title('Rating Distribution of Predicted Ratings')
plt.subplots_adjust(left=0, bottom=0, right=1, top=1, wspace=0.5, hspace=0.5)
plt.show()

After we have created and fit the model, we need to have another method of visually seeing how close our model got to approximating the ratings that users gave to specific movies. We use a pandas dataframe in order to showcase some comparisons between actual ratings and our predicted ratings. We will only use the users and movies from the validation set. 

In [0]:
def print_list_comparison(model, user_val, movie_val, rating_val, num=5):

    inds = np.random.randint(0, high=rating_val.size, size=num)
    df = pd.DataFrame(columns=[
                      "User ID", "Movie ID", "Actual Rating", "Predicted Rating", "Rating Difference"])

    for ind in inds:
        usr = user_val[ind].astype(np.int64)
        mov = movie_val[ind].astype(np.int64)
        df = df.append({"User ID": usr, "Movie ID": mov, "Actual Rating": rating_val[ind], "Predicted Rating": model.predict([usr.reshape(1,), mov.reshape(1,)])[
                       0][0], "Rating Difference": np.abs(model.predict([usr.reshape(1,), mov.reshape(1,)])[0][0] - rating_val[ind])}, ignore_index=True)
    print(df.to_string(index=False))


print_list_comparison(model, user_val, movie_val, rating_val, num=15)

To further assess the model, we will show what movies a user hated and loved (shown by a rating of 1 to 5 respectively), and then show a graphical comparison between the predicted ratings and actual ratings. We will use the movies that have the highest number of reviewed users in order to make sure our matrix has non empty cells. Dark blue is assoicated with a rating of 5, and a light color of blue is associated with a rating of 1. In the final matrix, a light color of blue is associated with a small difference in the actual and predicted rating, while a dark color of blue is associated with a large diffrence in the ratings. If the difference in ratings is all light blue then we can consider our performance to have generalized the preferences of the users. Again here, we will only use users and movies from the validation set. 

In [0]:
def movie_matrix_comparisons(model, df, movies=10, users=10):
    movie_viewers = df['movie_id'].value_counts()[0:movies]
    movies_ = list(movie_viewers.index)
    num_user = list(movie_viewers.values)

    user_set = []
    for movie in movies_:
        user_set.append(
            set(df.loc[df['movie_id'] == movie].iloc[:, 0].values.flatten().tolist()))

    cross_users = list(set.intersection(*user_set))

    inds = np.random.randint(0, high=len(cross_users), size=users)

    users_ = [cross_users[ind] for ind in inds]

    true_mat = np.zeros((movies, users))
    pred_mat = np.zeros((movies, users))

    for i, user in enumerate(users_):
        for j, movie in enumerate(movies_):
            true_mat[i, j] = df.loc[(df['movie_id'] == movie) & (
                df['user_id'] == user)]['rating'].values
            pred_mat[i, j] = model.predict(
                [np.array(user).reshape(1,), np.array(movie).reshape(1,)])

    fig, (ax0, ax1, ax2) = plt.subplots(3, 1, figsize=(20, 10))

    c = ax0.pcolor(true_mat, cmap='Blues', vmin=1.0, vmax=5.0)
    ax0.set_title('Actual Ratings')
    ax0.set_ylabel('User')
    ax0.set_xlabel('Movies')
    fig.colorbar(c, ax=ax0)

    c = ax1.pcolor(pred_mat, cmap='Blues', vmin=1.0, vmax=5.0)
    ax1.set_title('Predicted Ratings')
    ax1.set_ylabel('User')
    ax1.set_xlabel('Movies')
    fig.colorbar(c, ax=ax1)

    c = ax2.pcolor(np.abs(pred_mat - true_mat),
                   cmap='Blues', vmin=0.0, vmax=5.0)
    ax2.set_title('Difference Between Ratings')
    ax2.set_ylabel('User')
    ax2.set_xlabel('Movies')
    fig.colorbar(c, ax=ax2)

    fig.tight_layout()
    plt.show()

    return true_mat, pred_mat


true_mat, pred_mat = movie_matrix_comparisons(model, df)

Finally, we want to see how our model compares to simply averaging the rating for each movie. This is done as a sanity check to make sure that our model actually tries to achieve our goal. We will numerically check the errors between the averaged, predicted, and true ratings. We should see that the error in the averaged ratings is larger than in the predicted ratings. 

In [0]:
def average_comp(model, df, movies=10, users=10):
    
    num_ratings = movies*users
    
    movie_viewers = df['movie_id'].value_counts()[0:movies]
    movies_ = list(movie_viewers.index)
    num_user = list(movie_viewers.values)

    user_set = []
    for movie in movies_:
        user_set.append(
            set(df.loc[df['movie_id'] == movie].iloc[:, 0].values.flatten().tolist()))

    cross_users = list(set.intersection(*user_set))

    inds = np.random.randint(0, high=len(cross_users), size=users)

    users_ = [cross_users[ind] for ind in inds]

    true_mat = np.zeros((movies, users))
    pred_mat = np.zeros((movies, users))
    ave_mat = np.zeros((movies, users))

    for i, user in enumerate(users_):
        for j, movie in enumerate(movies_):
            true_mat[i, j] = df.loc[(df['movie_id'] == movie) & (
                df['user_id'] == user)]['rating'].values
            pred_mat[i, j] = model.predict(
                [np.array(user).reshape(1,), np.array(movie).reshape(1,)])
            ave_mat[i, j] = np.mean(df.loc[df['movie_id'] == movie]['rating'].values)

    print('Error between Average and Real: {}'.format(np.sum(np.abs(ave_mat - true_mat))/num_ratings))
    print('Error between Predicted and Real: {}'.format(np.sum(np.abs(pred_mat - true_mat))/num_ratings))

average_comp(model, df, movies = 100, users = 100)

# References 
1. Matrix Factorization Techniques for Recommedner Systems, Koren, Yahoo Research, Robert Bell and Chris Volinsky, AT&T Labs—Research
2. Cichocki, Andrzej, and P. H. A. N. Anh-Huy. “Fast local algorithms for large scale nonnegative matrix and tensor factorizations.” IEICE transactions on fundamentals of electronics, communications and computer sciences 92.3: 708-721, 2009.
3. Fevotte, C., & Idier, J. (2011). Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Computation, 23(9).
4. http://www.pitt.edu/~peterb/AIS2013/CollaborativeFiltering.pdf
5. http://kojinoshiba.com/recsys-cold-start/
6. https://github.com/Kaggle/kaggle-api/