## Introduction
This notebook demonstrates utilizing different memory based and model based collaborative filtering technique for Book Recommendation. Book recommendations are generated for userid 1839 using four different methods.

The 4 methods used for book recommendation are as follows:
1. Memory-Based Collaborative filtering
    
    a. User-based with Eucledean Distance measure
    
    b. Item-based with Cosine Similarity measure


2. Model-based Collaborative filtering
    
    a. Matrix Factorization
    
    b. SVD++

Original data source can be found [here](https://github.com/zygmuntz/goodbooks-10k).

In [1]:
import numpy as np
import pandas as pd

In [2]:
# Load the datasets
books = pd.read_csv('books.csv') # Book metadata
ratings_data = pd.read_csv('ratings.csv') # User ratings

In [3]:
# Show you what the data looks like
books.head()

Unnamed: 0,book_id,goodreads_book_id,best_book_id,work_id,books_count,isbn,isbn13,authors,original_publication_year,original_title,...,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
0,1,2767052,2767052,2792775,272,439023483,9780439000000.0,Suzanne Collins,2008.0,The Hunger Games,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
1,2,3,3,4640799,491,439554934,9780440000000.0,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,...,4602479,4800065,75867,75504,101676,455024,1156318,3011543,https://images.gr-assets.com/books/1474154022m...,https://images.gr-assets.com/books/1474154022s...
2,3,41865,41865,3212258,226,316015849,9780316000000.0,Stephenie Meyer,2005.0,Twilight,...,3866839,3916824,95009,456191,436802,793319,875073,1355439,https://images.gr-assets.com/books/1361039443m...,https://images.gr-assets.com/books/1361039443s...
3,4,2657,2657,3275794,487,61120081,9780061000000.0,Harper Lee,1960.0,To Kill a Mockingbird,...,3198671,3340896,72586,60427,117415,446835,1001952,1714267,https://images.gr-assets.com/books/1361975680m...,https://images.gr-assets.com/books/1361975680s...
4,5,4671,4671,245494,1356,743273567,9780743000000.0,F. Scott Fitzgerald,1925.0,The Great Gatsby,...,2683664,2773745,51992,86236,197621,606158,936012,947718,https://images.gr-assets.com/books/1490528560m...,https://images.gr-assets.com/books/1490528560s...


## Preprocessing

The first step is to perform some preprocessing of the data. In particular, we will format the ratings data into the nice matrix. We will first merge the two files, so we will eliminate any ratings that does have book metadata information (if any).

In [4]:
# Merge the two datasets
merged_data = pd.merge(books, ratings_data, on='book_id')[['user_id', 'book_id', 'rating', 'original_title']]

In [5]:
# drop null value rows
merged_data.dropna(inplace=True)

In [6]:
merged_data.head()

Unnamed: 0,user_id,book_id,rating,original_title
0,2886,1,5,The Hunger Games
1,6158,1,5,The Hunger Games
2,3991,1,4,The Hunger Games
3,5281,1,5,The Hunger Games
4,5721,1,5,The Hunger Games


It turns out that if we work with this data, we might run into memory issue. Hence I am going to keep only the user with ID less than or equal to 10000.

In [7]:
merged_data = merged_data[merged_data.user_id <= 10000]

#### First create the rating matrix. Replace any missing values with 0 afterwards. 

In [8]:
# select only required columns to create rating matrix
merged_rating = merged_data[['user_id','book_id', 'rating']]

In [9]:
# create rating matrix using pivot table
ratings = merged_rating.pivot_table(index = 'user_id', columns = 'book_id')

In [10]:
# remove level(s) in row
ratings.columns = ratings.columns.droplevel()

In [11]:
ratings.head()

book_id,1,2,3,4,5,6,7,8,9,10,...,9991,9992,9993,9994,9995,9996,9997,9998,9999,10000
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,5.0,,,,,,4.0,...,,,,,,,,,,
2,,5.0,,,5.0,,,4.0,,5.0,...,,,,,,,,,,
3,,,,3.0,,,,,,,...,,,,,,,,,,
4,,5.0,,4.0,4.0,,4.0,4.0,,5.0,...,,,,,,,,,,
5,,,,,,4.0,,,,,...,,,,,,,,,,


In [12]:
# replace missing values with 0
ratings.fillna(0, inplace=True)

## 1.a. User-Based Collaborative Filtering
The first model to use will be the user-based collaborative filtering.

1. We will use Euclidean distance to measure the similarity between users (Euclidean distance simply to explore).
2. Use 100 neighbors when calculating the predicted scores.
3. Get the top 15 recommendations for user with user_id 1839. Get the book titles and predicted ratings.
4. Also we'll store the recommendations in a variable. We will compare this result with other models later.

In [13]:
from sklearn.metrics.pairwise import euclidean_distances

In [14]:
# calculate user dissimilarity using euclidean distance
user_dis_sim = euclidean_distances(ratings)
user_dis_sim = pd.DataFrame(user_dis_sim, index = ratings.index, columns = ratings.index)

In [15]:
user_dis_sim.head()

user_id,1,2,3,4,5,6,7,8,9,10,...,9991,9992,9993,9994,9995,9996,9997,9998,9999,10000
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,51.137071,42.320208,52.488094,56.062465,55.569776,61.375891,50.774009,51.672043,51.691392,...,51.146847,53.646994,51.097945,51.526692,72.083285,48.238988,51.971146,49.719212,52.201533,50.783856
2,51.137071,0.0,39.724048,51.32251,54.074023,54.064776,59.615434,50.009999,50.586559,53.544374,...,48.548944,53.357286,50.695167,52.191953,68.694978,45.365185,49.091751,48.836462,52.668776,47.391982
3,42.320208,39.724048,0.0,46.583259,44.407207,45.044423,50.97058,39.560081,44.034078,46.292548,...,38.483763,45.066617,42.520583,43.081318,64.319515,34.899857,44.42972,43.714986,44.158804,42.449971
4,52.488094,51.32251,46.583259,0.0,60.365553,60.024995,61.595454,54.083269,54.046276,56.648036,...,55.560778,58.420887,55.371473,57.166424,72.02083,51.92302,48.207883,51.312766,57.271284,49.879856
5,56.062465,54.074023,44.407207,60.365553,0.0,58.111961,63.324561,54.726593,57.766772,59.489495,...,52.678269,57.280014,56.780278,56.089215,72.849159,50.497525,58.395205,57.818682,57.428216,57.567352


Calculate simialrity matrix between users based on euclidean distance between them by using the formula, similarity = 1/(1+ d(u1,u2) )

And with this similarity value ranges between 0 to 1.

In [16]:
# calculate user similarities from euclidean distance matrix
user_sim = 1/(1+user_dis_sim)

In [17]:
# first 5 reacords in similarity matrix
user_sim.head()

user_id,1,2,3,4,5,6,7,8,9,10,...,9991,9992,9993,9994,9995,9996,9997,9998,9999,10000
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.0,0.01918,0.023084,0.018696,0.017525,0.017677,0.016032,0.019315,0.018985,0.018978,...,0.019177,0.018299,0.019195,0.019038,0.013683,0.020309,0.018878,0.019716,0.018796,0.019311
2,0.01918,1.0,0.024556,0.019112,0.018157,0.01816,0.016497,0.019604,0.019385,0.018334,...,0.020182,0.018397,0.019344,0.0188,0.014348,0.021568,0.019963,0.020066,0.018633,0.020665
3,0.023084,0.024556,1.0,0.021016,0.022023,0.021718,0.019242,0.024655,0.022205,0.021145,...,0.025327,0.021708,0.022978,0.022685,0.015309,0.027855,0.022012,0.022364,0.022144,0.023015
4,0.018696,0.019112,0.021016,1.0,0.016296,0.016387,0.015976,0.018154,0.018167,0.017347,...,0.01768,0.016829,0.017739,0.017192,0.013695,0.018895,0.020322,0.019116,0.017161,0.019654
5,0.017525,0.018157,0.022023,0.016296,1.0,0.016917,0.015546,0.017945,0.017016,0.016532,...,0.01863,0.017159,0.017307,0.017516,0.013541,0.019418,0.016836,0.017001,0.017115,0.017074


#### Get top-n recommendation for given user using User based collaborative filtering

In [18]:
def ubcf_ed(user_id, n_neighbors, top_n, user_sim):
    '''
    Description: function to get top_n number of book recommendations for given user_id
    
    Input:
        user_id: The user of interest
        n_neighbors: Number of neighbors for similarity count
        top_n: Top n recommendations to return
        similarity: The distance measure matrix between users
    
    Output: 
    The top n recommendations
    '''
    # Get the nearest neighbors
    nearest_neighbors = user_sim[user_id].sort_values(ascending = False)[1:(n_neighbors+1)]
    
    # Obtain predicted ratings for unread books
    unread_book_index = ratings.columns[ratings.loc[user_id] == 0]# get indexes of books unread by given user
    missing_ratings = []
    for book_id in unread_book_index:
        neighbors_ratings = ratings.loc[nearest_neighbors.index, book_id]
        missing_ratings.append(sum(nearest_neighbors * neighbors_ratings) / sum(nearest_neighbors))
    
    # Sort the predicted ratings in descending order 
    missing_ratings = pd.Series(missing_ratings, index=unread_book_index).sort_values(ascending = False)
    
    # Extract top n books
    recommend_books = missing_ratings.index[:top_n]

    ubcf_rec_book_title = []
    #rec_book_rating_lst = []
    # Print the recommendations
    for i in range(top_n):
        rec_book = recommend_books[i]
        rec_book_title = merged_data[merged_data['book_id'] == rec_book]['original_title'].values[0]
        rec_book_rating = missing_ratings.iloc[i]
        print("Recommendation", i+1, "is", rec_book_title , 
              ", with a predicted rating of", round(missing_ratings.iloc[i],4))
        
        # store results for comparison
        ubcf_rec_book_title.append(rec_book_title)
    return ubcf_rec_book_title


In [19]:
# Call function with required details to get recommendations
ubcf_rec_book_title = ubcf_ed(1839, 100, 15, user_sim)

Recommendation 1 is The Da Vinci Code , with a predicted rating of 1.5266
Recommendation 2 is O Alquimista , with a predicted rating of 1.3822
Recommendation 3 is Harry Potter and the Prisoner of Azkaban , with a predicted rating of 1.164
Recommendation 4 is Harry Potter and the Philosopher's Stone , with a predicted rating of 1.154
Recommendation 5 is Harry Potter and the Order of the Phoenix , with a predicted rating of 1.1414
Recommendation 6 is The Kite Runner  , with a predicted rating of 1.0462
Recommendation 7 is Harry Potter and the Goblet of Fire , with a predicted rating of 1.0238
Recommendation 8 is Harry Potter and the Half-Blood Prince , with a predicted rating of 0.9846
Recommendation 9 is Le Petit Prince , with a predicted rating of 0.9651
Recommendation 10 is Harry Potter and the Chamber of Secrets , with a predicted rating of 0.9648
Recommendation 11 is Angels & Demons  , with a predicted rating of 0.8677
Recommendation 12 is Harry Potter and the Deathly Hallows , with

## 1.b. Item-Based Collaborative Filtering
Next we will use item-based collaborative filtering. 

1. This time we will use cosine similarity to measure the similarity between items.
2. Use 100 neighbors when calculating the predicted scores.
3. Get the top 15 recommendations for user with user_id 1839. Get the book titles and predicted ratings.
4. Also store the recommendations in a variable.

In [20]:
from sklearn.metrics.pairwise import cosine_similarity

In [21]:
# Calculate books similarity
books_sim = cosine_similarity(ratings.T)
books_sim = pd.DataFrame(books_sim, index = ratings.columns, columns = ratings.columns)
books_sim.head(2)

book_id,1,2,3,4,5,6,7,8,9,10,...,9991,9992,9993,9994,9995,9996,9997,9998,9999,10000
book_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.0,0.42632,0.46969,0.378218,0.34151,0.397019,0.294236,0.289858,0.325832,0.325946,...,0.016655,0.017877,0.02436,0.050888,0.037132,0.012379,0.006301,0.013394,0.057893,0.001413
2,0.42632,1.0,0.486069,0.542027,0.45641,0.188067,0.511171,0.445516,0.44561,0.467768,...,0.024669,0.0,0.01338,0.035157,0.032404,0.009075,0.005433,0.028263,0.022892,0.030204


#### Get top-n recommendation for given user using Item based collaborative filtering

In [22]:
def ibcf_cs(user_id, n_neighbors, top_n, books_sim):
    '''
    Description: function to get top_n number of book recommendations for given user_id
    
    Input:
        user_id: The user of interest
        n_neighbors: Number of neighbors for similarity count
        top_n: Top n recommendations to return
        books_sim: The similarity matrix
    
    Output: 
        The top n recommendations
    '''
    # Obtain unread book indices for given user_id
    unread_book_index = ratings.columns[ratings.loc[user_id] == 0]
    missing_ratings = []
    
    # Calculate predicted rating for each unread book
    for book_id in unread_book_index:
        nearest_neighbors = books_sim[book_id].sort_values(ascending = False)[1:(n_neighbors+1)]
        neighbors_ratings = ratings.loc[user_id, nearest_neighbors.index]
        missing_ratings.append(sum(nearest_neighbors * neighbors_ratings) / sum(nearest_neighbors))
    
    # Sort the predictions
    missing_ratings = pd.Series(missing_ratings, index=unread_book_index).sort_values(ascending = False)
    
    # Extract only the top n books
    recommend_books = missing_ratings.index[:top_n]
    
    ibcf_rec_book_title = []
    
    # Print the recommendations
    for i in range(top_n):
        rec_book = recommend_books[i]
        rec_book_title = merged_data[merged_data['book_id'] == rec_book]['original_title'].values[0]
        rec_book_rating = missing_ratings.iloc[i]
        
        print("recommendation ", i+1, " is ", rec_book_title, 
              ", with a predicted rating of", round(missing_ratings.iloc[i],4))

        ibcf_rec_book_title.append(rec_book_title)
    return ibcf_rec_book_title

In [23]:
# Call function with required details to get recommendations
ibcf_rec_book_title = ibcf_cs(1839, 100, 15, books_sim)

recommendation  1  is  Secret Prey , with a predicted rating of 1.1567
recommendation  2  is  Sudden Prey , with a predicted rating of 1.1452
recommendation  3  is  Night Prey , with a predicted rating of 1.1232
recommendation  4  is  Mortal Prey , with a predicted rating of 1.1123
recommendation  5  is  Mind Prey , with a predicted rating of 1.1096
recommendation  6  is  Chosen Prey , with a predicted rating of 1.0448
recommendation  7  is  Heat Lightning , with a predicted rating of 0.9576
recommendation  8  is  Bad Blood , with a predicted rating of 0.8801
recommendation  9  is  Shock Wave , with a predicted rating of 0.7435
recommendation  10  is  The Graveyard Book , with a predicted rating of 0.7414
recommendation  11  is  Abraham Lincoln: Vampire Hunter , with a predicted rating of 0.6799
recommendation  12  is  Gathering Blue , with a predicted rating of 0.663
recommendation  13  is  The Scorch Trials , with a predicted rating of 0.6533
recommendation  14  is  Gathering Prey , 

## 2.a. Matrix Factorization
Now we will turn to model based methods. First we will look at Matrix Factorization. 

1. Use 3 latent factors.
2. Set the learning rate at 0.001 and beta at 0.01. Since it will take a while to run, we will run only 5 iterations.
3. Fit the model (it will take a while to run).
4. Get the top 15 recommendations for user with user_id 1839. Return boths book names and predicted ratings.
5. Store the recommendations in a variable.

In [24]:
def matrix_factorization(R, P, Q, K, steps=5, alpha=0.001, beta=0.01):
    '''
    Inputs:
    R     : The ratings (of dimension M x N)
    P     : an initial matrix of dimension M x K
    Q     : an initial matrix of dimension N x K
    K     : the number of latent features
    steps : the maximum number of steps to perform the optimization
    alpha : the learning rate
    beta  : the regularization parameter

    Outputs:
    the final matrices P and Q
    '''

    for step in range(steps):
        for i in range(R.shape[0]):
            for j in range(R.shape[1]):
                if R[i][j] > 0: # Skipping over missing ratings
                    eij = R[i][j] - np.dot(P[i,:],Q[:,j])
                    for k in range(K):
                        P[i][k] = P[i][k] + alpha * (2 * eij * Q[k][j] - beta * P[i][k])
                        Q[k][j] = Q[k][j] + alpha * (2 * eij * P[i][k] - beta * Q[k][j])
        eR = np.dot(P,Q)
        e = 0
        for i in range(R.shape[0]):
            for j in range(R.shape[1]):
                if R[i][j] > 0:
                    e = e + pow(R[i][j] - np.dot(P[i,:],Q[:,j]), 2)
                    for k in range(K):
                        e = e + (beta/2) * ( pow(P[i][k],2) + pow(Q[k][j],2) )
        if e < 0.001: # tolerance
            break
    return P, Q

In [25]:
np.random.seed(862)

# Initializations
M = ratings.shape[0] # Number of users
N = ratings.shape[1] # Number of books
K = 3 # Number of latent features

# Initial estimate of P and Q
P = np.random.rand(M,K)
Q = np.random.rand(K,N)
rating_np = np.array(ratings)

In [26]:
# Run the model fitting. 
P, Q = matrix_factorization(rating_np, P, Q, K)

In [27]:
# Get decomposed rating matrix
predicted_rating = np.matmul(P, Q)
predicted_rating = pd.DataFrame(predicted_rating, index = ratings.index, columns = ratings.columns)


In [28]:
# get tp 15 recommendations for user_id = 1839
select_ratings = predicted_rating.loc[1839].sort_values(ascending = False)[:15]
recommend_books = select_ratings.index

mf_rec_book_title=[]

# Print the recommendations
for i in range(15):
    rec_book = recommend_books[i]
    rec_book_title = merged_data[merged_data['book_id'] == rec_book]['original_title'].values[0]
    print("recommendation ", i+1, " is ", rec_book_title, 
          ", with a predicted rating of", round(select_ratings.iloc[i],4))
    mf_rec_book_title.append(rec_book_title)
    

recommendation  1  is  Jesus the Christ: A Study of the Messiah and His Mission according to Holy Scriptures both Ancient and Modern , with a predicted rating of 4.8311
recommendation  2  is  The Essential Calvin and Hobbes: A Calvin and Hobbes Treasury , with a predicted rating of 4.752
recommendation  3  is  Complete Harry Potter Boxed Set , with a predicted rating of 4.7015
recommendation  4  is  The Brothers K , with a predicted rating of 4.6951
recommendation  5  is  The Complete Anne of Green Gables Boxed Set , with a predicted rating of 4.6941
recommendation  6  is  Maus II : And Here My Troubles Began  , with a predicted rating of 4.6793
recommendation  7  is  Being Mortal: Medicine and What Matters in the End , with a predicted rating of 4.6703
recommendation  8  is  Words of Radiance , with a predicted rating of 4.6588
recommendation  9  is  The Complete Maus , with a predicted rating of 4.6473
recommendation  10  is  The Authoritative Calvin and Hobbes , with a predicted rat

## 2.b. SVD++

First, we need to install the [surprise](http://surpriselib.com/) library.

SVD++ is the factorization algorithm. However, the surprise library called it SVD instead (and use SVD++ for a different yet similar algorithm). Therefore our task here is to implement the [SVD](https://surprise.readthedocs.io/en/stable/matrix_factorization.html#surprise.prediction_algorithms.matrix_factorization.SVD) algorithm from the surprise library.

In order to use the surprise library, we need to first put the data into its accepted format. [Here](https://surprise.readthedocs.io/en/stable/getting_started.html#load-dom-dataframe-py) is an example on how it work. In general, we need to follow below steps:

1. Set up a Reader class
2. Load the dataframe 
3. Build the data set using the build_full_trainset() method (see [here](https://surprise.readthedocs.io/en/stable/trainset.html) or [here](https://stackoverflow.com/questions/49263964/datasetautofolds-object-has-no-attribute-global-mean-on-python-surprise))


In [29]:
# Load the libraries
from surprise import Reader
from surprise import Dataset
from surprise.prediction_algorithms.matrix_factorization import SVD

In [30]:
# Step 1: Set up the reader class
reader = Reader(rating_scale=(1,5))

In [31]:
# Step 2: Load the dataframe. Use the merged data from above (not the pivoted data)
data = Dataset.load_from_df(merged_data[['user_id', 'book_id', 'rating']], reader)

In [32]:
# Step 3: Build the train set
svd_data = data.build_full_trainset()

Now we have prepared the data set, next task is to build the model. The usage is similar to any sklearn model: first instantiate a model and set any hyperparamters, then but the model. For this model, use 5 latent factors, a learning rate of 0.01 for all parameters, and a regularization parameter of 0.1 for all parameters. We'll Set a random state to 862.

In [33]:
# instantiate SVD algorithm
algo = SVD(n_factors=5, lr_all=0.01, reg_all= 0.1, random_state=862)
algo.fit(svd_data)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x11c5e3d68>

Now we have fitted the model, we can perform prediction. There are several ways to do this:

1. Calculate the individual ratings $r_{ui}$ by using the given equation [here](https://surprise.readthedocs.io/en/stable/matrix_factorization.html#surprise.prediction_algorithms.matrix_factorization.SVD)
2. Calculate the overall rating matrix by doing some matrix multiplications and manipulations
3. Probably the easiest, is to use the predict function (see an example [here](https://surprise.readthedocs.io/en/stable/getting_started.html#predict-ratings2-py) and [here](https://predictivehacks.com/how-to-run-recommender-systems-in-python/). You may not need to use the str() function)


#### Remove the books that user with user_id 1839 have red from the list of suggested ones

In [34]:
# get the list of the book ids
unique_ids = merged_data['book_id'].unique()

# get the list of the book ids that the userid 1839 has rated
book_ids_1839 = merged_data.loc[merged_data['user_id']==1839, 'book_id']

# remove the read books for the recommendations
books_to_predict = np.setdiff1d(unique_ids,book_ids_1839)

#### Recommendations using SVD

In [35]:
# get book recommendations for user 1839
book_recs = []
for book_id in books_to_predict:
    rec_book_title = merged_data[merged_data['book_id'] == book_id]['original_title'].values[0]
    book_recs.append((book_id, rec_book_title, algo.predict(uid=1839,iid=book_id).est))
    
user_rec = pd.DataFrame(book_recs, columns=['iid', 'book_title','predictions']).sort_values('predictions', ascending=False).head(15)

# print top 15 book recommendations for user 1839
svd_rec_book_title=[]
for i in range(15):
    print("recommendation ", i+1, " is ", user_rec.iloc[i]['book_title'], 
          ", with a predicted rating of", round(user_rec.iloc[i]['predictions'],4))
    svd_rec_book_title.append(user_rec.iloc[i]['book_title'])
    

recommendation  1  is  The Complete Calvin and Hobbes , with a predicted rating of 4.6455
recommendation  2  is  دیوان‎‎ [Dīvān] , with a predicted rating of 4.617
recommendation  3  is  I Want My Hat Back , with a predicted rating of 4.6018
recommendation  4  is  Jesus the Christ: A Study of the Messiah and His Mission according to Holy Scriptures both Ancient and Modern , with a predicted rating of 4.593
recommendation  5  is  The Sandman: King of Dreams , with a predicted rating of 4.5625
recommendation  6  is  Preach My Gospel (A Guide to Missionary Service) , with a predicted rating of 4.5219
recommendation  7  is   الرحيق المختوم: بحث في السيرة النبوية على صاحبها أفضل الصلاة والسلام  , with a predicted rating of 4.4992
recommendation  8  is  There's Treasure Everywhere: A Calvin and Hobbes Collection , with a predicted rating of 4.4974
recommendation  9  is  It's a Magical World: A Calvin and Hobbes Collection , with a predicted rating of 4.4843
recommendation  10  is  Attack of 

## Comparison

We have tried to provide recommendations to user 1839 using 4 methods. You last task is to put these 4 recommendations in a dataframe, with the column names the methods you used, and print out the dataframe.

In [36]:
# Use as many boxes as you need.
rec_comparison = pd.DataFrame(list(zip(ubcf_rec_book_title, ibcf_rec_book_title, mf_rec_book_title, svd_rec_book_title)),
                                  columns=['User Based', 'Item Based', 'Matrix factorization', 'SVD matrix factorization'])



In [37]:
rec_comparison

Unnamed: 0,User Based,Item Based,Matrix factorization,SVD matrix factorization
0,The Da Vinci Code,Secret Prey,Jesus the Christ: A Study of the Messiah and H...,The Complete Calvin and Hobbes
1,O Alquimista,Sudden Prey,The Essential Calvin and Hobbes: A Calvin and ...,دیوان‎‎ [Dīvān]
2,Harry Potter and the Prisoner of Azkaban,Night Prey,Complete Harry Potter Boxed Set,I Want My Hat Back
3,Harry Potter and the Philosopher's Stone,Mortal Prey,The Brothers K,Jesus the Christ: A Study of the Messiah and H...
4,Harry Potter and the Order of the Phoenix,Mind Prey,The Complete Anne of Green Gables Boxed Set,The Sandman: King of Dreams
5,The Kite Runner,Chosen Prey,Maus II : And Here My Troubles Began,Preach My Gospel (A Guide to Missionary Service)
6,Harry Potter and the Goblet of Fire,Heat Lightning,Being Mortal: Medicine and What Matters in the...,الرحيق المختوم: بحث في السيرة النبوية على صاح...
7,Harry Potter and the Half-Blood Prince,Bad Blood,Words of Radiance,There's Treasure Everywhere: A Calvin and Hobb...
8,Le Petit Prince,Shock Wave,The Complete Maus,It's a Magical World: A Calvin and Hobbes Coll...
9,Harry Potter and the Chamber of Secrets,The Graveyard Book,The Authoritative Calvin and Hobbes,Attack of the Deranged Mutant Killer Monster S...


From the comparison of recommendations by memory based models, we can see that the books recommended by user based collaborative filtering and item based collaborative filtering are somewhat different. However, the comparison of recommendations using model based methods show that the results of matrix factorization and SVD matrix factorization are quite similar.