Functions of Recommeder System

The main functions of the recommender system are:

It helps user to deal with information overload by filtering recommendations of product.
It helps businesses to generate more profits by selling more products.

In this article, we will build a Book Recommenders System using KNN.

Collaborative and Content Based Filtering

Collaborative filtering focuses on the ratings of the items given by users. It is based on “Wisdom of the crowd”. It predicts the suggested items based on the taste information from other users i.e. recommendations are from collaborative user ratings. collaborative filtering is used by large organizations such as Amazon and Netflix. It will suffer from cold start problems, sparsity problems, popularity bias, and first starter.

Content-based filtering recommends items to users based on the description of the items and user profile. For example, recommending products based on the textual description, recommending movies based on their textual overview, and recommending books based on associated keywords. It will suffer where content is not well represented by keywords and the problem of indistinguishable items(same set feature items).

Here we use Collaborative Filtering

Loading the Data
In this tutorial, you will build a book recommender system. You can download this dataset from here : http://www2.informatik.uni-freiburg.de/~cziegler/BX/.

Let’s load the data into pandas dataframe:

In [15]:
import pandas as pd
import numpy as np

# Read Ratings csv file
ratings = pd.read_csv("data/Ratings.csv", sep=';', encoding='latin-1', error_bad_lines = False)

# Show top-5 records
ratings.head()




  ratings = pd.read_csv("data/Ratings.csv", sep=';', encoding='latin-1', error_bad_lines = False)


Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [16]:
# Read Books csv file
books = pd.read_csv("data/Books.csv", sep=';', encoding='latin-1', error_bad_lines = False)

# Show top-5 records
books.head()



  books = pd.read_csv("data/Books.csv", sep=';', encoding='latin-1', error_bad_lines = False)
b'Skipping line 6452: expected 8 fields, saw 9\nSkipping line 43667: expected 8 fields, saw 10\nSkipping line 51751: expected 8 fields, saw 9\n'
b'Skipping line 92038: expected 8 fields, saw 9\nSkipping line 104319: expected 8 fields, saw 9\nSkipping line 121768: expected 8 fields, saw 9\n'
b'Skipping line 144058: expected 8 fields, saw 9\nSkipping line 150789: expected 8 fields, saw 9\nSkipping line 157128: expected 8 fields, saw 9\nSkipping line 180189: expected 8 fields, saw 9\nSkipping line 185738: expected 8 fields, saw 9\n'
b'Skipping line 209388: expected 8 fields, saw 9\nSkipping line 220626: expected 8 fields, saw 9\nSkipping line 227933: expected 8 fields, saw 11\nSkipping line 228957: expected 8 fields, saw 10\nSkipping line 245933: expected 8 fields, saw 9\nSkipping line 251296: expected 8 fields, saw 9\nSkipping line 259941: expected 8 fields, saw 9\nSkipping line 261529: expect

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...


In the above two code snippet, we have loaded the Ratings and Books data in Pandas DataFrame.

Merge the Data

In this section, we will merge the ratings and books’ dataframes based on the ISBN column using merge() function.

In [17]:
# Join ratings and books dataframes
rating_books=pd.merge(ratings,books,on="ISBN")

# Shape of the data
rating_books.shape

(1031397, 10)

Create Item-User Matrix using pivot_table()

In this section, we will create the pivot table with book-title on the index, user id on the column, and fill values book rating. But before this first, we take a sample(1%) of the whole dataset because this dataset has 1 million records. If we don;t do this it will take a very long time or may cause of memory error on 8 GB Laptop.

In [18]:
# Take 1 % data as sample  
rating_books_sample = rating_books.sample(frac=.01, random_state=1) 

# Shape of the sample data
rating_books_sample.shape

(10314, 10)

Let’s create a Pivot table:

In [44]:
# Create Item-user matrix using pivot_table()
from random import randint
rating_books_pivot = rating_books_sample.pivot_table(index='Book-Title', columns='User-ID', values='Book-Rating')
fill_list = [0,1,2,3,4,5,6,7,8,9,10]
rating_books_pivot = rating_books_pivot.fillna(pd.Series(np.random.choice(fill_list, size=len(rating_books_pivot.index))))
rating_books_pivot = rating_books_pivot.fillna(1)

# Show top-5 records
rating_books_pivot.head()

User-ID,77,243,244,254,289,384,424,446,472,507,...,278144,278188,278209,278418,278449,278554,278582,278767,278807,278843
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Garfield Bigger and Better (Garfield (Numbered Paperback)),2.0,10.0,10.0,2.0,10.0,9.0,2.0,1.0,3.0,10.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
"Q-Zone (Star Trek The Next Generation, Book 48)",2.0,10.0,10.0,2.0,10.0,9.0,2.0,1.0,3.0,10.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
"'Isms: a dictionary of words ending in -ism, -ology, and -phobia,: With some similar terms, arranged in subject order",2.0,10.0,10.0,2.0,10.0,9.0,2.0,1.0,3.0,10.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
(Un)arranged Marriage,2.0,10.0,10.0,2.0,10.0,9.0,2.0,1.0,3.0,10.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
0373953194 Christmas Stories 1993,2.0,10.0,10.0,2.0,10.0,9.0,2.0,1.0,3.0,10.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [20]:
rating_books_sample.to_csv("test.csv", sep=';')

Build Nearest Neighbour Model

It’s time to create a NearestNeighbours model for recommendations using the Scikit-lean library.

In [45]:
# Import NearestNeighbors
from sklearn.neighbors import NearestNeighbors

# Build NearestNeighbors Object
model_nn = NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=7, n_jobs=-1)

# Fit the NearestNeighbor
model_nn.fit(rating_books_pivot)

NearestNeighbors(algorithm='brute', metric='cosine', n_jobs=-1, n_neighbors=7)

Generate Recommendations

Let’s make a forecast using a trained model of NearestNeighbours and generate a list of recommended books.

In [46]:
# Get top 10 nearest neighbors 
indices=model_nn.kneighbors(rating_books_pivot.loc[['0373953194 Christmas Stories 1993']], 10, return_distance=False)

# Print the recommended books
print("Recommended Books:")
print("==================")
for index, value in enumerate(rating_books_pivot.index[indices][0]):
    print((index+1),". ",value)

Recommended Books:
1 .  The Slime That Ate Sweet Valley (Sweet Valley Twins and Friends, No 53)
2 .  0373953194 Christmas Stories 1993
3 .  The Age of Reason (Les Chemins De La Liberte)
4 .  Loving Torment
5 .  Head Over Heels (Harlequin Temptation, No 97)
6 .  Buddenbrooks. Verfall einer Familie. Roman.
7 .  Savage Game  (Executioner #292)
8 .  The Hat
9 .  Casebook on Waiting for Godot
10 .  Meltdown


  for index, value in enumerate(rating_books_pivot.index[indices][0]):


In [13]:
print(rating_books_pivot)

User-ID                                               77        243     \
Book-Title                                                               
 Garfield Bigger and Better (Garfield (Numbered...  0.841314 -0.159599   
 Q-Zone (Star Trek The Next Generation, Book 48)   -0.213200 -0.077248   
'Isms: a dictionary of words ending in -ism, -o...  0.356905  1.190656   
(Un)arranged Marriage                              -1.038933 -2.003443   
0373953194 Christmas Stories 1993                   1.581517 -0.185298   
...                                                      ...       ...   
\ Lamb to the Slaughter and Other Stories (Peng...  0.906222  0.752086   
\I Love Her, But\"..."                              0.419533 -0.502141   
\Living And Dying in 4/4 Time\""                   -0.911830  0.233894   
\Soleil De Soufre\" Et Autres Nouvelles"            0.096787  1.458568   
e                                                   0.123811  1.078321   

User-ID                              

In the above output, we can see the list of recommended books.

Issues with KNN Recommender System
Issues with NN-Based Collaborative Filtering

--Popularity Bias: KNN Based Collaborative Recommender Systems is biased towards books those have the most user ratings.
--Cold Start problem: When a new movie is added to the list, it has a lot less user interaction and thus will rarely occur as a recommendation.
--Scalability issue: The issue of managing a movie-user dataset matrix as the count of users and movies increase, since the matrix that we will deal with will have 90% of the values being 0. Storing such a sparse matrix wastes space when the database accommodates millions of users and movies.