# BOOK RECOMMENDATION SYSTEM

A book recommendation system using k-nearest neighbors (KNN) is designed to suggest books to users based on similarities in their reading preferences. By analyzing user-book interaction data, KNN identifies similar users and recommends books that these users have enjoyed. The algorithm calculates distances between users or items (books) in a multidimensional space and selects the k nearest neighbors to make personalized recommendations. This approach enables the system to provide accurate and relevant book suggestions tailored to each user's interests, enhancing their reading experience.

 <h3>Unsupervised Learning</h3>

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sbn
%matplotlib inline

In [8]:
df = pd.read_csv("data/BX-Books.csv", error_bad_lines=False, encoding='latin-1')



  df = pd.read_csv("data/BX-Books.csv", error_bad_lines=False, encoding='latin-1')
  df = pd.read_csv("data/BX-Books.csv", error_bad_lines=False, encoding='latin-1')


ISBN (International Standard Book Number) is a unique identifier assigned to books and book-like products. It consists of a numerical code typically represented as a 13-digit or 10-digit number. ISBNs help in identifying and managing books in libraries, bookstores, and online retailers, facilitating efficient cataloging and tracking of publications.

In [9]:
df.head(8)

Unnamed: 0,isbn,book_title,book_author,year_of_publication,publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company
5,399135782,The Kitchen God's Wife,Amy Tan,1991,Putnam Pub Group
6,425176428,What If?: The World's Foremost Military Histor...,Robert Cowley,2000,Berkley Publishing Group
7,671870432,PLEADING GUILTY,Scott Turow,1993,Audioworks


In [10]:
df.columns

Index(['isbn', 'book_title', 'book_author', 'year_of_publication',
       'publisher'],
      dtype='object')

In [11]:
users = pd.read_csv("data/BX-Users.csv", error_bad_lines=False, encoding='latin-1')



  users = pd.read_csv("data/BX-Users.csv", error_bad_lines=False, encoding='latin-1')
  users = pd.read_csv("data/BX-Users.csv", error_bad_lines=False, encoding='latin-1')


In [12]:
users.head(6)

Unnamed: 0,user_id,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",
5,6,"santa monica, california, usa",61.0


In [13]:
users.columns = ['user_id', 'location', 'age']

In [14]:
users.head(6)

Unnamed: 0,user_id,location,age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",
5,6,"santa monica, california, usa",61.0


In [15]:
users.columns

Index(['user_id', 'location', 'age'], dtype='object')

In [16]:
ratings = pd.read_csv("data/BX-Book-Ratings.csv", error_bad_lines=False, encoding='latin-1')



  ratings = pd.read_csv("data/BX-Book-Ratings.csv", error_bad_lines=False, encoding='latin-1')


In [17]:
ratings.head(6)

Unnamed: 0,user_id,isbn,rating
0,276725,034545104X,0
1,276726,155061224,5
2,276727,446520802,0
3,276729,052165615X,3
4,276729,521795028,6
5,276733,2080674722,0


In [18]:
df.shape

(271379, 5)

In [19]:
users.shape

(278859, 3)

In [20]:
ratings.shape

(1048575, 3)

In [21]:
ratings["user_id"].value_counts()

11676     13602
198711     7550
153662     6109
98391      5891
35859      5850
          ...  
104999        1
105002        1
105008        1
105014        1
123969        1
Name: user_id, Length: 95513, dtype: int64

In [22]:
ratings["user_id"].value_counts().shape

(95513,)

users who rated at least 200 books, we will considere their ratings 

In [23]:
ratings["user_id"].value_counts() > 200

11676      True
198711     True
153662     True
98391      True
35859      True
          ...  
104999    False
105002    False
105008    False
105014    False
123969    False
Name: user_id, Length: 95513, dtype: bool

In [24]:
i = ratings["user_id"].value_counts() > 200

In [25]:
i[i].shape

(815,)

In [26]:
i[i].index

Int64Index([ 11676, 198711, 153662,  98391,  35859, 212898, 278418,  76352,
            110973, 235105,
            ...
             88793,  33145, 116122,   9856,  73681,  28634,  59727, 188951,
            155916,  44296],
           dtype='int64', length=815)

In [27]:
ratings = ratings[ratings["user_id"].isin(i[i].index)]
ratings.head(10)

Unnamed: 0,user_id,isbn,rating
1456,277427,002542730X,10
1457,277427,26217457,0
1458,277427,003008685X,8
1459,277427,30615321,0
1460,277427,60002050,0
1461,277427,60006641,10
1462,277427,60159685,0
1463,277427,60177721,0
1464,277427,60192704,0
1465,277427,60542128,7


In [28]:
ratings.shape

(482728, 3)

Merge Ratings with Book  (df)

In [29]:
rate_books = ratings.merge(df, on="isbn")
rate_books.head(10)

Unnamed: 0,user_id,isbn,rating,book_title,book_author,year_of_publication,publisher
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc
1,3363,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc
2,11676,002542730X,6,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc
3,12538,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc
4,13552,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc
5,16795,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc
6,24194,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc
7,25981,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc
8,26535,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc
9,28204,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc


Books and ratings given by multiple users

In [30]:
ratings.shape

(482728, 3)

In [31]:
rate_books.shape

(446881, 7)

Many books removed since they were not rated

In [32]:
no_ratings = rate_books.groupby("book_title")["rating"].count().reset_index()

In [33]:
no_ratings.rename(columns={"rating":"no_of_ratings"}, inplace=True)

In [34]:
no_ratings.head(10)

Unnamed: 0,book_title,no_of_ratings
0,A Light in the Storm: The Civil War Diary of ...,2
1,Always Have Popsicles,1
2,Apple Magic (The Collector's series),1
3,Beyond IBM: Leadership Marketing and Finance ...,1
4,Clifford Visita El Hospital (Clifford El Gran...,1
5,Dark Justice,1
6,Deceived,1
7,Earth Prayers From around the World: 365 Pray...,3
8,Final Fantasy Anthology: Official Strategy Gu...,3
9,Flight of Fancy: American Heiresses (Zebra Ba...,1


apply no_of_ratings to no_rating dataframe by merging

In [35]:
final_ratings = rate_books.merge(no_ratings, on="book_title")
final_ratings.head()

Unnamed: 0,user_id,isbn,rating,book_title,book_author,year_of_publication,publisher,no_of_ratings
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,74
1,3363,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,74
2,11676,002542730X,6,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,74
3,12538,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,74
4,13552,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,74


In [36]:
final_ratings.shape

(446881, 8)

we have accumulate rating of each book with total no_of_ratings book recieved

now we need to remove books which got no_of_ratings less than 50 (not so useful for recommendation)

In [37]:
final_ratings = final_ratings[ final_ratings["no_of_ratings"] >= 50 ]

In [38]:
final_ratings.shape

(50851, 8)

In [39]:
final_ratings.drop_duplicates(["user_id", "book_title"], inplace=True)

In [40]:
final_ratings.shape

(49070, 8)

finally create a pivot table between users and their ratings

In [41]:
book_pivot = final_ratings.pivot_table(columns="user_id", index="book_title", values="rating")

In [42]:
book_pivot

user_id,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,249628,249862,249894,250184,250405,250764,277427,277478,277639,278418
book_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,,,,,,,,,,...,,,,,,,,,,
1st to Die: A Novel,,,,,,,,,,,...,0.0,,,,,,,,,
2nd Chance,,10.0,,,,,,,,,...,,,,,,,,,0.0,
4 Blondes,,,,,,,,,,0.0,...,,,,,,,,,,
84 Charing Cross Road,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,,,,7.0,,,,,7.0,,...,,,,,,,,,,
You Belong To Me,,,,,,,,,,,...,,,,0.0,,,,,,
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,,,,,0.0,,,,,0.0,...,,,,,,,,,,
Zoya,,,,,,,,,,,...,,,,,,0.0,,,,


In [43]:
book_pivot.shape

(626, 804)

earlier no_of_rating = 815

books = 626 and no_of_users = 804<br>
815-804 = 11

11 ratings were made on books that could not make upto total 50 ratings hence those books were removed with their ratings

In [44]:
book_pivot.fillna(0, inplace=True)

In [45]:
book_pivot

user_id,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,249628,249862,249894,250184,250405,250764,277427,277478,277639,278418
book_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1st to Die: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2nd Chance,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4 Blondes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
84 Charing Cross Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,7.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
You Belong To Me,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zoya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [57]:
book_pivot.to_csv("data/book_pivot.csv")

In [2]:
book_pivot = pd.read_csv("data/book_pivot.csv")
book_pivot

Unnamed: 0,book_title,254,2276,2766,2977,3363,3757,4017,4385,6242,...,249628,249862,249894,250184,250405,250764,277427,277478,277639,278418
0,1984,9,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1st to Die: A Novel,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2nd Chance,0,10,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4 Blondes,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,84 Charing Cross Road,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
621,Year of Wonders,0,0,0,7,0,0,0,0,7,...,0,0,0,0,0,0,0,0,0,0
622,You Belong To Me,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
623,Zen and the Art of Motorcycle Maintenance: An ...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
624,Zoya,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [5]:
for i in book_pivot["book_title"] :
    print(i, "\n")

1984 

1st to Die: A Novel 

2nd Chance 

4 Blondes 

84 Charing Cross Road 

A Bend in the Road 

A Case of Need 

A Child Called \It\": One Child's Courage to Survive" 

A Civil Action 

A Day Late and a Dollar Short 

A Fine Balance 

A Heartbreaking Work of Staggering Genius 

A Is for Alibi (Kinsey Millhone Mysteries (Paperback)) 

A Lesson Before Dying (Vintage Contemporaries (Paperback)) 

A Man in Full 

A Map of the World 

A Painted House 

A Patchwork Planet 

A Prayer for Owen Meany 

A Thin Dark Line (Mysteries &amp; Horror) 

A Thousand Acres (Ballantine Reader's Circle) 

A Time to Kill 

A Virtuous Woman (Oprah's Book Club (Paperback)) 

A Walk to Remember 

A Widow for One Year 

A Wrinkle In Time 

A Wrinkle in Time 

ANGELA'S ASHES 

About a Boy 

Absolute Power 

Accident 

Airframe 

All Around the Town 

All I Really Need to Know 

All That Remains (Kay Scarpetta Mysteries (Paperback)) 

All the Pretty Horses (The Border Trilogy, Vol 1) 

Along Came a Spider (Alex

1. a lot of zeroes will cause a lot of processing <br>
2. to avoid this use sparse matrix(csr_matrix in python)

In [76]:
from scipy.sparse import csr_matrix

In [77]:
book_sparse = csr_matrix(book_pivot)

In [78]:
type(book_sparse)

scipy.sparse._csr.csr_matrix

# Model creating

In [79]:
from sklearn.neighbors import NearestNeighbors

In [80]:
model = NearestNeighbors(algorithm="brute")

model fitting

In [81]:
model.fit(book_sparse)

In [82]:
book_pivot.iloc[237, :]

user_id
254       0.0
2276      0.0
2766      0.0
2977      0.0
3363      0.0
         ... 
250764    0.0
277427    0.0
277478    0.0
277639    0.0
278418    0.0
Name: Invasion, Length: 804, dtype: float64

In [55]:
# distance, suggestions = model.kneighbors(book_pivot.iloc[237, :].values.reshape(1, -1), n_neighbors=6)

In [84]:
distance

array([[ 0.        , 21.44761059, 24.31049156, 24.8394847 , 25.03996805,
        25.23885893]])

0 distance is 237 index number book itself and then goes with increasing distances

let us find the suggestions means actual index of books suggested

In [85]:
suggestions

array([[237, 448, 577,  80, 331, 108]], dtype=int64)

In [86]:
book_pivot.index[237]

'Invasion'

In [88]:
for i in range(len(suggestions)) :
    print(book_pivot.index[suggestions[i]])

Index(['Invasion', 'The Cradle Will Fall',
       'Tom Clancy's Op-Center (Tom Clancy's Op Center (Paperback))',
       'CAT'S EYE', 'Pleading Guilty', 'Cry Wolf'],
      dtype='object', name='book_title')


check for index number 54

In [90]:
distance, suggestions = model.kneighbors(book_pivot.iloc[54, :].values.reshape(1, -1), n_neighbors=6)

In [91]:
distance

array([[ 0.        , 34.0147027 , 34.79942528, 35.4682957 , 36.12478374,
        36.22154055]])

In [92]:
suggestions

array([[ 54, 616, 401, 134, 274, 393]], dtype=int64)

In [93]:
for i in range(len(suggestions)) :
    print(book_pivot.index[suggestions[i]])

Index(['Bag of Bones', 'Winter Moon', 'Strangers', 'Dragon Tears',
       'Master of the Game', 'Sole Survivor'],
      dtype='object', name='book_title')


Now check for 134 which lies bw of 54 as shown above and find common from above

In [94]:
distance, suggestions = model.kneighbors(book_pivot.iloc[134, :].values.reshape(1, -1), n_neighbors=6)

In [95]:
distance

array([[ 0.        , 25.35744467, 25.41653005, 26.92582404, 26.94438717,
        27.07397274]])

In [96]:
suggestions

array([[134, 401, 609, 448,  80,  59]], dtype=int64)

In [97]:
for i in range(len(suggestions)) :
    print(book_pivot.index[suggestions[i]])

Index(['Dragon Tears', 'Strangers', 'While My Pretty One Sleeps',
       'The Cradle Will Fall', 'CAT'S EYE', 'Before I Say Good-Bye'],
      dtype='object', name='book_title')


Found 1 common from 5 recommended with previous ie 401 Strangers

find index from book name since index is required to find suggestions

In [100]:
np.where(book_pivot.index == "Bag of Bones")[0][0]

54

Function to recommend the book

In [46]:
def recommend_books(book_name, n) :
    recommended = []
    try :
        book_id = np.where(book_pivot.index == book_name)[0][0]
        distance, suggestions = model.kneighbors(book_pivot.iloc[book_id, :].values.reshape(1, -1), n_neighbors=n)
        for i in range(len(suggestions)) :
            if not i :
                recommended.append(book_pivot.index[suggestions[i]])
    except :
        print("We could not find any of book you inserted in our system, Hence we cant recommend!, SORRY")
    return recommended

In [48]:
recommended = recommend_books("Dragon Tears", 8)
for i in recommended :
    print(i)

We could not find any of book you inserted in our system, Hence we cant recommend!, SORRY


Test for non existing book ---> Generates Error (Handle it)

error handling can solve the issue

In [2]:
import pickle

In [118]:
pickle.dump(model, open("models/knn_model_brs.sav", 'wb'))

In [None]:
# load the model from disk

In [3]:
loaded_model = pickle.load(open("models/knn_model_brs.sav", 'rb'))

In [50]:
distance, suggestions = loaded_model.kneighbors(book_pivot.iloc[134, :].values.reshape(1, -1), n_neighbors=6)

In [51]:
for i in range(len(suggestions)) :
    print(book_pivot.index[suggestions[i]])

Index(['Dragon Tears', 'Strangers', 'While My Pretty One Sleeps',
       'The Cradle Will Fall', 'CAT'S EYE', 'Before I Say Good-Bye'],
      dtype='object', name='book_title')


In [52]:
def recommend_books(book_name, n) :
    recommended = []
    try :
        book_id = np.where(book_pivot.index == book_name)[0][0]
        distance, suggestions = loaded_model.kneighbors(book_pivot.iloc[book_id, :].values.reshape(1, -1), n_neighbors=n)
        for i in range(len(suggestions)) :
            if not i :
                recommended.append(book_pivot.index[suggestions[i]])
    except :
        print("We could not find any of book you inserted in our system, Hence we cant recommend!, SORRY")
    return recommended

In [53]:
recommended = recommend_books("Dragon Tears", 8)
for i in recommended :
    print(i)

Index(['Dragon Tears', 'Strangers', 'While My Pretty One Sleeps',
       'The Cradle Will Fall', 'CAT'S EYE', 'Before I Say Good-Bye',
       'Pleading Guilty', 'Winter Moon'],
      dtype='object', name='book_title')


save book_pivot dataframe to disk in csv form

Done 


Thank You!