# Book Recommendation System

In this notebook, we will build a book recommendation system using collaborative filtering and content-based filtering techniques. We will start by downloading the dataset from Kaggle and then proceed with data preprocessing, exploratory data analysis, and model building.

## Step 1: Download the Dataset

Let's start by downloading the dataset from Kaggle, which contains information on books, users, and their ratings.


In [1]:
!kaggle datasets download -d arashnic/book-recommendation-dataset

Dataset URL: https://www.kaggle.com/datasets/arashnic/book-recommendation-dataset
License(s): CC0-1.0
Downloading book-recommendation-dataset.zip to /content
 70% 17.0M/24.3M [00:00<00:00, 56.2MB/s]
100% 24.3M/24.3M [00:00<00:00, 63.8MB/s]


## Step 2: Unzip the Dataset

After downloading the dataset, we need to unzip the files to access the CSV files that contain the book data, user data, and ratings.


In [2]:
!unzip -o book-recommendation-dataset.zip


Archive:  book-recommendation-dataset.zip
  inflating: Books.csv               
  inflating: DeepRec.png             
  inflating: Ratings.csv             
  inflating: Users.csv               
  inflating: classicRec.png          
  inflating: recsys_taxonomy2.png    


## Step 3:Import Library and Load the Dataset

Now that we have unzipped the dataset, we can load the data into pandas DataFrames. The dataset includes three main files:
- **Books.csv**: Contains information about the books.
- **Users.csv**: Contains information about the users.
- **Ratings.csv**: Contains user ratings for the books.

We will load these files and take a look at the first few rows to understand the structure of the data.


In [4]:
import pandas as pd
import numpy as np
book_data=pd.read_csv('Books.csv')
User_data=pd.read_csv('Users.csv')
rating_data=pd.read_csv('Ratings.csv')

  book_data=pd.read_csv('Books.csv')


In [5]:
book_data.head()

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...


In [6]:
User_data.head()

Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


In [7]:
rating_data.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


## Step 4: EDA and Data Preprocessing

Let's explore the dataset to gain insights into the distribution of books, users, and ratings. This step will help us understand the data better and guide us in building an effective recommendation system.


Before building the recommendation model, we need to preprocess the data. This involves:
- Handling missing values
- Filtering data to remove outliers or irrelevant information
- Encoding categorical variables if necessary

Let's start by examining the data for any necessary preprocessing steps.



In [8]:
book_data = book_data.drop(['Image-URL-S', 'Image-URL-M', 'Image-URL-L'], axis=1 )

In [9]:
book_data.head()

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company


In [10]:
book_data.shape

(271360, 5)

In [11]:
book_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 271360 entries, 0 to 271359
Data columns (total 5 columns):
 #   Column               Non-Null Count   Dtype 
---  ------               --------------   ----- 
 0   ISBN                 271360 non-null  object
 1   Book-Title           271360 non-null  object
 2   Book-Author          271358 non-null  object
 3   Year-Of-Publication  271360 non-null  object
 4   Publisher            271358 non-null  object
dtypes: object(5)
memory usage: 10.4+ MB


In [12]:
book_data.dropna(inplace=True)

In [13]:
book_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 271356 entries, 0 to 271359
Data columns (total 5 columns):
 #   Column               Non-Null Count   Dtype 
---  ------               --------------   ----- 
 0   ISBN                 271356 non-null  object
 1   Book-Title           271356 non-null  object
 2   Book-Author          271356 non-null  object
 3   Year-Of-Publication  271356 non-null  object
 4   Publisher            271356 non-null  object
dtypes: object(5)
memory usage: 12.4+ MB


In [14]:
book_data.head()

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company


In [15]:
book_data.rename(columns={'Book-Title':'title','Book-Author':'author','Year-Of-Publication':'year ','Publisher':'publisher'},inplace=True)

In [16]:
book_data.head(2)

Unnamed: 0,ISBN,title,author,year,publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada


In [17]:
User_data.head()

Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


In [18]:
User_data.rename(columns={'User-ID':'user id','Location':'location','Age':'age'},inplace=True)

In [19]:
rating_data.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [20]:
rating_data.rename(columns={'User-ID':'user id','Book-Rating':'rating'},inplace=True)

In [21]:
book_data.shape

(271356, 5)

In [22]:
rating_data.shape

(1149780, 3)

In [23]:
User_data.shape

(278858, 3)

In [24]:
User_data.head()

Unnamed: 0,user id,location,age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


In [25]:
rating_data['user id'].value_counts()


Unnamed: 0_level_0,count
user id,Unnamed: 1_level_1
11676,13602
198711,7550
153662,6109
98391,5891
35859,5850
...,...
116180,1
116166,1
116154,1
116137,1


In [26]:
rating_data['user id'].value_counts() >200

Unnamed: 0_level_0,count
user id,Unnamed: 1_level_1
11676,True
198711,True
153662,True
98391,True
35859,True
...,...
116180,False
116166,False
116154,False
116137,False


In [27]:
x=rating_data['user id'].value_counts()>200

In [28]:
x.dtype

dtype('bool')

In [29]:
z=x.index

In [30]:
x[x]

Unnamed: 0_level_0,count
user id,Unnamed: 1_level_1
11676,True
198711,True
153662,True
98391,True
35859,True
...,...
274808,True
28634,True
59727,True
268622,True


In [31]:
y = x[x].index

In [32]:
rating_data.head()

Unnamed: 0,user id,ISBN,rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [33]:
y

Index([ 11676, 198711, 153662,  98391,  35859, 212898, 278418,  76352, 110973,
       235105,
       ...
       260183,  73681,  44296, 155916,   9856, 274808,  28634,  59727, 268622,
       188951],
      dtype='int64', name='user id', length=899)

In [34]:
rating_data = rating_data[rating_data['user id'].isin(y)]


In [35]:
rating_data.shape

(526356, 3)

In [36]:
y.shape

(899,)

In [37]:
rating_with_books = rating_data.merge(book_data,on='ISBN')

In [38]:
rating_with_books.head()

Unnamed: 0,user id,ISBN,rating,title,author,year,publisher
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc
1,3363,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc
2,11676,002542730X,6,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc
3,12538,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc
4,13552,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc


In [39]:
rating_with_books.shape

(487668, 7)

In [40]:
582644-539223

43421

In [41]:
book_data['title'].unique().shape



(242132,)

In [42]:
rating_with_books.groupby('title')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7c5d61767ac0>

In [43]:
number_of_rating = rating_with_books.groupby('title')['rating'].count().reset_index()

In [44]:
number_of_rating.head(2)

Unnamed: 0,title,rating
0,A Light in the Storm: The Civil War Diary of ...,2
1,Always Have Popsicles,1


In [45]:
number_of_rating.rename(columns={'rating':'number of rating'},inplace=True)

In [46]:
final_rating = rating_with_books.merge(number_of_rating,on='title')

In [47]:
final_rating.shape

(487668, 8)

In [48]:
final_rating.head(2)

Unnamed: 0,user id,ISBN,rating,title,author,year,publisher,number of rating
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,82
1,3363,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,82


In [49]:
x=final_rating['number of rating']>=50

In [50]:
final_rating=final_rating[x]

In [51]:
final_rating.drop_duplicates(['user id','title'],inplace=True)

In [52]:
final_rating.shape

(59850, 8)

In [53]:
pivot_table_data= final_rating.pivot_table(columns='user id',index='title',values='rating')

In [54]:
pivot_table_data

user id,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,274004,274061,274301,274308,274808,275970,277427,277478,277639,278418
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,,,,,,,,,,...,,,,,,0.0,,,,
1st to Die: A Novel,,,,,,,,,,,...,,,,,,,,,,
2nd Chance,,10.0,,,,,,,,,...,,,,0.0,,,,,0.0,
4 Blondes,,,,,,,,,,0.0,...,,,,,,,,,,
84 Charing Cross Road,,,,,,,,,,,...,,,,,,10.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,,,,7.0,,,,,7.0,,...,,,,,,0.0,,,,
You Belong To Me,,,,,,,,,,,...,,,,,,,,,,
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,,,,,0.0,,,,,0.0,...,,,,,,0.0,,,,
Zoya,,,,,,,,,,,...,,,,,,,,,,


In [55]:
pivot_table_data.fillna(0,inplace=True)

In [56]:
pivot_table_data.shape

(742, 888)

In [57]:
pivot_table_data

user id,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,274004,274061,274301,274308,274808,275970,277427,277478,277639,278418
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1st to Die: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2nd Chance,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4 Blondes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
84 Charing Cross Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,7.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
You Belong To Me,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zoya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [58]:
from scipy.sparse import csr_matrix
book_sparse = csr_matrix(pivot_table_data)

In [59]:
print(book_sparse)

  (0, 0)	9.0
  (0, 16)	8.0
  (0, 37)	9.0
  (0, 42)	8.0
  (0, 59)	7.0
  (0, 135)	9.0
  (0, 156)	10.0
  (0, 170)	9.0
  (0, 187)	8.0
  (0, 220)	10.0
  (0, 221)	10.0
  (0, 245)	10.0
  (0, 277)	10.0
  (0, 359)	10.0
  (0, 370)	9.0
  (0, 389)	10.0
  (0, 477)	9.0
  (0, 491)	7.0
  (0, 507)	9.0
  (0, 554)	10.0
  (0, 604)	9.0
  (0, 611)	8.0
  (0, 695)	9.0
  (0, 734)	9.0
  (0, 762)	8.0
  :	:
  (740, 484)	8.0
  (740, 681)	9.0
  (741, 16)	8.0
  (741, 26)	7.0
  (741, 86)	10.0
  (741, 111)	5.0
  (741, 150)	6.0
  (741, 233)	7.0
  (741, 290)	8.0
  (741, 303)	9.0
  (741, 322)	10.0
  (741, 367)	10.0
  (741, 384)	9.0
  (741, 446)	7.0
  (741, 450)	9.0
  (741, 485)	7.0
  (741, 498)	10.0
  (741, 505)	8.0
  (741, 544)	10.0
  (741, 592)	8.0
  (741, 700)	10.0
  (741, 713)	8.0
  (741, 745)	10.0
  (741, 830)	9.0
  (741, 880)	8.0


## Step 7: Building the Recommendation Model

### Collaborative Filtering

Collaborative filtering is a method that makes automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). We will use this technique to suggest books based on user ratings.


In [60]:
from sklearn.neighbors import NearestNeighbors
model = NearestNeighbors(algorithm='brute')

In [61]:
model.fit(book_sparse)

In [62]:
distances,suggestions = model.kneighbors(pivot_table_data.iloc[0,:].values.reshape(1,-1),n_neighbors=6)

In [63]:
distances

array([[ 0.        , 47.5394573 , 49.06118629, 49.10193479, 49.53786431,
        49.61854492]])

In [64]:
suggestions

array([[  0, 372,   8, 212, 320,  33]])

In [65]:
suggestions[0][1]

372

In [66]:
pivot_table_data

user id,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,274004,274061,274301,274308,274808,275970,277427,277478,277639,278418
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1st to Die: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2nd Chance,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4 Blondes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
84 Charing Cross Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,7.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
You Belong To Me,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zoya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [67]:
pivot_table_data.iloc[237,:]

Unnamed: 0_level_0,Harry Potter and the Chamber of Secrets (Book 2)
user id,Unnamed: 1_level_1
254,9.0
2276,0.0
2766,0.0
2977,0.0
3363,0.0
...,...
275970,9.0
277427,0.0
277478,0.0
277639,0.0


In [68]:
for i in range(len(suggestions[0])):
    print(pivot_table_data.index[suggestions[0][i]])

1984
No Safe Place
A Civil Action
Foucault's Pendulum
Long After Midnight
Abduction


In [69]:
def recommend_book(book_name):
    if book_name not in pivot_table_data.index:
        print("book not found")
        return
    book_id = np.where(pivot_table_data.index==book_name)[0][0]
    distances,suggestions = model.kneighbors(pivot_table_data.iloc[book_id,:].values.reshape(1,-1),n_neighbors=6)
    for i in range(len(suggestions[0])):
      if(pivot_table_data.index[suggestions[0][i]]==book_name):
          print('suggestions for ',book_name," are :")
          continue
      print(pivot_table_data.index[suggestions[0][i]])

In [70]:
recommend_book('Classical Mythology')

book not found


In [71]:
recommend_book('Pleading Guilty')

suggestions for  Pleading Guilty  are :
No Safe Place
Long After Midnight
Exclusive
Journey
Absolute Power


In [72]:
recommend_book('The Da Vinci Code')

suggestions for  The Da Vinci Code  are :
Touching Evil
The Blue Nowhere : A Novel
Saving Faith
Zoya
Sea Glass: A Novel


In [73]:
pivot_table_data

user id,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,274004,274061,274301,274308,274808,275970,277427,277478,277639,278418
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1st to Die: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2nd Chance,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4 Blondes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
84 Charing Cross Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,7.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
You Belong To Me,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zoya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [74]:
from sklearn.metrics.pairwise import cosine_similarity

In [75]:
similarity_score=cosine_similarity(pivot_table_data)

In [76]:
similarity_score[0].shape

(742,)

In [77]:
import numpy as np

In [78]:
def recommend_book_sec(book_name):
    if book_name not in pivot_table_data.index:
        print("book not found")
        return

    book_id = np.where(pivot_table_data.index==book_name)[0][0]
    sugetions= sorted(list(enumerate(similarity_score[0])),key=lambda x:x[1],reverse=True)[1:6]

    for i in sugetions:
        if(pivot_table_data.index[i[0]]==book_name):
          continue
        print(pivot_table_data.index[i[0]])


In [79]:
recommend_book_sec('Classical Mythology')

book not found


In [80]:
recommend_book_sec('Zen and the Art of Motorcycle Maintenance: An Inquiry into Values')

Animal Farm
The Handmaid's Tale
The Catcher in the Rye
Lord of the Flies
The Vampire Lestat (Vampire Chronicles, Book II)


In [81]:
recommend_book_sec('2nd Chance')

Animal Farm
The Handmaid's Tale
The Catcher in the Rye
Lord of the Flies
The Vampire Lestat (Vampire Chronicles, Book II)


In [82]:
book_data.head()

Unnamed: 0,ISBN,title,author,year,publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company


In [83]:
pivot_table_data

user id,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,274004,274061,274301,274308,274808,275970,277427,277478,277639,278418
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1st to Die: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2nd Chance,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4 Blondes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
84 Charing Cross Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,7.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
You Belong To Me,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zoya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Conclusion

In this notebook, we successfully built a book recommendation system using collaborative filtering. We explored the dataset, preprocessed the data. This recommendation system can be further enhanced with more sophisticated techniques or by incorporating additional data.
