#### **Dataset Overview**

This dataset is meticulously curated to include a diverse array of books, user ratings, and metadata essential for 
training and evaluating the recommendation model. The primary objective is to utilize this dataset to build a robust 
system capable of recommending books tailored to individual user preferences.

#### **Contents of the Dataset**

- **Books Information:**
  - **Book ID:** Unique identifier for each book.
  - **Title:** The title of the book.
  - **Authors:** List of authors who wrote the book.
  - **Genres:** Genres or categories the book belongs to.
  - **Publication Year:** The year the book was published.
  - **Image URL:** URL link to the book’s cover image.

- **User Ratings:**
  - **User ID:** Unique identifier for each user.
  - **Book ID:** Unique identifier for each book.
  - **Rating:** Rating given by the user to the book, typically on a scale of 1-5.

- **Additional Metadata:**
  - **Description:** Brief description or summary of the book.
  - **Publisher:** The publishing company.
  - **ISBN:** International Standard Book Number, a unique identifier for books.

#### **Dataset Source**

This dataset is originally obtained from Kaggle, ensuring it is comprehensive and suitable for developing a high-quality book recommendation system. The data has been preprocessed and cleaned to facilitate effective training of machine learning models.

#### **Usage**

- **Training Machine Learning Models:** The dataset is used to train various recommendation algorithms, including collaborative filtering and content-based filtering techniques.
- **Evaluating Model Performance:** The data is split into training and test sets to evaluate the performance and accuracy of the recommendation models.
- **Generating Recommendations:** The trained models utilize this dataset to generate personalized book recommendations for users based on their preferences and reading history.

#### **Access the Dataset**

You can access and explore the dataset on Kaggle using the following link: [Kaggle Dataset Link](#)

#### **Acknowledgements**

We extend our gratitude to the Kaggle community for providing such a valuable dataset, which is instrumental in building an effective Book Recommender System.

### **Books Recommender System using clustering | Collaborative based** 

In [1]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns 

In [2]:
books = pd.read_csv('data/Books.csv')

  books = pd.read_csv('data/Books.csv')


In [3]:
books.head(2)

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...


In [4]:
books.shape

(271360, 8)

In [5]:
books.columns

Index(['ISBN', 'Book-Title', 'Book-Author', 'Year-Of-Publication', 'Publisher',
       'Image-URL-S', 'Image-URL-M', 'Image-URL-L'],
      dtype='object')

In [6]:
books = books[['ISBN', 'Book-Title', 'Book-Author', 'Year-Of-Publication', 'Publisher', 'Image-URL-L']]

In [7]:
books.head(2)

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...


In [8]:
# columns names seems to be bit large so let's rename the columns 
books.rename(columns={
    "Book-Title": "title",
    "Book-Author": "Author",
    "Year-Of-Publication": "year",
    "Publisher": "publisher",
    "Image-URL-L": "img_url",
}, inplace = True )

In [9]:
# Load the users Dataset 
users = pd.read_csv('data/users.csv')

In [10]:
users.head()

Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


In [11]:
users.shape

(278858, 3)

In [12]:
# load the Ratings dataset
ratings = pd.read_csv('data/Ratings.csv')

In [13]:
ratings.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [14]:
ratings.shape

(1149780, 3)

In [15]:
print(books.shape)
print(users.shape)
print(ratings.shape)


(271360, 6)
(278858, 3)
(1149780, 3)


In [16]:
# let's rename ratings' columns 
ratings.rename(columns={
    "User-ID": "user_id",
    "Book-Rating": "rating"}, inplace = True)

In [17]:
ratings.head()

Unnamed: 0,user_id,ISBN,rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [18]:
ratings['user_id'].value_counts()

11676     13602
198711     7550
153662     6109
98391      5891
35859      5850
          ...  
116180        1
116166        1
116154        1
116137        1
276723        1
Name: user_id, Length: 105283, dtype: int64

In [19]:
ratings['user_id'].unique().shape

(105283,)

In [20]:
# There are some users who have just read 1,2 or three books so we exclude such users
# and display only those who have read more than 200 books 

x = ratings['user_id'].value_counts() > 200

In [21]:
x[x].shape

(899,)

only 899 users are there who have read more than 200 books 

In [22]:
# now let's see the index of each user 

y = x[x].index 

In [23]:
y

Int64Index([ 11676, 198711, 153662,  98391,  35859, 212898, 278418,  76352,
            110973, 235105,
            ...
            260183,  73681,  44296, 155916,   9856, 274808,  28634,  59727,
            268622, 188951],
           dtype='int64', length=899)

In [24]:
# diplaying the total numbers of books a user has read on the basis of rating

ratings = ratings[ratings['user_id'].isin(y)]

In [25]:
ratings.head()

Unnamed: 0,user_id,ISBN,rating
1456,277427,002542730X,10
1457,277427,0026217457,0
1458,277427,003008685X,8
1459,277427,0030615321,0
1460,277427,0060002050,0


In [26]:
ratings.shape

(526356, 3)

out of 1149780 users there are only 526356 users who have read more than 200 books 

In [27]:
# now merge the ratings and books datasets 
ratings_with_books = ratings.merge(books, on = "ISBN")

ratings_with_books.head()



Unnamed: 0,user_id,ISBN,rating,title,Author,year,publisher,img_url
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...
1,3363,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...
2,11676,002542730X,6,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...
3,12538,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...
4,13552,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...


In [28]:
ratings_with_books.shape

(487671, 8)

### **we groupby the 'ratings_with_books' with column 'rating' on 'title' to check which specific book has got how many ratings**  

In [29]:
num_rating = ratings_with_books.groupby('title')['rating'].count().reset_index()


In [30]:
num_rating.head()

Unnamed: 0,title,rating
0,A Light in the Storm: The Civil War Diary of ...,2
1,Always Have Popsicles,1
2,Apple Magic (The Collector's series),1
3,Beyond IBM: Leadership Marketing and Finance ...,1
4,Clifford Visita El Hospital (Clifford El Gran...,1


In [31]:
# Group by 'title' and count the 'rating' occurrences
num_rating = ratings_with_books.groupby('title')['rating'].count().reset_index()

# Verify the structure of the DataFrame
print(num_rating.head())
print(num_rating.columns)


                                               title  rating
0   A Light in the Storm: The Civil War Diary of ...       2
1                              Always Have Popsicles       1
2               Apple Magic (The Collector's series)       1
3   Beyond IBM: Leadership Marketing and Finance ...       1
4   Clifford Visita El Hospital (Clifford El Gran...       1
Index(['title', 'rating'], dtype='object')


In [32]:
# Rename the column
num_rating.rename(columns={'rating': 'num_of_rating'}, inplace=True)

# Verify the changes
print(num_rating.head())
print(num_rating.columns)


                                               title  num_of_rating
0   A Light in the Storm: The Civil War Diary of ...              2
1                              Always Have Popsicles              1
2               Apple Magic (The Collector's series)              1
3   Beyond IBM: Leadership Marketing and Finance ...              1
4   Clifford Visita El Hospital (Clifford El Gran...              1
Index(['title', 'num_of_rating'], dtype='object')


##### l**et's judge the book by its ratings , here in this case if ratings are 50 or 50+ then it will be counted as the best book or at least worth_reading** 


In [33]:
# merge the books dataset with rating 
final_rating = ratings_with_books.merge(num_rating, on='title')
final_rating.head()

Unnamed: 0,user_id,ISBN,rating,title,Author,year,publisher,img_url,num_of_rating
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,82
1,3363,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,82
2,11676,002542730X,6,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,82
3,12538,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,82
4,13552,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,82


In [34]:
final_rating.shape

(487671, 9)

In [35]:
# books got more than or =50 ratings 
final_rating = final_rating[final_rating['num_of_rating'] >=50]

In [36]:
final_rating.sample(10)

Unnamed: 0,user_id,ISBN,rating,title,Author,year,publisher,img_url,num_of_rating
115116,241980,553582747,0,From the Corner of His Eye,Dean Koontz,2001,Bantam Books,http://images.amazon.com/images/P/0553582747.0...,71
101000,43246,316666009,0,1st to Die: A Novel,James Patterson,2001,Little Brown and Company,http://images.amazon.com/images/P/0316666009.0...,162
447,135831,60934417,0,Bel Canto: A Novel,Ann Patchett,2002,Perennial,http://images.amazon.com/images/P/0060934417.0...,108
52708,211430,140298479,0,Bridget Jones: The Edge of Reason,Helen Fielding,2001,Penguin Books,http://images.amazon.com/images/P/0140298479.0...,73
12073,128696,679751521,0,Midnight in the Garden of Good and Evil,John Berendt,1999,Vintage Books USA,http://images.amazon.com/images/P/0679751521.0...,62
64981,117539,671510053,5,SHIPPING NEWS,Annie Proulx,1994,Scribner,http://images.amazon.com/images/P/0671510053.0...,113
15709,244349,440234743,0,The Testament,John Grisham,1999,Dell,http://images.amazon.com/images/P/0440234743.0...,182
6763,13552,440176484,6,Secrets,DANIELLE STEEL,1986,Dell,http://images.amazon.com/images/P/0440176484.0...,68
7120,71712,440217466,0,Vanished,Danielle Steel,1994,Dell,http://images.amazon.com/images/P/0440217466.0...,67
91169,96054,446532231,0,"Dude, Where's My Country?",Michael Moore,2003,Warner Books,http://images.amazon.com/images/P/0446532231.0...,56


In [37]:
final_rating.shape

(61853, 9)

In [38]:
# to check duplicates rows 

final_rating.drop_duplicates(['user_id', 'title'], inplace=True)


In [39]:
final_rating.shape

(59850, 9)

In [40]:
final_rating

Unnamed: 0,user_id,ISBN,rating,title,Author,year,publisher,img_url,num_of_rating
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,82
1,3363,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,82
2,11676,002542730X,6,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,82
3,12538,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,82
4,13552,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,82
...,...,...,...,...,...,...,...,...,...
236701,255489,0553579983,7,And Then You Die,Iris Johansen,1998,Bantam,http://images.amazon.com/images/P/0553579983.0...,50
236702,256407,0553579983,0,And Then You Die,Iris Johansen,1998,Bantam,http://images.amazon.com/images/P/0553579983.0...,50
236703,257204,0553579983,0,And Then You Die,Iris Johansen,1998,Bantam,http://images.amazon.com/images/P/0553579983.0...,50
236704,261829,0553579983,0,And Then You Die,Iris Johansen,1998,Bantam,http://images.amazon.com/images/P/0553579983.0...,50


In [41]:
# create a pivot table 
book_pivot = final_rating.pivot_table(columns='user_id', index='title', values = "rating")



In [42]:
book_pivot

user_id,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,274004,274061,274301,274308,274808,275970,277427,277478,277639,278418
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,,,,,,,,,,...,,,,,,0.0,,,,
1st to Die: A Novel,,,,,,,,,,,...,,,,,,,,,,
2nd Chance,,10.0,,,,,,,,,...,,,,0.0,,,,,0.0,
4 Blondes,,,,,,,,,,0.0,...,,,,,,,,,,
84 Charing Cross Road,,,,,,,,,,,...,,,,,,10.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,,,,7.0,,,,,7.0,,...,,,,,,0.0,,,,
You Belong To Me,,,,,,,,,,,...,,,,,,,,,,
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,,,,,0.0,,,,,0.0,...,,,,,,0.0,,,,
Zoya,,,,,,,,,,,...,,,,,,,,,,


In [43]:
book_pivot.shape

(742, 888)

In [44]:
book_pivot.isnull().sum()

user_id
254       682
2276      711
2766      695
2977      723
3363      656
         ... 
275970    651
277427    645
277478    723
277639    713
278418    595
Length: 888, dtype: int64

In [45]:
# creat e heatmap for this 

book_pivot.fillna(0, inplace=True)

In [46]:
from scipy.sparse import csr_matrix 

In [47]:
book_sparse = csr_matrix(book_pivot)

In [48]:
book_sparse

<742x888 sparse matrix of type '<class 'numpy.float64'>'
	with 14942 stored elements in Compressed Sparse Row format>

In [49]:
from sklearn.neighbors import NearestNeighbors 
model = NearestNeighbors(algorithm='brute')

In [50]:
model.fit(book_sparse)

In [51]:
distance, suggestion = model.kneighbors(book_pivot.iloc[237,:].values.reshape(1,-1), n_neighbors=6)

In [52]:
distance

array([[ 0.        , 68.78953409, 69.5413546 , 72.64296249, 76.83098333,
        77.28518616]])

In [53]:
suggestion

array([[237, 240, 238, 241, 184, 536]], dtype=int64)

In [54]:
for i in range(len(suggestion)):
    print(book_pivot.index[suggestion[i]])

Index(['Harry Potter and the Chamber of Secrets (Book 2)',
       'Harry Potter and the Prisoner of Azkaban (Book 3)',
       'Harry Potter and the Goblet of Fire (Book 4)',
       'Harry Potter and the Sorcerer's Stone (Book 1)', 'Exclusive',
       'The Cradle Will Fall'],
      dtype='object', name='title')


In [55]:
book_pivot.index[3]

'4 Blondes'

In [56]:
book_pivot.index

Index(['1984', '1st to Die: A Novel', '2nd Chance', '4 Blondes',
       '84 Charing Cross Road', 'A Bend in the Road', 'A Case of Need',
       'A Child Called \It\": One Child's Courage to Survive"',
       'A Civil Action', 'A Cry In The Night',
       ...
       'Winter Solstice', 'Wish You Well', 'Without Remorse',
       'Wizard and Glass (The Dark Tower, Book 4)', 'Wuthering Heights',
       'Year of Wonders', 'You Belong To Me',
       'Zen and the Art of Motorcycle Maintenance: An Inquiry into Values',
       'Zoya', '\O\" Is for Outlaw"'],
      dtype='object', name='title', length=742)

In [57]:
books_name = book_pivot.index

In [58]:
# saving the model using pkl library 
import pickle 
pickle.dump(model, open('artifacts/model.pkl', 'wb'))
pickle.dump(books_name, open('artifacts/books_name.pkl', 'wb'))
pickle.dump(final_rating, open('artifacts/final_rating.pkl', 'wb'))
pickle.dump(book_pivot, open('artifacts/book_pivot.pkl', 'wb'))



In [59]:
def recommend_book(book_name):
    try:
        # Get the book ID from the index
        book_id = np.where(book_pivot.index == book_name)[0][0]
        # Find the nearest neighbors
        distance, suggestion = model.kneighbors(book_pivot.iloc[book_id, :].values.reshape(1, -1), n_neighbors=6)  # type: ignore
        
        # Print the recommended books
        for i in range(len(suggestion)):
            books = book_pivot.index[suggestion[i]]
            for j in books:
                print(j)
    except IndexError:
        print(f"Book '{book_name}' not found in the dataset.")
    except Exception as e:
        print(f"An error occurred: {e}")

In [60]:
books_name = 'A Bend in the Road'
recommend_book(books_name)

A Bend in the Road
Exclusive
The Cradle Will Fall
No Safe Place
Family Album
Lake Wobegon days


In [61]:
print(final_rating.head())


   user_id        ISBN  rating  \
0   277427  002542730X      10   
1     3363  002542730X       0   
2    11676  002542730X       6   
3    12538  002542730X      10   
4    13552  002542730X       0   

                                               title             Author  year  \
0  Politically Correct Bedtime Stories: Modern Ta...  James Finn Garner  1994   
1  Politically Correct Bedtime Stories: Modern Ta...  James Finn Garner  1994   
2  Politically Correct Bedtime Stories: Modern Ta...  James Finn Garner  1994   
3  Politically Correct Bedtime Stories: Modern Ta...  James Finn Garner  1994   
4  Politically Correct Bedtime Stories: Modern Ta...  James Finn Garner  1994   

                   publisher  \
0  John Wiley &amp; Sons Inc   
1  John Wiley &amp; Sons Inc   
2  John Wiley &amp; Sons Inc   
3  John Wiley &amp; Sons Inc   
4  John Wiley &amp; Sons Inc   

                                             img_url  num_of_rating  
0  http://images.amazon.com/images/P/00254273