# **Problem Statement:**

In today's digital age, the volume of available books is overwhelming, making it challenging for readers to discover new books that match their tastes. Traditional methods of book recommendations, such as word-of-mouth and bestseller lists, are not personalized and often fail to cater to individual preferences. A personalized book recommendation system can enhance user experience by suggesting books that align with each reader's unique interests and reading history.

# **Objective:**


The objective is to develop a personalized book recommendation system that enhances user experience by providing tailored book suggestions based on individual reading preferences and history, thereby increasing user engagement and effectively utilizing data to make accurate and adaptive recommendations.

# **About the Dataset:**


1.   Books

     *   ISBN
     *   Book-Title
     *   Book-Author
     *   Year-Of-Publication
     *   Publisher
     *   Image-URL-S
     *   Image-URL-M
     *   Image-URL-L


2.   Users

     *   User-ID
     *   Location
     *   Age


3.   Ratings

     *   User-ID
     *   ISBN
     *   Book-Rating

# Load Libraries

In [1]:
!pip install gdown -q

In [2]:
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
import gdown

# Load the Datasets

In [3]:
url_book = f'https://drive.google.com/uc?id=1d08FYTSJJAeRgaJos6hYuZpzqnLiEGn_'
output = 'books.csv'
gdown.download(url_book, output, quiet=False)

books = pd.read_csv(output)

Downloading...
From: https://drive.google.com/uc?id=1d08FYTSJJAeRgaJos6hYuZpzqnLiEGn_
To: /content/books.csv
100%|██████████| 73.3M/73.3M [00:02<00:00, 30.9MB/s]
  books = pd.read_csv(output)


In [4]:
url_users = f'https://drive.google.com/uc?id=1Dpr2oyFOJFlY9mtRKj4PMf_OzP9V54R7'
output = 'users.csv'
gdown.download(url_users, output, quiet=False)

users = pd.read_csv(output)

Downloading...
From: https://drive.google.com/uc?id=1Dpr2oyFOJFlY9mtRKj4PMf_OzP9V54R7
To: /content/users.csv
100%|██████████| 11.0M/11.0M [00:00<00:00, 119MB/s]


In [5]:
url_ratings = f'https://drive.google.com/uc?id=1ytdfbZmDzNh2crftALZ3EsqQRzBm6j5j'
output = 'ratings.csv'
gdown.download(url_ratings, output, quiet=False)

ratings = pd.read_csv(output)

Downloading...
From: https://drive.google.com/uc?id=1ytdfbZmDzNh2crftALZ3EsqQRzBm6j5j
To: /content/ratings.csv
100%|██████████| 22.6M/22.6M [00:00<00:00, 43.3MB/s]


# Looking into the datasets

In [6]:
books.sample(5)

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
167511,1586601342,Love Afloat: Drifting Hearts Find Safe Harbor ...,Kimberly Comeaux,2001,Barbour Bargain Books,http://images.amazon.com/images/P/1586601342.0...,http://images.amazon.com/images/P/1586601342.0...,http://images.amazon.com/images/P/1586601342.0...
5702,312266588,The Templars : The Dramatic History of the Kni...,Piers Paul Read,2000,St. Martin's Press,http://images.amazon.com/images/P/0312266588.0...,http://images.amazon.com/images/P/0312266588.0...,http://images.amazon.com/images/P/0312266588.0...
215599,1562829270,Don't Count Yourself Out: Staying Fit After 35,Jimmy Connors,1992,Hyperion Books,http://images.amazon.com/images/P/1562829270.0...,http://images.amazon.com/images/P/1562829270.0...,http://images.amazon.com/images/P/1562829270.0...
157332,375408282,The Reader,Bernhard Schlink,1999,Random House Audio,http://images.amazon.com/images/P/0375408282.0...,http://images.amazon.com/images/P/0375408282.0...,http://images.amazon.com/images/P/0375408282.0...
23207,1558500502,Cover letters that knock 'em dead (Cover Lette...,Martin John Yate,1992,Bob Adams,http://images.amazon.com/images/P/1558500502.0...,http://images.amazon.com/images/P/1558500502.0...,http://images.amazon.com/images/P/1558500502.0...


In [7]:
books['Image-URL-M'][1]

'http://images.amazon.com/images/P/0002005018.01.MZZZZZZZ.jpg'

In [8]:
books.nunique()

Unnamed: 0,0
ISBN,271360
Book-Title,242135
Book-Author,102022
Year-Of-Publication,202
Publisher,16807
Image-URL-S,271044
Image-URL-M,271044
Image-URL-L,271041


In [9]:
books.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 271360 entries, 0 to 271359
Data columns (total 8 columns):
 #   Column               Non-Null Count   Dtype 
---  ------               --------------   ----- 
 0   ISBN                 271360 non-null  object
 1   Book-Title           271360 non-null  object
 2   Book-Author          271358 non-null  object
 3   Year-Of-Publication  271360 non-null  object
 4   Publisher            271358 non-null  object
 5   Image-URL-S          271360 non-null  object
 6   Image-URL-M          271360 non-null  object
 7   Image-URL-L          271357 non-null  object
dtypes: object(8)
memory usage: 16.6+ MB


In [10]:
num_duplicates = books.duplicated('Book-Title').sum()
num_duplicates

29225

In [11]:
books.drop_duplicates('Book-Title', inplace=True)

In [12]:
books.info()

<class 'pandas.core.frame.DataFrame'>
Index: 242135 entries, 0 to 271359
Data columns (total 8 columns):
 #   Column               Non-Null Count   Dtype 
---  ------               --------------   ----- 
 0   ISBN                 242135 non-null  object
 1   Book-Title           242135 non-null  object
 2   Book-Author          242133 non-null  object
 3   Year-Of-Publication  242135 non-null  object
 4   Publisher            242134 non-null  object
 5   Image-URL-S          242135 non-null  object
 6   Image-URL-M          242135 non-null  object
 7   Image-URL-L          242132 non-null  object
dtypes: object(8)
memory usage: 16.6+ MB


In [13]:
users.head()

Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


In [14]:
ratings.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [15]:
print(books.shape)
print(ratings.shape)
print(users.shape)

(242135, 8)
(1149780, 3)
(278858, 3)


In [16]:
books.isnull().sum()

Unnamed: 0,0
ISBN,0
Book-Title,0
Book-Author,2
Year-Of-Publication,0
Publisher,1
Image-URL-S,0
Image-URL-M,0
Image-URL-L,3


In [17]:
users.isnull().sum()

Unnamed: 0,0
User-ID,0
Location,0
Age,110762


In [18]:
ratings.isnull().sum()

Unnamed: 0,0
User-ID,0
ISBN,0
Book-Rating,0


In [19]:
books.duplicated().sum()

0

In [20]:
ratings.duplicated().sum()

0

In [21]:
users.duplicated().sum()

0

# Popularity Based Recommender System

In [22]:
ratings_with_name = ratings.merge(books,on='ISBN')

In [23]:
print(ratings_with_name['Book-Rating'].unique())

[ 0  5  9  8  6  7  4 10  3  2  1]


In [24]:
ratings_with_name.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 883079 entries, 0 to 883078
Data columns (total 10 columns):
 #   Column               Non-Null Count   Dtype 
---  ------               --------------   ----- 
 0   User-ID              883079 non-null  int64 
 1   ISBN                 883079 non-null  object
 2   Book-Rating          883079 non-null  int64 
 3   Book-Title           883079 non-null  object
 4   Book-Author          883077 non-null  object
 5   Year-Of-Publication  883079 non-null  object
 6   Publisher            883078 non-null  object
 7   Image-URL-S          883079 non-null  object
 8   Image-URL-M          883079 non-null  object
 9   Image-URL-L          883075 non-null  object
dtypes: int64(2), object(8)
memory usage: 67.4+ MB


In [25]:
num_rating_df = ratings_with_name.groupby('Book-Title').count()['Book-Rating'].reset_index()
num_rating_df.rename(columns={'Book-Rating':'num_ratings'},inplace=True)
num_rating_df.sample(10)

Unnamed: 0,Book-Title,num_ratings
137381,Panther in the Sky,9
56140,"Dune, tome 1 : Le Messie de Dune",1
22634,"Betrayal (The Dhamon Saga, Volume II)",2
41869,Cuckoo's egg,8
17631,Audrey Hepburns Neck,18
117597,Masters of Art: Vermeer (Masters of Art Series),1
191248,The Heirs of Hammerfell,13
234302,Wholesale by Mail &amp; Online 2002: The Consu...,3
92532,Inner Energy Work Out,1
94703,It's Raining in Mango: Pictures from a Family ...,1


In [26]:
avg_rating_df = ratings_with_name.groupby('Book-Title')['Book-Rating'].agg(lambda x: x.astype(float).mean()).reset_index()
avg_rating_df.rename(columns={'Book-Rating': 'avg_rating'}, inplace=True)
avg_rating_df.sample(10)

Unnamed: 0,Book-Title,avg_rating
1263,3rd Degree,4.0375
124821,Murder Mystery,0.0
16095,Arizona Cook Book,0.0
109328,Little Whale,0.0
76883,Grandpa's Mountain,3.0
2253,A Change of Heart: A Memoir,1.75
142218,Portable Jung (Viking Portable Library),2.666667
214974,The Year's Best Horror Stories: XV,0.0
30894,Can You Tell Me How to Get to Sesame Street? (...,5.333333
192609,"The Incredible Shrinking Kid (The Weird Zone ,...",0.0


In [27]:
popular_df = num_rating_df.merge(avg_rating_df,on='Book-Title')
popular_df.sample(10)

Unnamed: 0,Book-Title,num_ratings,avg_rating
92251,Indiana Jones and the Philosopher's Stone,2,0.0
76444,Goodnight! a Novel: A Novel (Penguin Internati...,1,0.0
77234,Great Expectations (Longman Literature),1,8.0
153365,Sabriel (adult edition) (The Abhorsen Trilogy),2,9.0
176384,The BEST IS YET TO COME,1,5.0
71753,Fun With Words (Teddy Bears),1,0.0
169897,Survival With Style; In Trouble or in Fun ... ...,1,0.0
88272,I Am a Bunny (Golden Sturdy Book),8,4.125
100827,"L'Echiquier du mal, tome 2",2,0.0
31344,"Captive Bride (Harlequin Historical, No. 471)",4,0.0


In [28]:
popular_df = popular_df[popular_df['num_ratings']>=250].sort_values('avg_rating',ascending=False)

In [29]:
popular_df = popular_df.merge(books,on='Book-Title').drop_duplicates('Book-Title')[['Book-Title','Book-Author','Image-URL-M','num_ratings','avg_rating']]

## Top-10 Books based on Average Ratings

In [30]:
popular_df.head(10)

Unnamed: 0,Book-Title,Book-Author,Image-URL-M,num_ratings,avg_rating
0,Harry Potter and the Order of the Phoenix (Boo...,J. K. Rowling,http://images.amazon.com/images/P/043935806X.0...,334,5.571856
1,The Hobbit : The Enchanting Prelude to The Lor...,J.R.R. TOLKIEN,http://images.amazon.com/images/P/0345339681.0...,281,5.007117
2,To Kill a Mockingbird,Harper Lee,http://images.amazon.com/images/P/0446310786.0...,389,4.920308
3,Harry Potter and the Sorcerer's Stone (Harry P...,J. K. Rowling,http://images.amazon.com/images/P/059035342X.0...,571,4.900175
4,Harry Potter and the Chamber of Secrets (Book 2),J. K. Rowling,http://images.amazon.com/images/P/0439064872.0...,351,4.729345
5,The Da Vinci Code,Dan Brown,http://images.amazon.com/images/P/0385504209.0...,883,4.652322
6,The Catcher in the Rye,J.D. Salinger,http://images.amazon.com/images/P/0316769487.0...,403,4.635236
7,The Five People You Meet in Heaven,Mitch Albom,http://images.amazon.com/images/P/0786868716.0...,427,4.543326
8,The Fellowship of the Ring (The Lord of the Ri...,J.R.R. TOLKIEN,http://images.amazon.com/images/P/0345339703.0...,257,4.505837
9,The Lovely Bones: A Novel,Alice Sebold,http://images.amazon.com/images/P/0316666343.0...,1295,4.468726


# Collaborative Filtering Based Recommender System


In [31]:
x = ratings_with_name.groupby('User-ID').count()['Book-Rating'] > 200
active_readers = x[x].index

In [32]:
filtered_rating = ratings_with_name[ratings_with_name['User-ID'].isin(active_readers)]

In [33]:
y = filtered_rating.groupby('Book-Title').count()['Book-Rating'] >= 50
famous_books = y[y].index

In [34]:
final_ratings = filtered_rating[filtered_rating['Book-Title'].isin(famous_books)]

In [35]:
pt = final_ratings.pivot_table(index='Book-Title',columns='User-ID',values='Book-Rating')

In [36]:
pt.shape

(342, 665)

In [37]:
pt.fillna(0,inplace=True)

In [47]:
pt.sample(10)

User-ID,254,2276,2766,2977,3363,4385,6251,6543,6575,7158,...,271705,273979,274004,274061,274301,274308,275970,277427,277639,278418
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Nicolae: The Rise of Antichrist (Left Behind No. 3),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Roses Are Red (Alex Cross Novels),0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.0,0.0,8.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Back Roads,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
You Belong To Me,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
A Walk to Remember,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Tuesdays with Morrie: An Old Man, a Young Man, and Life's Greatest Lesson",0.0,0.0,0.0,0.0,0.0,0.0,7.0,0.0,6.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
A Painted House,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Divine Secrets of the Ya-Ya Sisterhood: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
The Robber Bride,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
The Plains of Passage (Earth's Children (Paperback)),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [39]:
similarity_scores = cosine_similarity(pt)

In [40]:
similarity_scores.shape

(342, 342)

## Function to Recommend Books based on Similarity

In [41]:
def recommend(book_name):
    index = np.where(pt.index == book_name)[0][0]
    similar_items = sorted(list(enumerate(similarity_scores[index])), key=lambda x: x[1], reverse=True)[1:6]

    recommendations = []
    for i in similar_items:
        temp_df = books[books['Book-Title'] == pt.index[i[0]]].drop_duplicates('Book-Title')

        book_info = {
            "Book-Title": temp_df['Book-Title'].values[0],
            "Book-Author": temp_df['Book-Author'].values[0],
            "Image": temp_df['Image-URL-M'].values[0]
        }

        recommendations.append(book_info)

    return recommendations

In [50]:
book_name = input("Input Book name: ")

print("----------------------------------")
print("The Top-5 Recommended Books are:")
print("----------------------------------")
recommend(book_name)

Input Book name: She's Come Undone (Oprah's Book Club)
----------------------------------
The Top-5 Recommended Books are:
----------------------------------


[{'Book-Title': 'N Is for Noose',
  'Book-Author': 'Sue Grafton',
  'Image': 'http://images.amazon.com/images/P/0449223612.01.MZZZZZZZ.jpg'},
 {'Book-Title': 'We Were the Mulvaneys',
  'Book-Author': 'Joyce Carol Oates',
  'Image': 'http://images.amazon.com/images/P/0452282829.01.MZZZZZZZ.jpg'},
 {'Book-Title': 'C Is for Corpse (Kinsey Millhone Mysteries (Paperback))',
  'Book-Author': 'Sue Grafton',
  'Image': 'http://images.amazon.com/images/P/0553280368.01.MZZZZZZZ.jpg'},
 {'Book-Title': 'The Rescue',
  'Book-Author': 'Nicholas Sparks',
  'Image': 'http://images.amazon.com/images/P/0446610399.01.MZZZZZZZ.jpg'},
 {'Book-Title': 'The Lovely Bones: A Novel',
  'Book-Author': 'Alice Sebold',
  'Image': 'http://images.amazon.com/images/P/0316666343.01.MZZZZZZZ.jpg'}]