Book Recommendation Dataset

Collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September 2004) from the Book-Crossing community with kind permission from Ron Hornbaker, CTO of Humankind Systems. Contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books.

The ratings are on a scale from 1 to 10.
The data consists of three tables: ratings, books info, and users info.

https://www.kaggle.com/datasets/arashnic/book-recommendation-dataset

In [None]:
Goal: Build a book recommender system.

## Import Libraries

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

## Read the Data

In [2]:
books=pd.read_csv("Books.csv", header=0,names=['ISBN','Book-Title'],usecols=range(2))

In [3]:
books.head()

Unnamed: 0,ISBN,Book-Title
0,195153448,Classical Mythology
1,2005018,Clara Callan
2,60973129,Decision in Normandy
3,374157065,Flu: The Story of the Great Influenza Pandemic...
4,393045218,The Mummies of Urumchi


In [4]:
Ratings=pd.read_csv("Ratings.csv",header=0,encoding="iso-8859-1",names=['User-ID','ISBN','Book-Rating'],usecols=range(3))

In [5]:
Ratings.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [6]:
ratings=pd.merge(books,Ratings)

In [7]:
ratings.head()

Unnamed: 0,ISBN,Book-Title,User-ID,Book-Rating
0,195153448,Classical Mythology,2,0
1,2005018,Clara Callan,8,5
2,2005018,Clara Callan,11400,0
3,2005018,Clara Callan,11676,8
4,2005018,Clara Callan,41385,0


In [8]:
ratings.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1031136 entries, 0 to 1031135
Data columns (total 4 columns):
 #   Column       Non-Null Count    Dtype 
---  ------       --------------    ----- 
 0   ISBN         1031136 non-null  object
 1   Book-Title   1031136 non-null  object
 2   User-ID      1031136 non-null  int64 
 3   Book-Rating  1031136 non-null  int64 
dtypes: int64(2), object(2)
memory usage: 39.3+ MB


## Popularity Based Recommender System

In [9]:
grouped=ratings.groupby("Book-Title").agg({"Book-Rating":[np.size,np.sum,np.mean]})

In [10]:
grouped.head()

Unnamed: 0_level_0,Book-Rating,Book-Rating,Book-Rating
Unnamed: 0_level_1,size,sum,mean
Book-Title,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
"A Light in the Storm: The Civil War Diary of Amelia Martin, Fenwick Island, Delaware, 1861 (Dear America)",4,9,2.25
Always Have Popsicles,1,0,0.0
Apple Magic (The Collector's series),1,0,0.0
"Ask Lily (Young Women of Faith: Lily Series, Book 5)",1,8,8.0
Beyond IBM: Leadership Marketing and Finance for the 1990s,1,0,0.0


In [11]:
populer=grouped.sort_values(("Book-Rating","mean"),ascending=False)

In [12]:
populer.head()

Unnamed: 0_level_0,Book-Rating,Book-Rating,Book-Rating
Unnamed: 0_level_1,size,sum,mean
Book-Title,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Film Is: The International Free Cinema,1,10,10.0
"More Secrets of Happy Children: Embrace Your Power as a Parent--and Help Your Children be Confident, Positive, Well-Adjusted and Happy",1,10,10.0
Jo's Boys : From the Original Publisher,1,10,10.0
The Vanished Priestess : An Annie Szabo Mystery,1,10,10.0
Game and Hunting,1,10,10.0


In [13]:
toplam=grouped["Book-Rating"]["sum"].sum()

In [14]:
toplam

2927448

In [15]:
populer["percentage"]=populer["Book-Rating","sum"].div(toplam)*100

In [16]:
populer.head()

Unnamed: 0_level_0,Book-Rating,Book-Rating,Book-Rating,percentage
Unnamed: 0_level_1,size,sum,mean,Unnamed: 4_level_1
Book-Title,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Film Is: The International Free Cinema,1,10,10.0,0.000342
"More Secrets of Happy Children: Embrace Your Power as a Parent--and Help Your Children be Confident, Positive, Well-Adjusted and Happy",1,10,10.0,0.000342
Jo's Boys : From the Original Publisher,1,10,10.0,0.000342
The Vanished Priestess : An Annie Szabo Mystery,1,10,10.0,0.000342
Game and Hunting,1,10,10.0,0.000342


In [17]:
populer.sort_values(("percentage"),ascending=False).head() #top 5 recommendations based on popularity

Unnamed: 0_level_0,Book-Rating,Book-Rating,Book-Rating,percentage
Unnamed: 0_level_1,size,sum,mean,Unnamed: 4_level_1
Book-Title,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
The Lovely Bones: A Novel,1295,5787,4.468726,0.197681
The Da Vinci Code,898,4169,4.642539,0.142411
The Secret Life of Bees,774,3442,4.447028,0.117577
The Red Tent (Bestselling Backlist),723,3134,4.334716,0.107056
The Nanny Diaries: A Novel,828,2923,3.530193,0.099848
