## **Book Recommendation System**

#### Problem Description
- An online-based book platform (Goodreads) was experiencing the problem of losing potential revenue due to a 20% decrease in user activity.
- After conducting research, the Goodreads team found that users felt confused and lost when trying to choose the books they wanted to read on Goodreads, which has nearly ~10,000 books. In addition, users also experience a decrease in interest in reading because they no longer find book preferences that match their interests.

#### Business Objective
- Improve user experience and interest in reading while using the platform by solving the problem of confusion when choosing books.

#### Solution
Create book recommendations to help users choose books easily and eliminate user difficulties in using the Goodreads platform.

The recommendation approach that will be carried out is:
1. Non-personalized: popularity-based recommendation
2. Personalized: collaborative filtering

Approaches in personalized recommender systems can be divided based on the presence of interaction data (implicit/explicit):
1. Implicit data is taken from indirect user behavior such as book clicks, time spent scrolling through book pages, purchasing items, or adding books to the reading list.
2. Explicit data is derived from direct user assessments such as book ratings, book reviews or feedback on user opinions of certain books.

Related to this problem, there is direct interaction by the user in the form of rating data. So that the approach to be taken is to use *collaborative filtering*.

#### Data Description
- The data comes from [Goodbooks dataset](https://github.com/zygmuntz/goodbooks-10k).
- The dataset contains 10,000 books and 5,976,479 ratings.

There are 2 files that will be used:


**Book rating data**: `ratings.csv`

<center>

|Feature|Description|Data Type|
|:--|:--|:--:|
|`user_id`|User ID|`int`|
|`book_id`|BookID|`int`|
|`rating`|The rating of the book given by the user. Rating starts from `0` to `5`|`int`|

**Books data** : `books.csv`

<center>

|Feature|Description|Data Type|
|:--|:--|:--:|
|`book_id`|Book ID|`int`|
|`goodreads_book_id`|The goodreads book ID|`int`|
|`best_book_id`|Rating of the book given by the user. Rating starts from `0` to `5`|`int`|
|`work_id`|Work ID|`int`|
|`books_count`|books count|`int`|
|`isbn`|International standard book number|`object`|
|`isbn13`|Book identification number (new version of ISBN)|`float`|
|`authors`|The authors of the book|`object`|
|`original_publication_year`|The year of publication|`float`|
|`original_title`|Original title|`object`|
|`title`|Book title|`object`|
|`language_code`|Code of language|`object`|
|`average_rating`|Average rating|`float`|
|`ratings_count`|Rating count|`int`|
|`work_ratings_count`|Work ratings count|`int`|
|`work_text_reviews_count`|Work text reviews count|`int`|
|`ratings_1`|rating 1|`int`|
|`ratings_2`|rating 2|`int`|
|`ratings_3`|rating 3|`int`|
|`ratings_4`|rating 4|`int`|
|`ratings_5`|rating 5|`int`|
|`image_url`|Image link|`object`|
|`small_image_url`|Small image links|`object`|

### **Import Data**

In [1]:
import numpy as np
import pandas as pd

In [2]:
rating_path = 'data/ratings.csv'
book_path = 'data/books.csv'

In [3]:
rating_data = pd.read_csv(rating_path, delimiter=',')
book_data = pd.read_csv(book_path, delimiter=',')

In [4]:
rating_data.head()

Unnamed: 0,user_id,book_id,rating
0,1,258,5
1,2,4081,4
2,2,260,5
3,2,9296,5
4,2,2318,3


In [5]:
book_data.head()

Unnamed: 0,book_id,goodreads_book_id,best_book_id,work_id,books_count,isbn,isbn13,authors,original_publication_year,original_title,...,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
0,1,2767052,2767052,2792775,272,439023483,9780439000000.0,Suzanne Collins,2008.0,The Hunger Games,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
1,2,3,3,4640799,491,439554934,9780440000000.0,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,...,4602479,4800065,75867,75504,101676,455024,1156318,3011543,https://images.gr-assets.com/books/1474154022m...,https://images.gr-assets.com/books/1474154022s...
2,3,41865,41865,3212258,226,316015849,9780316000000.0,Stephenie Meyer,2005.0,Twilight,...,3866839,3916824,95009,456191,436802,793319,875073,1355439,https://images.gr-assets.com/books/1361039443m...,https://images.gr-assets.com/books/1361039443s...
3,4,2657,2657,3275794,487,61120081,9780061000000.0,Harper Lee,1960.0,To Kill a Mockingbird,...,3198671,3340896,72586,60427,117415,446835,1001952,1714267,https://images.gr-assets.com/books/1361975680m...,https://images.gr-assets.com/books/1361975680s...
4,5,4671,4671,245494,1356,743273567,9780743000000.0,F. Scott Fitzgerald,1925.0,The Great Gatsby,...,2683664,2773745,51992,86236,197621,606158,936012,947718,https://images.gr-assets.com/books/1490528560m...,https://images.gr-assets.com/books/1490528560s...


### **Check data and handle duplicated**

In [6]:
rating_data.shape

(5976479, 3)

In [7]:
rating_data.dtypes

user_id    int64
book_id    int64
rating     int64
dtype: object

In [8]:
rating_data.isnull().sum()

user_id    0
book_id    0
rating     0
dtype: int64

In [9]:
rating_data.duplicated(subset=['user_id','book_id']).sum()

0

**rating_data** has the correct type and feature. There is no null data and duplicated in rating_data.

In [10]:
book_data.shape

(10000, 23)

In [11]:
book_data.columns

Index(['book_id', 'goodreads_book_id', 'best_book_id', 'work_id',
       'books_count', 'isbn', 'isbn13', 'authors', 'original_publication_year',
       'original_title', 'title', 'language_code', 'average_rating',
       'ratings_count', 'work_ratings_count', 'work_text_reviews_count',
       'ratings_1', 'ratings_2', 'ratings_3', 'ratings_4', 'ratings_5',
       'image_url', 'small_image_url'],
      dtype='object')

copy dataframe **book_data**, and delete some feature.

In [12]:
book_copy = book_data.copy()
book_copy = book_copy.drop(columns=['goodreads_book_id','best_book_id','work_id','books_count','isbn',
       'isbn13','title','language_code','average_rating',
       'ratings_count', 'work_ratings_count', 'work_text_reviews_count',
       'ratings_1', 'ratings_2', 'ratings_3', 'ratings_4', 'ratings_5',
       'small_image_url'], axis=1)
book_copy.head(3)

Unnamed: 0,book_id,authors,original_publication_year,original_title,image_url
0,1,Suzanne Collins,2008.0,The Hunger Games,https://images.gr-assets.com/books/1447303603m...
1,2,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,https://images.gr-assets.com/books/1474154022m...
2,3,Stephenie Meyer,2005.0,Twilight,https://images.gr-assets.com/books/1361039443m...


In [13]:
book_copy.dtypes

book_id                        int64
authors                       object
original_publication_year    float64
original_title                object
image_url                     object
dtype: object

In [14]:
book_copy.isnull().sum()

book_id                        0
authors                        0
original_publication_year     21
original_title               585
image_url                      0
dtype: int64

In [15]:
book_copy = book_copy.dropna(axis=0)
book_copy.isnull().sum()

book_id                      0
authors                      0
original_publication_year    0
original_title               0
image_url                    0
dtype: int64

In [16]:
book_copy.loc[:, 'original_publication_year'] = book_copy['original_publication_year'].astype(int)
book_copy.dtypes

book_id                       int64
authors                      object
original_publication_year     int32
original_title               object
image_url                    object
dtype: object

In [17]:
book_copy.head(3)

Unnamed: 0,book_id,authors,original_publication_year,original_title,image_url
0,1,Suzanne Collins,2008,The Hunger Games,https://images.gr-assets.com/books/1447303603m...
1,2,"J.K. Rowling, Mary GrandPré",1997,Harry Potter and the Philosopher's Stone,https://images.gr-assets.com/books/1474154022m...
2,3,Stephenie Meyer,2005,Twilight,https://images.gr-assets.com/books/1361039443m...


In [18]:
book_copy.duplicated().sum()

0

In [19]:
book_copy.shape

(9409, 5)

**book_copy** has the correct feature. The data type in 'original_publication_year' has been corrected. There is no duplicated in book_copy and null data has been removed.

#### **Non-personalized: popularity-based recommendation**

In [20]:
rating_count = rating_data.groupby('book_id').count()['rating'].reset_index()
rating_count.rename(columns={'rating':'rating_count'}, inplace=True)
rating_count

Unnamed: 0,book_id,rating_count
0,1,22806
1,2,21850
2,3,16931
3,4,19088
4,5,16604
...,...,...
9995,9996,141
9996,9997,93
9997,9998,102
9998,9999,130


In [21]:
mean_rating = rating_data.groupby('book_id').mean().round(2)['rating'].reset_index()
mean_rating.rename(columns={'rating':'mean_rating'}, inplace=True)
mean_rating

Unnamed: 0,book_id,mean_rating
0,1,4.28
1,2,4.35
2,3,3.21
3,4,4.33
4,5,3.77
...,...,...
9995,9996,4.01
9996,9997,4.45
9997,9998,4.32
9998,9999,3.71


In [22]:
popular = rating_count.merge(mean_rating, on='book_id')
popular

Unnamed: 0,book_id,rating_count,mean_rating
0,1,22806,4.28
1,2,21850,4.35
2,3,16931,3.21
3,4,19088,4.33
4,5,16604,3.77
...,...,...,...
9995,9996,141,4.01
9996,9997,93,4.45
9997,9998,102,4.32
9998,9999,130,3.71


In [23]:
popular = popular.merge(book_copy, on="book_id").drop_duplicates("book_id")[["book_id","rating_count","mean_rating","authors","original_publication_year","original_title","image_url"]]
popular

Unnamed: 0,book_id,rating_count,mean_rating,authors,original_publication_year,original_title,image_url
0,1,22806,4.28,Suzanne Collins,2008,The Hunger Games,https://images.gr-assets.com/books/1447303603m...
1,2,21850,4.35,"J.K. Rowling, Mary GrandPré",1997,Harry Potter and the Philosopher's Stone,https://images.gr-assets.com/books/1474154022m...
2,3,16931,3.21,Stephenie Meyer,2005,Twilight,https://images.gr-assets.com/books/1361039443m...
3,4,19088,4.33,Harper Lee,1960,To Kill a Mockingbird,https://images.gr-assets.com/books/1361975680m...
4,5,16604,3.77,F. Scott Fitzgerald,1925,The Great Gatsby,https://images.gr-assets.com/books/1490528560m...
...,...,...,...,...,...,...,...
9404,9996,141,4.01,Ilona Andrews,2010,Bayou Moon,https://images.gr-assets.com/books/1307445460m...
9405,9997,93,4.45,Robert A. Caro,1990,Means of Ascent,https://s.gr-assets.com/assets/nophoto/book/11...
9406,9998,102,4.32,Patrick O'Brian,1977,The Mauritius Command,https://images.gr-assets.com/books/1455373531m...
9407,9999,130,3.71,Peggy Orenstein,2011,Cinderella Ate My Daughter: Dispatches from th...,https://images.gr-assets.com/books/1279214118m...


In [24]:
popular.sort_values("rating_count", ascending=False).head(10)

Unnamed: 0,book_id,rating_count,mean_rating,authors,original_publication_year,original_title,image_url
0,1,22806,4.28,Suzanne Collins,2008,The Hunger Games,https://images.gr-assets.com/books/1447303603m...
1,2,21850,4.35,"J.K. Rowling, Mary GrandPré",1997,Harry Potter and the Philosopher's Stone,https://images.gr-assets.com/books/1474154022m...
3,4,19088,4.33,Harper Lee,1960,To Kill a Mockingbird,https://images.gr-assets.com/books/1361975680m...
2,3,16931,3.21,Stephenie Meyer,2005,Twilight,https://images.gr-assets.com/books/1361039443m...
4,5,16604,3.77,F. Scott Fitzgerald,1925,The Great Gatsby,https://images.gr-assets.com/books/1490528560m...
16,17,16549,4.13,Suzanne Collins,2009,Catching Fire,https://images.gr-assets.com/books/1358273780m...
19,20,15953,3.85,Suzanne Collins,2010,Mockingjay,https://images.gr-assets.com/books/1358275419m...
17,18,15855,4.42,"J.K. Rowling, Mary GrandPré, Rufus Beck",1999,Harry Potter and the Prisoner of Azkaban,https://images.gr-assets.com/books/1499277281m...
22,23,15657,4.23,"J.K. Rowling, Mary GrandPré",1998,Harry Potter and the Chamber of Secrets,https://images.gr-assets.com/books/1474169725m...
6,7,15558,4.15,J.R.R. Tolkien,1937,The Hobbit or There and Back Again,https://images.gr-assets.com/books/1372847500m...


In [25]:
popular.shape

(9409, 7)

### **Personalized recommender system**