<h1 align=center><font size = 5>BOOK RECOMMENDATION SYSTEM</font></h1>

### Table of contents

<a href="#ref1">1. Preprocessing data</a>

<a href="#ref2">2. Content-based Recommendation System</a>

<a href="#ref3">3. The final recommendation table</a>

In [None]:
import numpy as np 
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
import re
import matplotlib.style as style

In [None]:
books = pd.read_csv("/kaggle/input/goodbooks-10k/books.csv")
book_tags = pd.read_csv("/kaggle/input/goodbooks-10k/book_tags.csv")
tags = pd.read_csv("/kaggle/input/goodbooks-10k/tags.csv")
ratings = pd.read_csv("/kaggle/input/goodbooks-10k/ratings.csv")

<a id="ref1"></a>
# Preprocessing Data

Reviewing the data in ***tags*** and ***book_tags***

In [None]:
tags.head()

In [None]:
book_tags.head()

Both of these can be merged as one using the column ***'tag_id'***

In [None]:
#Left join between book_tags and tags dataframe
book_tags = pd.merge(book_tags,tags,on='tag_id',how='left')

Removing duplicated rows, if any.

In [None]:
book_tags.drop(book_tags[book_tags.duplicated()].index, inplace = True)

**FINAL *book_tags*:**

In [None]:
book_tags

Reviewing the data in ***books***

In [None]:
books.head()

Removing columns that aren't needed for a content-based recommendation system and renaming some of them for better understanding.

In [None]:
#Drop unnecessary columns
books.drop(columns=['id', 'best_book_id', 'work_id', 'isbn', 'isbn13', 'title','work_ratings_count','ratings_count','work_text_reviews_count', 'ratings_1', 'ratings_2', 'ratings_3','ratings_4', 'ratings_5', 'image_url','small_image_url'], inplace= True)

#Rename columns
books.rename(columns={'original_publication_year':'pub_year', 'original_title':'title', 'language_code':'language', 'average_rating':'rating'}, inplace=True)

Checking for nulls, if any.

In [None]:
books.isnull().sum()

In [None]:
#Dropping the null values
books.dropna(inplace= True)

Splitting the values in the ***authors*** column into a ***list of authors*** to simplify future use.

In [None]:
#Using python's split string function to create a list of authors
books['authors'] = books.authors.str.split(',')

**FINAL *books*:**

In [None]:
books

1. Since keeping authors in a list format isn't optimal for the content-based recommendation system technique, we will use the ***One Hot Encoding technique*** to convert it into to a vector where each column corresponds to one possible value of the feature. This encoding is needed for feeding categorical data. 

2. Store every different author in columns that contain either 1 or 0. 1 shows that the book is written by that author and 0 shows that it isn't.

In [None]:
book_authors = books.copy()

#For every row in the dataframe, iterate through the list of authors and place a 1 into the corresponding column
for index, row in books.iterrows():
    for author in row['authors']:
        book_authors.at[index, author] = 1
        
#Filling in the NaN values with 0 to show that a book isn't written by that author
book_authors = book_authors.fillna(0)
book_authors.head()

In [None]:
#Generalising the format of author names for simplicity in future
book_authors.columns = [c.lower().strip().replace(' ', '_') for c in book_authors.columns]

#Setting book_id as index of the dataframe 
book_authors = book_authors.set_index(book_authors['book_id'])

#Dropping unnecessary columns
book_authors.drop(columns= {'book_id','pub_year','title','rating','books_count', 'authors','language'}, inplace=True)

**FINAL *book_authors*:**

In [None]:
book_authors.head()

<a id="ref2"></a>
# Content-based Recommendation System

A **Content-Based** or **Item-Item recommendation system** attempts to figure out what a user's favourite aspects of an item is, and then recommends items that present those aspects. 

In this case, I'm going to figure out recommendations for a user based on the authors of the books they've read and ratings given.

Creating an input user to recommend books to:

In [None]:
user_1 = pd.DataFrame([{'book_id':2767052, 'rating':5.0},{'book_id':3, 'rating':4.0}, {'book_id':41865, 'rating':4.5},{'book_id':15613, 'rating':3.0},{'book_id':2657, 'rating':2.5}])
user_1

To learn user's preferences, we get the subset of authors that the user has already read from the dataframe (*book_authors*) containing authors of books with binary values.


In [None]:
user_authors = book_authors[book_authors.index.isin(user_1['book_id'].tolist())].reset_index(drop=True)
user_authors

Turning the authors into weights by using the user's ratings and multiplying them into the user's author table (*user_authors*) and then summing up the resulting table by column.
This operation is a result of dot product between a matrix and a vector that can be accomplished by Pandas's "dot" function.

In [None]:
user_1.rating

In [None]:
#Dot product to get weights
userProfile = user_authors.transpose().dot(user_1['rating'])
#The user profile
userProfile

*userProfile* contains the weights of the user's preferences. 
Using this, we can recommend books that satisfy the user's preferences.

With the *userProfile* and the *book_authors* , we take the **weighted average** of every book based on the user's profile and recommend the top twenty books written by same authors.

In [None]:
recommendation = (((book_authors*userProfile).sum(axis=1))/(userProfile.sum())).sort_values(ascending=False)
#Top 20 recommendations
recommendation.head(20)

<a id="ref3"></a>
# The final recommendation table:

In [None]:
#The final recommendation table
books.loc[books['book_id'].isin(recommendation.head(20).keys())].reset_index()

### Advantages of Content-Based Recommendation System

***Advantages***
* Learns user's preferences
* Highly personalized for the user