# BOOK RECOMMENDATION SYSTEM

## 1. BUSINESS UNDERSTANDING

### 1.1 INTRODUCTION

Books are essential for personal growth, knowledge acquisition, and entertainment. However, in Kenya, finding the right books that cater to diverse tastes has been a challenge. High book prices, limited library access, and a lack of book variety have hindered the reading experience for many Kenyan book enthusiasts. Recognizing this issue, Lonestar Incorporated, a Kenyan startup, has secured funding from Venture Capitalists to revolutionize the book market in Kenya. They aim to provide a wide range of affordable books, both digital and hardcopy, to their customers. To ensure an exceptional reading experience, Lonestar Inc. is launching a website where users can explore and purchase books. However, in order for their website and company to be successful, they recognize the need for a recommendation system of books that will recommend books to users based on their previous purchases and books other users with similar interests have rated highly.

They have decided to hire their fellow startup company known as Regex Inc., which is a new company consisting of Data Scientists, Analysts, and even Data Engineers. Since the company is still relatively new it does not consist of a lot of personnel. However, the current personnel have been given the task of building a recommendation system for Lonestar Inc. to use for their website. They are to deploy this model to their site with the help of those who created their website and they are finally to present what they did to a board of members consisting of Lonestar Inc. members to show their results of modeling.


### 1.2 OBJECTIVES

#### 1.2.1 MAIN OBJECTIVE

To build a model that recommends books to users based on what they have read before and what other users with similar interests have also read and liked.

#### 1.2.2 SPECIFIC OBJECTIVES

-	Design a sophisticated recommendation algorithm for book suggestions.

-	Utilize user profiles, reading history, and user-generated ratings for model training.

-	Ensure diverse book recommendations spanning different genres and interests.

-	Seamlessly integrate the recommendation system into Lonestar Incorporated's website.

-	Monitor and evaluate user engagement metrics, such as click-through rates and page views.

-	Analyze the impact of the recommendation system on book sales and revenue generation.

-	Comply with data protection regulations to safeguard user privacy.

-	Collaborate closely with the website development team for integration.

-	Deploy the recommendation model on the website for real-time book suggestions.

-	Prepare and deliver a comprehensive presentation to the Lonestar Inc. board members showcasing the project's results and impact.


### 1.3 PROBLEM STATEMENT

Lonestar Incorporated, a Kenyan startup aiming to revolutionize the book market in Kenya, faces the challenge of providing an exceptional reading experience to their customers. The primary problem is the lack of a book recommendation system on their website. Users currently have no efficient way to discover books tailored to their preferences and reading history. Lonestar Inc. seeks to implement a recommendation system that can suggest books to users based on their previous purchases and ratings by users with similar interests. The problem at hand is to design, build, and deploy an effective book recommendation system that enhances user engagement, drives book sales, and improves the overall reading experience on their website.

### 1.4 MEASURE OF SUCCESS

To achieve as low RMSE as possible.

## 2. DATA UNDERSTANDING

The dataset was sourced from https://maciejkula.github.io/spotlight/datasets/goodbooks.html. Here are the libraries to be used.

In [1]:
# For analysis and data manipulation
import pandas as pd
import numpy as np

# For visualisation
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_palette("Spectral")  # Setting style for plots
sns.set_style("darkgrid")
%matplotlib inline

# Surprise module and methods for the recommendation system
from surprise import Dataset, Reader
from surprise import SVD as SVD1  # Basic version of SVD
from surprise.prediction_algorithms import SVD as SVD2   # More advanced version of SVD for fine tuning the model
from surprise import KNNBasic
from surprise import NMF
from surprise.model_selection import train_test_split
from surprise import accuracy
from surprise.model_selection import GridSearchCV  # for fine tuning the model



### 2.1 RATINGS

These are the ratings given to various books by users.

In [2]:
ratings = pd.read_csv('ratings.csv')
ratings.head()

Unnamed: 0,user_id,book_id,rating
0,1,258,5
1,2,4081,4
2,2,260,5
3,2,9296,5
4,2,2318,3


**As can be seen above the columns for ratings are `userId`, `movieId` and `rating`.**

 These are defined as follows:
 - **UserId** : The unique identification of the user who gave the rating for a specific book to help know ratings given by a user or other users.
 
 - **book_id** : The unique identification of a book of which the user gave a rating, to help know the ratings given to books. This is to be able to recommend correctly if a user is new to the system based on ratings of users with similar interests.

 - **rating** : The rating given by a user to a specific book starting from 0 to 5 which is the highest score. This is to help in content and collaborative filtering.

### 2.2 BOOKS

This dataset contains information about books, including details like book IDs, authors, original publication years, and various ratings and counts related to the books.

In [5]:
books = pd.read_csv('books.csv')
books.head()

Unnamed: 0,book_id,goodreads_book_id,best_book_id,work_id,books_count,isbn,isbn13,authors,original_publication_year,original_title,...,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
0,1,2767052,2767052,2792775,272,439023483,9780439000000.0,Suzanne Collins,2008.0,The Hunger Games,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
1,2,3,3,4640799,491,439554934,9780440000000.0,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,...,4602479,4800065,75867,75504,101676,455024,1156318,3011543,https://images.gr-assets.com/books/1474154022m...,https://images.gr-assets.com/books/1474154022s...
2,3,41865,41865,3212258,226,316015849,9780316000000.0,Stephenie Meyer,2005.0,Twilight,...,3866839,3916824,95009,456191,436802,793319,875073,1355439,https://images.gr-assets.com/books/1361039443m...,https://images.gr-assets.com/books/1361039443s...
3,4,2657,2657,3275794,487,61120081,9780061000000.0,Harper Lee,1960.0,To Kill a Mockingbird,...,3198671,3340896,72586,60427,117415,446835,1001952,1714267,https://images.gr-assets.com/books/1361975680m...,https://images.gr-assets.com/books/1361975680s...
4,5,4671,4671,245494,1356,743273567,9780743000000.0,F. Scott Fitzgerald,1925.0,The Great Gatsby,...,2683664,2773745,51992,86236,197621,606158,936012,947718,https://images.gr-assets.com/books/1490528560m...,https://images.gr-assets.com/books/1490528560s...


The dataset provides detailed information about a collection of books, including their unique identifiers, author information, publication year, and various attributes relevant for book recommendations. It is a valuable resource for tasks such as book recommendations and literary analysis. Below is an overview of the columns in the dataset:

- `book_id`: The unique identifier for each book.
- `goodreads_book_id`: The book's identifier in the Goodreads database.
- `best_book_id`: The best book identifier.
- `work_id`: The work identifier.
- `books_count`: The number of editions/versions of the book.
- `isbn`: The International Standard Book Number of the book.
- `isbn13`: The ISBN-13 number for the book.
- `authors`: The author(s) of the book.
- `original_publication_year`: The year the book was originally published.
- `original_title`: The original title of the book.
- ...

The dataset provides rich information about the books, which can be leveraged for various analytical and recommendation purposes.

For example, the `book_id` can be used to uniquely identify each book, while the `original_publication_year` can help in understanding the historical context of the books. With this data, one can explore book popularity, authorship trends, and even create recommendation systems to suggest books to users based on their interests and reading history.


### 2.3 TAGS

In [9]:
tags = pd.read_csv('tags.csv')
tags.head()

Unnamed: 0,tag_id,tag_name
0,0,-
1,1,--1-
2,2,--10-
3,3,--12-
4,4,--122-


The dataset contains information about tags, including their unique identifiers and tag names. Tags are commonly used in various contexts for categorizing and labeling items. This dataset is particularly useful for tasks that involve tagging, categorization, and classification. Below is an overview of the columns in the dataset:

- `tag_id`: A unique numerical identifier for each tag.
- `tag_name`: The name of the tag.

The dataset provides a straightforward way to associate names with unique tag identifiers, making it suitable for applications such as content classification, labeling, and organization. Users can easily map tag names to their respective IDs and vice versa for effective data management and analysis.

For instance, the `tag_id` column is essential for database indexing and identification, while the `tag_name` column serves as the human-readable label for the corresponding tags. Such datasets are valuable in contexts like content management systems and data-driven applications where accurate tagging is essential for efficient data retrieval and analysis.

**These are the columns present in all the dataframes and even though we may drop some the remaining ones will be useful in building a recommendation system in their own way. Now we move on to the Data Preparation phase of the project.**