# Deep Learning $for$ Book Recommending System
This is a project on a Recommendation system that suggest different books to users based on their past behavior i.e., likings and ratings. 
## About the dataset:
The Book-Crossing dataset can be found on the following website:

http://www2.informatik.uni-freiburg.de/~cziegler/BX/

This dataset is a collaborative filtering dataset and contains information about users, books, and ratings. It was collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September 2004) from the Book-Crossing community, and contains 278,858 users (anonymized) providing 1,149,780 ratings (explicit / implicit) about 271,379 books.

## Load all necessary libraries
This section is about importing all libraries that will make the project walkthrough a success without an error.

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
import tensorflow as tf

## Load the datasets into this workspace

In [None]:
ratings = pd.read_csv("BX-CSV/BX-Book-Ratings.csv", delimiter=";", on_bad_lines='skip')
books = pd.read_csv("BX-CSV/BX-Books.csv", delimiter=";", on_bad_lines='skip', low_memory=False)
users = pd.read_csv("BX-CSV/BX-Users.csv", delimiter=";", on_bad_lines='skip')

## Datasets overview and information

### Book Ratings
To get the glimpse of the dataset, I will show the top 3 rows of the book ratings dataset.

In [None]:
ratings.head(3)

* Brief information about different columns of the ratings dataframe:

In [None]:
ratings.info()

The `ISBN` (i.e., book-id) is represented as an object. What can causes this is non-numerical character in the ID.

### Book information
* Top 3 rows of the books information:

In [None]:
books.head(3)

* Brief information about different columns of the books dataframe:

In [None]:
books.info()

The same is true for this dataset also i.e., `ISBN` column is represented as an object instead of an integer.

### Users information
* Top 5 header lines of the users information

In [None]:
users.head()

* Brief information about different columns of the users dataframe:

In [None]:
users.info()

### OBSERVATIONS:
- The "books" dataframe contains some unnecessary columns which are not needed for this analysis e.g. `"Image-URL-S"`, `"Image-URL-M"`, `"Image-URL-L"` etc.
- Some wrong columns are assigned to the `"Year-Of-Publication"` in the `books` dataframe which makes it to be loaded as an `object` instead of an `integer`.
- `ISBN` column (i.e., the book-id) has been misrepresented due to some non numerical characters.

## Data Cleaning
In this section, I will remove all the unnecessary columns and the invalid rows from the dataframe. Some user-id and book-id have unknown character which make them to be represented as an object. So, I will be using the LabelEncoder() function of the sklearn library to encode each and every id in the dataset to a numerical equivalent. But before that, I will remove all unknown ids in both User-ID and ISBN in the ratings dataframe.

* Drop all redundant columns from the books dataframe:

In [None]:
# Drop unnecessary columns
books.drop(columns=['Image-URL-S', 'Image-URL-M', 'Image-URL-L'], inplace=True)

* Remove ids with lesser counts in from the dataframe:

In [None]:
# Remove invalid book IDs (less than 5 ratings)
rbook_counts = ratings['ISBN'].value_counts()
ratings = ratings[ratings['ISBN'].isin(rbook_counts[rbook_counts >= 5].index)]

# Remove invalid user IDs (less than 10 ratings)
ruser_counts = ratings['User-ID'].value_counts()
ratings = ratings[ratings['User-ID'].isin(ruser_counts[ruser_counts >= 10].index)]

* Remove rows with id that are not in the books and users dataframe:

In [None]:
ratings = ratings[ratings['ISBN'].isin(books['ISBN'].values)]
ratings = ratings[ratings['User-ID'].isin(users['User-ID'].values)]

* Transform all IDs to integer equivalent

In [None]:
# Transform both the user and the book IDs
isbn_transformer = LabelEncoder().fit(books['ISBN'])
books['ISBN'] = isbn_transformer.transform(books['ISBN'])
ratings['ISBN'] = isbn_transformer.transform(ratings['ISBN'])

userid_transformer = LabelEncoder().fit(users['User-ID'])
users['User-ID'] = userid_transformer.transform(users['User-ID'])
ratings['User-ID'] = userid_transformer.transform(ratings['User-ID'])

## Brief overview of dataframes after data cleaning

#### Ratings

In [None]:
ratings.head()

In [None]:
ratings.info()

#### Books

In [None]:
books.head()

In [None]:
books.info()

#### Users

In [None]:
users.head()

In [None]:
users.info()