# Introduction

In this project, we will create a web app for book recommendation using machine learning algorithm.

We will be using the dataset from Kaggle: [Book Recommendation Dataset](https://www.kaggle.com/datasets/arashnic/book-recommendation-dataset?datasetId=1004280)

## Goals
By the end of this project, we will create a web app with the following features:
1. "Similar book" feature
1. "People also read" feature

## Steps
This project will go through
1. Preparing the GitHub repository
1. Data preparation
    1. Data Cleaning
    1. Exploratory Data Analysis (EDA)
    1. Feature engineering
    1. Feature selection
1. Creating the model
    1. Model training
    1. Model assessment
    1. Model serialization (to make the model reusable in the web app)
1. Setting up the web app
    1. Flask
    1. Streamlit
    1. Running and testing the web app (postman)
1. Web app deployment
    1. Heroku
    1. Dockers
    
## Tools

## Sources
1. [Krish Naik's End To End Machine Learning Project Implementation With Dockers,Github Actions And Deployment](https://www.youtube.com/watch?v=MJ1vWb1rGwM)
1. Kaggle
    1. [[Data]Book Recommendation Dataset](https://www.kaggle.com/datasets/arashnic/book-recommendation-dataset?datasetId=1004280)
    1. [[Notebook]Book_Item-Based Collaborative Filtering](https://www.kaggle.com/code/sebnemgurek/book-item-based-collaborative-filtering)
    1. [[Notebook]Book Recommendation System](https://www.kaggle.com/code/fahadmehfoooz/book-recommendation-system)
1. [Additional: How to Build a Book Recommendation System](https://www.analyticsvidhya.com/blog/2021/06/build-book-recommendation-system-unsupervised-learning-project/)
1. Additional
    1. [Rpubs: PCA and Linear Regression](https://rpubs.com/esobolewska/pcr-step-by-step#:~:text=PCA%20in%20linear%20regression%20has,with%20Partial%20Least%20Squares%20Regression.)
    1. [Stat SE: PCA vs Linear Regression](https://stats.stackexchange.com/questions/410516/using-pca-vs-linear-regression)
    1. [Statology: PCRegression](https://www.statology.org/principal-components-regression-in-python/)

# What Can You Expect In this Notebook?

## Goals
1. Data preparation
1. EDA

## Steps
In this notebook we will do the following:
1. Data cleaning
1. EDA of each dataset
1. Feature engineering
1. Feature selection
    1. "Manual" feature selection
    1. PCA
    1. t-SNE(?)

In [None]:
import logging
logging.captureWarnings(True)

# import libraries (you may add additional imports but you may not have to)
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix

#viz lib
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import seaborn as sns

#to scale the data using z-score
from sklearn.preprocessing import StandardScaler

#Importing PCA
from sklearn.decomposition import PCA