<a href="https://colab.research.google.com/github/Amolrakhunde/Book-Recommender-System/blob/main/Book_Recommender_System_Amol.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

During the last few decades, with the rise of Youtube, Amazon, Netflix and many other such web services, recommender systems have taken more and more place in our lives. From e-commerce (suggest to buyers articles that could interest them) to online advertisement (suggest to users the right contents, matching their preferences), recommender systems are today unavoidable in our daily online journeys.
In a very general way, recommender systems are algorithms aimed at suggesting relevant items to users (items being movies to watch, text to read, products to buy or anything else depending on industries).

Recommender systems are really critical in some industries as they can generate a huge amount of income when they are efficient or also be a way to stand out significantly from competitors. As a proof of the importance of recommender systems, we can mention that, a few years ago, Netflix organised a challenges (the “Netflix prize”) where the goal was to produce a recommender system that performs better than its own algorithm with a prize of 1 million dollars to win.

By applying this simple dataset and related tasks and notebooks , we will evolutionary go through different paradigms of recommender algorithms . For each of them, we will present how they work, describe their theoretical basis and discuss their strengths and weaknesses.

* **Content**

The Book-Crossing dataset comprises 3 files.

* **Users**


Contains the users. Note that user IDs (User-ID) have been anonymized and map to integers. Demographic data is provided (Location, Age) if available. Otherwise, these fields contain NULL-values.
* **Books**


Books are identified by their respective ISBN. Invalid ISBNs have already been removed from the dataset. Moreover, some content-based information is given (Book-Title, Book-Author, Year-Of-Publication, Publisher), obtained from Amazon Web Services. Note that in case of several authors, only the first is provided. URLs linking to cover images are also given, appearing in three different flavours (Image-URL-S, Image-URL-M, Image-URL-L), i.e., small, medium, large. These URLs point to the Amazon web site.

* **Ratings**


Contains the book rating information. Ratings (Book-Rating) are either explicit, expressed on a scale from 1-10 (higher values denoting higher appreciation), or implicit, expressed by 0.

In [2]:
# Importing required libraries
import pandas as pd
import numpy as np

In [30]:
#loading datasets in different pandas datafrmae
books_df=pd.read_csv('/content/drive/MyDrive/Practice/Book Recommender System/Book Data/Books.csv', sep=',', error_bad_lines=False, index_col=False, dtype='unicode')


In [35]:
ratings_df=pd.read_csv('/content/drive/MyDrive/Practice/Book Recommender System/Book Data/Ratings.csv')


In [38]:
users_df=pd.read_csv('/content/drive/MyDrive/Practice/Book Recommender System/Book Data/Users.csv')


Looking at each dataframe

In [41]:
books_df.head(2)

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...


In [42]:
ratings_df.head(2)

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5


In [43]:
users_df.head(2)

Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0


Idea about rows and columns in all three dataframe

In [48]:
books_df.shape

(271360, 8)

In [49]:
ratings_df.shape

(1149780, 3)

In [50]:
users_df.shape

(278858, 3)

Renaming column name

In [45]:
books_df.rename(columns={'Book-Title':'title', 'Book-Author':'author', 'Year-Of-Publication':'year','Publisher':'publisher'}, inplace=True)
ratings_df.rename(columns={'User-ID':'user', 'Book-Rating':'ratings'}, inplace=True)
users_df.rename(columns={'User-ID':'user', 'Location':'location', 'Age':'age'}, inplace=True)

In [46]:
books_df.head(2)

Unnamed: 0,ISBN,title,author,year,publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...


In [52]:
#choosing required columns
books_df = books_df[['ISBN','title','author','year','publisher']]

In [53]:
books_df.head(2)

Unnamed: 0,ISBN,title,author,year,publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada


In [54]:
ratings_df.head(2)

Unnamed: 0,user,ISBN,ratings
0,276725,034545104X,0
1,276726,0155061224,5


In [55]:
users_df.head(2)

Unnamed: 0,user,location,age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
