<a href="https://colab.research.google.com/github/akashbhor1356/Book_Recommendation_system/blob/main/new_Book_Recommendation_system.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

During the last few decades, with the rise of Youtube, Amazon, Netflix, and many other such
web services, recommender systems have taken more and more place in our lives. From
e-commerce (suggest to buyers articles that could interest them) to online advertisement
(suggest to users the right contents, matching their preferences), recommender systems are
today unavoidable in our daily online journeys.
In a very general way, recommender systems are algorithms aimed at suggesting relevant
items to users (items being movies to watch, text to read, products to buy, or anything else
depending on industries).
Recommender systems are really critical in some industries as they can generate a huge
amount of income when they are efficient or also be a way to stand out significantly from
competitors. The main objective is to create a book recommendation system for users.                                                                             
              
Content
The Book-Crossing dataset comprises 3 files.

● Users
Contains the users. Note that user IDs (User-ID) have been anonymized and map to
integers. Demographic data is provided (Location, Age) if available. Otherwise, these fields contain NULL values.

● Books
Books are identified by their respective ISBN. Invalid ISBNs have already been removed from the dataset. Moreover, some content-based information is given (Book-Title, Book-Author, Year-Of-Publication, Publisher), obtained from Amazon Web Services. Note that in the case of several authors, only the first is provided. URLs linking to cover images are also given, appearing in three different flavors (Image-URL-S, Image-URL-M, Image-URL-L), i.e., small, medium, large. These URLs point to the Amazon website.

● Ratings
Contains the book rating information. Ratings (Book-Rating) are either explicit,
expressed on a scale from 1-10 (higher values denoting higher appreciation), or implicit,
expressed by 0.

**Table of Contents**
1. Importing Required libraries
2. Popularity Based  Recommendation System
3. Collaborative-based filtering
4. KNN Model 
5. Collaborative filtering by using Cosine sililarity
6. Conclusion





##**1. Importing Required libraries**

In [None]:
import pandas as pd
import numpy as np
import  matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

import scipy
import math
import random
import sklearn
from nltk.corpus import stopwords
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import normalize
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse.linalg import svds

In [None]:
# importing data
from google.colab import drive 
drive.mount('/content/drive')

In [None]:
# Read the health insurance data set 
books = pd.read_csv("/content/drive/MyDrive/capstone_project_4_book_recommendation_system/Books.csv",low_memory='False')
users = pd.read_csv("/content/drive/MyDrive/capstone_project_4_book_recommendation_system/Users.csv",low_memory='False')
ratings = pd.read_csv("/content/drive/MyDrive/capstone_project_4_book_recommendation_system/Ratings.csv",low_memory='False')


In [None]:
books.head()

In [None]:
books['Image-URL-M'][0]

In [None]:
 users.head()

In [None]:
ratings.head()

In [None]:
print(books.shape)
print(users.shape)
print(ratings.shape)

The dataset is reliable and can consider as a large dataset. we have 271360 books data and total registered users on the website are approximately 278858 and they have given near about 1149780  rating. hence we can say that the dataset we have is nice and reliable.

In [None]:
books.isnull().sum()

In [None]:
users.isnull().sum()

In our model we don't need to use Age feature 

In [None]:
ratings.isnull().sum()

In [None]:
books.duplicated().sum()

In [None]:
users.duplicated().sum()

In [None]:
ratings.duplicated().sum()

##**2.Popularity Based Recommender System**


In [None]:
rating_with_name = ratings.merge(books,on='ISBN')

In [None]:
num_rating_df = rating_with_name.groupby('Book-Title').count()['Book-Rating'].reset_index()
num_rating_df.rename(columns = {'Book-Rating':'num_rating'}, inplace=True)
num_rating_df

In [None]:
avg_rating_df = rating_with_name.groupby('Book-Title').mean()['Book-Rating'].reset_index()
avg_rating_df.rename(columns = {'Book-Rating':'avg_rating'}, inplace=True)
avg_rating_df

In [None]:
popular_df = num_rating_df.merge(avg_rating_df,on='Book-Title')
popular_df

In [None]:
popular_df=popular_df[popular_df['num_rating']>=250].sort_values('avg_rating',ascending=False).head(50)

In [None]:
popular_df = popular_df.merge(books,on='Book-Title').drop_duplicates('Book-Title')[['Book-Title','Book-Author','Image-URL-M','num_rating','avg_rating']]

In [None]:
popular_df

In [None]:
popular_df['Image-URL-M'][0]

##**3.ollaborative Filtering Based Recommender System**

In [None]:
books.rename(columns = {'Book-Title':'title', 'Book-Author':'author', 'Year-Of-Publication':'year', 'Publisher':'publisher'}, inplace=True)
users.rename(columns = {'User-ID':'user_id', 'Location':'location', 'Age':'age'}, inplace=True)
ratings.rename(columns = {'User-ID':'user_id', 'Book-Rating':'rating'}, inplace=True)

In [None]:
books.columns


In [None]:
books=books[['ISBN', 'title', 'author', 'year', 'publisher','Image-URL-M']]

In [None]:
books.head()

In [None]:
# Merging Dataframes
# temp_df = books.merge(ratings, how='left', on='ISBN')
# final_df = temp_df.merge(users, how='left', on='user_id')
# final_df.shape

Each User How many times rating

In [None]:
ratings['user_id'].value_counts()

In [None]:
ratings['user_id'].value_counts().shape

In [None]:
x = ratings['user_id'].value_counts()>200

In [None]:
y =x[x].index

In [None]:
y

In [None]:
ratings = ratings[ratings['user_id'].isin(y)]

In [None]:
ratings.shape

In [None]:
books.head()

In [None]:
rating_with_books = ratings.merge( books, on = 'ISBN')

In [None]:
rating_with_books.shape

In [None]:
number_rating = rating_with_books.groupby('title')['rating'].count().reset_index()

In [None]:
number_rating.rename(columns={'rating':'number_of_rating'},inplace=True)

In [None]:
number_rating.head()

In [None]:
final_rating = rating_with_books.merge(number_rating,on = 'title')

In [None]:
final_rating.head()

In [None]:
final_rating.shape

who has rating above or equal to 50 those books only consider 

In [None]:
final_rating=final_rating[final_rating ['number_of_rating']>=50]

In [None]:
final_rating.shape

In [None]:
final_rating.drop_duplicates(['user_id','title'],inplace = True)

In [None]:
final_rating.shape

In [None]:
plt.rc("font",size =15)
final_rating.number_of_rating.value_counts(sort=False).plot(kind='bar')
plt.title('Rating Distribution')
plt.xlabel("Rating")
plt.ylabel("Count")
plt.savefig("system1.png",bbox_inches='tight')
plt.show()

In [None]:
# from google.colab import files
# final_rating.to_csv('final_rating.csv') 
# files.download('final_rating.csv')

Creating pivot table to show  the users rating on a every books

In [None]:
book_pivot = final_rating.pivot_table(columns='user_id',index = 'title',values = 'rating')

In [None]:
book_pivot

In [None]:
book_pivot.shape

finally books remaining 742 and users is 888 but

 what about NaN values fill that NaN values

In [None]:
book_pivot.fillna(0,inplace=True)

In [None]:
book_pivot

because of so many zeroes if we calculate distances then so many time will taken so we use sparse matrix to simplify our distance calculation for k nearest neighbour algorithm

In [None]:
from scipy.sparse import  csr_matrix

In [None]:
book_sparse = csr_matrix(book_pivot)

In [None]:
type(book_sparse)

In [None]:
book_sparse

##**4.K Nearest Neighbors**

In [None]:
from sklearn.neighbors import NearestNeighbors

In [None]:
model = NearestNeighbors(algorithm = 'brute')

In [None]:
model.fit(book_sparse)

In [None]:
distances,suggestions = model.kneighbors(book_pivot.iloc[54, :].values.reshape(1,-1),n_neighbors = 6)

In [None]:
distances

In [None]:
suggestions

In [None]:
for i in range (len(suggestions)):
  print(book_pivot.index[suggestions[i]])

In [None]:
np.where(book_pivot.index=='Message in a Bottle')[0][0]

In [None]:
def recommend_book(book_name):
  book_id = np.where(book_pivot.index==book_name)[0][0]
  distances,suggestions = model.kneighbors(book_pivot.iloc[book_id, :].values.reshape(1,-1),n_neighbors = 6)
  for i in range (len(suggestions)):
    
      print(book_pivot.index[suggestions[i][1:5]])



In [None]:
recommend_book('Message in a Bottle')

##**5.Cosine Similarity**

In [None]:
from sklearn.neighbors import NearestNeighbors


model_knn = NearestNeighbors(metric = 'cosine', algorithm = 'brute')
model_knn.fit(book_sparse)

In [None]:
def recommend_books(book_name):
  query_index = np.where(book_pivot.index==book_name)[0][0]
  print(query_index)
  distances, indices = model_knn.kneighbors(book_pivot.iloc[query_index,:].values.reshape(1, -1), n_neighbors = 6)
  for i in range(0, len(distances.flatten())):
    if i == 0:
        print('Recommendations for {0}:\n'.format(book_pivot.index[query_index]))
    else:
        print('{0}: {1}, with distance of {2}:'.format(i,book_pivot.index[indices.flatten()[i]], distances.flatten()[i]))


In [None]:
recommend_books('Naked')

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
similarity_scores = cosine_similarity(book_pivot)

In [None]:
similarity_scores.shape

In [None]:
def recommend(book_name):
    # index fetch
    index = np.where(book_pivot.index==book_name)[0][0]
    similar_items = sorted(list(enumerate(similarity_scores[index])),key=lambda x:x[1],reverse=True)[1:6]
    
    data = []
    for i in similar_items:
        item = []
        temp_df = final_rating[final_rating['title'] == book_pivot.index[i[0]]]
        item.extend(list(temp_df.drop_duplicates('title')['title'].values))
        item.extend(list(temp_df.drop_duplicates('title')['author'].values))
        item.extend(list(temp_df.drop_duplicates('title')['Image-URL-M'].values))
        
        data.append(item)
    
    return data

In [None]:
recommend('1984')

In [None]:
import pickle
pickle.dump(popular_df,open('popular.pkl','wb'))

In [None]:
# from google.colab import files
# files.download('popular.pkl')

In [None]:
pickle.dump(book_pivot,open('book_pivot.pkl','wb'))
pickle.dump(final_rating,open('final_rating.pkl','wb'))
pickle.dump(similarity_scores,open('similarity_scores.pkl','wb'))

In [None]:
# from google.colab import files
# files.download('book_pivot.pkl')
# files.download('final_rating.pkl')
# files.download('similarity_scores.pkl')


In [None]:
# pickle.dump(recommend_book.to_dict(),open('books_dict.pkl','wb'))

##**6.Conclusion**

This project provides an introduction to recommender systems. In the context of ever-increasing amounts of available information and data, it is difficult to know what information to look for and where to look for it. Computer-based techniques have been developed to facilitate the search and retrieval process; one of these techniques is recommendation, which guides users in their exploration of available information by seeking and highlighting the most relevant information.

Recommender systems have their origins in a variety of areas of research, including information retrieval, information filtering, text classification, etc. They use techniques such as machine learning and data mining, alongside a range of concepts including algorithms, collaborative and hybrid approaches, and evaluation methods.

Having first presented the notions inherent in data- and information-handling systems (information systems, decision support systems and recommender systems) and established a clear distinction between recommendation and personalization, we then presented the most widespread approaches used in producing recommendations for users (Popularity Based filtering, collaborative filtering approaches). 
we have done EDA, null values treatment, creating new columns, Use popularity based filtering and collaborative filtering and then model building of KNN by using Cosine similarity.
In all of these models our accuracy to give us better recommendations to show Top 50 Books and recommend on the basis of similarity shows top 5 books.
