# Book recommender system

In this project I have made a recommender system based on cosine similarity of book ratings.

The data for this project has been taken from [Here](https://www.kaggle.com/datasets/arashnic/book-recommendation-dataset)

### Importing necessary libraries

In [1]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

### Loading the data

In [2]:
! unzip archive.zip # Unzipping the files necessary for the book recommender system

Archive:  archive.zip
  inflating: Books.csv               
  inflating: DeepRec.png             
  inflating: Ratings.csv             
  inflating: Users.csv               
  inflating: classicRec.png          
  inflating: recsys_taxonomy2.png    


In [3]:
# Loading the data
books = pd.read_csv('Books.csv')
ratings = pd.read_csv('Ratings.csv')

# Merging the 2 tables on the "ISBN" column
data = pd.merge(ratings, books, on='ISBN')


  books = pd.read_csv('Books.csv')


### Exploratory Data Analysis

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1031136 entries, 0 to 1031135
Data columns (total 10 columns):
 #   Column               Non-Null Count    Dtype 
---  ------               --------------    ----- 
 0   User-ID              1031136 non-null  int64 
 1   ISBN                 1031136 non-null  object
 2   Book-Rating          1031136 non-null  int64 
 3   Book-Title           1031136 non-null  object
 4   Book-Author          1031134 non-null  object
 5   Year-Of-Publication  1031136 non-null  object
 6   Publisher            1031134 non-null  object
 7   Image-URL-S          1031136 non-null  object
 8   Image-URL-M          1031136 non-null  object
 9   Image-URL-L          1031132 non-null  object
dtypes: int64(2), object(8)
memory usage: 78.7+ MB


I am interested only in the User-ID, Book-Rating, Book-Title, Book-Author columns and they have correct type and don't have any missing values so I will proceed to further processing of the data

### Data preprocessing

In [5]:
rating_counts = ratings['ISBN'].value_counts()
books_with_enough_ratings = rating_counts[rating_counts >= 50].index

# Filtering the ratings dataframe to only include books with more than or equal to 50 ratings
filtered_ratings = ratings[ratings['ISBN'].isin(books_with_enough_ratings)]

# Merging the filtered ratings with the books information
filtered_data = pd.merge(filtered_ratings, books, on='ISBN')



In [6]:
# Creating a matrix of users and their ratings of particular books
book_user_matrix = filtered_data.pivot_table(index='User-ID', columns='ISBN', values='Book-Rating').fillna(0).T
book_user_matrix.head()

User-ID,9,14,16,17,26,32,39,42,44,51,...,278813,278819,278828,278832,278836,278843,278844,278846,278851,278854
ISBN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
000649840X,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0007110928,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
002026478X,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0020442203,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
002542730X,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [7]:
# Assigning the book titles instead of ISBN as the index to improve clarity
book_user_matrix.index = book_user_matrix.join(books.set_index('ISBN'))['Book-Title']
book_user_matix = book_user_matrix.sort_index()
book_user_matrix.head(15)


User-ID,9,14,16,17,26,32,39,42,44,51,...,278813,278819,278828,278832,278836,278843,278844,278846,278851,278854
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Angelas Ashes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Billy,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AGE OF INNOCENCE (MOVIE TIE-IN),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Lion, the Witch and the Wardrobe",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Politically Correct Bedtime Stories: Modern Tales for Our Life and Times,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Once upon a More Enlightened Time: More Politically Correct Bedtime Stories,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
The Death of Vishnu: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Angels,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Pagan Babies,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Dr. Atkins' New Diet Revolution,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [8]:
# Calculating the cosine to determine how similar books are to each other
book_similarity_matrix = cosine_similarity(book_user_matrix)
book_similarity_df = pd.DataFrame(book_similarity_matrix, index=book_user_matrix.index, columns=book_user_matrix.index)

### Creating the model

In [9]:
def recommend_similar_books(book_title):

  if book_title in book_user_matrix.index:
    # Finding the index of the specific book
    book_index = np.where(book_user_matrix.index == book_title)[0][0]

    # Calculating 'similar_items' by sorting the enumerated list of similarity scores
    similar_items = sorted(list(enumerate(book_similarity_matrix[book_index])), key=lambda x: x[1], reverse=True)[1:6]

    # Creating an empty list to store information about books
    data = []

    # Extracting information (book title and author) about similar books and appending it to the list
    for idx, score in similar_items:
        temp_df = books[books['Book-Title'] == book_user_matrix.index[idx]]
        item = [temp_df['Book-Title'].values[0], temp_df['Book-Author'].values[0]]
        data.append(item)

    return data

In [10]:
# 5 recommendations based on the book title "The Death of Vishnu: A Novel"
recommended_books = recommend_similar_books("The Death of Vishnu: A Novel")
print("Books recommended:\n", recommended_books)


Books recommended:
 [['Conversations with God : An Uncommon Dialogue (Book 1)', 'Neale Donald Walsch'], ['Blood and Gold (Rice, Anne, Vampire Chronicles.)', 'Anne Rice'], ['The Lost Continent: Travels in Small-Town America', 'Bill Bryson'], ["Hanna's Daughters (Ballantine Reader's Circle)", 'Marianne Fredriksson'], ['Toujours Provence (Vintage Departures)', 'Peter Mayle']]
