# Title: "Unveiling the Literary Universe: A Journey through Books, Users, and Ratings"

# Introduction:
Step into the captivating world of literature with this exploration of a dataset that intertwines books, users, and ratings. In this enthralling journey, we delve into the intricacies of a dataset comprising the magic of written words, the diversity of readers, and the nuanced art of rating. By examining this dataset, we aim to uncover patterns, preferences, and insights that illuminate the rich tapestry of the literary universe.

# Importance in today's world:
In today's digital age, analyzing datasets on books, users, and ratings is crucial. This exploration enables personalized reading experiences, fosters community engagement, and aids in content curation. It also provides valuable insights for publishers, authors, and educators, shaping market dynamics, technological advancements, and cultural impact. This dataset is not just information; it's a key to understanding the evolving dynamics between readers, technology, and the literary world.

In [1]:
# Libraries for analysis
import pandas as pd 
import numpy as np 

## Loading and Merging the datasets

In [2]:
# Importing the dataset
books = pd.read_csv('Books.csv')
users = pd.read_csv('Users.csv')
rating =pd.read_csv('Ratings.csv')

  books = pd.read_csv('Books.csv')


In [3]:
# First 5 rows of dataset
books.head()

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...


In [4]:
users.head()

Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


In [5]:
rating.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [6]:
print('books' , books.shape)
print('users' , users.shape)
print('Ratings' , rating.shape)

books (271360, 8)
users (278858, 3)
Ratings (1149780, 3)


## Basic analysis 

In [7]:
# checking missing value 
books.isnull().sum()

ISBN                   0
Book-Title             0
Book-Author            1
Year-Of-Publication    0
Publisher              2
Image-URL-S            0
Image-URL-M            0
Image-URL-L            3
dtype: int64

In [8]:
users.isnull().sum()

User-ID          0
Location         0
Age         110762
dtype: int64

In [9]:
rating.isnull().sum()

User-ID        0
ISBN           0
Book-Rating    0
dtype: int64

as we working going to work with Book-Rating we don't need to worry about null value of other column other then Book-Rating

In [10]:
# checking duplicates value 
books.duplicated().any()

False

In [11]:
users.duplicated().any()

False

In [12]:
rating.duplicated().any()

False

##  popularity based recommendation system

## our formula is going to be the book should pass these two condition 
Our popularity-based recommendation system employs a straightforward formula: a book must satisfy two conditions. Firstly, it should fall within the top 50 in terms of high average ratings. Secondly, the book must have garnered a minimum of 250 ratings to ensure robust statistical significance. This dual criterion ensures that recommended books not only receive high praise but also have a substantial number of reviews, reflecting both quality and popularity.

In [13]:
# merging rating and books data 
books_with_rating = rating.merge(books , on = 'ISBN' )

In [14]:
books_with_rating.head()

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...
1,2313,034545104X,5,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...
2,6543,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...
3,8680,034545104X,5,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...
4,10314,034545104X,9,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...


In [15]:
# we have to Extract all data which satisfy the above two condition 
no_of_rating = books_with_rating['Book-Title'].value_counts().reset_index()
no_of_rating.rename(columns={'index' : 'Book-Title' , 'Book-Title' : 'Num_of_rating'}, inplace=True)

In [16]:
# Grouping the 'books_with_rating' DataFrame by 'Book-Title' and counting the occurrences of each book's ratings.
# Sorting the results in descending order to identify the most frequently rated books.
books_with_rating.groupby('Book-Title').count()['Book-Rating'].sort_values(ascending=False)

Book-Title
Wild Animus                                                                           2502
The Lovely Bones: A Novel                                                             1295
The Da Vinci Code                                                                      898
A Painted House                                                                        838
The Nanny Diaries: A Novel                                                             828
                                                                                      ... 
Real Love: The Truth About Finding Unconditional Love and Fulfilling Relationships       1
Real Love: The Drawings for Sean                                                         1
Real Love or Fake (Camfield Novel of Love, No 78)                                        1
Fabulous Food for Family and Friends: Healthy Menus for Entertaining With Style          1
Suburban backlash: The battle for the world's most liveable city               

In [17]:
# checking top 5 row
no_of_rating.head()

Unnamed: 0,Book-Title,Num_of_rating
0,Wild Animus,2502
1,The Lovely Bones: A Novel,1295
2,The Da Vinci Code,898
3,A Painted House,838
4,The Nanny Diaries: A Novel,828


In [18]:
# Calculating the average rating for each book by grouping the 'books_with_rating' DataFrame by 'Book-Title'.
# Resetting the index to create a new DataFrame 'avg_rating'.
# Renaming the 'Book-Rating' column to 'avg_rating' for clarity.
avg_rating = books_with_rating.groupby('Book-Title').mean()['Book-Rating'].reset_index()
avg_rating.rename(columns={ 'Book-Rating' : 'avg_rating'}, inplace=True)

  avg_rating = books_with_rating.groupby('Book-Title').mean()['Book-Rating'].reset_index()


In [19]:
# checking top 5 row
avg_rating.head()

Unnamed: 0,Book-Title,avg_rating
0,A Light in the Storm: The Civil War Diary of ...,2.25
1,Always Have Popsicles,0.0
2,Apple Magic (The Collector's series),0.0
3,"Ask Lily (Young Women of Faith: Lily Series, ...",8.0
4,Beyond IBM: Leadership Marketing and Finance ...,0.0


In [20]:
# Merging the 'no_of_rating' and 'avg_rating' DataFrames on the 'Book-Title' column to create 'popular_df'.
# This DataFrame provides a consolidated view of the number of ratings and average rating for each book.
popular_df = no_of_rating.merge(avg_rating , on = 'Book-Title')


In [21]:
# Filtering 'popular_df' to include only books with a minimum of 250 ratings.
# Selecting the top 50 books based on average rating
popular_df = popular_df[popular_df['Num_of_rating'] >= 250].nlargest(50 , 'avg_rating') 

In [22]:
# Merging 'popular_df' with the 'books' DataFrame on the 'Book-Title' column.
# Dropping duplicates based on 'Book-Title' to ensure unique book entries.
# Selecting specific columns for a concise view: 'Book-Title', 'Book-Author', 'Image-URL-M', 'Num_of_rating', 'avg_rating'.
popular_df.merge(books , on = 'Book-Title').drop_duplicates('Book-Title')[['Book-Title' , 
                                    'Book-Author' ,'Image-URL-M', 'Num_of_rating', 'avg_rating' ]]
popular_df

Unnamed: 0,Book-Title,Num_of_rating,avg_rating
59,Harry Potter and the Prisoner of Azkaban (Book 3),428,5.852804
69,Harry Potter and the Goblet of Fire (Book 4),387,5.824289
141,Harry Potter and the Sorcerer's Stone (Book 1),278,5.73741
87,Harry Potter and the Order of the Phoenix (Boo...,347,5.501441
23,Harry Potter and the Chamber of Secrets (Book 2),556,5.183453
138,The Hobbit : The Enchanting Prelude to The Lor...,281,5.007117
78,The Fellowship of the Ring (The Lord of the Ri...,368,4.94837
18,Harry Potter and the Sorcerer's Stone (Harry P...,575,4.895652
170,"The Two Towers (The Lord of the Rings, Part 2)",260,4.880769
31,To Kill a Mockingbird,510,4.7


## colabrative recomdation system 


## Blueprint for Collaborative Recommendation System:

Grid Index Creation:

Formulating a grid index with books as rows and users as columns, leveraging the 'rating' data.
This grid encapsulates user-book interactions, where each cell represents a user's rating for a particular book.
User Filtering:

Considering only users who have provided more than 199 ratings. This criterion ensures that we focus on active and engaged users, enhancing the reliability of the collaborative filtering process.
Book Filtering:

Selecting books with 50 or more ratings. This filter ensures that we concentrate on books with a substantial number of ratings, providing a robust foundation for collaborative recommendations.

In [23]:
# Calculating the count of ratings provided by each user in the 'books_with_rating' DataFrame.
# Creating a boolean series 'x' indicating whether the count is greater than 200.
# Extracting the indices (User-IDs) of users who meet the criteria.
x = books_with_rating['User-ID'].value_counts() > 200
p_user = x[x].index

In [24]:
# Filtering the 'books_with_rating' DataFrame to include only rows where the 'User-ID' is in the 'p_user' index.
frating_with_user = books_with_rating[books_with_rating['User-ID'].isin(p_user)]

In [25]:
# Calculating the count of ratings for each book in the 'frating_with_user' DataFrame.
# Creating a boolean series 'y' indicating whether the count is greater than or equal to 50.
# Extracting the indices (Book-Titles) of books that meet the criteria.
y = frating_with_user['Book-Title'].value_counts() >= 50 
f_books = y[y].index

In [26]:
# Filtering the 'frating_with_user' DataFrame to include only rows where the 'Book-Title' is in the 'f_books' index.
final_rating = frating_with_user[frating_with_user['Book-Title'].isin(f_books)]

In [27]:
final_rating.drop_duplicates()

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
63,278418,0446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...
65,3363,0446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...
66,7158,0446520802,10,The Notebook,Nicholas Sparks,1996,Warner Books,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...
69,11676,0446520802,10,The Notebook,Nicholas Sparks,1996,Warner Books,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...
74,23768,0446520802,6,The Notebook,Nicholas Sparks,1996,Warner Books,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...
...,...,...,...,...,...,...,...,...,...,...
1026724,266865,0531001725,10,The Catcher in the Rye,Jerome David Salinger,1973,Scholastic Library Pub,http://images.amazon.com/images/P/0531001725.0...,http://images.amazon.com/images/P/0531001725.0...,http://images.amazon.com/images/P/0531001725.0...
1027923,269566,0670809381,0,Echoes,Maeve Binchy,1986,Penguin USA,http://images.amazon.com/images/P/0670809381.0...,http://images.amazon.com/images/P/0670809381.0...,http://images.amazon.com/images/P/0670809381.0...
1028777,271284,0440910927,0,The Rainmaker,John Grisham,1995,Island,http://images.amazon.com/images/P/0440910927.0...,http://images.amazon.com/images/P/0440910927.0...,http://images.amazon.com/images/P/0440910927.0...
1029070,271705,B0001PIOX4,0,Fahrenheit 451,Ray Bradbury,1993,Simon &amp; Schuster,http://images.amazon.com/images/P/B0001PIOX4.0...,http://images.amazon.com/images/P/B0001PIOX4.0...,http://images.amazon.com/images/P/B0001PIOX4.0...


In [28]:
final_table = final_rating.pivot_table(values='Book-Rating' , index='Book-Title' , columns='User-ID' )

In [29]:
final_table.fillna(0 , inplace=True)

In [30]:
# importing library 
from sklearn.metrics.pairwise import cosine_similarity

In [31]:
similar_score = cosine_similarity(final_table)

In [32]:
similar_score.shape

(706, 706)

In [33]:
def recommend(book_name) :
    # Find the index of the specified book in the 'final_table' index
    index = np.where(final_table.index == book_name)[0][0]
    # # Find similar items based on similarity scores 
    similar_item = sorted(list(enumerate(similar_score[index])), key=lambda x : -x[1])[1:6]
    # Display recommended books
    for i in similar_item :
        print(final_table.index[i[0]]) 

In [34]:
# final product 
recommend('Animal Farm')

1984
Angus, Thongs and Full-Frontal Snogging: Confessions of Georgia Nicolson
Midnight
Second Nature
Call of the Wild
