## BOOK RECOMMENDATION SYSTEM
In today's digital age, the vast expanse of available books can be overwhelming for readers seeking their next great read. To address this challenge, we present a book recommendation system designed to assist users in discovering books tailored to their preferences. Utilizing two comprehensive datasets—books, and ratings—our system employs two distinct recommendation models: a popularity-based model and a collaborative-based model.

The popularity-based model leverages overall book ratings to suggest widely acclaimed books, ensuring users are introduced to popular and trending titles. Conversely, the collaborative-based model harnesses the power of user interaction data, providing personalized recommendations by identifying patterns and similarities among users' reading habits. By combining these models, our system offers a balanced approach, catering to both general trends and individual tastes, ultimately enhancing the user experience in book discovery.

#### DATA GATHERING AND DATA CLEANING

Importing various libraries.

1. The NumPy and pandas libraries are essential for data manipulation and analysis in Python.
2. NumPy is used for numerical computing in Python. It provides support for arrays, matrices, and many mathematical functions.
3. Pandas is a powerful data manipulation library built on top of NumPy. It provides data structures like DataFrame and Series for efficient ata manipulation.

In [1]:
import numpy as np
import pandas as pd

DATA GATHERING : For this project, two primary datasets were utilized: 'books' and 'ratings'. The 'books' dataset contains detailed information about each book, including identifiers, titles, authors, publication years, publishers, and cover images. The 'ratings' dataset captures user interactions by recording unique user IDs, book ISBNs, and the corresponding ratings. These datasets form the foundation for the recommendation models.

In [174]:
ratings = pd.read_csv('Ratings.csv')
books = pd.read_csv('Books.csv')

  books = pd.read_csv('Books.csv')


In [175]:
ratings

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6
...,...,...,...
1149775,276704,1563526298,9
1149776,276706,0679447156,0
1149777,276709,0515107662,10
1149778,276721,0590442449,10


In [176]:
books

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,0195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,0002005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,0060973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,0374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,0393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...
...,...,...,...,...,...,...,...,...
271355,0440400988,There's a Bat in Bunk Five,Paula Danziger,1988,Random House Childrens Pub (Mm),http://images.amazon.com/images/P/0440400988.0...,http://images.amazon.com/images/P/0440400988.0...,http://images.amazon.com/images/P/0440400988.0...
271356,0525447644,From One to One Hundred,Teri Sloat,1991,Dutton Books,http://images.amazon.com/images/P/0525447644.0...,http://images.amazon.com/images/P/0525447644.0...,http://images.amazon.com/images/P/0525447644.0...
271357,006008667X,Lily Dale : The True Story of the Town that Ta...,Christine Wicker,2004,HarperSanFrancisco,http://images.amazon.com/images/P/006008667X.0...,http://images.amazon.com/images/P/006008667X.0...,http://images.amazon.com/images/P/006008667X.0...
271358,0192126040,Republic (World's Classics),Plato,1996,Oxford University Press,http://images.amazon.com/images/P/0192126040.0...,http://images.amazon.com/images/P/0192126040.0...,http://images.amazon.com/images/P/0192126040.0...


We will first use the info method of pandas which provides a concise summary of a DataFrame, including the index dtype, column dtypes, non-null values, and memory usage.

In [177]:
ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1149780 entries, 0 to 1149779
Data columns (total 3 columns):
 #   Column       Non-Null Count    Dtype 
---  ------       --------------    ----- 
 0   User-ID      1149780 non-null  int64 
 1   ISBN         1149780 non-null  object
 2   Book-Rating  1149780 non-null  int64 
dtypes: int64(2), object(1)
memory usage: 26.3+ MB


In [178]:
books.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 271360 entries, 0 to 271359
Data columns (total 8 columns):
 #   Column               Non-Null Count   Dtype 
---  ------               --------------   ----- 
 0   ISBN                 271360 non-null  object
 1   Book-Title           271360 non-null  object
 2   Book-Author          271358 non-null  object
 3   Year-Of-Publication  271360 non-null  object
 4   Publisher            271358 non-null  object
 5   Image-URL-S          271360 non-null  object
 6   Image-URL-M          271360 non-null  object
 7   Image-URL-L          271357 non-null  object
dtypes: object(8)
memory usage: 16.6+ MB


DATA CLEANING: Dropping columns 'Image-URL-S' and 'IMAGE-URL-L' and keeping only column 'IMAGE-URL-M'for giving reference to cover image.

In [179]:
books.drop(columns = ['Image-URL-S', 'Image-URL-L'], inplace = True)

Dealing with null values present in 'Book-Author' and 'Publisher' columns and then dropping them.

In [180]:
books[books['Book-Author'].isnull() | books['Publisher'].isnull()]

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-M
118033,0751352497,A+ Quiz Masters:01 Earth,,1999,Dorling Kindersley,http://images.amazon.com/images/P/0751352497.0...
128890,193169656X,Tyrant Moon,Elaine Corvidae,2002,,http://images.amazon.com/images/P/193169656X.0...
129037,1931696993,Finders Keepers,Linnea Sinclair,2001,,http://images.amazon.com/images/P/1931696993.0...
187689,9627982032,The Credit Suisse Guide to Managing Your Perso...,,1995,Edinburgh Financial Publishing,http://images.amazon.com/images/P/9627982032.0...


In [181]:
ratings[(ratings['ISBN'] =='0751352497') | (ratings['ISBN'] =='193169656X') | (ratings['ISBN'] =='1931696993') | (ratings['ISBN'] =='9627982032')]

Unnamed: 0,User-ID,ISBN,Book-Rating
273117,63714,0751352497,10
411773,98391,193169656X,9
411782,98391,1931696993,9
412805,98647,9627982032,8


In [182]:
ratings = ratings[~((ratings['ISBN'] =='0751352497') | (ratings['ISBN'] =='193169656X') | (ratings['ISBN'] =='1931696993') | (ratings['ISBN'] =='9627982032'))]

In [183]:
books.dropna(subset=['Book-Author'], inplace = True)

In [184]:
books.dropna(subset=['Publisher'], inplace = True)

#### POPULARITY BASED MODEL


To build the popularity-based model, we first merge the 'books' and 'ratings' datasets using the common 'ISBN' column. We then filter for books with at least 200 ratings, this is because by focusing on well-rated books, we can provide recommendations that are more likely to be universally appreciated and then we can finally sort them by their average rating to recommend the most popular titles.

Firstly, merging our two datasets on the basis of 'ISBN' column as it is present in both of the datasets.

In [185]:
books_ratings = books.merge(ratings, on='ISBN')

In [186]:
books_ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1031132 entries, 0 to 1031131
Data columns (total 8 columns):
 #   Column               Non-Null Count    Dtype 
---  ------               --------------    ----- 
 0   ISBN                 1031132 non-null  object
 1   Book-Title           1031132 non-null  object
 2   Book-Author          1031132 non-null  object
 3   Year-Of-Publication  1031132 non-null  object
 4   Publisher            1031132 non-null  object
 5   Image-URL-M          1031132 non-null  object
 6   User-ID              1031132 non-null  int64 
 7   Book-Rating          1031132 non-null  int64 
dtypes: int64(2), object(6)
memory usage: 62.9+ MB


In [187]:
books_ratings

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-M,User-ID,Book-Rating
0,0195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,2,0
1,0002005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,8,5
2,0002005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,11400,0
3,0002005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,11676,8
4,0002005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,41385,0
...,...,...,...,...,...,...,...,...
1031127,0440400988,There's a Bat in Bunk Five,Paula Danziger,1988,Random House Childrens Pub (Mm),http://images.amazon.com/images/P/0440400988.0...,276463,7
1031128,0525447644,From One to One Hundred,Teri Sloat,1991,Dutton Books,http://images.amazon.com/images/P/0525447644.0...,276579,4
1031129,006008667X,Lily Dale : The True Story of the Town that Ta...,Christine Wicker,2004,HarperSanFrancisco,http://images.amazon.com/images/P/006008667X.0...,276680,0
1031130,0192126040,Republic (World's Classics),Plato,1996,Oxford University Press,http://images.amazon.com/images/P/0192126040.0...,276680,0


Grouping the 'books_ratings' dataset on the basis of 'Book-Title' and then finding out count and mean of ratings for each book.

In [188]:
num_of_ratings = books_ratings.groupby('Book-Title').count()['Book-Rating']
num_of_ratings = num_of_ratings.reset_index()
num_of_ratings.rename(columns = {'Book-Rating':'count_ratings'}, inplace = True)

In [189]:
mean_of_ratings = books_ratings.groupby('Book-Title').agg({'Book-Rating':'mean'})
mean_of_ratings = mean_of_ratings.reset_index()
mean_of_ratings.rename(columns = {'Book-Rating':'mean_ratings'}, inplace = True)

In [190]:
num_of_ratings

Unnamed: 0,Book-Title,count_ratings
0,A Light in the Storm: The Civil War Diary of ...,4
1,Always Have Popsicles,1
2,Apple Magic (The Collector's series),1
3,"Ask Lily (Young Women of Faith: Lily Series, ...",1
4,Beyond IBM: Leadership Marketing and Finance ...,1
...,...,...
241063,Ã?Â?lpiraten.,2
241064,Ã?Â?rger mit Produkt X. Roman.,4
241065,Ã?Â?sterlich leben.,1
241066,Ã?Â?stlich der Berge.,3


In [191]:
mean_of_ratings

Unnamed: 0,Book-Title,mean_ratings
0,A Light in the Storm: The Civil War Diary of ...,2.250000
1,Always Have Popsicles,0.000000
2,Apple Magic (The Collector's series),0.000000
3,"Ask Lily (Young Women of Faith: Lily Series, ...",8.000000
4,Beyond IBM: Leadership Marketing and Finance ...,0.000000
...,...,...
241063,Ã?Â?lpiraten.,0.000000
241064,Ã?Â?rger mit Produkt X. Roman.,5.250000
241065,Ã?Â?sterlich leben.,7.000000
241066,Ã?Â?stlich der Berge.,2.666667


Making a new dataframe df which includes count and mean of rating for each book.

In [192]:
df = num_of_ratings.merge(mean_of_ratings, on = 'Book-Title')

In [193]:
df

Unnamed: 0,Book-Title,count_ratings,mean_ratings
0,A Light in the Storm: The Civil War Diary of ...,4,2.250000
1,Always Have Popsicles,1,0.000000
2,Apple Magic (The Collector's series),1,0.000000
3,"Ask Lily (Young Women of Faith: Lily Series, ...",1,8.000000
4,Beyond IBM: Leadership Marketing and Finance ...,1,0.000000
...,...,...,...
241063,Ã?Â?lpiraten.,2,0.000000
241064,Ã?Â?rger mit Produkt X. Roman.,4,5.250000
241065,Ã?Â?sterlich leben.,1,7.000000
241066,Ã?Â?stlich der Berge.,3,2.666667


Now for creating popularity based model we want to consider only those books which have minimum of 200 ratings to ensure that the recommendations are based on a substantial amount of user feedback and then we will consider only those books as popular books whose average rating is more than 4.Lastly we can sort the books on the basis of average ratings.

In [194]:
popular_books = df[df['count_ratings'] > 200]

In [195]:
popular_books = popular_books[popular_books['mean_ratings'] > 4]

In [196]:
popular_books = popular_books.sort_values('mean_ratings', ascending = False)

In [197]:
popular_books = popular_books.reset_index()

In [198]:
popular_books.drop(columns = 'index', inplace = True)

Merging popular_books dataset with books dataset to get all the information of the most popular books.

In [199]:
popular_books = popular_books.merge(books, on = 'Book-Title')

Dropping duplicates on the basis of 'Book-Title' column if there are any.

In [200]:
popular_books.drop_duplicates(subset = 'Book-Title', inplace = True)

Now presenting final popular_books dataframe by dropping column 'ISBN' and resetting index.

In [201]:
popular_books.drop(columns= 'ISBN', inplace = True)

In [202]:
popular_books = popular_books.reset_index().drop(columns = 'index')

In [203]:
popular_books

Unnamed: 0,Book-Title,count_ratings,mean_ratings,Book-Author,Year-Of-Publication,Publisher,Image-URL-M
0,Harry Potter and the Prisoner of Azkaban (Book 3),428,5.852804,J. K. Rowling,1999,Scholastic,http://images.amazon.com/images/P/0439136350.0...
1,Harry Potter and the Goblet of Fire (Book 4),387,5.824289,J. K. Rowling,2000,Scholastic,http://images.amazon.com/images/P/0439139597.0...
2,Harry Potter and the Sorcerer's Stone (Book 1),278,5.73741,J. K. Rowling,1998,Scholastic,http://images.amazon.com/images/P/0590353403.0...
3,Harry Potter and the Order of the Phoenix (Boo...,347,5.501441,J. K. Rowling,2003,Scholastic,http://images.amazon.com/images/P/043935806X.0...
4,Ender's Game (Ender Wiggins Saga (Paperback)),249,5.409639,Orson Scott Card,1992,Tor Books,http://images.amazon.com/images/P/0312853238.0...
5,Harry Potter and the Chamber of Secrets (Book 2),556,5.183453,J. K. Rowling,2000,Scholastic,http://images.amazon.com/images/P/0439064872.0...
6,The Hobbit : The Enchanting Prelude to The Lor...,281,5.007117,J.R.R. TOLKIEN,1986,Del Rey,http://images.amazon.com/images/P/0345339681.0...
7,The Fellowship of the Ring (The Lord of the Ri...,368,4.94837,J.R.R. TOLKIEN,1986,Del Rey,http://images.amazon.com/images/P/0345339703.0...
8,Harry Potter and the Sorcerer's Stone (Harry P...,575,4.895652,J. K. Rowling,1999,Arthur A. Levine Books,http://images.amazon.com/images/P/059035342X.0...
9,"The Two Towers (The Lord of the Rings, Part 2)",260,4.880769,J.R.R. TOLKIEN,1986,Del Rey,http://images.amazon.com/images/P/0345339711.0...


#### COLLABORATIVE BASED MODEL

The collaborative-based model focuses on recommending books by analyzing user ratings. It selects active users with at least 200 ratings and considers books with a minimum of 50 ratings to ensure reliable recommendations. By calculating similarity scores based on user ratings in a pivot table using cosine similarity, the model suggests books that share similar user preferences, aiming to enhance personalized recommendations effectively.

Using the dataset 'books_ratings' that we made earlier by merging datasets 'books' and 'ratings' on the basis of common column 'ISBN'.

In [205]:
books_ratings

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-M,User-ID,Book-Rating
0,0195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,2,0
1,0002005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,8,5
2,0002005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,11400,0
3,0002005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,11676,8
4,0002005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,41385,0
...,...,...,...,...,...,...,...,...
1031127,0440400988,There's a Bat in Bunk Five,Paula Danziger,1988,Random House Childrens Pub (Mm),http://images.amazon.com/images/P/0440400988.0...,276463,7
1031128,0525447644,From One to One Hundred,Teri Sloat,1991,Dutton Books,http://images.amazon.com/images/P/0525447644.0...,276579,4
1031129,006008667X,Lily Dale : The True Story of the Town that Ta...,Christine Wicker,2004,HarperSanFrancisco,http://images.amazon.com/images/P/006008667X.0...,276680,0
1031130,0192126040,Republic (World's Classics),Plato,1996,Oxford University Press,http://images.amazon.com/images/P/0192126040.0...,276680,0


Now we will group the books_ratings dataset on the basis of 'User-ID' to find out the number of ratings that each user has done. We will then keep only those users who have done a minimum of 200 ratings in total. This ensures that our recommendations are based on the preferences of more active and reliable users, enhancing the accuracy and trustworthiness of the personalized suggestions.

In [206]:
user_ratings = books_ratings.groupby('User-ID').count()['Book-Rating']

In [207]:
user_ratings = user_ratings.reset_index()

In [208]:
user_ratings = user_ratings[user_ratings['Book-Rating'] > 200]

In [209]:
user_ratings

Unnamed: 0,User-ID,Book-Rating
87,254,300
698,2276,456
862,2766,269
919,2977,227
1033,3363,890
...,...,...
90587,274308,1293
91112,275970,1325
91564,277427,490
91639,277639,265


Now considering only those rows which are rated by a user who has rated atleast 200 books in total.

In [210]:
users = user_ratings['User-ID'].values

In [211]:
books_ratings = books_ratings[books_ratings['User-ID'].isin(users)]

Now we will do groupby on the basis of 'Book-Title' column and then considering only those books which have a total of atleast 50 ratings. This threshold ensures that the books included in the model have enough user feedback to provide meaningful collaborative filtering.

In [212]:
no_of_ratings = books_ratings.groupby('Book-Title').count()['Book-Rating']

In [213]:
no_of_ratings = no_of_ratings.reset_index()

In [214]:
no_of_ratings = no_of_ratings[no_of_ratings['Book-Rating'] > 50]

Now we will create a list of all those movies which have atleast 50 number of ratings and is rated by only those users who have done atleast 200 reviews in total and then we will filter our dataframe 'books_ratings' to include only such books.

In [215]:
books = no_of_ratings['Book-Title'].to_list()

In [216]:
books_ratings = books_ratings[books_ratings['Book-Title'].isin(books)]

In [217]:
books_ratings

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-M,User-ID,Book-Rating
31,0399135782,The Kitchen God's Wife,Amy Tan,1991,Putnam Pub Group,http://images.amazon.com/images/P/0399135782.0...,11676,9
33,0399135782,The Kitchen God's Wife,Amy Tan,1991,Putnam Pub Group,http://images.amazon.com/images/P/0399135782.0...,36836,0
34,0399135782,The Kitchen God's Wife,Amy Tan,1991,Putnam Pub Group,http://images.amazon.com/images/P/0399135782.0...,46398,9
38,0399135782,The Kitchen God's Wife,Amy Tan,1991,Putnam Pub Group,http://images.amazon.com/images/P/0399135782.0...,113270,0
39,0399135782,The Kitchen God's Wife,Amy Tan,1991,Putnam Pub Group,http://images.amazon.com/images/P/0399135782.0...,113519,0
...,...,...,...,...,...,...,...,...
1028410,1878702831,Echoes,Nancy Morse,1992,Meteor Publishing Corporation,http://images.amazon.com/images/P/1878702831.0...,238781,0
1028596,0394429869,I Know Why the Caged Bird Sings,Maya Angelou,1996,Random House,http://images.amazon.com/images/P/0394429869.0...,239594,8
1028598,0449001164,The Promise,CHAIM POTOK,1997,Ballantine Books,http://images.amazon.com/images/P/0449001164.0...,239594,7
1028811,0743527631,The Pillars of the Earth,Ken Follett,2002,Encore,http://images.amazon.com/images/P/0743527631.0...,240144,0


Now we will create a pivot table that creates a matrix where each cell represents the rating a user has given to a particular book.

In [218]:
pt = books_ratings.pivot_table(index = 'Book-Title', columns = 'User-ID', values = 'Book-Rating')

In [219]:
pt

User-ID,254,2276,2766,2977,3363,4017,4385,6251,6323,6543,...,271705,273979,274004,274061,274301,274308,275970,277427,277639,278418
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,,,,,,,,,,...,10.0,,,,,,0.0,,,
1st to Die: A Novel,,,,,,,,,,9.0,...,,,,,,,,,,
2nd Chance,,10.0,,,,,,,,0.0,...,,,,,,0.0,,,0.0,
4 Blondes,,,,,,,,0.0,,,...,,,,,,,,,,
A Bend in the Road,0.0,,7.0,,,,,,,,...,,0.0,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,,,,7.0,,,,,,0.0,...,,9.0,,,,,0.0,,,
You Belong To Me,,,,,,,,,0.0,,...,,,,,,,,,,
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,,,,,0.0,,,0.0,,,...,,,,,,,0.0,,,
Zoya,,,,,,,,,,,...,,0.0,,,,,,,,


Filling NaN values by 0 in the pivot table.

In [220]:
pt = pt.fillna(0)

Now we will calculate cosine similarity which is a measure that calculates the cosine of the angle between two vectors, often used to determine similarity between items in a recommendation system. In our case, it computes the cosine similarity between the rows of the pivot table pt, where each row corresponds to a book and contains ratings given by different users.

In [221]:
from sklearn.metrics.pairwise import cosine_similarity

similarity_scores is a matrix where each element represents the similarity score between two books. This similarity is based on cosine_similarity. Higher scores indicate greater similarity.
So, in our case similarity_scores is a matrix which gives similarity of one book with other 679 books and with itself(which is 1). Therefore the shape of similiarity_scores matrix is (679,679).

In [222]:
similarity_scores = cosine_similarity(pt)

In [223]:
similarity_scores.shape

(679, 679)

Creating a function that recommends 5 books, its author and its cover page image url, based on collaborative filtering approach.

In [226]:
def recommend(book_name):
    books_list = list(pt.index)
    ind = books_list.index(book_name)
    similarity_scores_list = list(similarity_scores[ind])
    d = {}
    l = []
    for i in similarity_scores_list:
        d[i] = similarity_scores_list.index(i)
    d = dict(sorted(d.items(),reverse=True))
    for i in d.values():
        l.append(i)
    l = l[1:6]
    for i in l:
        print('Book name:', pt.index[i])
        print('Author name:', books_ratings[books_ratings['Book-Title'] == pt.index[i]]['Book-Author'].unique()[0])
        print('Book image url:', books_ratings[books_ratings['Book-Title'] == pt.index[i]]['Image-URL-M'].unique()[0])
        print('\n')

In [236]:
recommend('Harry Potter and the Goblet of Fire (Book 4)')

Book name: Harry Potter and the Prisoner of Azkaban (Book 3)
Author name: J. K. Rowling
Book image url: http://images.amazon.com/images/P/0439136350.01.MZZZZZZZ.jpg


Book name: Harry Potter and the Chamber of Secrets (Book 2)
Author name: J. K. Rowling
Book image url: http://images.amazon.com/images/P/0439064872.01.MZZZZZZZ.jpg


Book name: Harry Potter and the Order of the Phoenix (Book 5)
Author name: J. K. Rowling
Book image url: http://images.amazon.com/images/P/043935806X.01.MZZZZZZZ.jpg


Book name: Harry Potter and the Sorcerer's Stone (Book 1)
Author name: J. K. Rowling
Book image url: http://images.amazon.com/images/P/0590353403.01.MZZZZZZZ.jpg


Book name: Harry Potter and the Sorcerer's Stone (Harry Potter (Paperback))
Author name: J. K. Rowling
Book image url: http://images.amazon.com/images/P/059035342X.01.MZZZZZZZ.jpg


