# Book Recommendation System

## Approach- Item based Collaborative Filtering

### Recommender Systems

Recommender systems are used in a lot of commercial applications for recommending products and services to a user based on a user's surfing pattern and purchase history on the business' website. Businesses use recommender systems to recommend products and services that the customer is most likely to purchase and generate more sales and maximize profits. It has been observed in a lot of businesses like Amazon, Netflix, Flipkart and other Ecommerce websites that a big percentage of revenue they generate is from the sales of the products and services recommended to the customers by the designed recommender system. Hence a good recommendation system is very important for any business selling products/services online.

**There are different types of Recommendation Systems:**

 - **Content Based Recommendation System**

     Content-based filtering uses item features to recommend other items similar to items the user likes, based on their 
     purchase history and ratings given to items. It gives weight to each feature based on the items user has rated and these weights are then multiplied with Item features of every item to get the likeable items for the user rankwise.

- **Collaborative Filtering based Recommendation System**


- User based Collaborative Filtering-

    In this method the Users similarity with all other users is determined using some similarity measure like Euclidean
    distance/Pearson Correlation/Cosine Similarity based on the common items they have rated. Then rating for every
    product/service user has not purchased is determined by taking weighted average of ratings by other users and taking
    similarity scores as weights. Top rated items are recommended to the user. 
    
    
- Item based Collaborative Filtering-

    In this method similar items build neighbourhoods on the behavior of users. For example a particular user who likes a set
    of items then all those items are considered similar. In this way similarity of each item with every other item on the
    business' website is determined based on Users' ratings pattern.  This method is not based on the features/contents of the
    items. Similarity scores of items with other items is calculated using some similarity score measure like Euclidean
    distance/Pearson Correlation/Cosine Similarity.

- **Hybrid Recommendation System**

    Both collaborative and content based recommendation systems have their advantages and disadvantages. To overcome the
    disadvantages of both the systems most of the businesses use Hybrid recommendation system on their website or app. It is a
    combination of both the methods with each method having certain weightage which is determined by evaluating recommendations
    by keeping different weights to both methods. Hybrid recommendation systems help businesses make improved and more accurate
    recommendations.

### Brief explanation for this Book Recommendation System

In this Book recommendation system Item based Collaborative Filtering method is used for recommending similar books to the book being searched to a user. In this method similarity of each book with every other book is determined by cosine similarity measure  based on the Users' ratings pattern on all the books. Each book is considered a vector with ratings (interest of every user) as its coordinates and its cosine similarity with other books is calculated. These similarity scores of each book with every other book are stored in a matrix and when a book is being searched a select number of books with highest similarity scores to searched book are recommended rankwise. 

In [1]:
import numpy as np
import pandas as pd

#### Importing required datasets

In [2]:
books = pd.read_csv('books.csv')
users = pd.read_csv('users.csv')
ratings = pd.read_csv('ratings.csv')

  books = pd.read_csv('books.csv')


In [3]:
books['Image-URL-M'][1]

'http://images.amazon.com/images/P/0002005018.01.MZZZZZZZ.jpg'

Viewing books dataframe

In [4]:
books.head()

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...


Viewing users dataframe

In [5]:
users.head()

Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


Viewing ratings dataframe

In [6]:
ratings.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


Checking shape of all dataframes

In [7]:
print(books.shape)
print(ratings.shape)
print(users.shape)

(271360, 8)
(1149780, 3)
(278858, 3)


Checking for Null values in all the dataframes

In [8]:
books.isnull().sum()

ISBN                   0
Book-Title             0
Book-Author            1
Year-Of-Publication    0
Publisher              2
Image-URL-S            0
Image-URL-M            0
Image-URL-L            3
dtype: int64

Theses many null values are negligible relative to the total number of books. And also as we are using Collaborative filtering having Null values in these columns does'nt bother us.

In [9]:
users.isnull().sum()

User-ID          0
Location         0
Age         110762
dtype: int64

We dont need Age column for our recommendation system.

In [10]:
ratings.isnull().sum()

User-ID        0
ISBN           0
Book-Rating    0
dtype: int64

This dataframe was most important for making Collaborative Filtering based recommendation system and fortunately there are no null values in this dataframe.

Checking for duplicate entries in all dataframes

In [11]:
books.duplicated().sum()

0

In [12]:
ratings.duplicated().sum()

0

In [13]:
users.duplicated().sum()

0

## Exploratory Data Analysis

In [14]:
books['Book-Title'].nunique()

242135

In [15]:
books.shape

(271360, 8)

Number of books with unique Book Titles are 242135 whereas we have 271360 entries in our books dataset.

Let's explore further.

In [16]:
Book_Count_df=pd.DataFrame(books['Book-Title'].value_counts())

In [17]:
Book_Count_df.reset_index(inplace=True)

In [18]:
Book_Count_df.rename(columns={'index':'Book-Title','Book-Title':'Count'})

Unnamed: 0,Book-Title,Count
0,Selected Poems,27
1,Little Women,24
2,Wuthering Heights,21
3,The Secret Garden,20
4,Dracula,20
...,...,...
242130,What Every Kid Should Know,1
242131,The Seventh Enemy (A Brady Coyne Mystery),1
242132,A Brace of Skeet,1
242133,"The Yellow Admiral (O'Brian, Patrick, Aubrey/M...",1


We observe that there are multiple rows in books dataframe for the same book.  
Hence we would drop duplicate entries using the Book-Title column and only select rows with unique Book-Titles.

In [19]:
User_Rating_Count=pd.DataFrame(ratings['User-ID'].value_counts())

In [20]:
User_Rating_Count.reset_index(inplace=True)

In [21]:
User_Rating_Count.rename(columns={'index':'User-ID','User-ID':'Count'},inplace=True)
User_Rating_Count

Unnamed: 0,User-ID,Count
0,11676,13602
1,198711,7550
2,153662,6109
3,98391,5891
4,35859,5850
...,...,...
105278,116180,1
105279,116166,1
105280,116154,1
105281,116137,1


We observe that a small group of users have given ratings to a large number of books.  
Selecting set of users who have rated most number of books will help in developing an efficient recommendation system.

## Popularity Based Recommender System

Merging books and ratings dataframes on 'ISBN' column

In [22]:
ratings_with_name = ratings.merge(books,on='ISBN')

Making a dataframe for storing count of ratings of each book.

In [23]:
num_rating_df = ratings_with_name.groupby('Book-Title').count()['Book-Rating'].reset_index()
num_rating_df.rename(columns={'Book-Rating':'num_ratings'},inplace=True)
num_rating_df

Unnamed: 0,Book-Title,num_ratings
0,A Light in the Storm: The Civil War Diary of ...,4
1,Always Have Popsicles,1
2,Apple Magic (The Collector's series),1
3,"Ask Lily (Young Women of Faith: Lily Series, ...",1
4,Beyond IBM: Leadership Marketing and Finance ...,1
...,...,...
241066,Ã?Â?lpiraten.,2
241067,Ã?Â?rger mit Produkt X. Roman.,4
241068,Ã?Â?sterlich leben.,1
241069,Ã?Â?stlich der Berge.,3


Dataframe for storing average rating of each book

In [24]:
avg_rating_df = ratings_with_name.groupby('Book-Title').mean()['Book-Rating'].reset_index()
avg_rating_df.rename(columns={'Book-Rating':'avg_rating'},inplace=True)
avg_rating_df

Unnamed: 0,Book-Title,avg_rating
0,A Light in the Storm: The Civil War Diary of ...,2.250000
1,Always Have Popsicles,0.000000
2,Apple Magic (The Collector's series),0.000000
3,"Ask Lily (Young Women of Faith: Lily Series, ...",8.000000
4,Beyond IBM: Leadership Marketing and Finance ...,0.000000
...,...,...
241066,Ã?Â?lpiraten.,0.000000
241067,Ã?Â?rger mit Produkt X. Roman.,5.250000
241068,Ã?Â?sterlich leben.,7.000000
241069,Ã?Â?stlich der Berge.,2.666667


In [25]:
popular_df = num_rating_df.merge(avg_rating_df,on='Book-Title')
popular_df

Unnamed: 0,Book-Title,num_ratings,avg_rating
0,A Light in the Storm: The Civil War Diary of ...,4,2.250000
1,Always Have Popsicles,1,0.000000
2,Apple Magic (The Collector's series),1,0.000000
3,"Ask Lily (Young Women of Faith: Lily Series, ...",1,8.000000
4,Beyond IBM: Leadership Marketing and Finance ...,1,0.000000
...,...,...,...
241066,Ã?Â?lpiraten.,2,0.000000
241067,Ã?Â?rger mit Produkt X. Roman.,4,5.250000
241068,Ã?Â?sterlich leben.,1,7.000000
241069,Ã?Â?stlich der Berge.,3,2.666667


To recommend popular books we select only those books which have got more than 250 votes then rank the books according to average rating.

In [26]:
popular_df = popular_df[popular_df['num_ratings']>=250].sort_values('avg_rating',ascending=False).head(50)

In [27]:
popular_df = popular_df.merge(books,on='Book-Title').drop_duplicates('Book-Title')[['Book-Title','Book-Author','Image-URL-M','num_ratings','avg_rating']]

In [28]:
popular_df['Image-URL-M'][0]

'http://images.amazon.com/images/P/0439136350.01.MZZZZZZZ.jpg'

In [29]:
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_colwidth', 1000)

### Recommending most popular books

In [30]:
popular_df

Unnamed: 0,Book-Title,Book-Author,Image-URL-M,num_ratings,avg_rating
0,Harry Potter and the Prisoner of Azkaban (Book 3),J. K. Rowling,http://images.amazon.com/images/P/0439136350.01.MZZZZZZZ.jpg,428,5.852804
3,Harry Potter and the Goblet of Fire (Book 4),J. K. Rowling,http://images.amazon.com/images/P/0439139597.01.MZZZZZZZ.jpg,387,5.824289
5,Harry Potter and the Sorcerer's Stone (Book 1),J. K. Rowling,http://images.amazon.com/images/P/0590353403.01.MZZZZZZZ.jpg,278,5.73741
9,Harry Potter and the Order of the Phoenix (Book 5),J. K. Rowling,http://images.amazon.com/images/P/043935806X.01.MZZZZZZZ.jpg,347,5.501441
13,Harry Potter and the Chamber of Secrets (Book 2),J. K. Rowling,http://images.amazon.com/images/P/0439064872.01.MZZZZZZZ.jpg,556,5.183453
16,The Hobbit : The Enchanting Prelude to The Lord of the Rings,J.R.R. TOLKIEN,http://images.amazon.com/images/P/0345339681.01.MZZZZZZZ.jpg,281,5.007117
17,"The Fellowship of the Ring (The Lord of the Rings, Part 1)",J.R.R. TOLKIEN,http://images.amazon.com/images/P/0345339703.01.MZZZZZZZ.jpg,368,4.94837
26,Harry Potter and the Sorcerer's Stone (Harry Potter (Paperback)),J. K. Rowling,http://images.amazon.com/images/P/059035342X.01.MZZZZZZZ.jpg,575,4.895652
28,"The Two Towers (The Lord of the Rings, Part 2)",J.R.R. TOLKIEN,http://images.amazon.com/images/P/0345339711.01.MZZZZZZZ.jpg,260,4.880769
39,To Kill a Mockingbird,Harper Lee,http://images.amazon.com/images/P/0446310786.01.MZZZZZZZ.jpg,510,4.7


## Collaborative Filtering Based Recommender System

Selecting Users who have rated more than 200 books

In [31]:
x = ratings_with_name.groupby('User-ID').count()['Book-Rating'] > 200
x

User-ID
2         False
8         False
9         False
10        False
12        False
          ...  
278846    False
278849    False
278851    False
278852    False
278854    False
Name: Book-Rating, Length: 92106, dtype: bool

In [32]:
wellread_users = x[x].index

In [33]:
wellread_users.shape

(811,)

There are only 811 users who have rated more than 200 books.

Selecting entries from ratings_with_name dataframe only rated by Users in wellread_users

In [34]:
filtered_rating = ratings_with_name[ratings_with_name['User-ID'].isin(wellread_users)]
filtered_rating

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
2,6543,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/034545104X.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/034545104X.01.LZZZZZZZ.jpg
5,23768,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/034545104X.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/034545104X.01.LZZZZZZZ.jpg
7,28523,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/034545104X.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/034545104X.01.LZZZZZZZ.jpg
15,77940,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/034545104X.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/034545104X.01.LZZZZZZZ.jpg
16,81977,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/034545104X.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/034545104X.01.LZZZZZZZ.jpg
...,...,...,...,...,...,...,...,...,...,...
1030883,275970,1880837927,0,The Theology of the Hammer,Millard Fuller,1994,Smyth &amp; Helwys Publishing,http://images.amazon.com/images/P/1880837927.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/1880837927.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/1880837927.01.LZZZZZZZ.jpg
1030884,275970,188717897X,0,"The Ordeal of Integration: Progress and Resentment in America's \Racial\"" Crisis (Ordeal of Integration)""",Orlando Patterson,1998,Civitas Book Publisher,http://images.amazon.com/images/P/188717897X.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/188717897X.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/188717897X.01.LZZZZZZZ.jpg
1030885,275970,1888889047,0,Pushcart's Complete Rotten Reviews &amp; Rejections,Bill Henderson,1998,Pushcart Press,http://images.amazon.com/images/P/1888889047.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/1888889047.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/1888889047.01.LZZZZZZZ.jpg
1030886,275970,1931868123,0,There's a Porcupine in My Outhouse: Misadventures of a Mountain Man Wannabe (Capital Discoveries Book),Mike Tougias,2002,Capital Books (VA),http://images.amazon.com/images/P/1931868123.01.THUMBZZZ.jpg,http://images.amazon.com/images/P/1931868123.01.MZZZZZZZ.jpg,http://images.amazon.com/images/P/1931868123.01.LZZZZZZZ.jpg


We can see from this dataframe that out of around 11 lakh ratings around 5 lakh ratings have been given by just 811 users out of 278000 users.

Selecting only those books which have more than 50 ratings

In [35]:
y = filtered_rating.groupby('Book-Title').count()['Book-Rating']>=40
y

Book-Title
 A Light in the Storm: The Civil War Diary of Amelia Martin, Fenwick Island, Delaware, 1861 (Dear America)    False
 Always Have Popsicles                                                                                        False
 Apple Magic (The Collector's series)                                                                         False
 Beyond IBM: Leadership Marketing and Finance for the 1990s                                                   False
 Clifford Visita El Hospital (Clifford El Gran Perro Colorado)                                                False
                                                                                                              ...  
Ã?Â?ber das Fernsehen.                                                                                        False
Ã?Â?ber die Pflicht zum Ungehorsam gegen den Staat.                                                           False
Ã?Â?lpiraten.                                                

In [36]:
famous_books = y[y].index
famous_books

Index(['1984', '1st to Die: A Novel', '2010: Odyssey Two', '204 Rosewood Lane',
       '24 Hours', '2nd Chance', '4 Blondes', '84 Charing Cross Road',
       'A 2nd Helping of Chicken Soup for the Soul (Chicken Soup for the Soul Series (Paper))',
       'A Beautiful Mind: The Life of Mathematical Genius and Nobel Laureate John Nash',
       ...
       'Without Remorse', 'Wizard and Glass (The Dark Tower, Book 4)',
       'Women Who Run with the Wolves',
       'Word Freak: Heartbreak, Triumph, Genius, and Obsession in the World of Competitive Scrabble Players',
       'Wuthering Heights', 'Year of Wonders', 'You Belong To Me',
       'Zen and the Art of Motorcycle Maintenance: An Inquiry into Values',
       'Zoya', '\O\" Is for Outlaw"'],
      dtype='object', name='Book-Title', length=1056)

In [37]:
famous_books.shape

(1056,)

In [38]:
final_ratings = filtered_rating[filtered_rating['Book-Title'].isin(famous_books)]

Making pivot table of User ratings on Books taking Index as Book-Titles and Columns as User-ID

In [39]:
pt = final_ratings.pivot_table(index='Book-Title',columns='User-ID',values='Book-Rating')
pt

User-ID,254,2276,2766,2977,3363,4017,4385,6251,6323,6543,...,271705,273979,274004,274061,274301,274308,275970,277427,277639,278418
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,,,,,,,,,,...,10.0,,,,,,0.0,,,
1st to Die: A Novel,,,,,,,,,,9.0,...,,,,,,,,,,
2010: Odyssey Two,,0.0,,,,,,,,,...,,,,,,,,,,
204 Rosewood Lane,,,,,,,,,,,...,,,,,,,,,,
24 Hours,,,,,,,,,2.0,,...,,0.0,,,,,,10.0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,,,,7.0,,,,,,0.0,...,,9.0,,,,,0.0,,,
You Belong To Me,,,,,,,,,0.0,,...,,,,,,,,,,
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,,,,,0.0,,,0.0,,,...,,,,,,,0.0,,,
Zoya,,,,,,,,,,,...,,0.0,,,,,,,,


Filling Null values with zero

In [40]:
pt.fillna(0,inplace=True)

In [41]:
pt

User-ID,254,2276,2766,2977,3363,4017,4385,6251,6323,6543,...,271705,273979,274004,274061,274301,274308,275970,277427,277639,278418
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1st to Die: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2010: Odyssey Two,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
204 Rosewood Lane,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
24 Hours,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
You Belong To Me,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zoya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Making a list containing lists of similarity scores of each book with every other book 

In [42]:
from sklearn.metrics.pairwise import cosine_similarity

In [43]:
similarity_scores = cosine_similarity(pt)

In [44]:
similarity_scores

array([[1.        , 0.10255025, 0.05065443, ..., 0.12110367, 0.07347567,
        0.04316046],
       [0.10255025, 1.        , 0.03349836, ..., 0.07446129, 0.16773875,
        0.14263397],
       [0.05065443, 0.03349836, 1.        , ..., 0.14221586, 0.        ,
        0.        ],
       ...,
       [0.12110367, 0.07446129, 0.14221586, ..., 1.        , 0.07085128,
        0.0196177 ],
       [0.07347567, 0.16773875, 0.        , ..., 0.07085128, 1.        ,
        0.10602962],
       [0.04316046, 0.14263397, 0.        , ..., 0.0196177 , 0.10602962,
        1.        ]])

In [45]:
similarity_scores.shape

(1056, 1056)

Function for recommending similar books given input book

In [46]:
def recommend(book_name):
    # index fetch
    index = np.where(pt.index==book_name)[0][0]
    similar_items = sorted(list(enumerate(similarity_scores[index])),key=lambda x:x[1],reverse=True)[1:6]
    
    data = []
    for i in similar_items:
        item = []
        temp_df = books[books['Book-Title'] == pt.index[i[0]]]
        item.extend(list(temp_df.drop_duplicates('Book-Title')['Book-Title'].values))
        item.extend(list(temp_df.drop_duplicates('Book-Title')['Book-Author'].values))
        item.extend(list(temp_df.drop_duplicates('Book-Title')['Image-URL-M'].values))
        
        data.append(item)
    
    return data

In [47]:
recommend("The Hobbit : The Enchanting Prelude to The Lord of the Rings")

[['The Two Towers (The Lord of the Rings, Part 2)',
  'J.R.R. TOLKIEN',
  'http://images.amazon.com/images/P/0345339711.01.MZZZZZZZ.jpg'],
 ['The Fellowship of the Ring (The Lord of the Rings, Part 1)',
  'J.R.R. TOLKIEN',
  'http://images.amazon.com/images/P/0345339703.01.MZZZZZZZ.jpg'],
 ['Where the Red Fern Grows',
  'Wilson Rawls',
  'http://images.amazon.com/images/P/0553274295.01.MZZZZZZZ.jpg'],
 ['One for the Money (A Stephanie Plum Novel)',
  'Janet Evanovich',
  'http://images.amazon.com/images/P/0312990456.01.MZZZZZZZ.jpg'],
 ['The Return of the King (The Lord of the Rings, Part 3)',
  'J.R.R. TOLKIEN',
  'http://images.amazon.com/images/P/0345339738.01.MZZZZZZZ.jpg']]