# Book Recommender System

A book recommendation system is a type of recommendation system where we have to recommend similar books to the reader based on his interest. The books recommendation system is used by online websites which provide ebooks like google play books, open library, good Read's, etc.

### About Dataset 
In this we are going to download a dataset from [kaggle](https://www.kaggle.com/), a platform Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges. 

[Book Recommendation Dataset](https://www.kaggle.com/datasets/arashnic/book-recommendation-dataset), consists of three different dataset, 

1. Books.csv :- As the name suggest this dataset comprises of details about book like ISBN number, Book Title, Book Author, Year of publication and publisher.
2. Users.csv :- This dataset contains details about users like their User ID, Location and Age.
3. Ratings.csv :- This particular dataset basically contains the ratings provided by a user to a particular movie.
4. Plus we are also going to do some Web Scraping to get some additional data.

### First we are going to import some required libraries

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import requests as rq
from bs4 import BeautifulSoup as bs

from sklearn.metrics.pairwise import cosine_similarity
from sklearn.neighbors import NearestNeighbors
import pickle

## Extracting Data from Downloaded CSV Files in Pandas DataFrame

> Instructions (delete this cell):
>
> - Load the dataset into a data frame using Pandas
> - Explore the number of rows & columns, ranges of values etc.
> - Handle missing, incorrect and invalid data
> - Perform any additional steps (parsing dates, creating additional columns, merging multiple dataset etc.)

In [2]:
books_df = pd.read_csv("Books.csv")
users_df = pd.read_csv("Users.csv")
ratings_df = pd.read_csv("Ratings.csv")

  books_df = pd.read_csv("Books.csv")


In [3]:
books_df.head(3)

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...


In [4]:
users_df.head(3)

Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",


In [5]:
ratings_df.head(3)

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0


In [6]:
print("Books Dataset : {} \nUsers Dataset : {} \nRatings Dataset : {}".format(books_df.shape, users_df.shape, ratings_df.shape))

Books Dataset : (271360, 8) 
Users Dataset : (278858, 3) 
Ratings Dataset : (1149780, 3)


## Data Cleaning

In [7]:
books_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 271360 entries, 0 to 271359
Data columns (total 8 columns):
 #   Column               Non-Null Count   Dtype 
---  ------               --------------   ----- 
 0   ISBN                 271360 non-null  object
 1   Book-Title           271360 non-null  object
 2   Book-Author          271359 non-null  object
 3   Year-Of-Publication  271360 non-null  object
 4   Publisher            271358 non-null  object
 5   Image-URL-S          271360 non-null  object
 6   Image-URL-M          271360 non-null  object
 7   Image-URL-L          271357 non-null  object
dtypes: object(8)
memory usage: 16.6+ MB


As we can see that there are some columns with some missing values, so let's find out which columns are those and how many values are missing.

In [8]:
print(books_df.isnull().sum())

ISBN                   0
Book-Title             0
Book-Author            1
Year-Of-Publication    0
Publisher              2
Image-URL-S            0
Image-URL-M            0
Image-URL-L            3
dtype: int64


Now from here we will try to find out the missing Author name and Publisher.

In [9]:
books_df[books_df['Book-Author'].isnull()]

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
187689,9627982032,The Credit Suisse Guide to Managing Your Perso...,,1995,Edinburgh Financial Publishing,http://images.amazon.com/images/P/9627982032.0...,http://images.amazon.com/images/P/9627982032.0...,http://images.amazon.com/images/P/9627982032.0...


In [10]:
books_df.describe(include="all")

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
count,271360,271360,271359,271360,271358,271360,271360,271357
unique,271360,242135,102023,202,16807,271044,271044,271041
top,195153448,Selected Poems,Agatha Christie,2002,Harlequin,http://images.amazon.com/images/P/185326119X.0...,http://images.amazon.com/images/P/185326119X.0...,http://images.amazon.com/images/P/225307649X.0...
freq,1,27,632,13903,7535,2,2,2


## Popularity Based Recommender System

In [11]:
ratings_books_df = ratings_df.merge(books_df, on='ISBN')
ratings_books_df.shape

(1031136, 10)

In [12]:
ratings_books_df.head(3)

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...
1,2313,034545104X,5,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...
2,6543,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...


In [13]:
ratings_books_df.groupby(by='Book-Title').count().head(3)

Unnamed: 0_level_0,User-ID,ISBN,Book-Rating,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
"A Light in the Storm: The Civil War Diary of Amelia Martin, Fenwick Island, Delaware, 1861 (Dear America)",4,4,4,4,4,4,4,4,4
Always Have Popsicles,1,1,1,1,1,1,1,1,1
Apple Magic (The Collector's series),1,1,1,1,1,1,1,1,1


In [14]:
num_ratings_df = ratings_books_df.groupby(by='Book-Title').count()[['User-ID']]
num_ratings_df.head(3)

Unnamed: 0_level_0,User-ID
Book-Title,Unnamed: 1_level_1
"A Light in the Storm: The Civil War Diary of Amelia Martin, Fenwick Island, Delaware, 1861 (Dear America)",4
Always Have Popsicles,1
Apple Magic (The Collector's series),1


In [15]:
num_ratings_df.rename(columns={"User-ID" : "Count-Ratings"}, inplace=True)
num_ratings_df.head(3)

Unnamed: 0_level_0,Count-Ratings
Book-Title,Unnamed: 1_level_1
"A Light in the Storm: The Civil War Diary of Amelia Martin, Fenwick Island, Delaware, 1861 (Dear America)",4
Always Have Popsicles,1
Apple Magic (The Collector's series),1


In [16]:
avg_ratings_df = ratings_books_df.groupby(by="Book-Title").mean()[["Book-Rating"]].round(2)
avg_ratings_df.rename(columns= {"Book-Rating" : "Average-Rating"}, inplace=True)
avg_ratings_df.head(3)

Unnamed: 0_level_0,Average-Rating
Book-Title,Unnamed: 1_level_1
"A Light in the Storm: The Civil War Diary of Amelia Martin, Fenwick Island, Delaware, 1861 (Dear America)",2.25
Always Have Popsicles,0.0
Apple Magic (The Collector's series),0.0


In [17]:
avg_num_df = avg_ratings_df.merge(num_ratings_df, on="Book-Title")
avg_num_df.head(3)

Unnamed: 0_level_0,Average-Rating,Count-Ratings
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1
"A Light in the Storm: The Civil War Diary of Amelia Martin, Fenwick Island, Delaware, 1861 (Dear America)",2.25,4
Always Have Popsicles,0.0,1
Apple Magic (The Collector's series),0.0,1


In [18]:
avg_num_df[avg_num_df["Count-Ratings"] >= 250].head(3)

Unnamed: 0_level_0,Average-Rating,Count-Ratings
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1
1984,4.45,284
1st to Die: A Novel,3.58,509
2nd Chance,3.27,356


In [19]:
avg_num_250_df = avg_num_df[avg_num_df["Count-Ratings"] >= 250]
avg_num_250_df.sort_values("Average-Rating", ascending=False, inplace=True)
avg_num_250_df.head(3)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  avg_num_250_df.sort_values("Average-Rating", ascending=False, inplace=True)


Unnamed: 0_level_0,Average-Rating,Count-Ratings
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1
Harry Potter and the Prisoner of Azkaban (Book 3),5.85,428
Harry Potter and the Goblet of Fire (Book 4),5.82,387
Harry Potter and the Sorcerer's Stone (Book 1),5.74,278


In [20]:
avg_book_df = avg_num_250_df.merge(books_df, on='Book-Title').reset_index(drop=True)
avg_book_df.head(3)

Unnamed: 0,Book-Title,Average-Rating,Count-Ratings,ISBN,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,Harry Potter and the Prisoner of Azkaban (Book 3),5.85,428,439136350,J. K. Rowling,1999,Scholastic,http://images.amazon.com/images/P/0439136350.0...,http://images.amazon.com/images/P/0439136350.0...,http://images.amazon.com/images/P/0439136350.0...
1,Harry Potter and the Prisoner of Azkaban (Book 3),5.85,428,439136369,J. K. Rowling,2001,Scholastic,http://images.amazon.com/images/P/0439136369.0...,http://images.amazon.com/images/P/0439136369.0...,http://images.amazon.com/images/P/0439136369.0...
2,Harry Potter and the Prisoner of Azkaban (Book 3),5.85,428,786222743,J. K. Rowling,2000,Thorndike Press,http://images.amazon.com/images/P/0786222743.0...,http://images.amazon.com/images/P/0786222743.0...,http://images.amazon.com/images/P/0786222743.0...


In [21]:
popular_df = avg_book_df.drop_duplicates("Book-Title").reset_index(drop=True)
popular_df.head(3)

# popular_df.to_csv("Temp1.csv")

Unnamed: 0,Book-Title,Average-Rating,Count-Ratings,ISBN,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,Harry Potter and the Prisoner of Azkaban (Book 3),5.85,428,439136350,J. K. Rowling,1999,Scholastic,http://images.amazon.com/images/P/0439136350.0...,http://images.amazon.com/images/P/0439136350.0...,http://images.amazon.com/images/P/0439136350.0...
1,Harry Potter and the Goblet of Fire (Book 4),5.82,387,439139597,J. K. Rowling,2000,Scholastic,http://images.amazon.com/images/P/0439139597.0...,http://images.amazon.com/images/P/0439139597.0...,http://images.amazon.com/images/P/0439139597.0...
2,Harry Potter and the Sorcerer's Stone (Book 1),5.74,278,590353403,J. K. Rowling,1998,Scholastic,http://images.amazon.com/images/P/0590353403.0...,http://images.amazon.com/images/P/0590353403.0...,http://images.amazon.com/images/P/0590353403.0...


In [22]:
bhkk

NameError: name 'bhkk' is not defined

## Collaborative filtering based recommender System

In [23]:
ratings_books_df.head(3)

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...
1,2313,034545104X,5,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...
2,6543,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...


In [24]:
temp = ratings_books_df.groupby("User-ID").count()[["ISBN"]] >= 100
temp.head(3)

Unnamed: 0_level_0,ISBN
User-ID,Unnamed: 1_level_1
2,False
8,False
9,False


In [25]:
desired_users_df = temp[temp.ISBN]
desired_users_df.head(3)

Unnamed: 0_level_0,ISBN
User-ID,Unnamed: 1_level_1
254,True
507,True
882,True


In [26]:
filtered_users_rating_df = ratings_books_df[ratings_books_df["User-ID"].isin(desired_users_df.index)]
filtered_users_rating_df.head(3)

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
2,6543,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...
4,10314,034545104X,9,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...
5,23768,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...


In [27]:
len(filtered_users_rating_df['Book-Title'].unique())

178569

In [28]:
temp = filtered_users_rating_df["Book-Title"].value_counts() >= 50
temp

Wild Animus                                           True
The Lovely Bones: A Novel                             True
Bridget Jones's Diary                                 True
The Da Vinci Code                                     True
The Nanny Diaries: A Novel                            True
                                                     ...  
Master of the Senate: The Years of Lyndon Johnson    False
Fresh Faith                                          False
Living in the Open: [Poems]                          False
Robin Hood (Wordsworth Collection)                   False
City of Masks : A Cree Black Novel                   False
Name: Book-Title, Length: 178569, dtype: bool

In [29]:
desired_books_df = temp[temp]
desired_books_df

Wild Animus                                                 True
The Lovely Bones: A Novel                                   True
Bridget Jones's Diary                                       True
The Da Vinci Code                                           True
The Nanny Diaries: A Novel                                  True
                                                            ... 
The Green Mile: Coffey on the Mile (Green Mile Series)      True
Rush Limbaugh Is a Big Fat Idiot: And Other Observations    True
Clear and Present Danger                                    True
Brazen Virtue                                               True
Roll of Thunder, Hear My Cry                                True
Name: Book-Title, Length: 1072, dtype: bool

In [30]:
final_rating_df = filtered_users_rating_df[filtered_users_rating_df["Book-Title"].isin(desired_books_df.index)]
final_rating_df.head(3)

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
63,278418,446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...
65,3363,446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...
66,7158,446520802,10,The Notebook,Nicholas Sparks,1996,Warner Books,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...


In [31]:
pt_item_based = final_rating_df.pivot_table(index="Book-Title", columns="User-ID", values="Book-Rating")
pt_item_based

# pt.index.to_frame().to_csv("Temp2.csv")

User-ID,254,507,882,1424,1435,1733,1903,2033,2110,2276,...,275020,275970,276463,276680,277427,277478,277639,278137,278188,278418
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,,,,,,,,,,...,,0.0,,,,,,,,
1st to Die: A Novel,,0.0,,,,,,,,,...,,,,,,,,,,
2010: Odyssey Two,,,,,,,,,,0.0,...,,,,,,,,,,
204 Rosewood Lane,,,,,,,,,,,...,,,,,,,,,0.0,
24 Hours,,,,,,,,,,,...,,,,,10.0,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,,,,7.0,,,,,,,...,,0.0,,,,,,,,
You Belong To Me,,,,,,,,,,,...,,,,,,,,,,
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,,,,,,,,,,,...,,0.0,,,,,,,,
Zoya,,,,,,,,,,,...,,,,,,,,,,


In [32]:
pt_item_based.fillna(0, inplace=True)
pt_item_based

User-ID,254,507,882,1424,1435,1733,1903,2033,2110,2276,...,275020,275970,276463,276680,277427,277478,277639,278137,278188,278418
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1st to Die: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2010: Odyssey Two,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
204 Rosewood Lane,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
24 Hours,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,10.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
You Belong To Me,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zoya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [33]:
cosine_similarity(pt_item_based)

array([[1.        , 0.06550754, 0.09317747, ..., 0.07547772, 0.05810769,
        0.03029366],
       [0.06550754, 1.        , 0.02247675, ..., 0.04739871, 0.13548709,
        0.12388301],
       [0.09317747, 0.02247675, 1.        , ..., 0.09310325, 0.        ,
        0.        ],
       ...,
       [0.07547772, 0.04739871, 0.09310325, ..., 1.        , 0.05583662,
        0.0137213 ],
       [0.05810769, 0.13548709, 0.        , ..., 0.05583662, 1.        ,
        0.09410287],
       [0.03029366, 0.12388301, 0.        , ..., 0.0137213 , 0.09410287,
        1.        ]])

In [34]:
cosine_similarity(pt_item_based).shape

(1072, 1072)

In [35]:
similarity_scores_item_based = cosine_similarity(pt_item_based)

In [36]:
def item_based_recommend(book_title) :
    
    idx = 0
    for i,x in enumerate(pt_item_based.index):
        if x.lower() == book_title.lower():
            idx = i
            break
    
    similar_items = sorted(list(enumerate(similarity_scores_item_based[idx])), key = lambda x:x[1], reverse = True)[1:6]
    
    data = []
    for i in similar_items : 
        item = []
        temp_df = books_df[books_df["Book-Title"] == pt_item_based.index[i[0]]]
        
        item.extend(list(temp_df.drop_duplicates('Book-Title')['Book-Title'].values))
        item.extend(list(temp_df.drop_duplicates('Book-Title')['Book-Author'].values))
        item.extend(list(temp_df.drop_duplicates('Book-Title')['Image-URL-M'].values))
        
        data.append(item)
    
    return data

In [37]:
item_based_recommend("The FIrm")

[['The Pelican Brief',
  'John Grisham',
  'http://images.amazon.com/images/P/0440214041.01.MZZZZZZZ.jpg'],
 ['The Chamber',
  'John Grisham',
  'http://images.amazon.com/images/P/0385424728.01.MZZZZZZZ.jpg'],
 ['The Rainmaker',
  'JOHN GRISHAM',
  'http://images.amazon.com/images/P/044022165X.01.MZZZZZZZ.jpg'],
 ['The Client',
  'John Grisham',
  'http://images.amazon.com/images/P/038542471X.01.MZZZZZZZ.jpg'],
 ['A Time to Kill',
  'JOHN GRISHAM',
  'http://images.amazon.com/images/P/0440211727.01.MZZZZZZZ.jpg']]

In [38]:
# pickle.dump(popular_df, open('popular.pkl', 'wb'))
pickle.dump(pt_item_based, open('pt_item_based.pkl', 'wb'))
pickle.dump(similarity_scores_item_based, open('similarity_scores_item_based.pkl', 'wb'))
pickle.dump(books_df, open('books_df.pkl', 'wb'))

In [39]:
book_names = []
book_codes = []

for i, x in enumerate(final_rating_df['Book-Title'].unique()):
    book_codes.append(i)
    book_names.append(x)

temp_dict = {"Book-Title" : book_names, "Book-Codes" : book_codes}
book_code_df = pd.DataFrame(temp_dict)

book_code_df.head(10)

Unnamed: 0,Book-Title,Book-Codes
0,The Notebook,0
1,A Painted House,1
2,Lightning,2
3,Manhattan Hunt Club,3
4,Dark Paradise,4
5,Night Sins,5
6,How Stella Got Her Groove Back,6
7,Waiting to Exhale,7
8,The Girl Who Loved Tom Gordon : A Novel,8
9,The Pillars of the Earth,9


In [40]:
final_code_rating_df = book_code_df.merge(final_rating_df, on='Book-Title').reset_index(drop=True)
final_code_rating_df = final_code_rating_df[['User-ID', 'Book-Codes', 'Book-Rating']]
final_code_rating_df

Unnamed: 0,User-ID,Book-Codes,Book-Rating
0,278418,0,0
1,3363,0,0
2,7158,0,10
3,8253,0,10
4,11676,0,10
...,...,...,...
97867,194669,1071,0
97868,229011,1071,8
97869,245645,1071,10
97870,259629,1071,0


In [41]:
final_rating_df.shape

(97872, 10)

In [42]:
pt_user_based = pd.pivot_table(final_code_rating_df, index='User-ID', columns='Book-Codes', values='Book-Rating')
pt_user_based

Book-Codes,0,1,2,3,4,5,6,7,8,9,...,1062,1063,1064,1065,1066,1067,1068,1069,1070,1071
User-ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
254,0.0,,,,,,,,,,...,,,,,,,,,,
507,,,,,,,,,,,...,,,,,,,,,,
882,0.0,,,,,,0.0,,,,...,,,,,,,,,,
1424,,,,,,,,,,,...,,,,,,,,,,
1435,,0.0,,,,,,0.0,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
277478,,,,,,,,,,,...,,,,,,,,,,
277639,,,,,,,,,,,...,,,,,,,,,,
278137,,,,,,,,,,,...,,,,,,,,,,
278188,,,,,,,,,,,...,,,,,,,,,,


In [43]:
pt_user_based.fillna(0, inplace=True)
pt_user_based = pt_user_based.astype(np.int32)

pt_user_based

Book-Codes,0,1,2,3,4,5,6,7,8,9,...,1062,1063,1064,1065,1066,1067,1068,1069,1070,1071
User-ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
254,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
507,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
882,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1424,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1435,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
277478,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
277639,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
278137,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
278188,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [44]:
def user_based_recommend(user_id, metric, k):
    
    similarities=[]
    indices=[]
    
    model_knn = NearestNeighbors(metric = metric, algorithm = 'brute') 
    model_knn.fit(pt_user_based)
    
    loc = pt_user_based.index.get_loc(user_id)
    distances, indices = model_knn.kneighbors(pt_user_based.iloc[loc, :].values.reshape(1, -1), n_neighbors = k+1)
    
    similarities = 1-distances.flatten()
            
    return similarities,indices

In [45]:
s, i = user_based_recommend(278843, 'cosine', 10)
i

KeyError: 278843

In [None]:
for id in i[0]:
    print(id, type(id))
    if id in pt_user_based.iloc[id]
        print(id)

In [None]:
pt_user_based.iloc[i[0]]

In [None]:
type(final_code_rating_df['User-ID'].unique())

In [None]:
type(list(final_code_rating_df['User-ID'].values))

In [None]:
li = [1,2,3,4,5,6]

if 11 in li:
    print("hiii")