<a href="https://colab.research.google.com/github/NinaMwangi/PiSwap_Book_Recommender/blob/master/PiSwap2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PiSwap

Pi Swap is an innovative ML-driven model that offers recommendations to the users based on their past browsing history, preferences and ratings they have given in the past. The model aims to alleviate the cost of education, promote environmental sustainability, and promote the idea of cyclical economies by reducing over-consumerism and encouraging reusing. The methodology used in this model is the use of recommender systems. This model as a proposed solution can in the future be incorporated into a web application and a mobile application to create a platform that serves as an online marketplace where parents and students can buy and sell second-hand books.

The Dataset

Content

The Book-Crossing dataset comprises 3 files.

1. Users:
Contains the users. Note that user IDs (User-ID) have been anonymized and map to integers. Demographic data is provided (Location, Age) if available. Otherwise, these fields contain NULL-values.
2. Books:
Books are identified by their respective ISBN. Invalid ISBNs have already been removed from the dataset. Moreover, some content-based information is given (Book-Title, Book-Author, Year-Of-Publication, Publisher), obtained from Amazon Web Services. Note that in case of several authors, only the first is provided. URLs linking to cover images are also given, appearing in three different flavours (Image-URL-S, Image-URL-M, Image-URL-L), i.e., small, medium, large. These URLs point to the Amazon web site.
3. Ratings:
Contains the book rating information. Ratings (Book-Rating) are either explicit, expressed on a scale from 1-10 (higher values denoting higher appreciation), or implicit, expressed by 0.

The dataset is from Kaggle.
https://www.kaggle.com/datasets/arashnic/book-recommendation-dataset

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Data Preprocessing

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import NearestNeighbors
import pickle

In [None]:
books = pd.read_csv('/content/drive/MyDrive/PiSwap Data/Books.csv', low_memory=False)
users = pd.read_csv('/content/drive/MyDrive/PiSwap Data/Users.csv')
ratings = pd.read_csv('/content/drive/MyDrive/PiSwap Data/Ratings.csv')

In [None]:
books.head()

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...


In [None]:
books.shape

(271360, 8)

In [None]:
# Dropping the other URLs and retaining one because we only require one.

books = books.drop(['Image-URL-S', 'Image-URL-M'], axis=1)

In [None]:
# renaming the column names
books.rename(columns={'Book-Title':'title',
                      'Book-Author':'author',
                      'Year-Of-Publication':'year',
                      'Publisher':'publisher',
                      'Image-URL-L':'image_url'}, inplace=True)
books.head()

Unnamed: 0,ISBN,title,author,year,publisher,image_url
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...


In [None]:
users.head()

Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


In [None]:
users.shape

(278858, 3)

In [None]:
ratings.head()

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [None]:
ratings.shape

(1149780, 3)

In [None]:
ratings.rename(columns={'User-ID':'user_id',
                        'Book-Rating':'rating'}, inplace=True)


In [None]:
# checking for user_ids that appear more than 200 times
x = ratings['user_id'].value_counts() > 200

In [None]:
x[x].shape

(899,)

In [None]:
# Assigning those user_ids to y
y = x[x].index
y

Index([ 11676, 198711, 153662,  98391,  35859, 212898, 278418,  76352, 110973,
       235105,
       ...
       116122,  44296,  28634,  59727,  73681, 274808, 188951,   9856, 155916,
       268622],
      dtype='int64', name='user_id', length=899)

In [None]:
# Only storing the user_ids that appear in y back in ratings df
ratings = ratings[ratings['user_id'].isin(y)]

In [None]:
# Merging the books dataset and the ratings dataset
ratings_plus_books = ratings.merge(books, on='ISBN')
ratings_plus_books.head()

Unnamed: 0,user_id,ISBN,rating,title,author,year,publisher,image_url
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...
1,277427,0026217457,0,Vegetarian Times Complete Cookbook,Lucy Moll,1995,John Wiley &amp; Sons,http://images.amazon.com/images/P/0026217457.0...
2,277427,003008685X,8,Pioneers,James Fenimore Cooper,1974,Thomson Learning,http://images.amazon.com/images/P/003008685X.0...
3,277427,0030615321,0,"Ask for May, Settle for June (A Doonesbury book)",G. B. Trudeau,1982,Henry Holt &amp; Co,http://images.amazon.com/images/P/0030615321.0...
4,277427,0060002050,0,On a Wicked Dawn (Cynster Novels),Stephanie Laurens,2002,Avon Books,http://images.amazon.com/images/P/0060002050.0...


In [None]:
# Grouping the df by title and counting how many ratings each title has
num_rating = ratings_plus_books.groupby('title')['rating'].count().reset_index()
num_rating

Unnamed: 0,title,rating
0,A Light in the Storm: The Civil War Diary of ...,2
1,Always Have Popsicles,1
2,Apple Magic (The Collector's series),1
3,Beyond IBM: Leadership Marketing and Finance ...,1
4,Clifford Visita El Hospital (Clifford El Gran...,1
...,...,...
160264,Ã?Â?ber die Pflicht zum Ungehorsam gegen den S...,3
160265,Ã?Â?lpiraten.,1
160266,Ã?Â?rger mit Produkt X. Roman.,1
160267,Ã?Â?stlich der Berge.,1


In [None]:
num_rating.rename(columns={'rating':'num_ratings'}, inplace=True)

In [None]:
# merging the dataframe with total No. of ratings with the books plus ratings df
final_rating = ratings_plus_books.merge(num_rating, on='title')
final_rating.head()

Unnamed: 0,user_id,ISBN,rating,title,author,year,publisher,image_url,num_ratings
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,82
1,277427,0026217457,0,Vegetarian Times Complete Cookbook,Lucy Moll,1995,John Wiley &amp; Sons,http://images.amazon.com/images/P/0026217457.0...,7
2,277427,003008685X,8,Pioneers,James Fenimore Cooper,1974,Thomson Learning,http://images.amazon.com/images/P/003008685X.0...,1
3,277427,0030615321,0,"Ask for May, Settle for June (A Doonesbury book)",G. B. Trudeau,1982,Henry Holt &amp; Co,http://images.amazon.com/images/P/0030615321.0...,1
4,277427,0060002050,0,On a Wicked Dawn (Cynster Novels),Stephanie Laurens,2002,Avon Books,http://images.amazon.com/images/P/0060002050.0...,13


In [None]:
# Filtering for ratings that are only above or equal to 50
final_rating = final_rating[final_rating['num_ratings'] >= 50]

In [None]:
final_rating.shape

(61853, 9)

In [None]:
# Dropping duplicates in the title and user_id column
final_rating.drop_duplicates(['user_id', 'title'], inplace=True)
final_rating.shape

(59850, 9)

# Training the Model

In [None]:
# Creating a pivot table
book_pivot = final_rating.pivot_table(columns='user_id', index='title', values='rating')

In [None]:
book_pivot.fillna(0, inplace=True)
book_pivot.head()

user_id,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,274004,274061,274301,274308,274808,275970,277427,277478,277639,278418
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1st to Die: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2nd Chance,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4 Blondes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
84 Charing Cross Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0,0.0,0.0


In [None]:
c

In [None]:
# getting the indices for the 5 nearest neighbours
_, indices = model.kneighbors(book_pivot.iloc[333,:].values.reshape(1,-1), n_neighbors=5)
indices

array([[333, 209, 303,  68, 184]])

In [None]:
for i in range(len(indices)):
  print(book_pivot.index[indices[i]])

Index(['Mercy', 'Forever... : A Novel of Good and Evil, Love and Hope',
       'Killjoy', 'Beach House', 'Exclusive'],
      dtype='object', name='title')


In [None]:
# Extracting the book names
book_name = []
for book_id in indices:
  book_name.append(book_pivot.index[book_id])

print(book_name[0])

Index(['Mercy', 'Forever... : A Novel of Good and Evil, Love and Hope',
       'Killjoy', 'Beach House', 'Exclusive'],
      dtype='object', name='title')


In [None]:
ids_index = []
for name in book_name[0]:
  ids = np.where(final_rating['title'] == name)[0][0]
  ids_index.append(ids)

In [None]:
# Getting the urls from the final_rating df
for idx in ids_index:
  url = final_rating.iloc[idx]['image_url']
  print(url)

http://images.amazon.com/images/P/0671034022.01.LZZZZZZZ.jpg
http://images.amazon.com/images/P/067101420X.01.LZZZZZZZ.jpg
http://images.amazon.com/images/P/0345453816.01.LZZZZZZZ.jpg
http://images.amazon.com/images/P/1551668998.01.LZZZZZZZ.jpg
http://images.amazon.com/images/P/0446604232.01.LZZZZZZZ.jpg


In [None]:
book_names = list(book_pivot.index)

In [None]:
pickle.dump(model, open('model.pkl', 'wb'))
pickle.dump(book_names, open('book_names.pkl', 'wb'))
pickle.dump(final_rating, open('final_rating.pkl', 'wb'))
pickle.dump(book_pivot, open('book_pivot.pkl', 'wb'))

# Testing the Model

In [None]:
def recommend_book(book_name):
  book_id = np.where(book_pivot.index == book_name)[0][0]
  _, indices = model.kneighbors(book_pivot.iloc[book_id,:].values.reshape(1,-1)
  , n_neighbors=5)
  for i in range(len(indices)):
    books = book_pivot.index[indices[i]]
    for j in books:
      print(j)

In [None]:
book_name = 'Killjoy'
recommend_book(book_name)

Killjoy
Temptation
Forever... : A Novel of Good and Evil, Love and Hope
No Safe Place
Paradise
