# BOOK RECOMMENDATION SYSTEM

## What is a Recommendation System and Why is it important ?

![Image](books.jpg)



A recommendation system is a software algorithm or system that suggests items, products, or content to users based on their preferences, behavior, or similarities to other users. These systems are designed to provide personalized and relevant recommendations, aiming to enhance user experience, engagement, and satisfaction.

Recommendation systems are crucial in various domains, including e-commerce, entertainment, social media, and content streaming platforms

We are going to build a Book recommendation system which can recommend us a book to read based on what are other read and also what books were rated the highest

We are going to use two types of methods for our Recommendation System namely Collaborative and Content Based Filtering , on this particular we will be using Content based filtering

## Collaborative Filtering

Collaborative Filtering tends to find what similar users would like and the recommendations to be provided and in order to classify the users into clusters of similar types and recommend each user according to the preference of its cluster. The main idea that governs the collaborative methods is that through past user-item interactions when processed through the system, it becomes sufficient to detect similar users or similar items to make predictions based on these estimated facts and insights

Import Packages

In [1]:
import pandas as pd
import numpy as np
import re
import matplotlib as plt
import json
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse import coo_matrix

In [2]:
liked_books = pd.read_csv(r"C:\Users\PC\OneDrive\Documents\Movie Recommendation System\liked_books.csv")

### Data Engineering 

In [3]:
liked_books['book_id'] = liked_books['book_id'].astype(str)

In [4]:
csv_book_mapping = {}
with open(r"C:\Users\PC\OneDrive\Documents\Movie Recommendation System\book_id_map.csv") as f:
    while True:
        line = f.readline()
        if not line:
            break
        csv_id , book_id = line.strip().split(",")
        csv_book_mapping[csv_id] = book_id

In [5]:
book_set = set(liked_books['book_id'])

In [6]:
overlap_users = {}
with open(r"C:\Users\PC\OneDrive\Documents\Movie Recommendation System\goodreads_interactions.csv") as f:
    
    while True:
        line = f.readline()
        if not line:
            break
        user_id , csv_id , _, rating, _ = line.strip().split(",")
        
        book_id = csv_book_mapping.get(csv_id)
        if book_id in book_set:
            if user_id not in overlap_users:
                overlap_users[user_id] = 1
            else:
                overlap_users[user_id] += 1

In [7]:
len(overlap_users)

316341

We have to find users who have read the same books as ours and cut down users who do not share the same book as ours and this will also make computation for model building less intensive

In [8]:
filtered_overlap_users = set([k for k in overlap_users if overlap_users[k] > liked_books.shape[0]/5])

Below we create a list 'interactions_list' which grabs 'user_id' , 'book_id' and ratings for all the users

In [9]:
interactions_list = []
with open(r"C:\Users\PC\OneDrive\Documents\Movie Recommendation System\goodreads_interactions.csv") as f:
    
    while True:
        line = f.readline()
        if not line:
            break
        user_id , csv_id , _, rating, _ = line.strip().split(",")
        if user_id in filtered_overlap_users:
            book_id = csv_book_mapping[csv_id]
            interactions_list.append([user_id,book_id,rating])
            


### Creating a user/book collaborative sparse matrix

In [10]:
interactions = pd.DataFrame(interactions_list, columns=['user_id','book_id','rating'])

Add our ratings into this matrix

In [11]:
interactions.head()

Unnamed: 0,user_id,book_id,rating
0,282,627206,4
1,282,960,4
2,282,15931,4
3,282,24178,3
4,282,6310,4


In [12]:
interactions = pd.concat([liked_books[['user_id','book_id','rating']], interactions])

In [13]:
interactions

Unnamed: 0,user_id,book_id,rating
0,-1,2517439,5
1,-1,113576,5
2,-1,35100,5
3,-1,228221,5
4,-1,17662739,5
...,...,...,...
5638696,804100,475178,0
5638697,804100,186074,0
5638698,804100,153008,0
5638699,804100,45107,0


### Data Engineering 

convert each column to appropriate data type

In [14]:
interactions['book_id'] = interactions['book_id'].astype(str)
interactions['user_id'] = interactions['user_id'].astype(str)
interactions['rating'] = pd.to_numeric(interactions['rating'])

In [15]:
interactions['user_index'] = interactions['user_id'].astype('category').cat.codes

In [16]:
interactions['book_index'] = interactions['book_id'].astype('category').cat.codes

In [17]:
interactions

Unnamed: 0,user_id,book_id,rating,user_index,book_index
0,-1,2517439,5,0,414880
1,-1,113576,5,0,38971
2,-1,35100,5,0,575858
3,-1,228221,5,0,356004
4,-1,17662739,5,0,214285
...,...,...,...,...,...
5638696,804100,475178,0,1183,617107
5638697,804100,186074,0,1183,258768
5638698,804100,153008,0,1183,141428
5638699,804100,45107,0,1183,611284


In [18]:
ratings_mat_coo = coo_matrix((interactions['rating'] ,(interactions['user_index'],interactions['book_index'])))

In [19]:
ratings_mat = ratings_mat_coo.tocsr()

In [20]:
my_index = 0

Find users who have similar book taste to us

In [21]:
similarity = cosine_similarity(ratings_mat[my_index,:],ratings_mat).flatten()

In [22]:
similarity[2]

0.06143442518998915

Find indices of users who are similar to us and find their positions 

In [23]:
indices = np.argpartition(similarity,-15)[-15:]

In [24]:
indices

array([1188,  942,  218,  129,  496,  435, 1208,  795, 1213, 1210, 1143,
        321,  294,  862,    0], dtype=int64)

Find id of users who are similar to us based on their indices 

In [25]:
similar_users = interactions[interactions['user_index'].isin(indices)].copy()

In [26]:
similar_users

Unnamed: 0,user_id,book_id,rating,user_index,book_index
0,-1,2517439,5,0,414880
1,-1,113576,5,0,38971
2,-1,35100,5,0,575858
3,-1,228221,5,0,356004
4,-1,17662739,5,0,214285
...,...,...,...,...,...
5638521,712588,32388712,3,1143,543119
5638522,712588,16322,5,1143,183365
5638523,712588,860543,0,1143,759827
5638524,712588,853510,5,1143,756768


#### Creating Book Recommendations

In [27]:
book_recs = similar_users.groupby('book_id').rating.agg(['count','mean'])

In [28]:
book_recs

Unnamed: 0_level_0,count,mean
book_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,6,3.833333
100322,1,0.000000
100365,1,0.000000
10046142,1,0.000000
1005,3,0.000000
...,...,...
99561,2,2.500000
99610,1,3.000000
99664,1,4.000000
9969571,3,2.333333


import book titles to see the names of the books recommended as above we only just had the book id

In [29]:
book_titles = pd.read_json('book_titles.json')

In [30]:
#convert book id to string
book_titles['book_id'] = book_titles['book_id'].astype(str)

In [31]:
#now we merge the book titles with recommended books to display each book's title
book_recs = book_recs.merge(book_titles,how='inner',on='book_id')

In [32]:
book_recs['adjusted-count'] = book_recs['count'] * (book_recs['count']/book_recs['ratings'])

In [33]:
book_recs['score'] = book_recs['mean'] * book_recs['adjusted-count']

In [34]:
book_recs = book_recs[~book_recs['book_id'].isin(liked_books['book_id'])]

In [35]:
book_recs = book_recs[book_recs['mean'] > 3]

In [36]:
book_recs

Unnamed: 0,book_id,count,mean,title,ratings,url,cover_image,title_clean,adjusted-count,score
0,1,6,3.833333,Harry Potter and the Half-Blood Prince (Harry ...,1713866,https://www.goodreads.com/book/show/1.Harry_Po...,https://images.gr-assets.com/books/1361039191m...,harry potter and the halfblood prince harry po...,0.000021,0.000081
8,10079321,1,5.000000,"The Magician King (The Magicians, #2)",53532,https://www.goodreads.com/book/show/10079321-t...,https://images.gr-assets.com/books/1316177353m...,the magician king the magicians 2,0.000019,0.000093
11,100915,6,3.666667,"The Lion, the Witch, and the Wardrobe (Chronic...",1575387,https://www.goodreads.com/book/show/100915.The...,https://images.gr-assets.com/books/1353029077m...,the lion the witch and the wardrobe chronicles...,0.000023,0.000084
12,10098912,2,3.500000,Chanakya's Chant,15807,https://www.goodreads.com/book/show/10098912-c...,https://images.gr-assets.com/books/1327939570m...,chanakyas chant,0.000253,0.000886
13,1009996,1,5.000000,Artisan Bread in Five Minutes a Day: The Disco...,14456,https://www.goodreads.com/book/show/1009996.Ar...,https://images.gr-assets.com/books/1317064461m...,artisan bread in five minutes a day the discov...,0.000069,0.000346
...,...,...,...,...,...,...,...,...,...,...
2814,960,9,3.444444,"Angels & Demons (Robert Langdon, #1)",2046499,https://www.goodreads.com/book/show/960.Angels...,https://images.gr-assets.com/books/1303390735m...,angels demons robert langdon 1,0.000040,0.000136
2815,961520,1,4.000000,Yuganta: The End of an Epoch,1694,https://www.goodreads.com/book/show/961520.Yug...,https://images.gr-assets.com/books/1442640715m...,yuganta the end of an epoch,0.000590,0.002361
2840,9832370,1,5.000000,BookRags Summary: A Storm of Swords,19243,https://www.goodreads.com/book/show/9832370-bo...,https://images.gr-assets.com/books/1369340463m...,bookrags summary a storm of swords,0.000052,0.000260
2847,99085,1,5.000000,"Lord Brocktree (Redwall, #13)",43677,https://www.goodreads.com/book/show/99085.Lord...,https://s.gr-assets.com/assets/nophoto/book/11...,lord brocktree redwall 13,0.000023,0.000114


Our Top Recommended Books 

In [37]:
top_recs = book_recs.sort_values("score",ascending=True)

Let's style our Dataframe to show the each book's image

In [38]:
def show_book_image(val):
    return '<img src={} width=50></img>'.format(val)
top_recs.style.format({'cover_image':show_book_image})

Unnamed: 0,book_id,count,mean,title,ratings,url,cover_image,title_clean,adjusted-count,score
2069,5139,1,4.0,"The Devil Wears Prada (The Devil Wears Prada, #1)",675927,https://www.goodreads.com/book/show/5139.The_Devil_Wears_Prada,,the devil wears prada the devil wears prada 1,1e-06,6e-06
2666,8442457,1,4.0,Gone Girl,513361,https://www.goodreads.com/book/show/8442457-gone-girl,,gone girl,2e-06,8e-06
460,137791,1,4.0,Divine Secrets of the Ya-Ya Sisterhood,469226,https://www.goodreads.com/book/show/137791.Divine_Secrets_of_the_Ya_Ya_Sisterhood,,divine secrets of the yaya sisterhood,2e-06,9e-06
1770,37470,1,4.0,"The Other Boleyn Girl (The Plantagenet and Tudor Novels, #9)",384888,https://www.goodreads.com/book/show/37470.The_Other_Boleyn_Girl,,the other boleyn girl the plantagenet and tudor novels 9,3e-06,1e-05
74,10592,1,4.0,Carrie,365610,https://www.goodreads.com/book/show/10592.Carrie,,carrie,3e-06,1.1e-05
2197,5886881,1,4.0,Dark Places,344059,https://www.goodreads.com/book/show/5886881-dark-places,,dark places,3e-06,1.2e-05
153,11127,1,5.0,"The Chronicles of Narnia (Chronicles of Narnia, #1-7)",382518,https://www.goodreads.com/book/show/11127.The_Chronicles_of_Narnia,,the chronicles of narnia chronicles of narnia 17,3e-06,1.3e-05
1549,30118,1,4.0,A Light in the Attic,304689,https://www.goodreads.com/book/show/30118.A_Light_in_the_Attic,,a light in the attic,3e-06,1.3e-05
2564,7896527,1,4.0,"Throne of Glass (Throne of Glass, #1)",295609,https://www.goodreads.com/book/show/7896527-throne-of-glass,,throne of glass throne of glass 1,3e-06,1.4e-05
1811,3950967,1,4.0,The Tales of Beedle the Bard,289889,https://www.goodreads.com/book/show/3950967-the-tales-of-beedle-the-bard,,the tales of beedle the bard,3e-06,1.4e-05
