**Approach 1: Collaborative Filtering**

**1. Reading In Books We Like**

In [335]:
import pandas as pd 

my_books = pd.read_csv("liked_books.csv", index_col=0)
my_books["book_id"] = my_books["book_id"].astype(str)

In [336]:
my_books

Unnamed: 0,user_id,book_id,rating,title
0,-1,17788401,3,Ugly Love
1,-1,60310757,5,"Lightlark (Lightlark, #1)"
2,-1,58410355,5,Nightbane (Lightlark #2)
3,-1,64645812,0,If Only I Had Told Her
4,-1,44676678,4,If He Had Been with Me
...,...,...,...,...
128,-1,6448772,5,"Quantum: Einstein, Bohr, and the Great Debate ..."
129,-1,8161140,1,On the Beach
130,-1,28187,1,The Lightning Thief (Percy Jackson and the Ol...
131,-1,25659450,1,Arkwright


In [337]:
my_books["book_id"] = my_books["book_id"].astype(str)

**2. Finding Similar Users**

In [338]:
csv_book_mapping = {}

with open("book_id_map.csv", "r") as f:
    while True:
        line = f.readline()
        if not line:
            break
        csv_id, book_id = line.strip().split(",")
        csv_book_mapping[csv_id] = book_id

**Set that contains all of the unique books that we have read.**

In [339]:
book_set = set(my_books["book_id"])

**Every user that read the same book as us will be played in this overlap users dictionary.**

In [340]:
!wc -l goodreads_interactions.csv

 228648343 goodreads_interactions.csv


**Note: _ means that we do not care about that variable.**

In [341]:
overlap_users = {}

with open("goodreads_interactions.csv", 'r') as f:
    while True:
        line = f.readline()
        if not line:
            break
        user_id, csv_id, _, rating, _ = line.split(",")
        
        book_id = csv_book_mapping.get(csv_id)
        
        if book_id in book_set:
            if user_id not in overlap_users:
                overlap_users[user_id] = 1
            else:
                overlap_users[user_id] += 1

In [342]:
filtered_overlap_users = set([k for k in overlap_users if overlap_users[k] > my_books.shape[0]/5])

**3. Finding Similar User Book Ratings**

In [343]:
interactions_list = []

with open("goodreads_interactions.csv") as f:
    while True:
        line = f.readline()
        if not line:
            break

        user_id, csv_id, _, rating, _ = line.strip().split(",")
        if user_id in filtered_overlap_users:
            book_id = csv_book_mapping[csv_id]
            interactions_list.append([user_id, book_id, rating]) 

**4. Creating A User /Book Matrix**

In [344]:
len(interactions_list)

2050306

In [345]:
interactions_list[0]

['1033', '5287473', '5']

**DateFrame**

In [346]:
interactions = pd.DataFrame(interactions_list, columns=["user_id", "book_id", "rating"])

In [347]:
interactions = pd.concat([my_books[["user_id", "book_id", "rating"]], interactions])

In [348]:
interactions

Unnamed: 0,user_id,book_id,rating
0,-1,17788401,3
1,-1,60310757,5
2,-1,58410355,5
3,-1,64645812,0
4,-1,44676678,4
...,...,...,...
2050301,441283,35068798,0
2050302,441283,34227692,0
2050303,441283,32969999,0
2050304,441283,23705532,0


In [349]:
interactions["book_id"] = interactions["book_id"].astype(str)
interactions["user_id"] = interactions["user_id"].astype(str)
interactions["rating"] = pd.to_numeric(interactions["rating"])

In [350]:
interactions["user_id"].unique()

array(['-1', '1033', '2794', '9880', '13856', '18356', '22864', '27120',
       '32978', '34085', '40239', '43546', '45510', '49949', '55191',
       '56922', '59545', '65620', '66585', '70883', '78919', '79624',
       '83250', '83293', '83470', '84900', '89997', '92967', '93625',
       '96010', '99975', '100939', '105106', '113610', '113970', '116512',
       '119660', '121119', '122431', '126987', '130849', '137133',
       '141311', '143201', '154116', '155631', '157964', '161399',
       '165733', '171930', '173042', '180123', '180817', '182379',
       '183847', '188693', '192434', '192632', '193311', '195992',
       '198123', '198161', '199610', '207960', '214546', '214914',
       '215878', '216545', '220824', '222577', '223539', '223750',
       '227507', '238739', '239651', '241317', '243937', '247060',
       '248513', '254986', '262743', '268818', '269658', '275162',
       '275756', '275859', '275919', '280135', '280329', '281975',
       '284597', '286783', '287931', '2

**Note: cat.codes means category codes.**

In [351]:
interactions["user_index"] = interactions["user_id"].astype("category").cat.codes

**User ID and Book Id - To Position.**

In [352]:
interactions["user_index"].unique()

array([  0,   2,  62, 142,  13,  26,  46,  57,  83,  89, 109, 118, 124,
       125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137,
       138, 139, 140, 141, 143,   1,   3,   4,   5,   6,   7,   8,   9,
        10,  11,  12,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,
        24,  25,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,
        38,  39,  40,  41,  42,  43,  44,  45,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  58,  59,  60,  61,  63,  64,  65,  66,
        67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,  78,  79,
        80,  81,  82,  84,  85,  86,  87,  88,  90,  91,  92,  93,  94,
        95,  96,  97,  98,  99, 100, 101, 102, 103, 104, 105, 106, 107,
       108, 110, 111, 112, 113, 114, 115, 116, 117, 119, 120, 121, 122,
       123], dtype=int16)

In [353]:
interactions["book_index"] = interactions["book_id"].astype("category").cat.codes

**Note 1: Sparse Matrix will be used.**

**Note 2: if there is no value in a column you just leave it blank and not take up any memory or storage space (Sparse Matrix).**

In [354]:
from scipy.sparse import coo_matrix

ratings_mat_coo = coo_matrix((interactions["rating"], (interactions["user_index"], interactions["book_index"])))

In [355]:
ratings_mat_coo

<144x542886 sparse matrix of type '<class 'numpy.int64'>'
	with 2050438 stored elements in COOrdinate format>

**Convert coo matrix to csr matrix.**

**coo matrices are a little bit easier to create which is why we initially created it in coo format and now we are going to convert it in csr format.**

In [356]:
ratings_mat = ratings_mat_coo.tocsr()

**5. Finding Users Similar To Us**

In [357]:
interactions[interactions["user_id"] == "-1"]

Unnamed: 0,user_id,book_id,rating,user_index,book_index
0,-1,17788401,3,0,140796
1,-1,60310757,5,0,451741
2,-1,58410355,5,0,447417
3,-1,64645812,0,0,465589
4,-1,44676678,4,0,426457
...,...,...,...,...,...
128,-1,6448772,5,0,464984
129,-1,8161140,1,0,508123
130,-1,28187,1,0,332218
131,-1,25659450,1,0,294795


In [358]:
my_index = 0

**Use a cosine similarity measure to find users that are similar to us and have similar taste in books.**

**Cosine similarity will just find the similarity between two rows in our matrix so that we can find how similar each user is to us in terms of what books they read and how they rated them.**

In [359]:
from sklearn.metrics.pairwise import cosine_similarity

similarity = cosine_similarity(ratings_mat[my_index,:], ratings_mat).flatten()

In [360]:
similarity[0]

0.9999999999999987

**We will find the indices of the users who are most similar to us - That is what this numpy r partition function does - We're passing in negative 15. So what we're going to find are the 15 users who have the most similar taste.**

In [361]:
import numpy as np

indices = np.argpartition(similarity, -15)[-15:]

In [362]:
indices

array([ 31,  45,  70,  35, 143, 117, 104,  54, 137,  49,  33,  28,  24,
        19,   0])

**Find all of the rows in interactions where the user index is in our indices.**

In [363]:
similar_users = interactions[interactions["user_index"].isin(indices)].copy()

**We'll just take ourselves out so we do not get book recommendations from ourselves.**

In [364]:
similar_users = similar_users[similar_users["user_id"]!="-1"]

In [365]:
similar_users

Unnamed: 0,user_id,book_id,rating,user_index,book_index
401300,84900,20697586,0,137,200373
401301,84900,18166936,5,137,154818
401302,84900,19063,5,137,182121
401303,84900,18143977,4,137,153956
401304,84900,10357575,3,137,5916
...,...,...,...,...,...
1954515,433594,17465707,0,117,133240
1954516,433594,24233708,0,117,266435
1954517,433594,36067991,0,117,409765
1954518,433594,13480263,0,117,64076


**6. Creating Book Recommendations**

**How many times each book appeared in these recommendations.**

In [366]:
book_recs = similar_users.groupby("book_id").rating.agg(['count', 'mean'])

In [367]:
book_recs

Unnamed: 0_level_0,count,mean
book_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,11,3.818182
10000191,1,0.000000
10002296,2,0.000000
10006,1,0.000000
1002108,1,0.000000
...,...,...
99944,1,0.000000
9996645,1,0.000000
9999107,2,0.000000
9999576,1,0.000000


**Adding book titles.**

In [368]:
books_titles = pd.read_json("books_titles.json")
books_titles["book_id"] = books_titles["book_id"].astype(str)

**Merge our 2 datasets to get the book titles into our recommendations.**

In [369]:
book_recs = book_recs.merge(books_titles, how="inner", on="book_id")

In [370]:
book_recs

Unnamed: 0,book_id,count,mean,title,ratings,url,cover_image,mod_title
0,1,11,3.818182,Harry Potter and the Half-Blood Prince (Harry ...,1713866,https://www.goodreads.com/book/show/1.Harry_Po...,https://images.gr-assets.com/books/1361039191m...,harry potter and the halfblood prince harry po...
1,10000191,1,0.000000,Yellow Crocus,17787,https://www.goodreads.com/book/show/10000191-y...,https://s.gr-assets.com/assets/nophoto/book/11...,yellow crocus
2,10002296,2,0.000000,Wildflower Hill,9475,https://www.goodreads.com/book/show/10002296-w...,https://images.gr-assets.com/books/1314025082m...,wildflower hill
3,10006,1,0.000000,Oracle Night,8972,https://www.goodreads.com/book/show/10006.Orac...,https://images.gr-assets.com/books/1328287302m...,oracle night
4,1002108,1,0.000000,Grace Above All,43,https://www.goodreads.com/book/show/1002108.Gr...,https://images.gr-assets.com/books/1312009382m...,grace above all
...,...,...,...,...,...,...,...,...
18778,99944,1,0.000000,The Bhagavad Gita,33855,https://www.goodreads.com/book/show/99944.The_...,https://images.gr-assets.com/books/1383059639m...,the bhagavad gita
18779,9996645,1,0.000000,"Truly, Madly, Deeply",1268,https://www.goodreads.com/book/show/9996645-tr...,https://images.gr-assets.com/books/1293090850m...,truly madly deeply
18780,9999107,2,0.000000,The American Heiress,24522,https://www.goodreads.com/book/show/9999107-th...,https://images.gr-assets.com/books/1307342832m...,the american heiress
18781,9999576,1,0.000000,Long Gone,3953,https://www.goodreads.com/book/show/9999576-lo...,https://s.gr-assets.com/assets/nophoto/book/11...,long gone


**7. Ranking Our Book Recommendations**

**Figuring out which recommendations are relevant to us.**

**We'll first create an adjusted count which is a count but normalized for how many times the book appeared among people similar to us.**

**Ratings is the number of times the book was rated across all of goodreads.**

In [371]:
book_recs["adjusted_count"] = book_recs["count"] * (book_recs["count"] / book_recs["ratings"])

**Score: Indicating how much we might like a book.**

In [372]:
book_recs["score"] = book_recs["mean"] * book_recs["adjusted_count"]

In [373]:
# Remove any books with count under 1
#book_recs = book_recs[book_recs["count"] >= 1]

**Take out any books we already have read.**

**1st we'll take out any book where the book id matches an id of a book we already read. The 2nd thing we'll do is we'll take out any books where the title matches a book we've already read.**

In [374]:
book_recs = book_recs[~book_recs["book_id"].isin(my_books["book_id"])]

In [375]:
my_books["mod_title"] = my_books["title"].str.replace("[^a-zA-Z0-9 ]", "", regex=True).str.lower()

**Replace any sequences of spaces.**

In [376]:
my_books["mod_title"] = my_books["mod_title"].str.replace("\s+", " ", regex=True)

**Let's take out anything in our recommendations where the mod title fits into the books we already read.**

In [377]:
book_recs = book_recs[~book_recs["mod_title"].isin(my_books["mod_title"])]

In [378]:
#book_recs = book_recs[book_recs["score"] >= 1]

**Remove anything that appeared less than twice or less in our recommendations.**

In [379]:
book_recs = book_recs[book_recs["count"]>2]

**We want to make sure that we only find books where our mean rating is greater than a certain amount.**

In [380]:
book_recs = book_recs[book_recs["mean"] >=4]

**Creating our top recommendations which we'll sort based on our score (This basically sorts the dataframe based on the score).**

In [381]:
top_recs = book_recs.sort_values("mean", ascending=False)

In [382]:
top_recs

Unnamed: 0,book_id,count,mean,title,ratings,url,cover_image,mod_title,adjusted_count,score
15139,41899,3,5.0,Fantastic Beasts and Where to Find Them,194645,https://www.goodreads.com/book/show/41899.Fant...,https://images.gr-assets.com/books/1303738520m...,fantastic beasts and where to find them,4.6e-05,0.000231
6001,19358975,5,4.8,"Saga, Vol. 3 (Saga, #3)",50020,https://www.goodreads.com/book/show/19358975-s...,https://images.gr-assets.com/books/1486028973m...,saga vol 3 saga 3,0.0005,0.002399
15625,5,11,4.727273,Harry Potter and the Prisoner of Azkaban (Harr...,1876252,https://www.goodreads.com/book/show/5.Harry_Po...,https://images.gr-assets.com/books/1499277281m...,harry potter and the prisoner of azkaban harry...,6.4e-05,0.000305
3716,17131869,7,4.714286,"Saga, Vol. 2 (Saga, #2)",58474,https://www.goodreads.com/book/show/17131869-s...,https://images.gr-assets.com/books/1486028954m...,saga vol 2 saga 2,0.000838,0.00395
326,10637766,3,4.666667,"Silence (Hush, Hush, #3)",196160,https://www.goodreads.com/book/show/10637766-s...,https://images.gr-assets.com/books/1362408152m...,silence hush hush 3,4.6e-05,0.000214
1424,12751687,3,4.666667,"Finale (Hush, Hush, #4)",109752,https://www.goodreads.com/book/show/12751687-f...,https://images.gr-assets.com/books/1362408156m...,finale hush hush 4,8.2e-05,0.000383
17197,7544943,3,4.666667,"Death Note: Black Edition, Vol. 4",1739,https://www.goodreads.com/book/show/7544943-de...,https://images.gr-assets.com/books/1330545562m...,death note black edition vol 4,0.005175,0.024152
15665,5096,3,4.666667,Wizard and Glass,108246,https://www.goodreads.com/book/show/5096.Wizar...,https://images.gr-assets.com/books/1327946510m...,wizard and glass,8.3e-05,0.000388
14380,34084,3,4.666667,The Waste Lands,121199,https://www.goodreads.com/book/show/34084.The_...,https://s.gr-assets.com/assets/nophoto/book/11...,the waste lands,7.4e-05,0.000347
11405,2767052,13,4.615385,"The Hunger Games (The Hunger Games, #1)",4899965,https://www.goodreads.com/book/show/2767052-th...,https://images.gr-assets.com/books/1447303603m...,the hunger games the hunger games 1,3.4e-05,0.000159


**8. Improve The Display Of The Books**

In [383]:
def make_clickable(val):
    return '<a target="_blank" href="{}">Goodreads</a>'.format(val, val)

def show_image(val):
    return '<a href="{}"><img src="{}" width=50></img></a>'.format(val, val)

top_recs.style.format({'url': make_clickable, 'cover_image': show_image})

Unnamed: 0,book_id,count,mean,title,ratings,url,cover_image,mod_title,adjusted_count,score
15139,41899,3,5.0,Fantastic Beasts and Where to Find Them,194645,Goodreads,,fantastic beasts and where to find them,4.6e-05,0.000231
6001,19358975,5,4.8,"Saga, Vol. 3 (Saga, #3)",50020,Goodreads,,saga vol 3 saga 3,0.0005,0.002399
15625,5,11,4.727273,"Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)",1876252,Goodreads,,harry potter and the prisoner of azkaban harry potter 3,6.4e-05,0.000305
3716,17131869,7,4.714286,"Saga, Vol. 2 (Saga, #2)",58474,Goodreads,,saga vol 2 saga 2,0.000838,0.00395
326,10637766,3,4.666667,"Silence (Hush, Hush, #3)",196160,Goodreads,,silence hush hush 3,4.6e-05,0.000214
1424,12751687,3,4.666667,"Finale (Hush, Hush, #4)",109752,Goodreads,,finale hush hush 4,8.2e-05,0.000383
17197,7544943,3,4.666667,"Death Note: Black Edition, Vol. 4",1739,Goodreads,,death note black edition vol 4,0.005175,0.024152
15665,5096,3,4.666667,Wizard and Glass,108246,Goodreads,,wizard and glass,8.3e-05,0.000388
14380,34084,3,4.666667,The Waste Lands,121199,Goodreads,,the waste lands,7.4e-05,0.000347
11405,2767052,13,4.615385,"The Hunger Games (The Hunger Games, #1)",4899965,Goodreads,,the hunger games the hunger games 1,3.4e-05,0.000159


In [384]:
print("Length of ground_truth_books:", len(ground_truth_books))
print("Length of recommendations:", len(recommendations))

Length of ground_truth_books: 132
Length of recommendations: 22


**Testing**

In [385]:
import pandas as pd
from sklearn.metrics import precision_score, recall_score, f1_score

# Step 1: Prepare the ground truth books
liked_books_df = pd.read_csv('liked_books.csv')
ground_truth_books = liked_books_df['book_id'].tolist()

# Step 2: Generate recommendations
recommendations = book_recs['book_id'].tolist()

# Step 3: Remove duplicate recommendations
recommendations = list(set(recommendations))

# Convert book IDs to integers
ground_truth_books = [int(book_id) for book_id in ground_truth_books]
recommendations = [int(book_id) for book_id in recommendations]

# Step 4: Calculate evaluation metrics if there are recommendations
if recommendations:
    precision = precision_score(ground_truth_books[:len(recommendations)], recommendations, average="micro")
    recall = recall_score(ground_truth_books[:len(recommendations)], recommendations, average="micro")
    f1 = f1_score(ground_truth_books[:len(recommendations)], recommendations, average="micro")
    print("Precision:", precision)
    print("Recall:", recall)
    print("F1 Score:", f1)
else:
    print("No recommendations generated.")

Precision: 0.0
Recall: 0.0
F1 Score: 0.0


**Approach 2: K-Nearest Neighbor**

**1. Reading In Books We Like**

In [386]:
import pandas as pd
import numpy as np
from scipy.sparse import coo_matrix
from sklearn.neighbors import NearestNeighbors

my_books = pd.read_csv("liked_books.csv", index_col=0)
my_books["book_id"] = my_books["book_id"].astype(str)

**2. Mapping CSV book IDs To book IDs**

**book_set = set(my_books["book_id"]) is the set that contains all of the unique books that we have read.**

In [387]:
csv_book_mapping = {}
with open("book_id_map.csv", "r") as f:
    while True:
        line = f.readline()
        if not line:
            break
        csv_id, book_id = line.strip().split(",")
        csv_book_mapping[csv_id] = book_id


book_set = set(my_books["book_id"])

**3. Finding Similar Users Using KNN**

In [388]:
interactions_list = []
with open("goodreads_interactions.csv") as f:
    while True:
        line = f.readline()
        if not line:
            break

        user_id, csv_id, _, rating, _ = line.strip().split(",")
        book_id = csv_book_mapping.get(csv_id)

        if book_id in book_set:
            interactions_list.append([user_id, book_id, rating])

**4. Creating The User-Book Matrix**

**k = 15  Is the number of similar users to find.**

In [389]:
interactions = pd.DataFrame(interactions_list, columns=["user_id", "book_id", "rating"])
interactions["book_id"] = interactions["book_id"].astype(str)
interactions["user_id"] = interactions["user_id"].astype(str)
interactions["rating"] = pd.to_numeric(interactions["rating"])
interactions["user_index"] = interactions["user_id"].astype("category").cat.codes
interactions["book_index"] = interactions["book_id"].astype("category").cat.codes
ratings_mat_coo = coo_matrix((interactions["rating"], (interactions["user_index"], interactions["book_index"])))
ratings_mat = ratings_mat_coo.tocsr()

k = 15  
knn_model = NearestNeighbors(metric='cosine', algorithm='brute')
knn_model.fit(ratings_mat)

my_index = 0
distances, indices = knn_model.kneighbors(ratings_mat[my_index], n_neighbors=k+1)
similar_user_indices = indices.squeeze()[1:]

**5. Filter And Process Similar Users Data**

In [390]:
similar_users = interactions[interactions["user_index"].isin(similar_user_indices)].copy()
similar_users = similar_users[similar_users["user_id"] != "-1"]

**6. Creating Book Recommendations**

In [391]:
book_recs = similar_users.groupby("book_id").rating.agg(['count', 'mean'])
book_recs.reset_index(inplace=True)

In [392]:
print("Number of rows in book_recs:", len(book_recs))

Number of rows in book_recs: 13


In [393]:
book_recs

Unnamed: 0,book_id,count,mean
0,11870085,1,0.0
1,128029,15,4.4
2,139069,1,0.0
3,17235026,1,0.0
4,1898,15,4.0
5,20910157,2,1.5
6,22628,3,3.0
7,228221,1,3.0
8,23513349,1,0.0
9,356824,1,0.0


**7. Adding Book Titles**

In [394]:
books_titles = pd.read_json("books_titles.json")
books_titles["book_id"] = books_titles["book_id"].astype(str)

In [395]:
book_recs = book_recs.merge(books_titles, how="inner", on="book_id")

**8. Ranking The Book Recommendations**

In [396]:
book_recs["adjusted_count"] = book_recs["count"] * (book_recs["count"] / book_recs["ratings"])

In [397]:
book_recs

Unnamed: 0,book_id,count,mean,title,ratings,url,cover_image,mod_title,adjusted_count
0,11870085,1,0.0,The Fault in Our Stars,2429317,https://www.goodreads.com/book/show/11870085-t...,https://images.gr-assets.com/books/1360206420m...,the fault in our stars,4.116383e-07
1,128029,15,4.4,A Thousand Splendid Suns,835172,https://www.goodreads.com/book/show/128029.A_T...,https://images.gr-assets.com/books/1345958969m...,a thousand splendid suns,0.0002694056
2,139069,1,0.0,Endurance: Shackleton's Incredible Voyage,51536,https://www.goodreads.com/book/show/139069.End...,https://images.gr-assets.com/books/1391329559m...,endurance shackletons incredible voyage,1.940391e-05
3,17235026,1,0.0,The Girl with All the Gifts,105967,https://www.goodreads.com/book/show/17235026-t...,https://images.gr-assets.com/books/1403033579m...,the girl with all the gifts,9.4369e-06
4,1898,15,4.0,Into Thin Air: A Personal Account of the Mount...,294556,https://www.goodreads.com/book/show/1898.Into_...,https://images.gr-assets.com/books/1463384482m...,into thin air a personal account of the mount ...,0.0007638615
5,20910157,2,1.5,Yes Please,262157,https://www.goodreads.com/book/show/20910157-y...,https://images.gr-assets.com/books/1402815435m...,yes please,1.525803e-05
6,22628,3,3.0,The Perks of Being a Wallflower,906322,https://www.goodreads.com/book/show/22628.The_...,https://images.gr-assets.com/books/1167352178m...,the perks of being a wallflower,9.930246e-06
7,228221,1,3.0,The Mask,14615,https://www.goodreads.com/book/show/228221.The...,https://s.gr-assets.com/assets/nophoto/book/11...,the mask,6.842285e-05
8,23513349,1,0.0,Milk and Honey,92450,https://www.goodreads.com/book/show/23513349-m...,https://images.gr-assets.com/books/1491595510m...,milk and honey,1.081666e-05
9,356824,1,0.0,India After Gandhi: The History of the World's...,9652,https://www.goodreads.com/book/show/356824.Ind...,https://images.gr-assets.com/books/1500471959m...,india after gandhi the history of the worlds l...,0.0001036055


In [398]:
book_recs["score"] = book_recs["mean"] * book_recs["adjusted_count"]

In [399]:
book_recs

Unnamed: 0,book_id,count,mean,title,ratings,url,cover_image,mod_title,adjusted_count,score
0,11870085,1,0.0,The Fault in Our Stars,2429317,https://www.goodreads.com/book/show/11870085-t...,https://images.gr-assets.com/books/1360206420m...,the fault in our stars,4.116383e-07,0.0
1,128029,15,4.4,A Thousand Splendid Suns,835172,https://www.goodreads.com/book/show/128029.A_T...,https://images.gr-assets.com/books/1345958969m...,a thousand splendid suns,0.0002694056,0.001185
2,139069,1,0.0,Endurance: Shackleton's Incredible Voyage,51536,https://www.goodreads.com/book/show/139069.End...,https://images.gr-assets.com/books/1391329559m...,endurance shackletons incredible voyage,1.940391e-05,0.0
3,17235026,1,0.0,The Girl with All the Gifts,105967,https://www.goodreads.com/book/show/17235026-t...,https://images.gr-assets.com/books/1403033579m...,the girl with all the gifts,9.4369e-06,0.0
4,1898,15,4.0,Into Thin Air: A Personal Account of the Mount...,294556,https://www.goodreads.com/book/show/1898.Into_...,https://images.gr-assets.com/books/1463384482m...,into thin air a personal account of the mount ...,0.0007638615,0.003055
5,20910157,2,1.5,Yes Please,262157,https://www.goodreads.com/book/show/20910157-y...,https://images.gr-assets.com/books/1402815435m...,yes please,1.525803e-05,2.3e-05
6,22628,3,3.0,The Perks of Being a Wallflower,906322,https://www.goodreads.com/book/show/22628.The_...,https://images.gr-assets.com/books/1167352178m...,the perks of being a wallflower,9.930246e-06,3e-05
7,228221,1,3.0,The Mask,14615,https://www.goodreads.com/book/show/228221.The...,https://s.gr-assets.com/assets/nophoto/book/11...,the mask,6.842285e-05,0.000205
8,23513349,1,0.0,Milk and Honey,92450,https://www.goodreads.com/book/show/23513349-m...,https://images.gr-assets.com/books/1491595510m...,milk and honey,1.081666e-05,0.0
9,356824,1,0.0,India After Gandhi: The History of the World's...,9652,https://www.goodreads.com/book/show/356824.Ind...,https://images.gr-assets.com/books/1500471959m...,india after gandhi the history of the worlds l...,0.0001036055,0.0


In [400]:
book_recs = book_recs[~book_recs["book_id"].isin(my_books["book_id"])]

In [401]:
book_recs

Unnamed: 0,book_id,count,mean,title,ratings,url,cover_image,mod_title,adjusted_count,score


In [402]:
my_books["mod_title"] = my_books["title"].str.replace("[^a-zA-Z0-9 ]", "", regex=True).str.lower()

**Replace any sequences of spaces.**

In [403]:
my_books["mod_title"] = my_books["mod_title"].str.replace("\s+", " ", regex=True)

**Remove recommendations where the mod title matches the books we already read.**

In [404]:
book_recs = book_recs[~book_recs["mod_title"].isin(my_books["mod_title"])]

**Remove recommendations that appeared less than twice in our recommendations.**

In [405]:
book_recs = book_recs[book_recs["count"] > 2]

**Remove recommendations with mean rating below a certain threshold.**

In [406]:
book_recs = book_recs[book_recs["mean"] >= 4]

**Sort the recommendations based on score.**

In [407]:
top_recs = book_recs.sort_values("mean", ascending=False)

**9. Improve The Display Of The Books**

In [408]:
def make_clickable(val):
    return '<a target="_blank" href="{}">Goodreads</a>'.format(val, val)

def show_image(val):
    return '<a href="{}"><img src="{}" width=50></img></a>'.format(val, val)

top_recs.style.format({'url': make_clickable, 'cover_image': show_image})

Unnamed: 0,book_id,count,mean,title,ratings,url,cover_image,mod_title,adjusted_count,score
