**1. Reading In Books We Like**

In [3]:
import pandas as pd 

my_books = pd.read_csv("liked_books.csv", index_col=0)
my_books["book_id"] = my_books["book_id"].astype(str)

In [4]:
my_books

Unnamed: 0,user_id,book_id,rating,title
0,-1,2517439,5,"The Forever War (The Forever War, #1)"
1,-1,113576,5,The Smartest Guys in the Room: The Amazing Ris...
2,-1,35100,5,Battle Cry of Freedom
3,-1,228221,5,The Mask of Command
5,-1,17662739,5,"2001: A Space Odyssey (Space Odyssey, #1)"
6,-1,356824,5,India After Gandhi: The History of the World's...
7,-1,12125412,5,The Lady or the Tiger?: and Other Logic Puzzles
8,-1,139069,5,Endurance: Shackleton's Incredible Voyage
10,-1,76680,5,"Foundation (Foundation, #1)"
11,-1,1898,5,Into Thin Air: A Personal Account of the Mount...


In [5]:
my_books["book_id"] = my_books["book_id"].astype(str)

**2. Finding Similar Users**

In [6]:
csv_book_mapping = {}

with open("book_id_map.csv", "r") as f:
    while True:
        line = f.readline()
        if not line:
            break
        csv_id, book_id = line.strip().split(",")
        csv_book_mapping[csv_id] = book_id

**Set that contains all of the unique books that we have read.**

In [7]:
book_set = set(my_books["book_id"])

**Every user that read the same book as us will be plaved in this overlap users dictionary.**

In [8]:
!wc -l goodreads_interactions.csv

 228648343 goodreads_interactions.csv


**Note: _ means that we do not care about that variable.**

In [9]:
overlap_users = {}

with open("goodreads_interactions.csv", 'r') as f:
    while True:
        line = f.readline()
        if not line:
            break
        user_id, csv_id, _, rating, _ = line.split(",")
        
        book_id = csv_book_mapping.get(csv_id)
        
        if book_id in book_set:
            if user_id not in overlap_users:
                overlap_users[user_id] = 1
            else:
                overlap_users[user_id] += 1

In [10]:
filtered_overlap_users = set([k for k in overlap_users if overlap_users[k] > my_books.shape[0]/5])

**3. Finding Similar User Book Ratings**

In [11]:
interactions_list = []

with open("goodreads_interactions.csv") as f:
    while True:
        line = f.readline()
        if not line:
            break

        user_id, csv_id, _, rating, _ = line.strip().split(",")
        if user_id in filtered_overlap_users:
            book_id = csv_book_mapping[csv_id]
            interactions_list.append([user_id, book_id, rating]) 

**4. Creating A User /Book Matrix**

In [12]:
len(interactions_list)

5638701

In [13]:
interactions_list[0]

['282', '627206', '4']

**DateFrame**

In [14]:
interactions = pd.DataFrame(interactions_list, columns=["user_id", "book_id", "rating"])

In [15]:
interactions = pd.concat([my_books[["user_id", "book_id", "rating"]], interactions])

In [16]:
interactions

Unnamed: 0,user_id,book_id,rating
0,-1,2517439,5
1,-1,113576,5
2,-1,35100,5
3,-1,228221,5
5,-1,17662739,5
...,...,...,...
5638696,804100,475178,0
5638697,804100,186074,0
5638698,804100,153008,0
5638699,804100,45107,0


In [17]:
interactions["book_id"] = interactions["book_id"].astype(str)
interactions["user_id"] = interactions["user_id"].astype(str)
interactions["rating"] = pd.to_numeric(interactions["rating"])

In [26]:
interactions["user_id"].unique()

array(['-1', '282', '874', ..., '442043', '712588', '804100'],
      dtype=object)

**Note: cat.codes means category codes.**

In [22]:
interactions["user_index"] = interactions["user_id"].astype("category").cat.codes

**User ID and Book Id - To Position.**

In [23]:
interactions["user_index"].unique()

array([   0,  555, 1216, ..., 1054, 1143, 1183], dtype=int16)

In [24]:
interactions["book_index"] = interactions["book_id"].astype("category").cat.codes

**Note 1: Sparse Matrix will be used.**

**Note 2: if there is no value in a column you just leave it blank and not take up any memory or storage space (Sparse Matrix).**

In [27]:
from scipy.sparse import coo_matrix

ratings_mat_coo = coo_matrix((interactions["rating"], (interactions["user_index"], interactions["book_index"])))

In [28]:
ratings_mat_coo

<1259x802870 sparse matrix of type '<class 'numpy.int64'>'
	with 5638728 stored elements in COOrdinate format>

**Convert coo matrix to csr matrix.**

**coo matrices are a little bit easier to create which is why we initially created it in coo format and now we are going to convert it in csr format.**

In [30]:
ratings_mat = ratings_mat_coo.tocsr()

**5. Finding Users Similar To Us**

In [31]:
interactions[interactions["user_id"] == "-1"]

Unnamed: 0,user_id,book_id,rating,user_index,book_index
0,-1,2517439,5,0,414880
1,-1,113576,5,0,38971
2,-1,35100,5,0,575858
3,-1,228221,5,0,356004
5,-1,17662739,5,0,214285
6,-1,356824,5,0,581743
7,-1,12125412,5,0,59763
8,-1,139069,5,0,124430
10,-1,76680,5,0,722098
11,-1,1898,5,0,276178


In [32]:
my_index = 0

**Use a cosine similarity measure to find users that are similar to us and have similar taste in books.**

**Cosine similarity will just find the similarity between two rows in our matrix so that we can find how similar each user is to us in terms of what books they read and how they rated them.**

In [33]:
from sklearn.metrics.pairwise import cosine_similarity

similarity = cosine_similarity(ratings_mat[my_index,:], ratings_mat).flatten()

In [34]:
similarity[0]

0.9999999999999999

**We will find the indices of the users who are most similar to us - That is what this numpy r partition function does - We're passing in negative 15. So what we're going to find are the 15 users who have the most similar taste.**

In [35]:
import numpy as np

indices = np.argpartition(similarity, -15)[-15:]

In [36]:
indices

array([1188,  942,  218,  129,  496,  435, 1208,  795, 1213, 1210, 1143,
        321,  294,  862,    0])

**Find all of the rows in interactions where the user index is in our indices.**

In [37]:
similar_users = interactions[interactions["user_index"].isin(indices)].copy()

**We'll just take ourselves out so we do not get book recommendations from ourselves.**

In [38]:
similar_users = similar_users[similar_users["user_id"]!="-1"]

In [39]:
similar_users

Unnamed: 0,user_id,book_id,rating,user_index,book_index
45312,4133,5359,3,942,632143
45313,4133,10464963,4,942,13492
45314,4133,3858,3,942,593622
45315,4133,11827808,4,942,51904
45316,4133,7913305,4,942,732465
...,...,...,...,...,...
5638521,712588,32388712,3,1143,543119
5638522,712588,16322,5,1143,183365
5638523,712588,860543,0,1143,759827
5638524,712588,853510,5,1143,756768


**6. Creating Book Recommendations**

**How many times each book appeared in these recommendations.**

In [40]:
book_recs = similar_users.groupby("book_id").rating.agg(['count', 'mean'])

In [41]:
book_recs

Unnamed: 0_level_0,count,mean
book_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,6,3.833333
100322,1,0.000000
100365,1,0.000000
10046142,1,0.000000
1005,3,0.000000
...,...,...
99561,2,2.500000
99610,1,3.000000
99664,1,4.000000
9969571,3,2.333333


**Adding book titles.**

In [42]:
books_titles = pd.read_json("books_titles.json")
books_titles["book_id"] = books_titles["book_id"].astype(str)

**Merge our 2 datasets to get the book titles into our recommendations.**

In [43]:
book_recs = book_recs.merge(books_titles, how="inner", on="book_id")

In [44]:
book_recs

Unnamed: 0,book_id,count,mean,title,ratings,url,cover_image,mod_title
0,1,6,3.833333,Harry Potter and the Half-Blood Prince (Harry ...,1713866,https://www.goodreads.com/book/show/1.Harry_Po...,https://images.gr-assets.com/books/1361039191m...,harry potter and the halfblood prince harry po...
1,100322,1,0.000000,Assata: An Autobiography,11057,https://www.goodreads.com/book/show/100322.Assata,https://images.gr-assets.com/books/1328857268m...,assata an autobiography
2,100365,1,0.000000,The Mote in God's Eye,48736,https://www.goodreads.com/book/show/100365.The...,https://images.gr-assets.com/books/1399490037m...,the mote in gods eye
3,10046142,1,0.000000,Dancing in the Glory of Monsters: The Collapse...,2391,https://www.goodreads.com/book/show/10046142-d...,https://images.gr-assets.com/books/1328757755m...,dancing in the glory of monsters the collapse ...
4,1005,3,0.000000,Think and Grow Rich,87634,https://www.goodreads.com/book/show/1005.Think...,https://s.gr-assets.com/assets/nophoto/book/11...,think and grow rich
...,...,...,...,...,...,...,...,...
2843,99561,2,2.500000,Looking for Alaska,804587,https://www.goodreads.com/book/show/99561.Look...,https://images.gr-assets.com/books/1394798630m...,looking for alaska
2844,99610,1,3.000000,The Best Laid Plans,17434,https://www.goodreads.com/book/show/99610.The_...,https://images.gr-assets.com/books/1353374848m...,the best laid plans
2845,99664,1,4.000000,The Painted Veil,24606,https://www.goodreads.com/book/show/99664.The_...,https://images.gr-assets.com/books/1320421719m...,the painted veil
2846,9969571,3,2.333333,Ready Player One,376328,https://www.goodreads.com/book/show/9969571-re...,https://images.gr-assets.com/books/1500930947m...,ready player one
