## Build recommendation system

In this book we are going to build the recommendation system.

Achieving this will require two steps:

+ Find the users who like the same books as us
+ Find the other books those users like since will probably like the same books.

In [53]:
#list of liked books
liked_books = ["681495", "380748", "1062354", "638824", "12"]

## Explore book rating data

Look at some lines of hte `book_id_map.csv` file. This file is available at [this URL](https://www.youtube.com/redirect?event=video_description&redir_token=QUFFLUhqbHcyMzFtQ0N3dUhDcXBtUVEzak8yOWdVRUFjQXxBQ3Jtc0ttQTBpMWRUTS0zT2xJNk14VDM3T0pVSFJxOFNIazBJZU1sS2NWNzJFVGt3VXlYZDFCNkpYSm1GaDVyVlpnbU5JTllZaGhObXlYRGt6bFZoZS1ZWEl6NEZqcVhkMFlHUGs1bngtQjhyeFFRTDFNMlRZSQ&q=https%3A%2F%2Fdrive.google.com%2Fuc%3Fid%3D1CHTAaNwyzvbi1TR08MJrJ03BxA266Yxr&v=x-alwfgQ-cY). 

We first have to navigate to the directoray containing the csv file.

In [65]:
project_directory = "/content/drive/MyDrive/Colab Notebooks/book_recommandation_system"

In [66]:
cd '{project_directory}'

/content/drive/MyDrive/Colab Notebooks/book_recommandation_system


In [9]:
!head book_id_map.csv

book_id_csv,book_id
0,34684622
1,34536488
2,34017076
3,71730
4,30422361
5,33503613
6,33517540
7,34467031
8,6383669


Read the content of the file in a stream fashion.

In [45]:
csv_book_mapping = {}

with open('book_id_map.csv', 'r') as f:
  while True:
    line = f.readline()
    if not line:
      break
    csv_id, book_id = line.strip().split(",")
    csv_book_mapping[csv_id] = book_id

In [11]:
len(csv_book_mapping)

2360651

### Download the interaction data set

In [49]:
cd /content

/content


In [17]:
!gdown 1zmylV7XW2dfQVCLeg1LbllfQtHD2KUon

Downloading...
From: https://drive.google.com/uc?id=1zmylV7XW2dfQVCLeg1LbllfQtHD2KUon
To: /content/goodreads_interactions.csv
100% 4.32G/4.32G [00:21<00:00, 199MB/s]


Number of lines of the interactions file

In [18]:
!wc -l goodreads_interactions.csv

228648343 goodreads_interactions.csv


Look at the size of the interactions file

In [19]:
!ls -lh | grep goodreads_interactions.csv

-rw-r--r-- 1 root root 4.1G Sep 25 19:36 goodreads_interactions.csv


### Finding the users who like the same books as us

Look at some lines of the interactions file.

In [20]:
!head goodreads_interactions.csv

user_id,book_id,is_read,rating,is_reviewed
0,948,1,5,0
0,947,1,5,1
0,946,1,5,0
0,945,1,5,0
0,944,1,5,0
0,943,1,5,0
0,942,1,5,0
0,941,1,5,0
0,940,1,5,0


In [54]:
overlap_users = set()

with open("goodreads_interactions.csv", 'r') as f:
  while True:
    line = f.readline()
    if not line:
      break
    user_id, csv_id, _, rating, _ = line.split(",")
    
    if user_id in overlap_users:
      continue
    
    try:
      rating = int(rating)
    except:
      continue

    book_id =  csv_book_mapping[csv_id]

    if book_id in liked_books and rating  >= 4:
      overlap_users.add(user_id)

In [55]:
len(overlap_users)

342

### Find what books those users liked

In [56]:
rec_lines = []

with open("goodreads_interactions.csv", 'r') as f:
  while True:
    line = f.readline()
    if not line:
      break
    user_id, csv_id, _, rating, _ = line.split(",")
    
    if user_id in overlap_users:
      book_id =  csv_book_mapping[csv_id]
      rec_lines.append([user_id, book_id, rating])

In [58]:
import pandas as pd

recs = pd.DataFrame(rec_lines, columns=["user_id", "book_id", "rating"])
recs["book_id"] = recs["book_id"].astype(str)

Look at the top recommendations

In [70]:
top_recs = recs["book_id"].value_counts().head(10)
top_recs = top_recs.index.values

In [67]:
cd '{project_directory}'

/content/drive/MyDrive/Colab Notebooks/book_recommandation_system


In [68]:
books_titles = pd.read_json("books_titles.json")
books_titles["book_id"] = books_titles["book_id"].astype(str)

In [69]:
books_titles.head()

Unnamed: 0,book_id,title,ratings,url,cover_image,mod_title
0,7327624,"The Unschooled Wizard (Sun Wolf and Starhawk, ...",140,https://www.goodreads.com/book/show/7327624-th...,https://images.gr-assets.com/books/1304100136m...,the unschooled wizard sun wolf and starhawk 12
1,6066819,Best Friends Forever,51184,https://www.goodreads.com/book/show/6066819-be...,https://s.gr-assets.com/assets/nophoto/book/11...,best friends forever
2,287141,The Aeneid for Boys and Girls,46,https://www.goodreads.com/book/show/287141.The...,https://s.gr-assets.com/assets/nophoto/book/11...,the aeneid for boys and girls
3,6066812,All's Fairy in Love and War (Avalon: Web of Ma...,98,https://www.goodreads.com/book/show/6066812-al...,https://images.gr-assets.com/books/1316637798m...,alls fairy in love and war avalon web of magic 8
4,287149,The Devil's Notebook,986,https://www.goodreads.com/book/show/287149.The...,https://images.gr-assets.com/books/1328768789m...,the devils notebook


### Create initial book recommendations

In [76]:
books_titles[books_titles["book_id"].isin(top_recs)]

Unnamed: 0,book_id,title,ratings,url,cover_image,mod_title
446972,5907,The Hobbit,2099680,https://www.goodreads.com/book/show/5907.The_H...,https://images.gr-assets.com/books/1372847500m...,the hobbit
463463,4671,The Great Gatsby,2758812,https://www.goodreads.com/book/show/4671.The_G...,https://images.gr-assets.com/books/1490528560m...,the great gatsby
477321,1,Harry Potter and the Half-Blood Prince (Harry ...,1713866,https://www.goodreads.com/book/show/1.Harry_Po...,https://images.gr-assets.com/books/1361039191m...,harry potter and the halfblood prince harry po...
615314,5470,1984,2023937,https://www.goodreads.com/book/show/5470.1984,https://images.gr-assets.com/books/1348990566m...,1984
878545,3,Harry Potter and the Sorcerer's Stone (Harry P...,4765497,https://www.goodreads.com/book/show/3.Harry_Po...,https://images.gr-assets.com/books/1474154022m...,harry potter and the sorcerers stone harry pot...
902688,136251,Harry Potter and the Deathly Hallows (Harry Po...,1784684,https://www.goodreads.com/book/show/136251.Har...,https://images.gr-assets.com/books/1474171184m...,harry potter and the deathly hallows harry pot...
995137,15881,Harry Potter and the Chamber of Secrets (Harry...,1821802,https://www.goodreads.com/book/show/15881.Harr...,https://images.gr-assets.com/books/1474169725m...,harry potter and the chamber of secrets harry ...
1155584,2,Harry Potter and the Order of the Phoenix (Har...,1766895,https://www.goodreads.com/book/show/2.Harry_Po...,https://images.gr-assets.com/books/1507396732m...,harry potter and the order of the phoenix harr...
1221893,12,The Ultimate Hitchhiker's Guide: Five Complete...,3471,https://www.goodreads.com/book/show/12.The_Ult...,https://images.gr-assets.com/books/1404658218m...,the ultimate hitchhikers guide five complete n...
1248586,6,Harry Potter and the Goblet of Fire (Harry Pot...,1792561,https://www.goodreads.com/book/show/6.Harry_Po...,https://images.gr-assets.com/books/1361482611m...,harry potter and the goblet of fire harry pott...


### Improving our book recommendations

Our recommendation system takes into consideration only how popular a book is. Thus it recommends books that anyone may like, not books that are somehow related to the list of our liked books. To correct this, we have to make sure that the recommended books are only popular among the users that well rated the books we liked, and not popular among all the users.

In [78]:
all_recs = recs["book_id"].value_counts()
all_recs = all_recs.to_frame().reset_index()

In [80]:
all_recs.head(2)

Unnamed: 0,index,book_id
0,12,342
1,3,215


In [81]:
#rename our columns
all_recs.columns = ["book_id", "book_count"]

In [82]:
all_recs.head(2)

Unnamed: 0,book_id,book_count
0,12,342
1,3,215


In [83]:
all_recs = all_recs.merge(books_titles, how="inner", on="book_id")

In [84]:
all_recs

Unnamed: 0,book_id,book_count,title,ratings,url,cover_image,mod_title
0,12,342,The Ultimate Hitchhiker's Guide: Five Complete...,3471,https://www.goodreads.com/book/show/12.The_Ult...,https://images.gr-assets.com/books/1404658218m...,the ultimate hitchhikers guide five complete n...
1,3,215,Harry Potter and the Sorcerer's Stone (Harry P...,4765497,https://www.goodreads.com/book/show/3.Harry_Po...,https://images.gr-assets.com/books/1474154022m...,harry potter and the sorcerers stone harry pot...
2,5470,205,1984,2023937,https://www.goodreads.com/book/show/5470.1984,https://images.gr-assets.com/books/1348990566m...,1984
3,4671,201,The Great Gatsby,2758812,https://www.goodreads.com/book/show/4671.The_G...,https://images.gr-assets.com/books/1490528560m...,the great gatsby
4,136251,194,Harry Potter and the Deathly Hallows (Harry Po...,1784684,https://www.goodreads.com/book/show/136251.Har...,https://images.gr-assets.com/books/1474171184m...,harry potter and the deathly hallows harry pot...
...,...,...,...,...,...,...,...
140880,18730738,1,The Cheat Code for God Mode,42,https://www.goodreads.com/book/show/18730738-t...,https://images.gr-assets.com/books/1382998378m...,the cheat code for god mode
140881,20878116,1,"The Atlantis Gene (The Origin Mystery, #1)",153,https://www.goodreads.com/book/show/20878116-t...,https://images.gr-assets.com/books/1392938677m...,the atlantis gene the origin mystery 1
140882,18731053,1,Inspector Hobbes and the Curse (Unhuman #2),117,https://www.goodreads.com/book/show/18731053-i...,https://images.gr-assets.com/books/1383409930m...,inspector hobbes and the curse unhuman 2
140883,20622800,1,Gargoyle Knight (Gargoyle Knight #1),107,https://www.goodreads.com/book/show/20622800-g...,https://images.gr-assets.com/books/1390496626m...,gargoyle knight gargoyle knight 1


Now we are going to create a score that we will use to rank the recommendations. This score should penalyse books popular among all users, and reward books that are only popular within the users that read the books I liked.

In [85]:
all_recs["score"] = all_recs["book_count"] * (all_recs["book_count"] / all_recs["ratings"])

In [86]:
all_recs.sort_values("score", ascending=False).head(10)

Unnamed: 0,book_id,book_count,title,ratings,url,cover_image,mod_title,score
0,12,342,The Ultimate Hitchhiker's Guide: Five Complete...,3471,https://www.goodreads.com/book/show/12.The_Ult...,https://images.gr-assets.com/books/1404658218m...,the ultimate hitchhikers guide five complete n...,33.697494
9185,10085735,5,"Ebon (Pegasus, #2)",34,https://www.goodreads.com/book/show/10085735-ebon,https://s.gr-assets.com/assets/nophoto/book/11...,ebon pegasus 2,0.735294
10384,15999003,5,"Inherit the Night (Gentleman Bastard, #7)",36,https://www.goodreads.com/book/show/15999003-i...,https://s.gr-assets.com/assets/nophoto/book/11...,inherit the night gentleman bastard 7,0.694444
7681,9531458,6,7 Things To Do Before You Die in Talgarth (Sha...,61,https://www.goodreads.com/book/show/9531458-7-...,https://s.gr-assets.com/assets/nophoto/book/11...,7 things to do before you die in talgarth shad...,0.590164
11150,11049978,4,"The Last Great Tortoise Race (Nursery Crime, #3)",28,https://www.goodreads.com/book/show/11049978-t...,https://s.gr-assets.com/assets/nophoto/book/11...,the last great tortoise race nursery crime 3,0.571429
20087,23244609,3,"Tricks for Free (InCryptid, #7)",16,https://www.goodreads.com/book/show/23244609-t...,https://images.gr-assets.com/books/1499885246m...,tricks for free incryptid 7,0.5625
21475,2837684,3,The Levitationist,20,https://www.goodreads.com/book/show/2837684-th...,https://s.gr-assets.com/assets/nophoto/book/11...,the levitationist,0.45
22783,11525109,3,The Pioneer Cookbook: Recipes for Today's Kitchen,21,https://www.goodreads.com/book/show/11525109-t...,https://images.gr-assets.com/books/1337188080m...,the pioneer cookbook recipes for todays kitchen,0.428571
13495,34386617,4,"The Night Masquerade (Binti, #3)",38,https://www.goodreads.com/book/show/34386617-t...,https://images.gr-assets.com/books/1495725402m...,the night masquerade binti 3,0.421053
16896,442480,3,King Arthur: Hero and Legend,22,https://www.goodreads.com/book/show/442480.Kin...,https://images.gr-assets.com/books/1368429800m...,king arthur hero and legend,0.409091


We can notice that some of the resulting books have a low number of ratings. We are going to remove those books.

In [87]:
popular_recs = all_recs[all_recs["book_count"] > 75].sort_values("score", ascending=False)

In [88]:
def make_clickable(val):
  #This function will help us to style links in dataframes such that we can click on them to see if a book is the
  #one we are looking for
  return '<a target="_blank" href="{}">Goodreads</a>'.format(val)

def show_image(val):
  return '<img src="{}" width=50></img>'.format(val)

#avoid recommending books we have already read
popular_recs[~popular_recs["book_id"].isin(liked_books)].head(10).style.format({'url':make_clickable, 'cover_image':show_image})

Unnamed: 0,book_id,book_count,title,ratings,url,cover_image,mod_title,score
103,357,79,"The Long Dark Tea-Time of the Soul (Dirk Gently, #2)",60364,Goodreads,,the long dark teatime of the soul dirk gently 2,0.103389
80,365,87,Dirk Gently's Holistic Detective Agency (Dirk Gently #1),91027,Goodreads,,dirk gentlys holistic detective agency dirk gently 1,0.083151
63,2744,96,Anansi Boys,140090,Goodreads,,anansi boys,0.065786
41,7082,113,Do Androids Dream of Electric Sheep?,229370,Goodreads,,do androids dream of electric sheep,0.05567
28,12067,127,"Good Omens: The Nice and Accurate Prophecies of Agnes Nutter, Witch",307430,Goodreads,,good omens the nice and accurate prophecies of agnes nutter witch,0.052464
61,830,96,Snow Crash,179029,Goodreads,,snow crash,0.051478
33,14497,119,Neverwhere,275822,Goodreads,,neverwhere,0.051341
62,22328,96,"Neuromancer (Sprawl, #1)",184611,Goodreads,,neuromancer sprawl 1,0.049921
24,472331,139,Watchmen,406669,Goodreads,,watchmen,0.04751
101,14201,80,Jonathan Strange & Mr Norrell,136983,Goodreads,,jonathan strange mr norrell,0.046721
