**6. Exploring book rating data**

In [4]:
liked_books = ["17788401", "44676678", "36411676", "62058385", "62056935", "58583724", "62057697", "60156155", 
               "62022434", "32620332", "23546634", "60393672", "58346977", "27362503", "25912358"]
#Defines a list called liked_books that contains book IDs.

**Find all the users that like the same books as us then find all the books that they liked (Assumption: They have similar taste).**

In [5]:
!head book_id_map.csv

500 Can't connect to book_id_map.csv:80 (Bad hostname 'book_id_map.csv')
Content-Type: text/plain
Client-Date: Tue, 09 May 2023 01:47:02 GMT



In [6]:
csv_book_mapping = {}

with open("book_id_map.csv", "r") as f:
    while True:
        line = f.readline()
        if not line:
            break
        csv_id, book_id = line.strip().split(",")
        csv_book_mapping[csv_id] = book_id

In [7]:
csv_book_mapping['0']

'34684622'

In [8]:
!wc -l goodreads_interactions.csv
#Uses the wc -l command to count the number of lines in the "goodreads_interactions.csv" file.

 228648343 goodreads_interactions.csv


**Size of goodreads_interactions.csv**

In [9]:
!ls -lh | grep goodreads_interactions.csv
#Uses the ls -lh command to display the size of the "goodreads_interactions.csv" file.

-rw-r--r--@ 1 britneyagius  staff   4.0G Mar  8 13:47 goodreads_interactions.csv


**7. Finding users who like the same books as us**

**A set is a python data structure where every element is unique.**

**if user_id in overlap_users: continue - means that if we already added this user to our overlap user set we don't need to keep processing.**

In [10]:
#data = pd.read_csv("goodreads_interactions.csv")
#data.head(n=9)

In [11]:
overlap_users = set()

with open("goodreads_interactions.csv", 'r') as f:
    while True:
        line = f.readline()
        if not line:
            break
        user_id, csv_id, _, rating, _ = line.split(",")
        
        if user_id in overlap_users:
            continue

        try:
            rating = int(rating)
        except ValueError:
            continue
        
        book_id = csv_book_mapping[csv_id]
        
        if book_id in liked_books and rating >= 4:
                overlap_users.add(user_id)
        
#Reads the "goodreads_interactions.csv" file and finds users who have rated the same books as in liked_books with a rating of 4 or higher

**8. Finding what books those users liked**

**rec_lines will only contin books that users who likes the same books as us have read.**

**rec_lines will contain books we might want to read.**

In [12]:
rec_lines = []

with open("goodreads_interactions.csv", 'r') as f:
    while True:
        line = f.readline()
        if not line:
            break
        user_id, csv_id, _, rating, _ = line.split(",")
        
        if user_id in overlap_users:
            book_id = csv_book_mapping[csv_id]
            rec_lines.append([user_id, book_id, rating])

#Reads the "goodreads_interactions.csv" file again and extracts the books that the overlapping users have rated highly.

In [13]:
import pandas as pd

recs = pd.DataFrame(rec_lines, columns=["user_id", "book_id", "rating"])
recs["book_id"] = recs["book_id"].astype(str)

In [14]:
recs

Unnamed: 0,user_id,book_id,rating
0,14,12881778,5
1,14,10818853,5
2,14,11857408,5
3,14,18806659,5
4,14,13448656,5
...,...,...,...
15082684,875876,8520610,0
15082685,875876,9013,0
15082686,875876,29751398,0
15082687,875876,29780253,5


**top_recs = recs("book_id").value_counts() - It counts up how many times each book id occured and it shows you the most common ones so our top recommendations will be our book ids that occured the most frequently.**

In [15]:
top_recs = recs["book_id"].value_counts().head(10)

**To get from the book id to a title: Read in from the file books_titles.json**

In [16]:
books_titles = pd.read_json("books_titles.json")
books_titles["book_id"] = books_titles["book_id"].astype(str)

In [17]:
books_titles.head()

Unnamed: 0,book_id,title,ratings,url,cover_image,mod_title
0,7327624,"The Unschooled Wizard (Sun Wolf and Starhawk, ...",140,https://www.goodreads.com/book/show/7327624-th...,https://images.gr-assets.com/books/1304100136m...,the unschooled wizard sun wolf and starhawk 12
1,6066819,Best Friends Forever,51184,https://www.goodreads.com/book/show/6066819-be...,https://s.gr-assets.com/assets/nophoto/book/11...,best friends forever
2,287141,The Aeneid for Boys and Girls,46,https://www.goodreads.com/book/show/287141.The...,https://s.gr-assets.com/assets/nophoto/book/11...,the aeneid for boys and girls
3,6066812,All's Fairy in Love and War (Avalon: Web of Ma...,98,https://www.goodreads.com/book/show/6066812-al...,https://images.gr-assets.com/books/1316637798m...,alls fairy in love and war avalon web of magic 8
4,287149,The Devil's Notebook,986,https://www.goodreads.com/book/show/287149.The...,https://images.gr-assets.com/books/1328768789m...,the devils notebook


**9. Creating initial book recommendations**

**Finds all of the book titles where the book id is in the top recommendations.**

In [18]:
books_titles[books_titles["book_id"].isin(top_recs)]

Unnamed: 0,book_id,title,ratings,url,cover_image,mod_title


**10. Improving our book recommendations**

In [19]:
all_recs = recs["book_id"].value_counts()
#Converts rec_lines into a DataFrame called recs.

In [20]:
all_recs = all_recs.to_frame().reset_index()
all_recs.columns = ["book_id", "book_count"]

In [21]:
all_recs.head(5)

Unnamed: 0,book_id,book_count
0,17788401,15131
1,27362503,13202
2,11870085,13003
3,2767052,12785
4,41865,12417


**Inner merge means that if the data dosen't exist in both get rid of the row.**

In [22]:
all_recs = all_recs.merge(books_titles, how="inner", on="book_id")

**A score will be created which we be used to sort these recommendations.**

In [23]:
all_recs["score"] = all_recs["book_count"] * (all_recs["book_count"] / all_recs["ratings"])

In [24]:
all_recs.sort_values("score", ascending=False).head(10)

Unnamed: 0,book_id,book_count,title,ratings,url,cover_image,mod_title,score
3663,31681941,610,The Comfort Zone,36,https://www.goodreads.com/book/show/31681941-t...,https://s.gr-assets.com/assets/nophoto/book/11...,the comfort zone,10336.111111
2310,24909347,889,"Obsidio (The Illuminae Files, #3)",82,https://www.goodreads.com/book/show/24909347-o...,https://images.gr-assets.com/books/1501704611m...,obsidio the illuminae files 3,9638.060976
5772,32333338,415,Save the Date,19,https://www.goodreads.com/book/show/32333338-s...,https://images.gr-assets.com/books/1510611507m...,save the date,9064.473684
2837,29749098,748,"Catwoman: Soulstealer (DC Icons, #3)",73,https://www.goodreads.com/book/show/29749098-c...,https://s.gr-assets.com/assets/nophoto/book/11...,catwoman soulstealer dc icons 3,7664.438356
4365,34705684,527,"Cracked Kingdom (The Royals, #5)",39,https://www.goodreads.com/book/show/34705684-c...,https://s.gr-assets.com/assets/nophoto/book/11...,cracked kingdom the royals 5,7121.25641
3736,23865291,599,"All Closed Off (Rusk University, #4)",54,https://www.goodreads.com/book/show/23865291-a...,https://images.gr-assets.com/books/1490797036m...,all closed off rusk university 4,6644.462963
3980,26181560,566,"Forget You, Ethan",49,https://www.goodreads.com/book/show/26181560-f...,https://images.gr-assets.com/books/1504787872m...,forget you ethan,6537.877551
7791,27212267,309,"Darkness Embraced (Hades Hangmen, #7)",16,https://www.goodreads.com/book/show/27212267-d...,https://s.gr-assets.com/assets/nophoto/book/11...,darkness embraced hades hangmen 7,5967.5625
7198,27212250,334,"Crux Untamed (Hades Hangmen, #6)",19,https://www.goodreads.com/book/show/27212250-c...,https://s.gr-assets.com/assets/nophoto/book/11...,crux untamed hades hangmen 6,5871.368421
1503,31050237,1238,"Untitled (A Court of Thorns and Roses, #4)",264,https://www.goodreads.com/book/show/31050237-u...,https://s.gr-assets.com/assets/nophoto/book/11...,untitled a court of thorns and roses 4,5805.469697


In [25]:
all_recs[all_recs["book_count"] > 700].sort_values("score", ascending=False).head(10)

Unnamed: 0,book_id,book_count,title,ratings,url,cover_image,mod_title,score
2310,24909347,889,"Obsidio (The Illuminae Files, #3)",82,https://www.goodreads.com/book/show/24909347-o...,https://images.gr-assets.com/books/1501704611m...,obsidio the illuminae files 3,9638.060976
2837,29749098,748,"Catwoman: Soulstealer (DC Icons, #3)",73,https://www.goodreads.com/book/show/29749098-c...,https://s.gr-assets.com/assets/nophoto/book/11...,catwoman soulstealer dc icons 3,7664.438356
1503,31050237,1238,"Untitled (A Court of Thorns and Roses, #4)",264,https://www.goodreads.com/book/show/31050237-u...,https://s.gr-assets.com/assets/nophoto/book/11...,untitled a court of thorns and roses 4,5805.469697
2288,31050358,896,"Untitled (A Court of Thorns and Roses, #5)",146,https://www.goodreads.com/book/show/31050358-u...,https://s.gr-assets.com/assets/nophoto/book/11...,untitled a court of thorns and roses 5,5498.739726
2680,30809786,787,"A Reaper at the Gates (An Ember in the Ashes, #3)",124,https://www.goodreads.com/book/show/30809786-a...,https://images.gr-assets.com/books/1507476834m...,a reaper at the gates an ember in the ashes 3,4994.91129
638,33590260,2139,"Untitled (Throne of Glass, #7)",1190,https://www.goodreads.com/book/show/33590260-u...,https://images.gr-assets.com/books/1488914165m...,untitled throne of glass 7,3844.807563
2576,17699859,816,"Chain of Thorns (The Last Hours, #3)",185,https://www.goodreads.com/book/show/17699859-c...,https://s.gr-assets.com/assets/nophoto/book/11...,chain of thorns the last hours 3,3599.221622
2883,15776693,740,Options,154,https://www.goodreads.com/book/show/15776693-o...,https://images.gr-assets.com/books/1345320956m...,options,3555.844156
1989,17699853,996,"Chain of Gold (The Last Hours, #1)",287,https://www.goodreads.com/book/show/17699853-c...,https://s.gr-assets.com/assets/nophoto/book/11...,chain of gold the last hours 1,3456.501742
1415,13541056,1291,The Queen of Air and Darkness (The Dark Artifi...,516,https://www.goodreads.com/book/show/13541056-q...,https://images.gr-assets.com/books/1510447136m...,the queen of air and darkness the dark artific...,3230.001938


**Popular recommendations.**

**The below output is given by how likely you will enjoy the book (score) based on your liked books.**

**Star ratings are based on rating.**

In [26]:
popular_recs = all_recs[all_recs["book_count"] > 1000].sort_values("score", ascending=False)

In [27]:
def add_star_rating(val):
    if pd.isna(val):
        return ""
    rating = int(val)
    stars = "&#11088;" * rating
    return stars

recs["rating"] = recs["rating"].astype(float)
recs["star_rating"] = recs["rating"].apply(lambda x: add_star_rating(5) if x >= 4 else add_star_rating(x))

# Aggregate ratings for each book_id
grouped_ratings = recs.groupby("book_id")["rating"].mean().reset_index()

# Filter out books with a rating under 1
grouped_ratings = grouped_ratings[grouped_ratings["rating"] >= 1]

# Merge popular_recs and grouped_ratings DataFrames
popular_recs_with_stars = popular_recs.merge(grouped_ratings, how="left", on="book_id")

# Add estimate book rating column as star rating
popular_recs_with_stars["star_book_rating"] = popular_recs_with_stars["rating"].apply(add_star_rating)

# Filter out books with a rating under 1
popular_recs_filtered = popular_recs_with_stars[popular_recs_with_stars["rating"] >= 1]

# Formatting function for clickable URLs
def make_clickable(val):
    return '<a target="_blank" href="{}">Goodreads</a>'.format(val)

# Formatting function for displaying cover images
def show_image(val):
    return '<a href="{}"><img src="{}" width=50></img></a>'.format(val, val)

# Apply formatting functions and display the DataFrame
popular_recs_formatted = popular_recs_filtered[~popular_recs_filtered["book_id"].isin(liked_books)].head(10).copy()
popular_recs_formatted = popular_recs_formatted.style.format(
    {'url': make_clickable, 'cover_image': show_image}
)

# Display the DataFrame with the added columns
popular_recs_formatted = popular_recs_formatted.set_table_styles([{
    'selector': 'td.col_star_rating',
    'props': [('text-align', 'center')]
}])

popular_recs_formatted

Unnamed: 0,book_id,book_count,title,ratings,url,cover_image,mod_title,score,rating,star_book_rating
5,25111004,10465,November 9,56228,Goodreads,,november 9,1947.716885,3.041949,⭐⭐⭐
6,22609310,11392,Confess,73455,Goodreads,,confess,1766.764196,3.065923,⭐⭐⭐
7,24378015,9483,"Never Never (Never Never, #1)",56002,Goodreads,,never never never never 1,1605.787097,2.535274,⭐⭐
8,33280872,4605,Without Merit,13852,Goodreads,,without merit,1530.89987,1.351792,⭐
10,29610595,4521,Too Late,16204,Goodreads,,too late,1261.382436,2.25282,⭐⭐
11,23587984,5652,"Maybe Not (Maybe, #1.5)",26935,Goodreads,,maybe not maybe 15,1186.007203,2.760439,⭐⭐
12,24422492,6136,"Never Never: Part Two (Never Never, #2)",32240,Goodreads,,never never part two never never 2,1167.819355,2.657269,⭐⭐
13,25454883,5369,"Never Never: Part Three (Never Never, #3)",27765,Goodreads,,never never part three never never 3,1038.219377,2.153287,⭐⭐
14,26721568,4244,The Problem with Forever,19211,Goodreads,,the problem with forever,937.563687,1.559849,⭐
15,23492533,3498,Swear on This Life,13083,Goodreads,,swear on this life,935.259803,1.801029,⭐
