# Book Recommendation Notebook
This notebook is designed to provide book recommendations by finding similar users based on mutually liked books. It processes user data and book ratings to generate personalized recommendations.

### Key Features:
- Data loading and cleaning
- Model building for recommendation
- Extraction of top book recommendations based on user activity

In [None]:
# Continuation after we obtain the book_ids of books we like
# A short history of nearly everything - 437143
# The intelligent investor - 106835
# The alchemist - 25076674

In [112]:
# Book_Id for each book in the goodreads_books json file is different from the id notation followed 
# in the 'goodreads_interactions' file (feature name: csv_id)
# book_id_map.csv file has the mapping between csv_id and book_id
import pandas as pd
liked_books = ['437143', '106835', '25076674']
id2={}
with open('book_id_map.csv') as f:
    while True:
        line = f.readline()
        if not line:
            break
        csv_id, book_id = line.split(',') #Take each line and make 2 variables separating by ,
        id2[csv_id]=book_id.strip()  # Adding strip to remove any newline characters
#This creates a dictionary id2 containing csv_id as keys and book_id as values

        

In [113]:
# Working with goodreads interactions file line by line (as the file size is huge)
# 'goodreads interactions' file contains information about the books different users have rated
similar_users = set()

with open("goodreads_interactions.csv", 'r') as f:
    while True:
        line = f.readline()
        if not line:
            break
        values = line.strip().split(',') 
        if len(values)==5:#if the number of elements in a row are 5, divide into below mentioned variables
            user_id,csv_id,_,rating,_ =line.split(",")
        
        if user_id in similar_users:
            continue

        try:
            rating = int(rating)
        except ValueError:
            continue
        
        book_id = id2[csv_id] 
        
        if book_id in liked_books and rating >= 4: # Finding other users who liked the same book as us, giving a rating of > 4
                similar_users.add(user_id)
# similar_users is a set containing different users who've rated the books we liked more than 4

* The above chunk takes each line of the file 'goodreads_interactions', splits it into variables user_id, csv_id, rating
* Convert the csv_id of book in each line into its corresponding book_id, check if the book belongs to our liked books. 
* If so, we proceed to check if the user has rated the book > 4 to see if the user's liking is similar to ours and add the user id to the set similar_users

In [114]:
# Finding what books similar users have read
recommended_books = []
with open('goodreads_interactions.csv') as f:
    while True:
        line = f.readline()
        if not line:
            break
        values = line.strip().split(',')
        if len(values)==5:
            user_id,csv_id,_,rating,_ =line.split(",")
        if user_id in similar_users:
            book_id = id2[csv_id] # Obtain the corresponding book_id based on the csv id (id2[csv_id]=book_id)
            recommended_books.append([user_id, book_id, rating])
# This returns book details of books liked by similar users

In [115]:
import pandas as pd
recommendations = pd.DataFrame(recommended_books, columns = ['user_id', 'book_id', 'rating'])
# This creates a dataframe called recommendations mentioning books similar users have rated well

In [116]:
top_recommendations=recommendations['book_id'].value_counts().head(10) 
# The above code finds the top 10 most recommended books (by their book_id)
# Approach - books rated high by similar users. Comes on top if many similar users have highly rated a particular book.
top_recommendations = top_recommendations.index.values
# This extracts the index of the top_recommendations series, which corresponds to the book_id values (without the counts)
books_titles = pd.read_csv("cleaned_books.csv") 
books_titles["book_id"] = books_titles["Id"].astype(str)
books_titles.head()


Unnamed: 0.1,Unnamed: 0,Cover,Id,Rating,Title,book_id
0,0,https://s.gr-assets.com/assets/nophoto/book/11...,1333909,10,good harbor,1333909
1,1,https://images.gr-assets.com/books/1304100136m...,7327624,140,the unschooled wizard sun wolf and starhawk 12,7327624
2,2,https://s.gr-assets.com/assets/nophoto/book/11...,6066819,51184,best friends forever,6066819
3,3,https://images.gr-assets.com/books/1413219371m...,287140,15,runic astrology starcraft and timekeeping in t...,287140
4,4,https://s.gr-assets.com/assets/nophoto/book/11...,287141,46,the aeneid for boys and girls,287141


In [121]:
books_titles[books_titles["book_id"].isin(top_recommendations)]
# This gives the list of books which are liked by other users who've liked books we like
# There's a high chance that the books which are most popular are being displayed here 
# Unique books which similar users have read (apart from the most popular books) might be missing

Unnamed: 0.1,Unnamed: 0,Cover,Id,Rating,Title,book_id
179347,179347,https://images.gr-assets.com/books/1409602421m...,106835,32476,the intelligent investor,106835
386663,386663,https://images.gr-assets.com/books/1447303603m...,2767052,4899965,the hunger games the hunger games 1,2767052
546297,546297,https://images.gr-assets.com/books/1398034300m...,5107,2086945,the catcher in the rye,5107
608482,608482,https://images.gr-assets.com/books/1372847500m...,5907,2099680,the hobbit,5907
630937,630937,https://images.gr-assets.com/books/1490528560m...,4671,2758812,the great gatsby,4671
838525,838525,https://images.gr-assets.com/books/1348990566m...,5470,2023937,1984,5470
1048210,1048210,https://images.gr-assets.com/books/1327909092m...,1202,529274,freakonomics a rogue economist explores the hi...,1202
1048745,1048745,https://images.gr-assets.com/books/1424037542m...,7613,1928931,animal farm,7613
1077226,1077226,https://images.gr-assets.com/books/1361975680m...,2657,3255518,to kill a mockingbird,2657
1196415,1196415,https://images.gr-assets.com/books/1474154022m...,3,4765497,harry potter and the sorcerers stone harry pot...,3


In [125]:
# To get the count of how many times a book appeared in recommendations
recs_count=recommendations['book_id'].value_counts().head(10)
recs_count = recs_count.to_frame().reset_index()
recs_count.columns = ['book_id', 'Recommendation Count']
recs_count

Unnamed: 0,book_id,Recommendation Count
0,106835,605
1,5470,573
2,2657,525
3,7613,490
4,4671,482
5,5107,456
6,3,445
7,5907,406
8,2767052,403
9,1202,400


In [131]:
recommendation_df = recs_count.merge(books_titles, on = 'book_id', how = 'inner')
recommendation_df

Unnamed: 0.1,book_id,Recommendation Count,Unnamed: 0,Cover,Id,Rating,Title
0,106835,605,179347,https://images.gr-assets.com/books/1409602421m...,106835,32476,the intelligent investor
1,5470,573,838525,https://images.gr-assets.com/books/1348990566m...,5470,2023937,1984
2,2657,525,1077226,https://images.gr-assets.com/books/1361975680m...,2657,3255518,to kill a mockingbird
3,7613,490,1048745,https://images.gr-assets.com/books/1424037542m...,7613,1928931,animal farm
4,4671,482,630937,https://images.gr-assets.com/books/1490528560m...,4671,2758812,the great gatsby
5,5107,456,546297,https://images.gr-assets.com/books/1398034300m...,5107,2086945,the catcher in the rye
6,3,445,1196415,https://images.gr-assets.com/books/1474154022m...,3,4765497,harry potter and the sorcerers stone harry pot...
7,5907,406,608482,https://images.gr-assets.com/books/1372847500m...,5907,2099680,the hobbit
8,2767052,403,386663,https://images.gr-assets.com/books/1447303603m...,2767052,4899965,the hunger games the hunger games 1
9,1202,400,1048210,https://images.gr-assets.com/books/1327909092m...,1202,529274,freakonomics a rogue economist explores the hi...


In [137]:
# To find books that are popular among similar users but aren't generally popular among all the users
recommendation_df["score"] = recommendation_df["Recommendation Count"] * (recommendation_df["Recommendation Count"] / recommendation_df["Rating"])
recommendation_df.sort_values('score', ascending = False).head(10)

Unnamed: 0.1,book_id,Recommendation Count,Unnamed: 0,Cover,Id,Rating,Title,score
0,106835,605,179347,https://images.gr-assets.com/books/1409602421m...,106835,32476,the intelligent investor,11.270631
9,1202,400,1048210,https://images.gr-assets.com/books/1327909092m...,1202,529274,freakonomics a rogue economist explores the hi...,0.302301
1,5470,573,838525,https://images.gr-assets.com/books/1348990566m...,5470,2023937,1984,0.162223
3,7613,490,1048745,https://images.gr-assets.com/books/1424037542m...,7613,1928931,animal farm,0.124473
5,5107,456,546297,https://images.gr-assets.com/books/1398034300m...,5107,2086945,the catcher in the rye,0.099637
2,2657,525,1077226,https://images.gr-assets.com/books/1361975680m...,2657,3255518,to kill a mockingbird,0.084664
4,4671,482,630937,https://images.gr-assets.com/books/1490528560m...,4671,2758812,the great gatsby,0.084212
7,5907,406,608482,https://images.gr-assets.com/books/1372847500m...,5907,2099680,the hobbit,0.078505
6,3,445,1196415,https://images.gr-assets.com/books/1474154022m...,3,4765497,harry potter and the sorcerers stone harry pot...,0.041554
8,2767052,403,386663,https://images.gr-assets.com/books/1447303603m...,2767052,4899965,the hunger games the hunger games 1,0.033145


### Conclusion:
The book recommendation system in this notebook successfully identifies the top recommended books for users by analyzing their previous interactions and activities by similar users based on their mutual liking. This provides a personalized experience for users to discover new books.
