# BOOK RECOMMENDER

After defining the optimal number of clusters in the previous kernel and assigning one cluster to each of the books in the dataset, I am now going to develop a function to build a book recommender.

The function takes as input a 'book_searched' and the dataset. If the searched book is in the dataset, it recommends the most similar book - using cosine similarity - from the same cluster.

In [1]:
import pandas as pd
import numpy as np
import unidecode
import random

from sklearn.cluster import KMeans
from sklearn import cluster, datasets
from IPython.display import Markdown, display 
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
gr_data = pd.read_csv('./2. Clean_df/gr_data_FOR_CLUSTERING.csv')
gr_data = gr_data.drop(['Unnamed: 0'],axis=1)
#display(gr_data.head(3),gr_data.shape)

# FUNCTION

In [16]:
def book_recommender(book_searched, gr_data):
    
    """This function takes in a book title and a dataframe of books,
    and recommends a book from the same cluster as the input book.
    It first normalizes the input title and finds matching books.
    It then identifies the input book's cluster and retrieves all books
    in that cluster. The most similar book to the input book is 
    recommended using cosine similarity. The function displays a random 
    recommendation message."""
    
    # Used libraries
    import random
    import unidecode
    import numpy as np
    from IPython.display import display, Markdown
    from sklearn.preprocessing import StandardScaler
    from sklearn.metrics.pairwise import cosine_similarity    
    
    """-------------------------- Standardize 'title' input -----------------------------"""
    
    # Normalized name of the book
    book_search_norm = unidecode.unidecode(book_searched.lower().strip())
    # Check if the book is in gr_data:
    book_matches = gr_data[gr_data['normalized_title'].str.contains(book_search_norm)]
    
    
    """----- Find match in book_matches dataframe and works on proper 'title' name ------"""

    # Set maximum number of options to display
    max_options = 20
    # If there is at least 1 coincedence or more
    if len(book_matches) > 0:
        #If there is just 1 match
        if len(book_matches) == 1:
            book_searched = book_matches.iloc[0]['title']
        #If there is more than 1 coincence
        else:
            print(f"Found {len(book_matches)} matches:")
            # Display list of options
            for i, row in enumerate(book_matches.head(max_options).itertuples(), start=1):
                print(f"{i}: {row.title} by {row.author}")
            while True:
                # Ask user to select choice (numerical range)
                user_choice = input(f"\nPlease select an option (1-{min(len(book_matches), max_options)}): ")
                try:
                    user_choice = int(user_choice)
                    if user_choice < 1 or user_choice > min(len(book_matches), max_options):
                        raise ValueError
                    break
                except ValueError:
                    print("Invalid input. Please enter a number between 1 and the number of options.")
        
            # Set the new book_searched variable to the selected book title
            book_searched = book_matches.iloc[user_choice - 1]['title']
    else:
        # Set book_searched to None if no matches are found
        print("No matches found.")
        book_searched = None 
        
    """----------------Identify cluster and get books of the same custer--------------------"""
    
    # Look for the book_searched corresponding 'cluster' value
    if book_searched is not None:
        book_searched_cluster = gr_data.loc[gr_data['title'].str.contains(book_searched), 'cluster'].values[0]
    else:
        book_searched_cluster = None
        
    # Get all books in gr_data that shares book_searched_cluster
    matching_cluster = gr_data.loc[gr_data['cluster'] == book_searched_cluster]
    
    
    """------------------ Most similar book in cluster - Cosine Similarity -----------------"""
    
    # Select only numerical columns
    features = matching_cluster.select_dtypes(include=np.number)
    # Drop 'cluster' column
    features = features.drop(['cluster'],axis=1)  # Remove the 'cluster' column from the list
    # Most similar book (nested function)
    #recommended_book = find_most_similar_book(book_searched, features)
    
    # Scale numerical features
    scaler = StandardScaler()
    features_scaled = scaler.fit_transform(features)
    features_scaled

    # Find index of recommended book within same_cluster_books
    recommended_book_idx = matching_cluster[matching_cluster['title'].str.contains(book_searched)].index[0]

    # Compute cosine similarity between books in same cluster
    sims = cosine_similarity(features_scaled)

    # Get the index of the most similar book
    most_similar_book_idx = np.argsort(sims[recommended_book_idx])[::-1][1]

    # Get the most similar book from same_cluster_books
    recommended_book = matching_cluster.iloc[most_similar_book_idx]
    
    """ ----------------------- Random messages to be displayed ----------------------------"""
    
    # Recommendation message
    message = random.choice([
            f"Based on your selection of '**{book_searched}**', we recommend the *hot* book '**{recommended_book['title']}**' by **{recommended_book['author']}**.",
            f"Looks like you've got great taste in books! We recommend the *sizzling* read '**{recommended_book['title']}**' by **{recommended_book['author']}** based on your choice of '**{book_searched}**'.",
            f"You've hit the jackpot with your choice of '**{book_searched}**'! Our recommendation for you is the *hyped* book '**{recommended_book['title']}**' by **{recommended_book['author']}**.",
            f"Based on your selection of '**{book_searched}**', we think you'll love the *popular* book '**{recommended_book['title']}**' by **{recommended_book['author']}**'.",
            f"Based on your selection of '**{book_searched}**', we suggest the *trending* novel '**{recommended_book['title']}**' by **{recommended_book['author']}**.'",
            f"Based on your selection of '**{book_searched}**', you might also like the *bestselling* book '**{recommended_book['title']}**' by **{recommended_book['author']}**.'",
            f"Looks like you're a big fan of **{book_searched}**! Our recommendation algorithm thinks you'll enjoy the hilarious and heartwarming **{recommended_book['title']}** by **{recommended_book['author']}**. Get ready to laugh and cry!",
            f"Oh joy, another person searching for **{book_searched}**. Haven't we had enough of that book already? But since you insist, our recommendation algorithm suggests the timeless classic **{recommended_book['title']}** by **{recommended_book['author']}**. Just kidding, it's probably just more of the same.",
            f"Based on your selection of '**{book_searched}**', we recommend the *hot* book '**{recommended_book['title']}**' by **{recommended_book['author']}**."
        ])
        
    display(Markdown(message))
     
    """ -------------------------- More info about the book recommended -------------------"""
    
    # Display book description
    display(Markdown(f"**Description:** {recommended_book['description']}"))

    # Display Goodreads rating
    display(Markdown(f"**GoodReads users rating:** {recommended_book['rating']}"))
        
    # URL link to search for recommended book on Goodreads
    url_title = '+'.join(recommended_book['title'].split())
    url = f"https://www.goodreads.com/search?q={url_title}"
    display(Markdown(f"Search for the recommended book on Goodreads: {url}"))
        
    # If book search is NOT IN gr_data
    #else:
    #    display(Markdown(f"Sorry, we couldn't find any books matching '**{book_searched.capitalize()}**'. Please try another search term."))


    

# Book Recommender

In [18]:
book_searched = input("Enter the name of the book you are interested in: ")
book = book_recommender(book_searched, gr_data)

Enter the name of the book you are interested in: Pedro paramo


Looks like you've got great taste in books! We recommend the *sizzling* read '**Twice-Told Tales**' by **Nathaniel Hawthorne** based on your choice of '**Pedro Páramo**'.

**Description:** The author of such short-fiction masterpieces as Young Goodman Brown and The Minister's Black Veil, Nathaniel Hawthorne is regarded as one of the most significant American writers of the nineteenth century. This volume collects many of his most famous short works and is a fitting compendium of his literary achievements for newcomers or longtime Hawthorne fans alike.

**GoodReads users rating:** 3.9

Search for the recommended book on Goodreads: https://www.goodreads.com/search?q=Twice-Told+Tales