<div id="container" style="position:relative;">
<div style="float:left"><h1> Capstone Project Modeling: Funk SVD  </h1></div>
<div style="position:relative; float:right"><img style="height:65px" src ="https://drive.google.com/uc?export=view&id=1EnB0x-fdqMp6I5iMoEBBEuxB_s7AmE2k" />
</div>
</div>

<br>
<br>
<br>


### Camilo Salazar <br> BrainStation <br> November 10, 2023

## Introduction

In this notebook, we delve into the exciting realm of hybrid recommendation systems, a powerful approach that combines the strengths of both collaborative filtering and content-based methods to provide highly personalized book recommendations. By fusing user behavior and content attributes, we aim to create a recommendation model that offers superior accuracy and enhanced user experiences. Join us on this journey as we explore the fusion of data-driven insights and content analysis to bring you a state-of-the-art hybrid book recommender.


In [1]:
# imports usefull libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.dates as mdates
import random

# import Supriside to run model 
from surprise import Dataset
from surprise.reader import Reader
from surprise.prediction_algorithms.matrix_factorization import SVD as FunkSVD
from surprise.model_selection import cross_validate, train_test_split, GridSearchCV
from surprise import accuracy

# Filter warnings
from warnings import filterwarnings
filterwarnings('ignore')

In [2]:
#loading all dataframes
book_df = pd.read_csv('data/books.csv')
tags_df = pd.read_csv('data/tags.csv')
book_tags_df = pd.read_csv('data/book_tags.csv')
ratings_df = pd.read_csv('data/ratings.csv')

## Collaborite filtering recomender 

In [4]:
# set reader of the rating
reader = Reader(rating_scale=(1, 5))
my_data = Dataset.load_from_df(ratings_df, reader)
trainset, testset = train_test_split(my_data, test_size=.10, random_state = 42)

In [6]:
final_model = FunkSVD( n_factors = 20,
                 n_epochs = 20,
                 lr_all = 0.0075,
                 biased = False,
                 random_state = 42)

final_model.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x2918e6d4dd0>

In [7]:
book_latent = final_model.qi
book_latent

array([[ 0.3369531 ,  0.00643481, -1.02027065, ...,  0.54481097,
        -0.37628551, -0.36398061],
       [-0.2157467 ,  0.54330917, -1.14447225, ...,  0.11042157,
        -0.18798275,  0.00404867],
       [ 0.57734943, -0.28976767, -1.45857707, ...,  0.37267011,
        -0.31876645, -0.51049235],
       ...,
       [ 0.03336135,  0.37107305, -1.20623341, ...,  0.20151792,
        -0.69288273, -0.24857848],
       [ 0.04705083,  0.33119691, -1.07257293, ...,  0.16431601,
        -0.71292374,  0.32603291],
       [-0.11239458,  0.39885339, -1.37572287, ..., -0.05530494,
        -0.76821062,  0.29739487]])

In [8]:
book_simind = pd.DataFrame(list(trainset._raw2inner_id_items.items()
),columns=['book_id', 'Vindex']).set_index('book_id', drop=True)
book_simind

Unnamed: 0_level_0,Vindex
book_id,Unnamed: 1_level_1
2757,0
134,1
1463,2
71,3
3339,4
...,...
7636,9995
9080,9996
9980,9997
1935,9998


In [19]:
def id_bookinfo(b_id):
    '''
    Retrieves book information based on a given book ID.

    Parameters
    ----------
    b_id: int
        The book ID (an integer) for the book to retrieve information about. Should be between 1 and 10,000.

    Returns
    -------
    result: pandas.DataFrame
        A DataFrame containing information about the book, including title, authors, and original publication year.
    '''
    # checks that the book id is and int and valid
    if (not isinstance(b_id, int)) or ((b_id < 1) or (b_id > 10000)):
        raise ValueError("Invalid Book Id Pick an Integer between 1-10000")
    
    result = book_df[book_df['book_id'] == b_id][['title', 'authors', 'original_publication_year']]
    return result

def title_to_id(b_title):
    '''
    Retrieves the book ID based on a given book title.

    Parameters
    ----------
    b_title: str
        The title of the book to find the corresponding book ID for.

    Returns
    -------
    result: int
        The book ID associated with the given title.
    '''
    
    result = book_df[book_df['title'] == b_title]['book_id']
    return result.values[0]

def vin_to_id(vin):
    '''
    Retrieves the book ID based on a given Vindex (index in a similarity matrix).

    Parameters
    ----------
    vin: int
        The Vindex value representing a book's index in a similarity matrix.

    Returns
    -------
    result: int
        The book ID associated with the given Vindex.
    '''
    result = book_simind[book_simind['Vindex'] == vin].index[0]
    return result


def id_to_vin(inb):
    '''
    Retrieves the Vindex (index in a similarity matrix) based on a given book ID.

    Parameters
    ----------
    inb: int
        The book ID for which the Vindex is needed.

    Returns
    -------
    result: int
    The Vindex associated with the given book ID
    '''     
    
    result = book_simind.loc[inb][0]
    return result

def title_to_vin(b_title):
    '''
    Retrieves the Vindex (index in a similarity matrix) based on a given book title.

    Parameters
    ----------
    b_title: str
        The title of the book for which the Vindex is needed.

    Returns
    -------
    result: int
        The Vindex associated with the given book title.
    '''
        
    result = id_to_vin(title_to_id(b_title))
    return result
    
    

In [21]:
from sklearn.metrics.pairwise import cosine_similarity 
book_similarities = cosine_similarity(book_latent, dense_output=False)

In [22]:
def recomender(b_title, book_similarities, bi):
    '''
    Recommends books similar to a given book title based on book similarities.

    Parameters
    ----------
    b_title: str
        The title of the book for which you want book recommendations.
    book_similarities: np.ndarray
        A 2D numpy array containing book similarities where each row represents a book.
    bi: pd.DataFrame
        A DataFrame containing book indices.

    Returns
    -------
    results: pd.DataFrame
        A DataFrame containing book recommendations and their similarities to the input book.
    '''
    # Copy book_similarities to a local variable for manipulation
    sim_arr = book_similarities
    # Create a copy of the book indices DataFrame
    botoind = bi.copy()
    # Get the Vindex (index in similarity matrix) of the input book title
    vin = title_to_vin(b_title)
    # Extract similarity data for the input book
    data = sim_arr[vin]
    # Add the Similarities column to the book indices DataFrame
    botoind['Similarities'] = data
    # Sort books by similarity in descending order
    botoind.sort_values('Similarities', ascending=False, inplace=True)
    # Remove the input book from the recommendations
    botoind = botoind.drop(vin_to_id(vin))
    # Get the top 10 book indices
    top10ind = botoind.head(10).index
    results = pd.DataFrame([])
    # Retrieve book information for the top 10 recommended books
    for b_id in top10ind:
        res = id_bookinfo(b_id)
        results = pd.concat([results, res])
    # Add the Similarities column to the recommendations DataFrame
    results['Similarities'] = botoind.head(10)['Similarities'].values
    return results


In [26]:
recomender("Words of Radiance (The Stormlight Archive, #2)",book_similarities,book_simind)

Unnamed: 0,title,authors,original_publication_year,Similarities
561,"The Way of Kings (The Stormlight Archive, #1)",Brandon Sanderson,2010.0,0.986627
9964,Mister B. Gone,Clive Barker,2007.0,0.969936
9140,"The Way of Kings, Part 1 (The Stormlight Archi...",Brandon Sanderson,2011.0,0.969516
306,"The Wise Man's Fear (The Kingkiller Chronicle,...",Patrick Rothfuss,2011.0,0.959442
3240,"Crooked Kingdom (Six of Crows, #2)",Leigh Bardugo,2016.0,0.958984
191,The Name of the Wind (The Kingkiller Chronicle...,Patrick Rothfuss,2007.0,0.956774
1199,"The Alloy of Law (Mistborn, #4)",Brandon Sanderson,2011.0,0.956183
3340,"The Bands of Mourning (Mistborn, #6)",Brandon Sanderson,2016.0,0.95519
7379,"Dark Fire (Matthew Shardlake, #2)",C.J. Sansom,2004.0,0.954631
388,"The Final Empire (Mistborn, #1)",Brandon Sanderson,2006.0,0.954596


In [27]:
recomender("5 Very Good Reasons to Punch a Dolphin in the Mouth and Other Useful Guides",book_similarities,book_simind)

Unnamed: 0,title,authors,original_publication_year,Similarities
9959,"The Prize Winner of Defiance, Ohio: How My Mot...",Terry Ryan,2001.0,0.96743
4813,Because of Mr. Terupt,Rob Buyea,2010.0,0.967251
6138,Captains and the Kings,Taylor Caldwell,1972.0,0.9672
8580,Gonzo: The Life of Hunter S. Thompson,"Jann S. Wenner, Corey Seymour",2007.0,0.966995
4346,The Heretic's Daughter,Kathleen Kent,2008.0,0.966279
8534,A New Hope (Star Wars: Novelizations #4),"Alan Dean Foster, George Lucas",1976.0,0.964958
4695,"The Defector (Gabriel Allon, #9)",Daniel Silva,2009.0,0.964551
6427,"Short Stories from Hogwarts of Heroism, Hardsh...","J.K. Rowling, MinaLima",2016.0,0.963986
9452,"In This Mountain (Mitford Years, #7)",Jan Karon,2002.0,0.963547
9643,"My Lady Jane (The Lady Janies, #1)","Cynthia Hand, Brodi Ashton, Jodi Meadows",2016.0,0.963285
