## Recomendador de Libros

Recomendamos a un usuario conocido nuevos libros en función de sus preferencias.

Sacamos el proyecto de Kaggle:
https://www.kaggle.com/datasets/ishikajohari/best-books-10k-multi-genre-data

Los datos fueron recolectados de: Books That Everyone Should Read At Least Once (https://booksofbrilliance.com/2023/07/01/23-books-that-everyone-should-read-at-least-once-in-their-lives/)

En este proyecto tenemos un dataset con Libros y sus Generos aparte de otra información que no usaremos.

Creamos un perfil de usurio con 5 libros que aparecen en el dateset y damos una puntuación para poder comparar estos 5 libros con el resto.

In [1]:
#Importamos las librerias necesarias
import collections
import numpy as np 
import pandas as pd
import warnings
import random
warnings.simplefilter('ignore')


In [2]:
#Leemos el documento csv y lo pasamos a Dataframe
df = pd.read_csv("goodreads_data.csv")
df

Unnamed: 0.1,Unnamed: 0,Book,Author,Description,Genres,Avg_Rating,Num_Ratings,URL
0,0,To Kill a Mockingbird,Harper Lee,The unforgettable novel of a childhood in a sl...,"['Classics', 'Fiction', 'Historical Fiction', ...",4.27,5691311,https://www.goodreads.com/book/show/2657.To_Ki...
1,1,Harry Potter and the Philosopher’s Stone (Harr...,J.K. Rowling,Harry Potter thinks he is an ordinary boy - un...,"['Fantasy', 'Fiction', 'Young Adult', 'Magic',...",4.47,9278135,https://www.goodreads.com/book/show/72193.Harr...
2,2,Pride and Prejudice,Jane Austen,"Since its immediate success in 1813, Pride and...","['Classics', 'Fiction', 'Romance', 'Historical...",4.28,3944155,https://www.goodreads.com/book/show/1885.Pride...
3,3,The Diary of a Young Girl,Anne Frank,Discovered in the attic in which she spent the...,"['Classics', 'Nonfiction', 'History', 'Biograp...",4.18,3488438,https://www.goodreads.com/book/show/48855.The_...
4,4,Animal Farm,George Orwell,Librarian's note: There is an Alternate Cover ...,"['Classics', 'Fiction', 'Dystopia', 'Fantasy',...",3.98,3575172,https://www.goodreads.com/book/show/170448.Ani...
...,...,...,...,...,...,...,...,...
9995,9995,"Breeders (Breeders Trilogy, #1)",Ashley Quigley,How far would you go? If human society was gen...,"['Dystopia', 'Science Fiction', 'Post Apocalyp...",3.44,276,https://www.goodreads.com/book/show/22085400-b...
9996,9996,Dynamo,Eleanor Gustafson,Jeth Cavanaugh is searching for a new life alo...,[],4.23,60,https://www.goodreads.com/book/show/20862902-d...
9997,9997,The Republic of Trees,Sam Taylor,This dark fable tells the story of four Englis...,"['Fiction', 'Horror', 'Dystopia', 'Coming Of A...",3.29,383,https://www.goodreads.com/book/show/891262.The...
9998,9998,"Waking Up (Healing Hearts, #1)",Renee Dyer,For Adriana Monroe life couldn’t get any bette...,"['New Adult', 'Romance', 'Contemporary Romance...",4.13,263,https://www.goodreads.com/book/show/19347252-w...


## Exploratory Data Analysis (EDA).

In [3]:
# Seleccionamos las columnas necesarias para el recomendador que son Book y Genres.

df = df[['Book', 'Genres']]
df



Unnamed: 0,Book,Genres
0,To Kill a Mockingbird,"['Classics', 'Fiction', 'Historical Fiction', ..."
1,Harry Potter and the Philosopher’s Stone (Harr...,"['Fantasy', 'Fiction', 'Young Adult', 'Magic',..."
2,Pride and Prejudice,"['Classics', 'Fiction', 'Romance', 'Historical..."
3,The Diary of a Young Girl,"['Classics', 'Nonfiction', 'History', 'Biograp..."
4,Animal Farm,"['Classics', 'Fiction', 'Dystopia', 'Fantasy',..."
...,...,...
9995,"Breeders (Breeders Trilogy, #1)","['Dystopia', 'Science Fiction', 'Post Apocalyp..."
9996,Dynamo,[]
9997,The Republic of Trees,"['Fiction', 'Horror', 'Dystopia', 'Coming Of A..."
9998,"Waking Up (Healing Hearts, #1)","['New Adult', 'Romance', 'Contemporary Romance..."


In [4]:
# Eliminamos las filas que no tienen género.

for row in range(df.shape[0]):
    if df["Genres"][row] == "[]":
        df.drop(row, axis=0, inplace=True)
df.reset_index(inplace=True, drop=True) 

In [5]:
#Transformamos a String todos los Generos
df['Genres'] = df['Genres'].apply(lambda x: str(x))

In [6]:
# Limpiamos primero corchetes y comillas y luego hacemos un .split para separar los distintos generos de la lista. 
df['Genres'] = df['Genres'].apply(lambda x: x[2:-2])
df['Genres'] = df['Genres'].apply(lambda x: x.split("', '"))

In [7]:
#Sacamos una lista con todos los generos y como son muchos seleccionamos los 20 que aparecen con mayor frecuencia.
categorias_unicas = list(df["Genres"].sum())
Categorías=dict(collections.Counter(categorias_unicas))

cat=pd.DataFrame({"Genero" : Categorías.keys(),"Frecuencia" : Categorías.values()})
Categorias=list(cat["Genero"][0:21])

In [8]:
# Eliminamos las filas de DataFrame que no tienen ninguno de los generos mas frecuentes, los guardados en "Categorías".
for x in range(df.shape[0]):
    Coincidencia=0
    for gen in df["Genres"][x]:
        if gen in Categorias:
            Coincidencia+=1
    if Coincidencia == 0:
        df.drop(x, axis=0, inplace=True)
df.reset_index(inplace=True, drop=True) 

In [9]:
#Creamos una Lista con los generos que aparecen en cada Libro. 

datos = list()


for row in df["Genres"].values:

    
    categorias_libros = list()
    
    for cat in Categorias:
    
        if cat in row:
            categorias_libros.append(1)

        else:

            categorias_libros.append(0)

    datos.append(categorias_libros)
    
datos

# A partir de estos datos creamos un DataFrame

df_generos_Libros = pd.DataFrame(data = datos, columns = Categorias)

df_generos_Libros

Unnamed: 0,Classics,Fiction,Historical Fiction,School,Literature,Young Adult,Historical,Fantasy,Magic,Childrens,...,Romance,Audiobook,Nonfiction,History,Biography,Memoir,Holocaust,Dystopia,Politics,France
0,1,1,1,1,1,1,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,1,0,0,0,1,0,1,1,1,...,0,0,0,0,0,0,0,0,0,0
2,1,1,1,0,1,0,1,0,0,0,...,1,1,0,0,0,0,0,0,0,0
3,1,0,0,0,0,0,1,0,0,0,...,0,0,1,1,1,1,1,0,0,0
4,1,1,0,1,1,0,0,1,0,0,...,0,0,0,0,0,0,0,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8729,0,0,1,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8730,0,1,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
8731,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,1,0,0
8732,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


In [10]:
df_generos_Libros2=df_generos_Libros.copy()
df_generos_Libros2["Book"]=df["Book"]

In [11]:
# Generamos aleatoriamente una lista de 5 libros y 5 puntuaciones que simulan los datos proporcionados por el usuario.

Libros_leidos_u=[random.choice(df["Book"]) for x in range(5)]


Puntuaciones_u=[random.randint(0,10) for x in range(5)]


In [12]:
#Generamos un DataFrame con los datos del usuario
dfusuario= pd.DataFrame(zip(Libros_leidos_u,Puntuaciones_u), columns=["Book","Score"])
dfusuario

Unnamed: 0,Book,Score
0,My Lovely Wife,8
1,The Natural,9
2,Feminism Is for Everybody: Passionate Politics,8
3,The Raven,9
4,April Fool's Day,6


In [13]:
df_generos_usuario=pd.merge(dfusuario,df_generos_Libros2,on="Book")
df_generos_usuario


Unnamed: 0,Book,Score,Classics,Fiction,Historical Fiction,School,Literature,Young Adult,Historical,Fantasy,...,Romance,Audiobook,Nonfiction,History,Biography,Memoir,Holocaust,Dystopia,Politics,France
0,My Lovely Wife,8,0,1,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,The Natural,9,1,1,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Feminism Is for Everybody: Passionate Politics,8,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,1,0
3,The Raven,9,1,1,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,April Fool's Day,6,0,0,0,0,0,0,0,0,...,0,0,1,0,1,1,0,0,0,0


In [14]:
#Creamos la matriz de peso
weighted_genre_matrix_ = list()

for punto, libros in zip(dfusuario["Score"].values, df_generos_usuario.iloc[:,2:].values):
    weighted_genre_matrix_.append(punto*libros)
    
weighted_genre_matrix2 = pd.DataFrame(weighted_genre_matrix_, columns = Categorias)

weighted_genre_matrix2    

Unnamed: 0,Classics,Fiction,Historical Fiction,School,Literature,Young Adult,Historical,Fantasy,Magic,Childrens,...,Romance,Audiobook,Nonfiction,History,Biography,Memoir,Holocaust,Dystopia,Politics,France
0,0,8,0,0,0,0,0,0,0,0,...,0,8,0,0,0,0,0,0,0,0
1,9,9,0,0,9,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,8,0,0,0,0,0,8,0
3,9,9,0,9,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,6,0,6,6,0,0,0,0


In [15]:
# Creamos los pesos por usuario 
usuario_pesos = weighted_genre_matrix2.sum()

usuario_pesos = usuario_pesos/usuario_pesos.sum()

usuario_pesos.values

array([0.17307692, 0.25      , 0.        , 0.08653846, 0.08653846,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.07692308, 0.13461538, 0.        ,
       0.05769231, 0.05769231, 0.        , 0.        , 0.07692308,
       0.        ])

In [16]:
# Multiplicamos los pesos del usuario por la matriz de generos
weighted_genre_matrix3 = list()

for libros in df_generos_Libros.values:
    weighted_genre_matrix3.append(usuario_pesos*libros)

In [17]:
weighted_genre_matrix3=pd.DataFrame(weighted_genre_matrix3, columns= Categorias)

In [18]:
weighted_genre_matrix3

Unnamed: 0,Classics,Fiction,Historical Fiction,School,Literature,Young Adult,Historical,Fantasy,Magic,Childrens,...,Romance,Audiobook,Nonfiction,History,Biography,Memoir,Holocaust,Dystopia,Politics,France
0,0.173077,0.25,0.0,0.086538,0.086538,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0
1,0.173077,0.25,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0
2,0.173077,0.25,0.0,0.000000,0.086538,0.0,0.0,0.0,0.0,0.0,...,0.0,0.076923,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0
3,0.173077,0.00,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.134615,0.0,0.057692,0.057692,0.0,0.0,0.000000,0.0
4,0.173077,0.25,0.0,0.086538,0.086538,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.076923,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8729,0.000000,0.00,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0
8730,0.000000,0.25,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.076923,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0
8731,0.000000,0.00,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0
8732,0.000000,0.25,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0


In [19]:
# Obtenemos la suma que nos da el peso de cada libro.
weighted_genre_matrix3["Suma"]=weighted_genre_matrix3.sum(axis=1)

In [20]:
weighted_genre_matrix3["Books"]=df["Book"]

In [21]:
weighted_genre_matrix3

Unnamed: 0,Classics,Fiction,Historical Fiction,School,Literature,Young Adult,Historical,Fantasy,Magic,Childrens,...,Nonfiction,History,Biography,Memoir,Holocaust,Dystopia,Politics,France,Suma,Books
0,0.173077,0.25,0.0,0.086538,0.086538,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.596154,To Kill a Mockingbird
1,0.173077,0.25,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.423077,Harry Potter and the Philosopher’s Stone (Harr...
2,0.173077,0.25,0.0,0.000000,0.086538,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.586538,Pride and Prejudice
3,0.173077,0.00,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.134615,0.0,0.057692,0.057692,0.0,0.0,0.000000,0.0,0.423077,The Diary of a Young Girl
4,0.173077,0.25,0.0,0.086538,0.086538,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.076923,0.0,0.673077,Animal Farm
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8729,0.000000,0.00,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.000000,Call To Crusade
8730,0.000000,0.25,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.326923,"Die Känguru-Chroniken (Die Känguru-Chroniken, #1)"
8731,0.000000,0.00,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.000000,"Breeders (Breeders Trilogy, #1)"
8732,0.000000,0.25,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.250000,The Republic of Trees


In [22]:
# Ordenamos los libros por peso el mejor recomendado primero
weighted_genre_matrix3.sort_values("Suma", ascending=False,inplace=True)
weighted_genre_matrix3

Unnamed: 0,Classics,Fiction,Historical Fiction,School,Literature,Young Adult,Historical,Fantasy,Magic,Childrens,...,Nonfiction,History,Biography,Memoir,Holocaust,Dystopia,Politics,France,Suma,Books
282,0.173077,0.25,0.0,0.086538,0.086538,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.673077,The Jungle
607,0.173077,0.25,0.0,0.086538,0.086538,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.673077,Utopia
4,0.173077,0.25,0.0,0.086538,0.086538,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.673077,Animal Farm
0,0.173077,0.25,0.0,0.086538,0.086538,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.596154,To Kill a Mockingbird
398,0.173077,0.25,0.0,0.086538,0.086538,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.596154,"My Ántonia (Great Plains Trilogy, #3)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5145,0.000000,0.00,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,"Farsighted (Farsighted, #1)"
5131,0.000000,0.00,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,Tommy Goes Trick-Or-Treating (a Bird Brain Book)
5123,0.000000,0.00,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,"Halfway to the Grave (Night Huntress, #1)"
5100,0.000000,0.00,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,"Acheron (Dark-Hunter, #8; Entire Dark-Hunterve..."


In [23]:
# Imprimimos los 5 libros con mejor puntuación del recomendador.
Contador=0

for x in weighted_genre_matrix3["Books"].values:
    if x not in Libros_leidos_u and Contador<5:
        print(f"Te recomiendo este libro: {x}")
        Contador+=1
       

Te recomiendo este libro: The Jungle
Te recomiendo este libro: Utopia
Te recomiendo este libro: Animal Farm
Te recomiendo este libro: To Kill a Mockingbird
Te recomiendo este libro: My Ántonia (Great Plains Trilogy, #3)
