# Análisis de sentimientos

A continuación utilizaremos LLM para realizar análisis de sentimientos, extrayendo características basadas en emociones (por ejm: libros con contenido de suspenso, de autoestima, felicidad, etc), dicha información puede ser bastante útil al aplicarla como filtro para recomendaciones de libros.

In [1]:
#Importando librerías
import pandas as pd

libros = pd.read_csv("books_with_categories.csv")

In [2]:
#Usando un modelo de HuggingFace
from transformers import pipeline
classifier = pipeline("text-classification", model="j-hartmann/emotion-english-distilroberta-base", top_k=None)
classifier("I love this!")

Device set to use cpu


[[{'label': 'joy', 'score': 0.9771687984466553},
  {'label': 'surprise', 'score': 0.008528684265911579},
  {'label': 'neutral', 'score': 0.005764583125710487},
  {'label': 'anger', 'score': 0.004419783595949411},
  {'label': 'sadness', 'score': 0.002092392183840275},
  {'label': 'disgust', 'score': 0.0016119909705594182},
  {'label': 'fear', 'score': 0.00041385277290828526}]]

Aplicamos el modelo para predecir la emoción general de las descripciones.

In [3]:
libros["description"][0]

'A NOVEL THAT READERS and critics have been eagerly anticipating for over a decade, Gilead is an astonishingly imagined story of remarkable lives. John Ames is a preacher, the son of a preacher and the grandson (both maternal and paternal) of preachers. It’s 1956 in Gilead, Iowa, towards the end of the Reverend Ames’s life, and he is absorbed in recording his family’s story, a legacy for the young son he will never see grow up. Haunted by his grandfather’s presence, John tells of the rift between his grandfather and his father: the elder, an angry visionary who fought for the abolitionist cause, and his son, an ardent pacifist. He is troubled, too, by his prodigal namesake, Jack (John Ames) Boughton, his best friend’s lost son who returns to Gilead searching for forgiveness and redemption. Told in John Ames’s joyous, rambling voice that finds beauty, humour and truth in the smallest of life’s details, Gilead is a song of celebration and acceptance of the best and the worst the world ha

In [4]:
#Aplicando el modelo a toda la descripción
classifier(libros["description"][0])

[[{'label': 'fear', 'score': 0.6548405885696411},
  {'label': 'neutral', 'score': 0.16985228657722473},
  {'label': 'sadness', 'score': 0.11640921980142593},
  {'label': 'surprise', 'score': 0.02070065587759018},
  {'label': 'disgust', 'score': 0.019100677222013474},
  {'label': 'joy', 'score': 0.01516144908964634},
  {'label': 'anger', 'score': 0.003935146611183882}]]

Hay mayor probabilidad (65%) de que el contenido sea de miedo o terror. Sin embargo al leer la descripción completa parece haber más de un sentimiento, por lo que separaremos en oraciones y aplicaremos nuevamente el modelo.

In [5]:
classifier(libros["description"][0].split("."))

[[{'label': 'surprise', 'score': 0.7296020984649658},
  {'label': 'neutral', 'score': 0.14038600027561188},
  {'label': 'fear', 'score': 0.06816228479146957},
  {'label': 'joy', 'score': 0.04794260859489441},
  {'label': 'anger', 'score': 0.009156366810202599},
  {'label': 'disgust', 'score': 0.0026284765917807817},
  {'label': 'sadness', 'score': 0.002122163539752364}],
 [{'label': 'neutral', 'score': 0.44937002658843994},
  {'label': 'disgust', 'score': 0.27359163761138916},
  {'label': 'joy', 'score': 0.10908330976963043},
  {'label': 'sadness', 'score': 0.09362746775150299},
  {'label': 'anger', 'score': 0.04047830402851105},
  {'label': 'surprise', 'score': 0.026970159262418747},
  {'label': 'fear', 'score': 0.006879047024995089}],
 [{'label': 'neutral', 'score': 0.6462159752845764},
  {'label': 'sadness', 'score': 0.24273329973220825},
  {'label': 'disgust', 'score': 0.04342271760106087},
  {'label': 'surprise', 'score': 0.028300564736127853},
  {'label': 'joy', 'score': 0.014211

Tenemos muchas más emociones encontradas, por ejemplo en la primera oración predomina la emoción de sorpresa, en la segunda oración predomina emoción neutral, etc.

In [6]:
#Comparando predicciones con las oraciones
oraciones = libros["description"][0].split(".")
predicciones = classifier(oraciones)

In [7]:
predicciones[0] #primera predicción

[{'label': 'surprise', 'score': 0.7296020984649658},
 {'label': 'neutral', 'score': 0.14038600027561188},
 {'label': 'fear', 'score': 0.06816228479146957},
 {'label': 'joy', 'score': 0.04794260859489441},
 {'label': 'anger', 'score': 0.009156366810202599},
 {'label': 'disgust', 'score': 0.0026284765917807817},
 {'label': 'sadness', 'score': 0.002122163539752364}]

In [8]:
oraciones[0] #primera oración

'A NOVEL THAT READERS and critics have been eagerly anticipating for over a decade, Gilead is an astonishingly imagined story of remarkable lives'

En efecto el contenido de la primera oración tiene una tonalidad de sorpresa.

In [9]:
predicciones[3] #cuarta predicción

[{'label': 'fear', 'score': 0.9281681180000305},
 {'label': 'anger', 'score': 0.032191041857004166},
 {'label': 'neutral', 'score': 0.01280867587774992},
 {'label': 'sadness', 'score': 0.008756861090660095},
 {'label': 'surprise', 'score': 0.008597906678915024},
 {'label': 'disgust', 'score': 0.008431818336248398},
 {'label': 'joy', 'score': 0.0010455837473273277}]

In [10]:
oraciones[3]

' Haunted by his grandfather’s presence, John tells of the rift between his grandfather and his father: the elder, an angry visionary who fought for the abolitionist cause, and his son, an ardent pacifist'

Al leer el contenido de la oración nos damos cuenta que están presentes las emociones de miedo, enojo, tristeza. El clasificador realiza bien su trabajo.

A continuación, elaboramos una lista de las siete emociones para cada libro, a cada emoción se le asignará el valor más alto que tenga en cualquier oración. De esta manera obtenemos una lista principal de emociones con sus valores máximos

In [11]:
#Ordenando la lista de emociones para que las etiquetas tengan siempre el mismo orden
sorted(predicciones[0], key=lambda x: x["label"]) #ordenar por "label"

[{'label': 'anger', 'score': 0.009156366810202599},
 {'label': 'disgust', 'score': 0.0026284765917807817},
 {'label': 'fear', 'score': 0.06816228479146957},
 {'label': 'joy', 'score': 0.04794260859489441},
 {'label': 'neutral', 'score': 0.14038600027561188},
 {'label': 'sadness', 'score': 0.002122163539752364},
 {'label': 'surprise', 'score': 0.7296020984649658}]

In [12]:
#Función para obtener la máxima probabilidad de cada emoción
import numpy as np

labels_emociones = ["anger", "disgust", "fear", "joy", "neutral", "sadness", "surprise"]
isbn = [] #para guardar códigos isbn13
scores_emocion = {label: [] for label in labels_emociones} #diccionario

#Creando función
def calcular_max_scores_emocion(predicciones):
    per_scores_emocion = {label: [] for label in labels_emociones} #contendrá todas las predicciones para una descripción
    for prediccion in predicciones: #recorrer cada oración
        predicciones_ordenadas = sorted(prediccion, key=lambda x: x["label"]) #ordenamos la predicción por label
        for index, label in enumerate(labels_emociones):
            per_scores_emocion[label].append(predicciones_ordenadas[index]["score"]) #se guarda score por cada emocion
    return {label: np.max(scores) for label, scores in per_scores_emocion.items()} #tomamos el score maximo para cada emocion

In [13]:
#Aplicando la función a los libros (10 primeros)
for i in range(10):
    isbn.append(libros["isbn13"][i]) #guardamos codigos isbn13 en lista
    oraciones = libros["description"][i].split(".") #diviendo descripción en oraciones
    predicciones = classifier(oraciones)
    max_scores = calcular_max_scores_emocion(predicciones) #obtenemos el máximo score de las emociones
    for label in labels_emociones:
        scores_emocion[label].append(max_scores[label]) #agregamos al diccionario los máximos scores de cada emoción

In [14]:
scores_emocion

{'anger': [np.float64(0.0641336441040039),
  np.float64(0.6126197576522827),
  np.float64(0.0641336441040039),
  np.float64(0.35148438811302185),
  np.float64(0.08141235262155533),
  np.float64(0.2322252243757248),
  np.float64(0.5381842255592346),
  np.float64(0.0641336441040039),
  np.float64(0.3006700277328491),
  np.float64(0.0641336441040039)],
 'disgust': [np.float64(0.27359163761138916),
  np.float64(0.3482847511768341),
  np.float64(0.10400667786598206),
  np.float64(0.1507224589586258),
  np.float64(0.18449543416500092),
  np.float64(0.7271744608879089),
  np.float64(0.155854731798172),
  np.float64(0.10400667786598206),
  np.float64(0.2794816195964813),
  np.float64(0.17792661488056183)],
 'fear': [np.float64(0.9281681180000305),
  np.float64(0.9425276517868042),
  np.float64(0.9723208546638489),
  np.float64(0.3607059419155121),
  np.float64(0.09504333138465881),
  np.float64(0.05136283114552498),
  np.float64(0.7474274635314941),
  np.float64(0.4044976532459259),
  np.float

Ya que el algoritmo funciona, lo aplicamos para todos los libros.

In [15]:
#Aplicando función
from tqdm import tqdm

#Reiniciamos variables
labels_emociones = ["anger", "disgust", "fear", "joy", "neutral", "sadness", "surprise"]
isbn = [] #para guardar códigos isbn13
scores_emocion = {label: [] for label in labels_emociones} #diccionario

for i in tqdm(range(len(libros))):
    isbn.append(libros["isbn13"][i]) #guardamos codigos isbn13 en lista
    oraciones = libros["description"][i].split(".") #diviendo descripción en oraciones
    predicciones = classifier(oraciones)
    max_scores = calcular_max_scores_emocion(predicciones) #obtenemos el máximo score de las emociones
    for label in labels_emociones:
        scores_emocion[label].append(max_scores[label]) #agregamos al diccionario los máximos scores de cada emoción

100%|██████████| 5197/5197 [12:40<00:00,  6.83it/s]


In [16]:
#Convertimos el diccionario a dataframe
df_emociones = pd.DataFrame(scores_emocion)
df_emociones["isbn13"] = isbn

In [17]:
df_emociones

Unnamed: 0,anger,disgust,fear,joy,neutral,sadness,surprise,isbn13
0,0.064134,0.273592,0.928168,0.932798,0.646216,0.967158,0.729602,9780002005883
1,0.612620,0.348285,0.942528,0.704422,0.887940,0.111690,0.252546,9780002261982
2,0.064134,0.104007,0.972321,0.767238,0.549477,0.111690,0.078766,9780006178736
3,0.351484,0.150722,0.360706,0.251881,0.732684,0.111690,0.078766,9780006280897
4,0.081412,0.184495,0.095043,0.040564,0.884390,0.475881,0.078766,9780006280934
...,...,...,...,...,...,...,...,...
5192,0.148208,0.030643,0.919165,0.255172,0.853721,0.980877,0.030656,9788172235222
5193,0.064134,0.114383,0.051363,0.400262,0.883198,0.111690,0.227765,9788173031014
5194,0.009997,0.009929,0.339218,0.947779,0.375754,0.066685,0.057625,9788179921623
5195,0.064134,0.104007,0.459269,0.759456,0.951104,0.368111,0.078766,9788185300535


In [18]:
#Fusionamos el nuevo dataframe al dataframe original (libros)
libros = pd.merge(libros, df_emociones, on="isbn13")

In [24]:
#Exportamos el dataset a CSV
libros.to_csv("books_with_emotions.csv", index=False)