Objetivos:

- Analisis del tono emocional (sentimental) de los libros impulsadas por ML
- Clasificar el tono emocional de los libros en base a las siguientes clasificaciones:
  - anger
  - disgust
  - fear
  - joy
  - sadness
  - surprise
  - neutral
- Fine tuning (entrenamiento) de un LLM para que el LLM pueda realizar estas clasificaciones

Ref:

- Modelos usados:
  - j-hartmann/emotion-english-distilroberta-base
    - Tipo: Text Classification
    - [URL](https://huggingface.co/j-hartmann/emotion-english-distilroberta-base)

In [16]:
# Data
import pandas as pd

# ML
from transformers import pipeline

# Math
import numpy as np

# Procesos
from tqdm import tqdm

1. Leer el dataset "book_with_categories.csv" (el dataset mas limpio que hemos producido hasta ahora)

In [2]:
books = pd.read_csv("../datasets/book_with_categories.csv")

2. Ejemplo de prediccion de emocion de un texto

In [3]:
classifier = pipeline("text-classification", model="j-hartmann/emotion-english-distilroberta-base", top_k= None)

classifier("I love this!")


Loading weights: 100%|██████████| 105/105 [00:00<00:00, 1691.73it/s, Materializing param=roberta.encoder.layer.5.output.dense.weight]             
RobertaForSequenceClassification LOAD REPORT from: j-hartmann/emotion-english-distilroberta-base
Key                             | Status     |  | 
--------------------------------+------------+--+-
roberta.embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


[[{'label': 'joy', 'score': 0.9771687984466553},
  {'label': 'surprise', 'score': 0.008528684265911579},
  {'label': 'neutral', 'score': 0.005764594301581383},
  {'label': 'anger', 'score': 0.004419777542352676},
  {'label': 'sadness', 'score': 0.0020923891570419073},
  {'label': 'disgust', 'score': 0.0016119900392368436},
  {'label': 'fear', 'score': 0.00041385178337804973}]]

1. Prueba con descripciones de algunos libros

In [None]:
# Analisa toda la descripcion
print(books["description"][0])

print(classifier(books["description"][0]))

A NOVEL THAT READERS and critics have been eagerly anticipating for over a decade, Gilead is an astonishingly imagined story of remarkable lives. John Ames is a preacher, the son of a preacher and the grandson (both maternal and paternal) of preachers. It’s 1956 in Gilead, Iowa, towards the end of the Reverend Ames’s life, and he is absorbed in recording his family’s story, a legacy for the young son he will never see grow up. Haunted by his grandfather’s presence, John tells of the rift between his grandfather and his father: the elder, an angry visionary who fought for the abolitionist cause, and his son, an ardent pacifist. He is troubled, too, by his prodigal namesake, Jack (John Ames) Boughton, his best friend’s lost son who returns to Gilead searching for forgiveness and redemption. Told in John Ames’s joyous, rambling voice that finds beauty, humour and truth in the smallest of life’s details, Gilead is a song of celebration and acceptance of the best and the worst the world has

In [None]:
# Analisa la descripcion, parte por parte
predictions_of_a_book = classifier(books["description"][0].split("."))

predictions_of_a_book

[[{'label': 'surprise', 'score': 0.7296019792556763},
  {'label': 'neutral', 'score': 0.1403862088918686},
  {'label': 'fear', 'score': 0.0681622326374054},
  {'label': 'joy', 'score': 0.04794260114431381},
  {'label': 'anger', 'score': 0.009156360290944576},
  {'label': 'disgust', 'score': 0.0026284800842404366},
  {'label': 'sadness', 'score': 0.0021221654023975134}],
 [{'label': 'neutral', 'score': 0.4493708908557892},
  {'label': 'disgust', 'score': 0.27359095215797424},
  {'label': 'joy', 'score': 0.10908332467079163},
  {'label': 'sadness', 'score': 0.09362729638814926},
  {'label': 'anger', 'score': 0.040478307753801346},
  {'label': 'surprise', 'score': 0.026970168575644493},
  {'label': 'fear', 'score': 0.006879050750285387}],
 [{'label': 'neutral', 'score': 0.6462164521217346},
  {'label': 'sadness', 'score': 0.24273289740085602},
  {'label': 'disgust', 'score': 0.0434226393699646},
  {'label': 'surprise', 'score': 0.028300585225224495},
  {'label': 'joy', 'score': 0.01421148

Extrar la probabilidad de la maxima emocion de la descripcion de un libro

In [None]:
# Ahora la emocion con mas probabilidad aparece en [0] de la lista devuelta por el metodo "sorted"
sorted(predictions_of_a_book[0], key=lambda x: x["label"], reverse=True)

[{'label': 'surprise', 'score': 0.7296019792556763},
 {'label': 'sadness', 'score': 0.0021221654023975134},
 {'label': 'neutral', 'score': 0.1403862088918686},
 {'label': 'joy', 'score': 0.04794260114431381},
 {'label': 'fear', 'score': 0.0681622326374054},
 {'label': 'disgust', 'score': 0.0026284800842404366},
 {'label': 'anger', 'score': 0.009156360290944576}]

2. Desarrollo del metodo para extrar la probabilidad de la maxima emocion de la descripcion de todos los libros

Definicion del metodo "calculate_max_emotion_scores"

In [11]:
# Array de emnociones que el modelo "emotion-english-distilroberta-base" puede analizar
emotion_labels = ["anger", "disgust", "fear", "joy", "sadness", "surprise", "neutral"]

def calculate_max_emotion_scores (predictions):
  per_emotion_scores = { label: [] for label in emotion_labels }

  for prediction in predictions:
    sorted_predictions = sorted(prediction, key=lambda x: x["label"], reverse=True)

    for index, label in enumerate (emotion_labels):
      per_emotion_scores [label].append(sorted_predictions [index]["score"])

  return {
    label: np.max(scores) for label, scores in per_emotion_scores.items()
  }

Aplicar el metodo "calculate_max_emotion_scores" sobre la descripcion de todos los libros

In [None]:
# Array de IDs
isbn13 = []

# "Score" de las emociones
emotion_scores = { label: [] for label in emotion_labels }

for i in tqdm(range(len(books))):
  isbn13.append(books["isbn13"][i])
  sentences = books["description"][i].split(".")
  predictions = classifier(sentences)
  max_scores = calculate_max_emotion_scores(predictions)

  for label in emotion_labels:
    emotion_scores[label].append(max_scores[label])

100%|██████████| 5197/5197 [04:21<00:00, 19.86it/s]


In [None]:
# Mostrar el "score" de las emociones de todos los libros
emotion_scores

{'anger': [np.float64(0.7296019792556763),
  np.float64(0.2525448501110077),
  np.float64(0.07876545190811157),
  np.float64(0.07876545190811157),
  np.float64(0.07876545190811157),
  np.float64(0.2719026505947113),
  np.float64(0.07876545190811157),
  np.float64(0.23448754847049713),
  np.float64(0.13561394810676575),
  np.float64(0.07876545190811157)],
 'disgust': [np.float64(0.9671575427055359),
  np.float64(0.11169004440307617),
  np.float64(0.11169004440307617),
  np.float64(0.11169004440307617),
  np.float64(0.4758809804916382),
  np.float64(0.11169004440307617),
  np.float64(0.40800073742866516),
  np.float64(0.8202821612358093),
  np.float64(0.3544611930847168),
  np.float64(0.11169004440307617)],
 'fear': [np.float64(0.6462164521217346),
  np.float64(0.887939453125),
  np.float64(0.5494773983955383),
  np.float64(0.7326844930648804),
  np.float64(0.8843896985054016),
  np.float64(0.6213928461074829),
  np.float64(0.7121946811676025),
  np.float64(0.5494773983955383),
  np.floa

Incluir los "score" de las emociones de todos los libros (emotion_scores) a los libros (books)

In [None]:
# Crear el DataFrame de "emotion_scores"
emotion_scores_df = pd.DataFrame(emotion_scores)

# Agregarle el "isbn13" al DataFrame de "emotion_scores"
emotion_scores_df["isbn13"] = isbn13

# Mostrar el nuevo DataFrame de "emotion_scores"
emotion_scores_df

Unnamed: 0,anger,disgust,fear,joy,sadness,surprise,neutral,isbn13
0,0.729602,0.967158,0.646216,0.932797,0.928168,0.273591,0.064134,9780002005883
1,0.252545,0.111690,0.887939,0.704421,0.942528,0.348285,0.612620,9780002261982
2,0.078765,0.111690,0.549477,0.767238,0.972321,0.104007,0.064134,9780006178736
3,0.078765,0.111690,0.732684,0.251881,0.360708,0.150722,0.351483,9780006280897
4,0.078765,0.475881,0.884390,0.040564,0.095043,0.184495,0.081412,9780006280934
...,...,...,...,...,...,...,...,...
5192,0.030656,0.980877,0.853721,0.255171,0.919165,0.030643,0.148208,9788172235222
5193,0.227765,0.111690,0.883198,0.400262,0.051363,0.114383,0.064134,9788173031014
5194,0.057625,0.066685,0.375755,0.947779,0.339218,0.009929,0.009997,9788179921623
5195,0.078765,0.368110,0.951104,0.759455,0.459271,0.104007,0.064134,9788185300535


In [21]:
# Hacer el merge de los libros (books) con sus respectivas prediciones sobre las emociones de las descripciones (emotion_scores_df)
books = pd.merge(books, emotion_scores_df, on="isbn13", how="left")

# Mostrar el resultado final
books

Unnamed: 0,isbn13,isbn10,title,authors,categories,thumbnail,description,published_year,average_rating,num_pages,...,title_and_subtitle,tagged_description,simple_categories,anger,disgust,fear,joy,sadness,surprise,neutral
0,9780002005883,0002005883,Gilead,Marilynne Robinson,Fiction,http://books.google.com/books/content?id=KQZCP...,A NOVEL THAT READERS and critics have been eag...,2004.0,3.85,247.0,...,Gilead,9780002005883 A NOVEL THAT READERS and critics...,Fiction,0.729602,0.967158,0.646216,0.932797,0.928168,0.273591,0.064134
1,9780002261982,0002261987,Spider's Web,Charles Osborne;Agatha Christie,Detective and mystery stories,http://books.google.com/books/content?id=gA5GP...,A new 'Christie for Christmas' -- a full-lengt...,2000.0,3.83,241.0,...,Spider's Web: A Novel,9780002261982 A new 'Christie for Christmas' -...,Fiction,0.252545,0.111690,0.887939,0.704421,0.942528,0.348285,0.612620
2,9780006178736,0006178731,Rage of angels,Sidney Sheldon,Fiction,http://books.google.com/books/content?id=FKo2T...,"A memorable, mesmerizing heroine Jennifer -- b...",1993.0,3.93,512.0,...,Rage of angels,"9780006178736 A memorable, mesmerizing heroine...",Fiction,0.078765,0.111690,0.549477,0.767238,0.972321,0.104007,0.064134
3,9780006280897,0006280897,The Four Loves,Clive Staples Lewis,Christian life,http://books.google.com/books/content?id=XhQ5X...,Lewis' work on the nature of love divides love...,2002.0,4.15,170.0,...,The Four Loves,9780006280897 Lewis' work on the nature of lov...,Nonfiction,0.078765,0.111690,0.732684,0.251881,0.360708,0.150722,0.351483
4,9780006280934,0006280935,The Problem of Pain,Clive Staples Lewis,Christian life,http://books.google.com/books/content?id=Kk-uV...,"""In The Problem of Pain, C.S. Lewis, one of th...",2002.0,4.09,176.0,...,The Problem of Pain,"9780006280934 ""In The Problem of Pain, C.S. Le...",Nonfiction,0.078765,0.475881,0.884390,0.040564,0.095043,0.184495,0.081412
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5192,9788172235222,8172235224,Mistaken Identity,Nayantara Sahgal,Indic fiction (English),http://books.google.com/books/content?id=q-tKP...,On A Train Journey Home To North India After L...,2003.0,2.93,324.0,...,Mistaken Identity,9788172235222 On A Train Journey Home To North...,Fiction,0.030656,0.980877,0.853721,0.255171,0.919165,0.030643,0.148208
5193,9788173031014,8173031010,Journey to the East,Hermann Hesse,Adventure stories,http://books.google.com/books/content?id=rq6JP...,This book tells the tale of a man who goes on ...,2002.0,3.70,175.0,...,Journey to the East,9788173031014 This book tells the tale of a ma...,Nonfiction,0.227765,0.111690,0.883198,0.400262,0.051363,0.114383,0.064134
5194,9788179921623,817992162X,The Monk Who Sold His Ferrari: A Fable About F...,Robin Sharma,Health & Fitness,http://books.google.com/books/content?id=c_7mf...,"Wisdom to Create a Life of Passion, Purpose, a...",2003.0,3.82,198.0,...,The Monk Who Sold His Ferrari: A Fable About F...,9788179921623 Wisdom to Create a Life of Passi...,Fiction,0.057625,0.066685,0.375755,0.947779,0.339218,0.009929,0.009997
5195,9788185300535,8185300534,I Am that,Sri Nisargadatta Maharaj;Sudhakar S. Dikshit,Philosophy,http://books.google.com/books/content?id=Fv_JP...,This collection of the timeless teachings of o...,1999.0,4.51,531.0,...,I Am that: Talks with Sri Nisargadatta Maharaj,9788185300535 This collection of the timeless ...,Nonfiction,0.078765,0.368110,0.951104,0.759455,0.459271,0.104007,0.064134


3. Exportar el resultado final (los libros con sus respectivas predicciones sobre el tono emocional de sus descripciones) a un dataset

In [23]:
books.to_csv("../datasets/4_books_with_emotion_scores.csv", index=False)