## `sentiment_analysis.ipynb` - анализ тональности описания фильмов
**При помощи языковой модели происходит извлечение эмоций из описания**

In [1]:
import pandas as pd

movies = pd.read_csv("../data/movies_with_genres.csv")

**Запуск модели для классификации эмоций в тексте**

In [2]:
from transformers import pipeline
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

classifier = pipeline("text-classification", model="j-hartmann/emotion-english-distilroberta-base", top_k=None, device=device)

classifier("Goida!")

Device set to use cuda


[[{'label': 'surprise', 'score': 0.3491100072860718},
  {'label': 'anger', 'score': 0.22491633892059326},
  {'label': 'joy', 'score': 0.1782553344964981},
  {'label': 'neutral', 'score': 0.1282225251197815},
  {'label': 'disgust', 'score': 0.0725356936454773},
  {'label': 'sadness', 'score': 0.03637048229575157},
  {'label': 'fear', 'score': 0.010589638724923134}]]

In [3]:
movies["overview"][0]

'Cobb, a skilled thief who commits corporate espionage by infiltrating the subconscious of his targets is offered a chance to regain his old life as payment for a task considered to be impossible: "inception", the implantation of another person\'s idea into a target\'s subconscious.'

**Делаем предсказание описания первого фильма**

In [4]:
classifier(movies["overview"][0])

[[{'label': 'neutral', 'score': 0.7862669229507446},
  {'label': 'disgust', 'score': 0.08769696950912476},
  {'label': 'anger', 'score': 0.06784504652023315},
  {'label': 'joy', 'score': 0.031864233314991},
  {'label': 'sadness', 'score': 0.010563821531832218},
  {'label': 'surprise', 'score': 0.009128081612288952},
  {'label': 'fear', 'score': 0.006634943187236786}]]

**Делаем предсказание описания первого фильма по предложениям**

In [5]:
classifier(movies["overview"][0].split("."))

[[{'label': 'neutral', 'score': 0.7524929642677307},
  {'label': 'anger', 'score': 0.08548389375209808},
  {'label': 'disgust', 'score': 0.08509394526481628},
  {'label': 'joy', 'score': 0.04001671448349953},
  {'label': 'surprise', 'score': 0.01256334688514471},
  {'label': 'fear', 'score': 0.012190472334623337},
  {'label': 'sadness', 'score': 0.012158682569861412}],
 [{'label': 'neutral', 'score': 0.549476683139801},
  {'label': 'sadness', 'score': 0.11169017851352692},
  {'label': 'disgust', 'score': 0.10400667041540146},
  {'label': 'surprise', 'score': 0.07876553386449814},
  {'label': 'anger', 'score': 0.06413363665342331},
  {'label': 'fear', 'score': 0.051362816244363785},
  {'label': 'joy', 'score': 0.04056441783905029}]]

In [6]:
sentences = movies["overview"][0].split(".")
predictions = classifier(sentences)

In [7]:
sentences[0]

'Cobb, a skilled thief who commits corporate espionage by infiltrating the subconscious of his targets is offered a chance to regain his old life as payment for a task considered to be impossible: "inception", the implantation of another person\'s idea into a target\'s subconscious'

In [8]:
predictions[0]

[{'label': 'neutral', 'score': 0.7524929642677307},
 {'label': 'anger', 'score': 0.08548389375209808},
 {'label': 'disgust', 'score': 0.08509394526481628},
 {'label': 'joy', 'score': 0.04001671448349953},
 {'label': 'surprise', 'score': 0.01256334688514471},
 {'label': 'fear', 'score': 0.012190472334623337},
 {'label': 'sadness', 'score': 0.012158682569861412}]

**Сортируем предсказания эмоций первого предложения по алфавиту**

In [9]:
sorted(predictions[0], key=lambda x: x["label"])

[{'label': 'anger', 'score': 0.08548389375209808},
 {'label': 'disgust', 'score': 0.08509394526481628},
 {'label': 'fear', 'score': 0.012190472334623337},
 {'label': 'joy', 'score': 0.04001671448349953},
 {'label': 'neutral', 'score': 0.7524929642677307},
 {'label': 'sadness', 'score': 0.012158682569861412},
 {'label': 'surprise', 'score': 0.01256334688514471}]

**Подготавливаем структуру для сбора эмоциональных оценок фильмов и определяем функцию для вычисления максимального значения каждой эмоции**

In [10]:
import numpy as np

emotion_labels = ["anger", "disgust", "fear", "joy", "sadness", "surprise", "neutral"]
imdb_ids = []
emotion_scores = {label: [] for label in emotion_labels}

def calculate_max_emotion_scores(sentence):
    per_emotion_scores = {label: [] for label in emotion_labels}
    for prediction in predictions:
        sorted_prediction = sorted(prediction, key=lambda x: x["label"])
        for index, label in enumerate(emotion_labels):
            per_emotion_scores[label].append(sorted_prediction[index]["score"])
    return {label: np.max(scores) for label, scores in per_emotion_scores.items()}

**Извлекаем оценки для первых 10 фильмов**

In [11]:
for i in range(10):
    imdb_ids.append(movies["imdb_id"][i])
    sentences = movies["overview"][i].split(".")
    predictions = classifier(sentences)
    max_scores = calculate_max_emotion_scores(predictions)
    for label in emotion_labels:
        emotion_scores[label].append(max_scores[label])

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


In [12]:
emotion_scores

{'anger': [np.float64(0.08548389375209808),
  np.float64(0.06413363665342331),
  np.float64(0.31546318531036377),
  np.float64(0.06413363665342331),
  np.float64(0.11398930102586746),
  np.float64(0.06413363665342331),
  np.float64(0.214565709233284),
  np.float64(0.107868991792202),
  np.float64(0.06413363665342331),
  np.float64(0.06413363665342331)],
 'disgust': [np.float64(0.10400667041540146),
  np.float64(0.10400667041540146),
  np.float64(0.1721762716770172),
  np.float64(0.10400667041540146),
  np.float64(0.1652555614709854),
  np.float64(0.10400667041540146),
  np.float64(0.10400667041540146),
  np.float64(0.10400667041540146),
  np.float64(0.10400667041540146),
  np.float64(0.10400667041540146)],
 'fear': [np.float64(0.051362816244363785),
  np.float64(0.051362816244363785),
  np.float64(0.9776860475540161),
  np.float64(0.12529602646827698),
  np.float64(0.9705313444137573),
  np.float64(0.6797763109207153),
  np.float64(0.9643508195877075),
  np.float64(0.9049578309059143),

**Анализ всех описаний**

In [13]:
from tqdm import tqdm

emotion_labels = ["anger", "disgust", "fear", "joy", "sadness", "surprise", "neutral"]
imdb_ids = []
emotion_scores = {label: [] for label in emotion_labels}

for i in tqdm(range(len(movies))):
    imdb_ids.append(movies["imdb_id"][i])
    sentences = movies["overview"][i].split(".")
    predictions = classifier(sentences)
    max_scores = calculate_max_emotion_scores(predictions)
    for label in emotion_labels:
        emotion_scores[label].append(max_scores[label])

100%|██████████| 1228/1228 [00:20<00:00, 60.99it/s]


**Создаем датафрейм для хранения эмоций каждого описания**

In [14]:
emotions_df = pd.DataFrame(emotion_scores)
emotions_df["imdb_id"] = imdb_ids
emotions_df

Unnamed: 0,anger,disgust,fear,joy,sadness,surprise,neutral,imdb_id
0,0.085484,0.104007,0.051363,0.040564,0.752493,0.111690,0.078766,tt1375666
1,0.064134,0.104007,0.051363,0.290713,0.549477,0.111690,0.116634,tt0816692
2,0.315463,0.172176,0.977686,0.548406,0.549477,0.111690,0.078766,tt0468569
3,0.064134,0.104007,0.125296,0.040564,0.751972,0.111690,0.078766,tt0499549
4,0.113989,0.165256,0.970531,0.171054,0.823941,0.212411,0.231226,tt0848228
...,...,...,...,...,...,...,...,...
1223,0.116909,0.104007,0.458601,0.712652,0.968510,0.111690,0.078766,tt2850386
1224,0.508866,0.104659,0.357230,0.253073,0.549477,0.711014,0.078766,tt0435625
1225,0.387601,0.325993,0.276351,0.040564,0.728907,0.183962,0.078766,tt0381707
1226,0.564887,0.209479,0.198819,0.040564,0.549477,0.111690,0.078766,tt0309698


**Объединяем `movies` и `emotions_df` по общему ключу**

In [15]:
movies = pd.merge(movies, emotions_df, on="imdb_id")

In [16]:
movies

Unnamed: 0,id,title,vote_average,vote_count,status,release_date,revenue,runtime,adult,backdrop_path,...,keywords,tagged_overview,simple_genres,anger,disgust,fear,joy,sadness,surprise,neutral
0,27205,Inception,8.364,34495,Released,2010-07-15,825532764,148,False,/8ZTVqvKDQ8emSGUEMjsS4yHAwrp.jpg,...,"rescue, mission, dream, airplane, paris, franc...","tt1375666 Cobb, a skilled thief who commits co...",Action/Adventure,0.085484,0.104007,0.051363,0.040564,0.752493,0.111690,0.078766
1,157336,Interstellar,8.417,32571,Released,2014-11-05,701729206,169,False,/pbrkL804c8yAv3zBZR4QPEafpAR.jpg,...,"rescue, future, spacecraft, race against time,...",tt0816692 The adventures of a group of explore...,Action/Adventure,0.064134,0.104007,0.051363,0.290713,0.549477,0.111690,0.116634
2,155,The Dark Knight,8.512,30619,Released,2008-07-16,1004558444,152,False,/nMKdUUepR0i5zn0y1T4CsSB5chy.jpg,...,"joker, sadism, chaos, secret identity, crime f...",tt0468569 Batman raises the stakes in his war ...,Action/Adventure,0.315463,0.172176,0.977686,0.548406,0.549477,0.111690,0.078766
3,19995,Avatar,7.573,29815,Released,2009-12-15,2923706026,162,False,/vL5LR6WdxWPjLPFRLe133jXWsh5.jpg,...,"future, society, culture clash, space travel, ...","tt0499549 In the 22nd century, a paraplegic Ma...",Action/Adventure,0.064134,0.104007,0.125296,0.040564,0.751972,0.111690,0.078766
4,24428,The Avengers,7.710,29166,Released,2012-04-25,1518815515,143,False,/9BBTo63ANSmhC4e6r62OJFuK2GL.jpg,...,"new york city, superhero, shield, based on com...",tt0848228 When an unexpected enemy emerges and...,Action/Adventure,0.113989,0.165256,0.970531,0.171054,0.823941,0.212411,0.231226
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1223,529203,The Croods: A New Age,7.521,3508,Released,2020-11-25,215905815,95,False,/ytTQoYkdpsgtfDWrNFCei8Mfbxu.jpg,...,"sequel, prehistory, candid, playful, joyous, a...","tt2850386 Searching for a safer habitat, the p...",Action/Adventure,0.116909,0.104007,0.458601,0.712652,0.968510,0.111690,0.078766
1224,9392,The Descent,6.957,3507,Released,2005-07-08,57130027,99,False,/70TIOrfkQli0Smsfjua2McaDPci.jpg,...,"panic, darkness, mutant, expedition, cave, cla...","tt0435625 After a tragic accident, six friends...",Action/Adventure,0.508866,0.104659,0.357230,0.253073,0.549477,0.711014,0.078766
1225,12153,White Chicks,6.919,3505,Released,2004-06-23,113086475,109,False,/di47xqYMCYpjqwnqNlO17X5qXMX.jpg,...,"undercover, fbi, cross dressing, car accident,...","tt0381707 Two FBI agent brothers, Marcus and K...",Comedy,0.387601,0.325993,0.276351,0.040564,0.728907,0.183962,0.078766
1226,2832,Identity,7.180,3502,Released,2003-04-25,90259536,90,False,/7MwDOMrbjrKP3XQ5vw4cgB2DPaF.jpg,...,"prostitute, prisoner, psychopath, nevada, dete...",tt0309698 Complete strangers stranded at a rem...,Thriller,0.564887,0.209479,0.198819,0.040564,0.549477,0.111690,0.078766


**Сохранение обновленного датафрейма**

In [17]:
movies.to_csv("../data/movies_with_emotions.csv", index=False)