# Sentiment Analysis

Using huggin_face fine_tuned model for this task

In [2]:
import pandas as pd
import numpy as np

In [3]:
from transformers import pipeline
classifier = pipeline("text-classification", model="j-hartmann/emotion-english-distilroberta-base", return_all_scores=True)
classifier("I love this!")

  from .autonotebook import tqdm as notebook_tqdm
Device set to use cpu


[[{'label': 'anger', 'score': 0.004419787786900997},
  {'label': 'disgust', 'score': 0.0016119900392368436},
  {'label': 'fear', 'score': 0.00041385277290828526},
  {'label': 'joy', 'score': 0.9771687984466553},
  {'label': 'neutral', 'score': 0.005764589179307222},
  {'label': 'sadness', 'score': 0.002092392183840275},
  {'label': 'surprise', 'score': 0.008528691716492176}]]

#### Now the question is that how to perform sentiment analysis for the descriptions we have? For example should we apply it on the whole description? 

In [4]:
books = pd.read_csv("books_with_categories.csv")

In [5]:
import textwrap

print(textwrap.fill(books.loc[0, 'description'], width=80))
classifier(books.loc[0, 'description'])

A NOVEL THAT READERS and critics have been eagerly anticipating for over a
decade, Gilead is an astonishingly imagined story of remarkable lives. John Ames
is a preacher, the son of a preacher and the grandson (both maternal and
paternal) of preachers. It’s 1956 in Gilead, Iowa, towards the end of the
Reverend Ames’s life, and he is absorbed in recording his family’s story, a
legacy for the young son he will never see grow up. Haunted by his grandfather’s
presence, John tells of the rift between his grandfather and his father: the
elder, an angry visionary who fought for the abolitionist cause, and his son, an
ardent pacifist. He is troubled, too, by his prodigal namesake, Jack (John Ames)
Boughton, his best friend’s lost son who returns to Gilead searching for
forgiveness and redemption. Told in John Ames’s joyous, rambling voice that
finds beauty, humour and truth in the smallest of life’s details, Gilead is a
song of celebration and acceptance of the best and the worst the world has

[[{'label': 'anger', 'score': 0.003935148939490318},
  {'label': 'disgust', 'score': 0.019100705161690712},
  {'label': 'fear', 'score': 0.654839813709259},
  {'label': 'joy', 'score': 0.015161501243710518},
  {'label': 'neutral', 'score': 0.16985264420509338},
  {'label': 'sadness', 'score': 0.11640956997871399},
  {'label': 'surprise', 'score': 0.020700659602880478}]]

In [6]:
classifier("Haunted by his grandfather’s presence, John tells of the rift between his grandfather and his father: \
           the elder, an angry visionary who fought for the abolitionist cause, and his son, anardent pacifist")

[[{'label': 'anger', 'score': 0.001562298508360982},
  {'label': 'disgust', 'score': 0.00043948637903667986},
  {'label': 'fear', 'score': 0.986592710018158},
  {'label': 'joy', 'score': 0.000990753062069416},
  {'label': 'neutral', 'score': 0.001084391144104302},
  {'label': 'sadness', 'score': 0.007464968599379063},
  {'label': 'surprise', 'score': 0.0018653932493180037}]]

Lets split the text to sentences becasue maybe it is better to dont assign only one emotion to the whole text

In [7]:
preds = classifier(books.loc[0, 'description'].split("."))
preds

[[{'label': 'anger', 'score': 0.009156367741525173},
  {'label': 'disgust', 'score': 0.0026284796185791492},
  {'label': 'fear', 'score': 0.06816233694553375},
  {'label': 'joy', 'score': 0.04794260486960411},
  {'label': 'neutral', 'score': 0.14038586616516113},
  {'label': 'sadness', 'score': 0.0021221658680588007},
  {'label': 'surprise', 'score': 0.7296021580696106}],
 [{'label': 'anger', 'score': 0.04047833010554314},
  {'label': 'disgust', 'score': 0.2735915184020996},
  {'label': 'fear', 'score': 0.006879063323140144},
  {'label': 'joy', 'score': 0.10908326506614685},
  {'label': 'neutral', 'score': 0.4493700861930847},
  {'label': 'sadness', 'score': 0.09362751990556717},
  {'label': 'surprise', 'score': 0.026970162987709045}],
 [{'label': 'anger', 'score': 0.011031902395188808},
  {'label': 'disgust', 'score': 0.04342273622751236},
  {'label': 'fear', 'score': 0.01408410258591175},
  {'label': 'joy', 'score': 0.014211490750312805},
  {'label': 'neutral', 'score': 0.64621573686

Lets take from each of the emotions the highest score we have for the whole description

In [8]:
def calculate_max_emotion_scores(sentiment_analysis):

    highest_scores = {'anger': 0., 'disgust': 0., 'fear': 0., 'joy': 0., 'neutral': 0., 'sadness': 0., 'surprise': 0.}

    for l in sentiment_analysis:
        for dic in l:
            label = dic['label']
            score = dic['score']
            if score > highest_scores[label]:
                highest_scores[label] = score
    return highest_scores

In [9]:
calculate_max_emotion_scores(preds)

{'anger': 0.06413359194993973,
 'disgust': 0.2735915184020996,
 'fear': 0.9281682372093201,
 'joy': 0.9327983260154724,
 'neutral': 0.6462157368659973,
 'sadness': 0.9671575427055359,
 'surprise': 0.7296021580696106}

#### Now lets do it for all the books

In [10]:
emptions_for_each_book = []
isbn13 = []

for i in range(len(books)):
    isbn13.append(books.loc[i, 'isbn13'])
    sentences = books.loc[i, 'description'].split(".")
    preds = classifier(sentences)
    max_scores = calculate_max_emotion_scores(preds)
    emptions_for_each_book.append(max_scores)
    

In [11]:
df = pd.DataFrame(emptions_for_each_book)
df['isbn13'] = isbn13

Now lets write a function to do this for all the books

In [12]:
books = pd.merge(books, df, on="isbn13")
books.head()

Unnamed: 0.1,Unnamed: 0,isbn13,isbn10,title,authors,categories,thumbnail,description,published_year,average_rating,...,title_and_subtitle,tag_description,simple_categories,anger,disgust,fear,joy,neutral,sadness,surprise
0,0,9780002005883,2005883,Gilead,Marilynne Robinson,Fiction,http://books.google.com/books/content?id=KQZCP...,A NOVEL THAT READERS and critics have been eag...,2004.0,3.85,...,Gilead,9780002005883_A NOVEL THAT READERS and critics...,Fiction,0.064134,0.273592,0.928168,0.932798,0.646216,0.967158,0.729602
1,1,9780002261982,2261987,Spider's Web,Charles Osborne;Agatha Christie,Detective and mystery stories,http://books.google.com/books/content?id=gA5GP...,A new 'Christie for Christmas' -- a full-lengt...,2000.0,3.83,...,Spider's Web_A Novel,9780002261982_A new 'Christie for Christmas' -...,Fiction,0.612619,0.348285,0.942528,0.704422,0.88794,0.11169,0.252546
2,2,9780006178736,6178731,Rage of angels,Sidney Sheldon,Fiction,http://books.google.com/books/content?id=FKo2T...,"A memorable, mesmerizing heroine Jennifer -- b...",1993.0,3.93,...,Rage of angels,"9780006178736_A memorable, mesmerizing heroine...",Fiction,0.064134,0.104007,0.972321,0.767238,0.549477,0.11169,0.078765
3,3,9780006280897,6280897,The Four Loves,Clive Staples Lewis,Christian life,http://books.google.com/books/content?id=XhQ5X...,Lewis' work on the nature of love divides love...,2002.0,4.15,...,The Four Loves,9780006280897_Lewis' work on the nature of lov...,Nonfiction,0.351484,0.150722,0.360706,0.251881,0.732685,0.11169,0.078765
4,4,9780006280934,6280935,The Problem of Pain,Clive Staples Lewis,Christian life,http://books.google.com/books/content?id=Kk-uV...,"""In The Problem of Pain, C.S. Lewis, one of th...",2002.0,4.09,...,The Problem of Pain,"9780006280934_""In The Problem of Pain, C.S. Le...",Nonfiction,0.081413,0.184495,0.095043,0.040564,0.88439,0.475881,0.078765


In [13]:
books.to_csv('books_with_emotion.csv')

In [14]:
books.simple_categories.unique()

array(['Fiction', 'Nonfiction', "Children's Fiction",
       "Children's Nonfiction"], dtype=object)