### Sentiment Analysis approach (see Section 4.2.3.7 of thesis)

This notebook focuses on how the ER system can be utilzed to an Sentiment Analysis approach. Furthermore, this approach is then used to compare it to state-of-art Sentiment Analysis approaches for German text corpora.

Please keep in mind that these notebooks are primarily used for conducting experiments, live coding, and implementing and evaluating the approaches presented in the thesis. As a result, the code in this notebook may not strictly adhere to best practice coding standards.

In [None]:
!pip install germansentiment
!pip install datasets
!pip install transformers
!pip install spacy
!python -m spacy download de_core_news_sm
!pip install -U textblob-de
!python3 -m textblob.download_corpora
!pip install spacy-sentiws

In [None]:
# It is necessary to  get be in directory of the Sentiment Analysis notebook. This is needed to execute the Sentiment Analysis approach based on the ER
# System of this thesis.
from google.colab import drive
drive.mount('/content/gdrive')
cd /content/gdrive/MyDrive/Experiment/Transformer Models/Sentiment Analysis

Mounted at /content/gdrive


In [None]:
from datasets import load_dataset
import pandas as pd

dataset = load_dataset("amazon_reviews_multi","de", split='test')
test_sent = pd.DataFrame(dataset)[["review_body", "stars"]]

test_sent.stars = test_sent.stars.replace({1:"negative"})
test_sent.stars = test_sent.stars.replace({5:"positive"})
test_sent = test_sent.loc[test_sent['stars'].isin(["negative", "positive"])]

test_sent.reset_index(drop=True, inplace=True)

Downloading builder script:   0%|          | 0.00/7.16k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/37.4k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/15.1k [00:00<?, ?B/s]

Downloading and preparing dataset amazon_reviews_multi/de to /root/.cache/huggingface/datasets/amazon_reviews_multi/de/1.0.0/724e94f4b0c6c405ce7e476a6c5ef4f87db30799ad49f765094cf9770e0f7609...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/90.3M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/2.25M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/2.26M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/200000 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/5000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/5000 [00:00<?, ? examples/s]

Dataset amazon_reviews_multi downloaded and prepared to /root/.cache/huggingface/datasets/amazon_reviews_multi/de/1.0.0/724e94f4b0c6c405ce7e476a6c5ef4f87db30799ad49f765094cf9770e0f7609. Subsequent calls will reuse this data.


### TextBlobDE Sentiment Analysis

In [None]:
# load and predict on textblob
from textblob_de import TextBlobDE as TextBlob
from tqdm import tqdm

def transform_blob(blob):
  if blob > 0:
    return "positive"
  elif blob < 0:
    return "negative"
  else:
    return "neutral"

blob_res = []
for i in tqdm(range(len(test_sent))):
  res = TextBlob(test_sent["review_body"][i])
  res = transform_blob(res.sentiment[0])
  blob_res.append(res)

100%|██████████| 2000/2000 [00:17<00:00, 115.07it/s]


### Spacy Sentiment Analysis

In [None]:
# load and predict on spacy
import spacy
from spacy_sentiws import spaCySentiWS

nlp = spacy.load('de_core_news_sm')
nlp.add_pipe('sentiws', config={'sentiws_path': '/content/'})

def sentiment_spacy(sentiments):
  neg = [abs(i) for i in sentiments if i < 0]
  pos = [i for i in sentiments if i > 0]

  if sum(pos) > sum(neg):
    return "positive"
  elif sum(pos) < sum(neg):
    return "negative"
  else:
    return "neutral"

def transform_spacy(inp):
  doc = nlp(inp)

  sentiments = []
  for token in doc:
    sentiments.append(token._.sentiws)

  return sentiment_spacy([i for i in sentiments if i != None])

spacy_res = []
for i in tqdm(range(len(test_sent))):
  res = transform_spacy(test_sent["review_body"][i])
  spacy_res.append(res)

### oliverguhr's German Sentiment Analysis with BERT

In [None]:
# load and predict on oliver guhr
from germansentiment import SentimentModel

model = SentimentModel()

oliv_res = []
for i in tqdm(range(len(test_sent))):
  res = model.predict_sentiment([test_sent["review_body"][i]])
  oliv_res.append(res)

oliv_res = [i[0] for i in oliv_res]

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/436M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/161 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/255k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

100%|██████████| 2000/2000 [00:29<00:00, 66.78it/s]


### Sentiment Analysis approach of this ER System.
Please note, that you need to insert the trained ER model to: "/content/gdrive/MyDrive/Experiment/Transformer Models/Sentiment Analysis/model/"
or equivalent to the local view:
"./Experiment/Transformer Models/Sentiment Analysis/model/"

In [None]:
# load and predict on own model
import SentimentAnalysis

model = SentimentAnalysis.EmotionModel()
my_res = []
for i in tqdm(range(len(test_sent))):
  res = model.get_sentiment(test_sent["review_body"][i])
  my_res.append(res)

In [None]:
import numpy as np
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("MilaNLProc/xlm-emo-t")

model = AutoModelForSequenceClassification.from_pretrained("MilaNLProc/xlm-emo-t")


def predict(inputs):
    inputs = tokenizer(inputs, truncation=True, padding=True, return_tensors="pt")
    # Pass the tokenized input to the model
    outputs = model(**inputs)
    # Get the predicted class probabilities
    predictions = outputs.logits.softmax(dim=1)

    # Extract the labels and scores
    label_list = model.config.id2label
    results = {}

    # Iterate over the predicted probabilities
    for pred in predictions:
        # Iterate over each label and score
        for label_id, score in enumerate(pred):
            label = label_list[label_id]
            results[label] = score.item()
    return results


def get_sentiment(inp):
  pred = predict(inp)

  sentiment = sum([-(pred['anger']),-(pred['fear']),-(pred['sadness']),pred['joy']])
  sentiment_val = np.round(sentiment,5)

  if sentiment_val < 0:
      return  "negative"

  elif sentiment_val > 0:
      return  "positive"

  else:
      return  "neutral"



'negative'

In [None]:
from tqdm import tqdm

xlm_emo_res = []

for i in tqdm(range(len(test_sent))):
  res = get_sentiment(test_sent["review_body"][i])
  xlm_emo_res.append(res)

100%|██████████| 2000/2000 [06:19<00:00,  5.27it/s]


# Predictions

In [None]:
accuracy_score(test_sent["stars"], spacy_res)

0.5155

In [None]:
accuracy_score(test_sent["stars"], my_res)

0.8675

In [None]:
accuracy_score(test_sent["stars"], oliv_res)

0.8965

In [None]:
accuracy_score(test_sent["stars"], xlm_emo_res)

0.9008333333333333
