# Sentiment Analysis with Pretrained Models
This notebook demonstrates different existing sentiment analysis tools for the german language. It should give a brief overview of the pros and cons of using them. Here we track performance and prediction time as an indicator of the different trade-offs.

## Install dependencies

In [None]:
!pip install ibm-watson
!pip install boto3
!pip install azure-ai-textanalytics==5.1.0
!pip install transformers
!pip install -q t5
!pip install textblob-de
!pip install google-cloud==1.0.3
!pip install google-cloud-language==1.2.0
!pip install google-api-python-client==1.12.8
!pip install SoMaJo

In [None]:
import pandas as pd

from somajo import SoMaJo
from time_tracker import measure_time

import warnings
warnings.filterwarnings("ignore")

import nltk
nltk.download("stopwords")
nltk.download('punkt')

## Set flags
Since we use different `APIs` in this notebook, it is better to only call thoes `APIs` you need.

In [None]:
google_flag = True
ibm_flag = True
aws_flag = True
google_t5_flag = True
textblob_de_flag = True

## Load the data
In this demo, we'll use the data of a challenge (GermEval2017) that focused on sentiment analysis in various online media such as newspapers, Twitter, Facebook and many more. This task is far from easy, especially for the german language. Thus the results are expected to be relatively low.

In [None]:
# load the dataset
data = pd.read_csv('data/germeval_2017_sentiment.tsv', sep='\t', header=None)
data.columns = ['id', 'text', 'relevance', 'sentiment', 'aspect:Polarity', 'NAN']
data.drop(columns=['NAN'], inplace=True)
data.head(3)

Randomly choose 20 samples of each category (e.g. `positive`, `negative` and `neutral`) on which the different tools will be tested.

In [None]:
data_pos = data[data.sentiment == 'positive'].sample(n=20)
data_neg = data[data.sentiment == 'negative'].sample(n=20)
data_neut = data[data.sentiment == 'neutral'].sample(n=20)

data = pd.DataFrame()
data = data.append(data_pos)
data = data.append(data_neg)
data = data.append(data_neut)
data.head(3)

Most of the tools focus on predicting the sentiment of sentences instead of entire texts. However, the GermEval2017 dataset contains complete texts. Thus they need to be split into sentences. To do this, we'll use `SoMaJo`. A german tokenizer focused on social media data.

In [None]:
tokenizer = SoMaJo("de_CMC", split_camel_case=True)
data['tokens'] = data.text.apply(lambda x: [[token.text for token in sent] for sent in tokenizer.tokenize_text([x])])
data['tokens'] = data.tokens.apply(lambda x: x[0])
data['sentence'] = data.tokens.apply(lambda x : ' '.join(x))
data["review_id"] = data.index + 1
data.head(3)

In [None]:
data['sentiment'].value_counts()

Now we can use different sentiment `APIs` to predict the polarity of our dataset.

## Prediction

### Google API

In [None]:
@measure_time
def run_google():
    from GoogleSentiment import GoogleSentiment
    google_sentiment = GoogleSentiment()
    google_output = google_sentiment.run_sentiment(data)
    print("Total time: {time.time - gg_start_time}")

if google_flag:
    run_google()

### IBMWatson API

In [None]:
@measure_time
def run_ibm():
    from IBMSentiment import IBMSentiment
    ibm_sentiment = IBMSentiment()
    ibm_output = ibm_sentiment.run_sentiment(data)

if ibm_flag:
    run_ibm()

### AWS API

In [None]:
@measure_time
def run_aws():
    from AWSSentiment import AWSSentiment
    aws_sentiment = AWSSentiment()
    aws_output = aws_sentiment.run_sentiment(data)

if aws_flag:
    run_aws()

### Google `T5`

In [None]:
@measure_time
def run_google_t5():
    from google_t5_sentiment import GoogleT5Sentiment
    google_t5_sentiment = GoogleT5Sentiment()
    gt5_output = google_t5_sentiment.run_sentiment(data)

if google_t5_flag:
    run_google_t5()

### `TextBlobDe` package

In [None]:
@measure_time
def run_textblob_de():
    from textblob_de_sentiment import TextBlobDESentiment
    textblob_de = TextBlobDESentiment()
    textblob_de_output = textblob_de.run_sentiment(data)

if textblob_de_flag:
    run_textblob_de()

## Evaluate the models
After the prediction, we will now need to evaluate the results obtained by the different tools correctly. We will do this in a few different ways
- confusion matrix
- classification report
- accuracy / balanced accuracy
- F1 score

In [None]:
from ModelValidation import ModelValidation
model_validation = ModelValidation()

### Evaluate the `Google API` performance

In [None]:
file_path = 'predictions/google_sentiment.csv'
google_df = pd.read_csv(file_path)
model_validation.evaluate(google_df['true_sentiment'], google_df['predicted_sentiment'], title='Google')

### Evaluate the `IBMWatson API` performance

In [None]:
file_path = 'predictions/ibmwatson_sentiment.csv'
ibm_df = pd.read_csv(file_path)
model_validation.evaluate(ibm_df['true_sentiment'], ibm_df['predicted_sentiment'], title='IBMWatson')

### Evaluate the `AWS API` performane

In [None]:
file_path = 'predictions/amazon_sentiment.csv'
aws_df = pd.read_csv(file_path)
model_validation.evaluate(aws_df['true_sentiment'], aws_df['predicted_sentiment'], title='Amazon')

### Evaluate the `Google T5` performance
This model is a binary classification (`negative`, `positive`)

In [None]:
file_path = 'predictions/googlet5_sentiment.csv'
gt5_df = pd.read_csv(file_path)
model_validation.evaluate(gt5_df['true_sentiment'], gt5_df['predicted_sentiment'], title='GoogleT5')

### Evaluate the `TextBlobDE` performance

In [None]:
file_path = 'predictions/textblob_de_sentiment.csv'
textblob_de_df = pd.read_csv(file_path)
model_validation.evaluate(textblob_de_df['true_sentiment'], textblob_de_df['predicted_sentiment'], title='TextBlobDE')