# **Milestone 4:**

Creating two types of neural network–based sentiment analyzers


*   built-in neural-network based sentiment analyzer
*   self-built neural-network based sentiment analyzer



### **Setting up the environment**

In [1]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


Installing requirements

In [2]:
%%capture
!pip install --upgrade transformers  # make sure compatible with tokenizers
!wget https://raw.githubusercontent.com/crow-intelligence/growth-hacking-sentiment/master/requirements.txt
!pip install -r requirements.txt

Installing apex

In [3]:
%%writefile setup.sh

export CUDA_HOME=/usr/local/cuda-10.1
git clone https://github.com/NVIDIA/apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./apex

Writing setup.sh


Writing setup.sh

In [4]:
%%capture
!sh setup.sh

###**Importing the required modules**

In [6]:
# importing relevant libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from transformers import pipeline, BertTokenizer
from simpletransformers.classification import ClassificationModel
from simpletransformers.language_modeling import LanguageModelingModel

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

###**Loading in the small corpus csv**

In [7]:
# importing and reading csv from drive
df = pd.read_csv('/content/drive/MyDrive/Sentiment Analysis for Marketing/Data/small_corpus.csv')

In [8]:
# viewing a sample of the data
df.head()

Unnamed: 0,ratings,reviews
0,1,Recently UBISOFT had to settle a huge class-ac...
1,1,"code didn't work, got me a refund."
2,1,"these do not work at all, all i get is static ..."
3,1,well let me start by saying that when i first ...
4,1,"Dont waste your money, you will just end up us..."


In [9]:
# checking for null values
df.isnull().sum()

ratings    0
reviews    4
dtype: int64

In [10]:
# filling null values with emptry string ''
df['reviews'] = df['reviews'].fillna('')

### **Classification with a built-in neural network-based sentiment analyzer**

In [11]:
# building sentiment analysis pipeline
# by default, the Transformers library uses a DistilBERT model that was fine-tuned on the Stanford Sentiment Treebank v2 (SST2) task from the GLUE Dataset
nlp = pipeline("sentiment-analysis")

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

In [12]:
# constructing a BERT tokenizer based on WordPiece
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [13]:
# preprocessing and data preparation
# BERT-like models have a fixed limit in sequence length, which is often 512
# process data (e.g., remove whitespaces, converting to into lowercases)
def preprocessing(text):
  '''
  input:
    text: df-column with unprocessed text
  output:
    text_d: choped and en-/decoded text
  '''
  text_d = []
  for review in df['reviews']:
    review = review[:512]
    encoding = tokenizer.encode(review)
    decoding = tokenizer.decode(encoding, skip_special_tokens=True)
    text_d.append(decoding)
  return text_d

In [14]:
# printing two example outputs
# pipeline outputs a dictionary with two keys: 'label' and 'score'
preprocessed_data = preprocessing(df['reviews'])
for i in range(2):
  print(preprocessed_data[i], '\n')
  print(nlp(preprocessed_data[i]), '\n\n--\n')

recently ubisoft had to settle a huge class - action suit brought against the company for bundling ( the notoriously harmful ) starforce drm with its released games. so what the geniuses at the helm do next? they decide to make the same mistake yet again - by choosing the same drm scheme that made bioshock, mass effect and spore infamous : securom 7. xx with limited activations! mass effect can be found in clearance bins only months after its release ; spore not only undersold miserably but also made history as t 

[{'label': 'NEGATIVE', 'score': 0.9991271495819092}] 

--

code didn't work, got me a refund. 

[{'label': 'NEGATIVE', 'score': 0.9996480345726013}] 

--



In [15]:
df['reviews_pp'] = preprocessed_data
df.head()

Unnamed: 0,ratings,reviews,reviews_pp
0,1,Recently UBISOFT had to settle a huge class-ac...,recently ubisoft had to settle a huge class - ...
1,1,"code didn't work, got me a refund.","code didn't work, got me a refund."
2,1,"these do not work at all, all i get is static ...","these do not work at all, all i get is static ..."
3,1,well let me start by saying that when i first ...,well let me start by saying that when i first ...
4,1,"Dont waste your money, you will just end up us...","dont waste your money, you will just end up us..."


Converting sentiment scores into sentiment classes

In [17]:
# classifying the reviews as positive, neutral or negative
# since the classifier does not contain the neutral class (only positive and negative), it needs to be created on the basis of the probability score of the label scores
def categorize_review(text):
  s = nlp(text)
  if s[0]['label'] == 'POSITIVE' and s[0]['score'] > 0.85:
      return 1
  elif s[0]['label'] == 'NEGATIVE' and s[0]['score'] > 0.95:
      return -1
  else:  
      return 0

In [18]:
# applying the function and augmenting the dataframe
df['sentiment_cat'] = df['reviews_pp'].apply(categorize_review)

Converting ratings into rating classes

In [19]:
# classifying the ratings as positive, neutral or negative
def categorize_rating(rating):
  '''
  input:
    rating: rating of a certain review
  output:
    integer:
      1: for a positive review
      0: for a neutral review 
      -1: for a negative review
  '''
  if rating == 5:
    return 1 
  elif 2 <= rating <= 4:
    return 0
  else:
    return -1

In [20]:
# applying the function and augmenting the dataframe
df['rating_cat'] = df['ratings'].apply(categorize_rating)
df.head()

Unnamed: 0,ratings,reviews,reviews_pp,sentiment_cat,rating_cat
0,1,Recently UBISOFT had to settle a huge class-ac...,recently ubisoft had to settle a huge class - ...,-1,-1
1,1,"code didn't work, got me a refund.","code didn't work, got me a refund.",-1,-1
2,1,"these do not work at all, all i get is static ...","these do not work at all, all i get is static ...",-1,-1
3,1,well let me start by saying that when i first ...,well let me start by saying that when i first ...,-1,-1
4,1,"Dont waste your money, you will just end up us...","dont waste your money, you will just end up us...",-1,-1


Evaluating the sentiment analyzer

In [21]:
y_pred = list(df['sentiment_cat'])
y_true = list(df['rating_cat'])

a. Accuracy

In [22]:
# accuracy: amount of reviews that are categorized correctly as positive, negative or neutral
# using scikit-learn metric
accuracy = accuracy_score(y_true, y_pred)*100
print(f'The model got {accuracy.round(2)}% of the predictions right.')

The model got 61.07% of the predictions right.


b. Classification report

In [23]:
# classification report: text report showing the main classification metrics
# using scikit-learn metric
cr = classification_report(y_true, y_pred)
print(cr)

              precision    recall  f1-score   support

          -1       0.59      0.89      0.71      1500
           0       0.43      0.08      0.14      1500
           1       0.67      0.86      0.75      1500

    accuracy                           0.61      4500
   macro avg       0.56      0.61      0.53      4500
weighted avg       0.56      0.61      0.53      4500



### **Classification with a self-built neural network-based sentiment analyzer**

In [24]:
# transforming labels since the loss function won't work for negative labels
def transform_labels(label):
    """encode [-1, 0, 1] to [0, 1, 2]"""
    if label == -1:
        return 0  # negative
    elif label == 0:
        return 1  # neutral
    elif label == 1:
        return 2  # positive
    else:
        raise ValueError('unknown label value')

transformed_labels = df['rating_cat'].apply(transform_labels)
transformed_labels.unique()

array([0, 1, 2])

Splitting into training, validation and test test

In [26]:
# creating a training, validation and test set
X = list(df['reviews'])
y = list(transformed_labels)

X_train_full, X_test, y_train_full, y_test = train_test_split(X, y, stratify=y)

X_train, X_val, y_train, y_val = train_test_split(X_train_full, y_train_full, stratify=y_train_full)

train_df = pd.DataFrame({
    'text': X_train,
    'labels': y_train,})

val_df = pd.DataFrame({
    'text': X_val,
    'labels': y_val,})

test_df = pd.DataFrame({
    'text': X_test,
    'labels': y_test,})

print(train_df.shape)
print(val_df.shape)
print(test_df.shape)

(2531, 2)
(844, 2)
(1125, 2)


In [28]:
# saving training, validation and test corpora
# converting datasets to .tsv format used by the Hugging Face library
with open("/content/drive/MyDrive/Sentiment Analysis for Marketing/data/processed/train.tsv", "w") as outfile:
  outfile.write(train_df.to_csv(index=False, sep="\t"))

with open("/content/drive/MyDrive/Sentiment Analysis for Marketing/data/processed/val.tsv", "w") as outfile:
  outfile.write(val_df.to_csv(index=False, sep="\t"))

with open("/content/drive/MyDrive/Sentiment Analysis for Marketing/data/processed/test.tsv", "w") as outfile:
  outfile.write(test_df.to_csv(index=False, sep="\t"))

Model configuration & training process

In [None]:
# creating the model
# using simpletransformers ClassificationModel (built on top of the Transformers library by Hugging Face)
# the ClassificationModel class is used for all text classification tasks except for multi label classification
model = ClassificationModel(
    "distilbert",
    "distilbert-base-uncased",
    use_cuda=True,
    num_labels=3,
    args={
        "use_cuda": True, 
        "max_seq_length": 128,
        "num_train_epochs": 10,
        "output_dir": "/content/drive/MyDrive/Sentiment Analysis for Marketing/outputs/",
        "best_model_dir": "/content/drive/MyDrive/Sentiment Analysis for Marketing/outputs/best_model/",
        "evaluate_during_training": True,
        "train_batch_size": 20,
        "eval_batch_size": 20
      })

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classi

In [None]:
# training the model
model.train_model(train_df=train_df, eval_df=val_df)

  0%|          | 0/2531 [00:00<?, ?it/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 0 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 2 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 3 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 4 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 5 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 6 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 7 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 8 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 9 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

(1270,
 {'eval_loss': [0.8187382013298744,
   0.8069924282473188,
   0.974675869525865,
   1.283232920391615,
   1.4819503398828728,
   1.7624836979910385,
   1.7129898701989374,
   1.792810367983441,
   1.890520955934081,
   1.843089029539463],
  'global_step': [127, 254, 381, 508, 635, 762, 889, 1016, 1143, 1270],
  'mcc': [0.46277706015564046,
   0.4495763039434509,
   0.45022894513005685,
   0.4408810726388039,
   0.4635245173672817,
   0.4120472984697979,
   0.4799155016416804,
   0.4919371152638494,
   0.46823036119600214,
   0.48333486003765963],
  'train_loss': [0.7175840139389038,
   0.648752748966217,
   0.3928728997707367,
   0.1962171494960785,
   0.009276074357330799,
   0.15121930837631226,
   0.02738451212644577,
   0.016408884897828102,
   0.0011780043132603168,
   0.001493785879574716]})

Evaluating the sentiment analyzer

In [None]:
# evaluating the model
result, model_outputs, wrong_predictions = model.eval_model(eval_df=val_df, acc=accuracy_score)
print(result)

  0%|          | 0/844 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/43 [00:00<?, ?it/s]

{'mcc': 0.48333486003765963, 'acc': 0.6552132701421801, 'eval_loss': 1.843089029539463}


In [None]:
y_pred = [list(e[0]) for e in model_outputs]
y_pred = [e.index(max(e)) for e in y_pred]
y_true = y_val

a. Accuracy

In [None]:
# accuracy: amount of reviews that are categorized correctly as positive, negative or neutral
# using scikit-learn metric
accuracy = accuracy_score(y_true, y_pred)*100
print(f'The model got {accuracy.round(2)}% of the predictions right.')

The model got 65.52% of the predictions right.


b. Classification report

In [None]:
# classification report: text report showing the main classification metrics
# using scikit-learn metric
cr = classification_report(y_true, y_pred)
print(cr)

              precision    recall  f1-score   support

           0       0.71      0.65      0.68       281
           1       0.55      0.60      0.57       282
           2       0.72      0.71      0.71       281

    accuracy                           0.66       844
   macro avg       0.66      0.66      0.66       844
weighted avg       0.66      0.66      0.66       844



### **Finetuning the model**

#### **Testing different hyperparameters**

Model configuration & training process

In [None]:
# increasing the sequence_length from 128 --> 512
model_512 = ClassificationModel(
    "distilbert",
    "distilbert-base-uncased",
    use_cuda=True,
    num_labels=3,
    args={
        "use_cuda": True, 
        "max_seq_length": 512,
        "num_train_epochs": 10,
        "output_dir": "/content/drive/MyDrive/Sentiment Analysis for Marketing/outputs_m512/",
        "best_model_dir": "/content/drive/MyDrive/Sentiment Analysis for Marketing/outputs_m512/best_model_m512/",
        "evaluate_during_training": True,
        "train_batch_size": 20,
        "eval_batch_size": 20
      })

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classi

In [None]:
# training the adjusted model
model_512.train_model(train_df=train_df, eval_df=val_df)

  0%|          | 0/2531 [00:00<?, ?it/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 0 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 2 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 3 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 4 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 5 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 6 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 7 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 8 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

Running Epoch 9 of 10:   0%|          | 0/127 [00:00<?, ?it/s]

  0%|          | 0/844 [00:00<?, ?it/s]

(1270,
 {'eval_loss': [0.7927087822625803,
   0.785042331662289,
   0.8846903408682624,
   1.248641490243202,
   1.397418200969696,
   1.613371426282927,
   1.720530066379281,
   1.8409901427668194,
   1.84936977264493,
   1.8861425477404927],
  'global_step': [127, 254, 381, 508, 635, 762, 889, 1016, 1143, 1270],
  'mcc': [0.46657591536966897,
   0.4756841078054947,
   0.5134017223791005,
   0.4510083493916854,
   0.4929888734522363,
   0.47249388188498687,
   0.461115535929252,
   0.47013265605294136,
   0.47813622457415766,
   0.47699474883233006],
  'train_loss': [1.0378761291503906,
   0.5310162305831909,
   0.48766037821769714,
   0.1333058923482895,
   0.1244797632098198,
   0.15841470658779144,
   0.003794949734583497,
   0.0017462145769968629,
   0.0012575556756928563,
   0.0015826773596927524]})

Evaluating the sentiment analyzer

In [None]:
# evaluating the adjusted model
result_512, model_outputs_512, wrong_predictions_512 = model_512.eval_model(eval_df=val_df, acc=accuracy_score)
print(result)

  0%|          | 0/844 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/43 [00:00<?, ?it/s]

{'mcc': 0.47699474883233006, 'acc': 0.6504739336492891, 'eval_loss': 1.8861425477404927}


In [None]:
y_pred_512 = [list(e[0]) for e in model_outputs_512]
y_pred_2 = [e.index(max(e)) for e in y_pred_512]
y_true = y_val

a. Accuracy

In [None]:
# accuracy: amount of reviews that are categorized correctly as positive, negative or neutral
# using scikit-learn metric
accuracy_512 = accuracy_score(y_true, y_pred_512)*100
print(f'The finetuned model got {accuracy_512.round(2)}% of the predictions right.')

The finetuned model got 65.05% of the predictions right.


b. Classification report

In [None]:
# classification report: text report showing the main classification metrics
# using scikit-learn metric
cr_512 = classification_report(y_true, y_pred_512)
print(cr_512)

              precision    recall  f1-score   support

           0       0.70      0.65      0.68       281
           1       0.52      0.59      0.55       282
           2       0.77      0.71      0.74       281

    accuracy                           0.65       844
   macro avg       0.66      0.65      0.65       844
weighted avg       0.66      0.65      0.65       844



#### **Training a language model based on distilbert and using it for classification** 

Preprocessing and data preparation

In [37]:
# preprocessing the data 
df_train = pd.read_csv("/content/drive/MyDrive/Sentiment Analysis for Marketing/data/processed/train.tsv", sep="\t")
reviews_train = list(df_train['text'])
reviews_train = [str(r).lower().strip() for r in reviews_train]
ratings_train = df_train['labels']

df_val = pd.read_csv("/content/drive/MyDrive/Sentiment Analysis for Marketing/data/processed/val.tsv", sep="\t")
reviews_val = list(df_val['text'])
reviews_val = [str(r).lower().strip() for r in reviews_val]
ratings_val = df_val['labels']

df_test = pd.read_csv("/content/drive/MyDrive/Sentiment Analysis for Marketing/data/processed/test.tsv", sep="\t")
reviews_test = list(df_test['text'])
reviews_test = [str(r).lower().strip() for r in reviews_test]
ratings_test = df_test['labels']

# creating the .txt files as input for the language model training process
# data should be in a text file with one text sample per row
train = "\n". join(reviews_train)
val = "\n".join(reviews_val)
test = "\n". join(reviews_test)

with open("/content/drive/MyDrive/Sentiment Analysis for Marketing/data/processed/train.txt", "w") as outfile:
    outfile.write(train)

with open("/content/drive/MyDrive/Sentiment Analysis for Marketing/data/processed/val.txt", "w") as outfile:
    outfile.write(val)

with open("/content/drive/MyDrive/Sentiment Analysis for Marketing/data/processed/test.txt", "w") as outfile:
    outfile.write(test)

Model configuration & training process for pre-training a language model

In [38]:
# first we will train a language model from scratch
# using simpletransformers LanguageModel (built on top of the Transformers library by Hugging Face)
train_args = {
    "output_dir": "/content/drive/MyDrive/Sentiment Analysis for Marketing/language_model/",
    "best_model_dir": "/content/drive/MyDrive/Sentiment Analysis for Marketing/language_model/best_model/",
    "reprocess_input_data": True,
    "overwrite_output_dir": True,
     "num_train_epochs": 10,
     "evaluate_during_training": True,
}

model_lm = LanguageModelingModel('distilbert', 'distilbert-base-uncased',
                              use_cuda=True,
                              args=train_args)

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/442 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

In [39]:
# training the model
model_lm.train_model("/content/drive/MyDrive/Sentiment Analysis for Marketing/data/processed/train.txt",
                  eval_file="/content/drive/MyDrive/Sentiment Analysis for Marketing/data/processed/val.txt")

  0%|          | 0/7152 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (532 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1221 > 512). Running this sequence through the model will result in indexing errors


  0%|          | 0/3652 [00:00<?, ?it/s]

Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 0 of 10:   0%|          | 0/457 [00:00<?, ?it/s]



  0%|          | 0/2216 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (558 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (666 > 512). Running this sequence through the model will result in indexing errors


  0%|          | 0/1124 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/457 [00:00<?, ?it/s]

  0%|          | 0/2216 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (558 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (666 > 512). Running this sequence through the model will result in indexing errors


  0%|          | 0/1124 [00:00<?, ?it/s]

Running Epoch 2 of 10:   0%|          | 0/457 [00:00<?, ?it/s]

  0%|          | 0/2216 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (558 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (666 > 512). Running this sequence through the model will result in indexing errors


  0%|          | 0/1124 [00:00<?, ?it/s]

Running Epoch 3 of 10:   0%|          | 0/457 [00:00<?, ?it/s]

  0%|          | 0/2216 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (558 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (666 > 512). Running this sequence through the model will result in indexing errors


  0%|          | 0/1124 [00:00<?, ?it/s]

Running Epoch 4 of 10:   0%|          | 0/457 [00:00<?, ?it/s]

  0%|          | 0/2216 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (558 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (666 > 512). Running this sequence through the model will result in indexing errors


  0%|          | 0/1124 [00:00<?, ?it/s]

  0%|          | 0/2216 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (558 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (666 > 512). Running this sequence through the model will result in indexing errors


  0%|          | 0/1124 [00:00<?, ?it/s]

Running Epoch 5 of 10:   0%|          | 0/457 [00:00<?, ?it/s]

  0%|          | 0/2216 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (558 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (666 > 512). Running this sequence through the model will result in indexing errors


  0%|          | 0/1124 [00:00<?, ?it/s]

Running Epoch 6 of 10:   0%|          | 0/457 [00:00<?, ?it/s]

  0%|          | 0/2216 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (558 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (666 > 512). Running this sequence through the model will result in indexing errors


  0%|          | 0/1124 [00:00<?, ?it/s]

Running Epoch 7 of 10:   0%|          | 0/457 [00:00<?, ?it/s]

  0%|          | 0/2216 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (558 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (666 > 512). Running this sequence through the model will result in indexing errors


  0%|          | 0/1124 [00:00<?, ?it/s]

Running Epoch 8 of 10:   0%|          | 0/457 [00:00<?, ?it/s]

  0%|          | 0/2216 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (558 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (666 > 512). Running this sequence through the model will result in indexing errors


  0%|          | 0/1124 [00:00<?, ?it/s]

  0%|          | 0/2216 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (558 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (666 > 512). Running this sequence through the model will result in indexing errors


  0%|          | 0/1124 [00:00<?, ?it/s]

Running Epoch 9 of 10:   0%|          | 0/457 [00:00<?, ?it/s]

  0%|          | 0/2216 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (558 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (666 > 512). Running this sequence through the model will result in indexing errors


  0%|          | 0/1124 [00:00<?, ?it/s]

(4570,
 {'eval_loss': [2.627924693391678,
   2.584329238174655,
   2.514475853730601,
   2.5188781733208514,
   2.5110436001567975,
   2.4593157531521843,
   2.4628333766409694,
   2.439303026131704,
   2.4497628744612348,
   2.417667842080407,
   2.433594603910514,
   2.425932097096815],
  'global_step': [457,
   914,
   1371,
   1828,
   2000,
   2285,
   2742,
   3199,
   3656,
   4000,
   4113,
   4570],
  'perplexity': [tensor(13.8450),
   tensor(13.2544),
   tensor(12.3601),
   tensor(12.4147),
   tensor(12.3178),
   tensor(11.6968),
   tensor(11.7380),
   tensor(11.4650),
   tensor(11.5856),
   tensor(11.2197),
   tensor(11.3998),
   tensor(11.3128)],
  'train_loss': [2.056353807449341,
   3.2569618225097656,
   2.5133960247039795,
   2.7291083335876465,
   2.32588267326355,
   2.815772294998169,
   1.9189523458480835,
   2.23355770111084,
   2.2739369869232178,
   2.557093381881714,
   2.7234623432159424,
   2.092848062515259]})

Model configuration & training process for using the pre-trained model as classifier

In [42]:
# now we will finetune the model to fit our classification task
# as shown by Jeremy Howard and Sebastian Ruder fine-tuning the language model can lead to performance enhancement (https://arxiv.org/abs/1801.06146)
# using simpletransformers ClassificationModel (built on top of the Transformers library by Hugging Face)
model2 = ClassificationModel(
    model_type="distilbert",
    model_name="/content/drive/MyDrive/Sentiment Analysis for Marketing/language_model/best_model/",
    use_cuda=True,
    num_labels=3,
    args={
        "output_dir": "/content/drive/MyDrive/Sentiment Analysis for Marketing/outputs_2/",
        "best_model_dir": "/content/drive/MyDrive/Sentiment Analysis for Marketing/outputs_2/best_model/",
        "evaluate_during_training": True,
        "reprocess_input_data": True,
        "sliding_window": True,
        "overwrite_output_dir": True,
        "max_seq_length": 512,
        "num_train_epochs": 10,
        "train_batch_size": 20,
        "eval_batch_size": 20,
    },
)

Some weights of the model checkpoint at /content/drive/MyDrive/Sentiment Analysis for Marketing/language_model/best_model/ were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at /content/drive/MyDrive/Sentiment Analysis for Ma

In [43]:
# training the model
model2.train_model(train_df=train_df, eval_df=val_df)

  0%|          | 0/2531 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (551 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (544 > 512). Running this sequence through the model will result in indexing errors


Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Running Epoch 0 of 10:   0%|          | 0/147 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (1619 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1028 > 512). Running this sequence through the model will result in indexing errors


Running Epoch 1 of 10:   0%|          | 0/147 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (1619 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1028 > 512). Running this sequence through the model will result in indexing errors


Running Epoch 2 of 10:   0%|          | 0/147 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (1619 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1028 > 512). Running this sequence through the model will result in indexing errors


Running Epoch 3 of 10:   0%|          | 0/147 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (1619 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1028 > 512). Running this sequence through the model will result in indexing errors


Running Epoch 4 of 10:   0%|          | 0/147 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (1619 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1028 > 512). Running this sequence through the model will result in indexing errors


Running Epoch 5 of 10:   0%|          | 0/147 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (1619 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1028 > 512). Running this sequence through the model will result in indexing errors


Running Epoch 6 of 10:   0%|          | 0/147 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (1619 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1028 > 512). Running this sequence through the model will result in indexing errors


Running Epoch 7 of 10:   0%|          | 0/147 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (1619 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1028 > 512). Running this sequence through the model will result in indexing errors


Running Epoch 8 of 10:   0%|          | 0/147 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (1619 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1028 > 512). Running this sequence through the model will result in indexing errors


Running Epoch 9 of 10:   0%|          | 0/147 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (1619 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1028 > 512). Running this sequence through the model will result in indexing errors


(1470,
 {'eval_loss': [0.8030740413814783,
   0.7495734778543314,
   0.831772742792964,
   0.8591354495535294,
   1.184527472903331,
   1.4720028191804886,
   1.5629313482592504,
   1.6952489148825407,
   1.8059410359710455,
   1.770267903804779],
  'global_step': [147, 294, 441, 588, 735, 882, 1029, 1176, 1323, 1470],
  'mcc': [0.4469644139708448,
   0.537307291422588,
   0.5322388474858469,
   0.5734920644716154,
   0.5380680042220793,
   0.5321401931190638,
   0.5494161178768008,
   0.5347379056709621,
   0.5375249772060444,
   0.5459834550200917],
  'train_loss': [0.9342685341835022,
   0.5654231905937195,
   0.3398427665233612,
   0.03508536145091057,
   0.033284228295087814,
   0.39220112562179565,
   0.003957610111683607,
   0.007237046025693417,
   0.00116606371011585,
   0.025926291942596436]})

Evaluating the finetuned sentiment analyzer

In [52]:
# evaluating the finetuned model
result_2, model_outputs_2, wrong_predictions_2 = model2.eval_model(eval_df=test_df, acc=accuracy_score)
print(result)

  0%|          | 0/1125 [00:00<?, ?it/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (1218 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (661 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1456 > 512). Running this sequence through the model will result in indexing errors


Running Evaluation:   0%|          | 0/64 [00:00<?, ?it/s]

{'mcc': 0.4756060105288119, 'acc': 0.648, 'eval_loss': 2.0994366365484893}


In [66]:
y_pred_2 = [list(e[0]) for e in model_outputs_2]
y_pred_2 = [e.index(max(e)) for e in y_pred_2]
y_true = y_test

a. Accuracy

In [67]:
# accuracy: amount of reviews that are categorized correctly as positive, negative or neutral
# using scikit-learn metric
accuracy_2 = accuracy_score(y_true, y_pred_2)*100
print(f'The model got {accuracy_2.round(2)}% of the predictions right.')

The model got 65.51% of the predictions right.


b. Classification report

In [68]:
# classification report: text report showing the main classification metrics
# using scikit-learn metric
cr_2 = classification_report(y_true, y_pred_2)
print(cr_2)

              precision    recall  f1-score   support

           0       0.72      0.67      0.69       375
           1       0.52      0.62      0.57       375
           2       0.76      0.68      0.72       375

    accuracy                           0.66      1125
   macro avg       0.67      0.66      0.66      1125
weighted avg       0.67      0.66      0.66      1125

