Do you plan to test it on Colab? Make sure you change your settings to: Runtime->Change runtime type->Hardware accelarator->GPU
We import some necessary libraries and we load our hate speech dataset. The preprocess script is available on the github repo.

In [None]:
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
#####################################################################
#                           Set D                                   #
#####################################################################
#                    (distil)BERT Tests                             #
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
First we are going to install the transformers

In [None]:
# First we are going to install the transformers library by hugging face!
! pip install transformers

In [None]:
#We are printing the requirements for this specific notebook (which are different from the other notebooks we had)
! pip freeze > colab_requirements.txt

In [None]:
#Then, we will clone the Ethos repo to extract the preprocessing pipeline and data
!git clone https://github.com/intelligence-csd-auth-gr/Ethos-Hate-Speech-Dataset/
!mv '/content/Ethos-Hate-Speech-Dataset/ethos/experiments' '/content/experiments'
!mv '/content/Ethos-Hate-Speech-Dataset/ethos/ethos_data' '/ethos_data'

Cloning into 'Ethos-Hate-Speech-Dataset'...
remote: Enumerating objects: 184, done.[K
remote: Counting objects: 100% (184/184), done.[K
remote: Compressing objects: 100% (137/137), done.[K
remote: Total 184 (delta 101), reused 107 (delta 46), pack-reused 0[K
Receiving objects: 100% (184/184), 12.63 MiB | 17.22 MiB/s, done.
Resolving deltas: 100% (101/101), done.


In [None]:
#We now import our data
from experiments.utilities.preprocess import Preproccesor
X, y = Preproccesor.load_data(True, False)
class_names = ['noHateSpeech', 'hateSpeech']

from sklearn.model_selection import train_test_split
train_texts, test_texts, train_labels, test_labels = train_test_split(list(X), y, stratify=y, test_size=.2)

In [None]:
print("Total amount:",len(y))
print("Hate speech:",sum(y))
print("Non Hate speech:",len(y)-sum(y))

Total amount: 998
Hate speech: 433
Non Hate speech: 565


We now have a train and test dataset, but let's also also create a validation set which we can use for for evaluation
and tuning without training our test set results. Sklearn has a convenient utility for creating such splits:

In [None]:
from sklearn.model_selection import train_test_split
train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=.2)

# Running and Fine-tuning (Distil)Bert

Alright, we've read in our dataset. Now let's tackle tokenization. We'll eventually train a classifier using
pre-trained (Distil)Bert, so let's use the (Distil)Bert tokenizer.

Uncomment the correct lines to choose between DistilBert and Bert

We will do it with a 10Fold CV

In [None]:
import tensorflow as tf
import pandas as pd
import numpy as np
import time
from sklearn.metrics import confusion_matrix, f1_score, accuracy_score, precision_score, recall_score
from sklearn.model_selection import StratifiedKFold, KFold

folds = StratifiedKFold(n_splits= 10, shuffle=True, random_state=7)

def specificity(y_true, y_pred):
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
    if (tn+fp) > 0:
        speci = tn/(tn+fp)
        return speci
    return 0

def sensitivity(y_true, y_pred):
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
    if (tp+fn) > 0:
        sensi = tp/(tp+fn)
        return sensi
    return 0

from transformers import BertTokenizerFast
from transformers import TFBertForSequenceClassification, TFTrainer, TFTrainingArguments
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')

### !!! Uncomment these lines for distilbert, and comment the above three lines ###
#from transformers import DistilBertTokenizerFast 
#from transformers import TFDistilBertForSequenceClassification, TFTrainer, TFTrainingArguments 
#tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')

training_args = TFTrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=5,              # total number of training epochs
    per_device_train_batch_size=4,   # batch size per device during training
    per_device_eval_batch_size=4,    # batch size for evaluation
    warmup_steps=100,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
    logging_steps=10,
)

scores = {} #This will help us collect our statistics!
scores.setdefault('fit_time', [])
scores.setdefault('score_time', [])
scores.setdefault('test_F1', [])
scores.setdefault('test_Precision', [])
scores.setdefault('test_Recall', [])
scores.setdefault('test_Accuracy', [])
scores.setdefault('test_Specificity', [])
scores.setdefault('test_Sensitivity', [])

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=466062.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=28.0, style=ProgressStyle(description_w…




[]

Now we are going to actually run the 10Fold CV

In [None]:
for fold_n, (train_index, valid_index) in enumerate(folds.split(X, y)):

    with training_args.strategy.scope():
        model = TFBertForSequenceClassification.from_pretrained("bert-base-uncased")

    ### !!! Uncomment these lines for distilbert, and comment the above two lines ###
    #with training_args.strategy.scope():  
    #    model = TFDistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")

    print('Fold', fold_n, 'started at', time.ctime())
    X_train, X_valid = X[train_index], X[valid_index]
    y_train, y_valid = y[train_index], y[valid_index]

    train_encodings = tokenizer(list(X_train), truncation=True, padding=True)
    val_encodings = tokenizer(list(X_valid), truncation=True, padding=True)
    
    train_dataset = tf.data.Dataset.from_tensor_slices((
        dict(train_encodings),
        y_train
    ))
    val_dataset = tf.data.Dataset.from_tensor_slices((
        dict(val_encodings),
        y_valid
    ))

    trainer = TFTrainer(
                model=model,                         # the instantiated 🤗 Transformers model to be trained
                args=training_args,                  # training arguments, defined above
                train_dataset=train_dataset,         # training dataset
                eval_dataset=val_dataset             # evaluation dataset
    )
    trainer.train()

    y_preds = []
    for i in trainer.predict(val_dataset).predictions:
      y_preds.append(np.argmax(i))

    scores['test_F1'].append(f1_score(y_valid, y_preds, average='macro'))
    scores['test_Precision'].append(
        precision_score(y_valid, y_preds, average='macro'))
    scores['test_Recall'].append(
        recall_score(y_valid, y_preds, average='macro'))
    scores['test_Accuracy'].append(accuracy_score(y_valid, y_preds))
    scores['test_Specificity'].append(specificity(y_valid, y_preds))
    scores['test_Sensitivity'].append(sensitivity(y_valid, y_preds))
    name = 'DistilBert'
    print("{:<10} | {:<7} {:<7} {:<7} {:<7} {:<7} {:<7}".format(str(name)[:7],
                                                                str('%.4f' % (
                                                                    sum(scores['test_F1']) / (fold_n+1))),
                                                                str('%.4f' % (
                                                                    sum(scores['test_Precision']) / (fold_n+1))),
                                                                str('%.4f' % (
                                                                    sum(scores['test_Recall']) / (fold_n+1))),
                                                                str('%.4f' % (
                                                                    sum(scores['test_Accuracy']) / (fold_n+1))),
                                                                str('%.4f' % (
                                                                    sum(scores['test_Specificity']) / (fold_n+1))),
                                                                str('%.4f' % (sum(scores['test_Sensitivity']) / (fold_n+1)))))
    !rm -r '/content/results'
    !rm -r '/content/logs'

name = 'Bert'

#name = 'DistilBert'

print("{:<10} | {:<7} {:<7} {:<7} {:<7} {:<7} {:<7}".format(str(name)[:7],
                                                                str('%.4f' % (
                                                                    sum(scores['test_F1']) / 10)),
                                                                str('%.4f' % (
                                                                    sum(scores['test_Precision']) / 10)),
                                                                str('%.4f' % (
                                                                    sum(scores['test_Recall']) / 10)),
                                                                str('%.4f' % (
                                                                    sum(scores['test_Accuracy']) / 10)),
                                                                str('%.4f' % (
                                                                    sum(scores['test_Specificity']) / 10)),
                                                                str('%.4f' % (sum(scores['test_Sensitivity']) / 10))))

All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Fold 0 started at Tue Mar  9 13:47:49 2021
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Cause: while/else statement not yet supported
Cause: while/else statement not yet supported
DistilB    | 0.8084  0.8077  0.8133  0.8100  0.7895  0.8372 


All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Fold 1 started at Tue Mar  9 14:06:59 2021
DistilB    | 0.7883  0.7877  0.7929  0.7900  0.7719  0.8140 


All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Fold 2 started at Tue Mar  9 14:25:36 2021
DistilB    | 0.8145  0.8145  0.8173  0.8167  0.8129  0.8217 


All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Fold 3 started at Tue Mar  9 14:44:10 2021
DistilB    | 0.8041  0.8054  0.8054  0.8075  0.8202  0.7907 


All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Fold 4 started at Tue Mar  9 15:02:49 2021
DistilB    | 0.8070  0.8081  0.8093  0.8100  0.8140  0.8047 


All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Fold 5 started at Tue Mar  9 15:21:29 2021
DistilB    | 0.7954  0.7962  0.7976  0.7983  0.8034  0.7918 


All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Fold 6 started at Tue Mar  9 15:40:09 2021
DistilB    | 0.7937  0.7954  0.7952  0.7971  0.8110  0.7793 


All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Fold 7 started at Tue Mar  9 15:58:46 2021
DistilB    | 0.7963  0.7990  0.7970  0.8000  0.8213  0.7728 


All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Fold 8 started at Tue Mar  9 16:09:13 2021
DistilB    | 0.7998  0.8027  0.8015  0.8031  0.8153  0.7877 


All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Fold 9 started at Tue Mar  9 16:27:51 2021
DistilB    | 0.7960  0.7989  0.7973  0.7996  0.8159  0.7787 
DistilB    | 0.7960  0.7989  0.7973  0.7996  0.8159  0.7787 
