# Fine-tuned BERT, RoBERTa, XLNet for Sentiment Analysis 
Notebook is broken down to the following sections: 
1. Load Datasets
  - View datasets
1. BERT
  - BERT trained on the original train dataset
  - BERT trained on dataset with synonym replacement augmentation
  - BERT trained on dataset with back translation augmentation
2. RoBERTa
  - RoBERTa trained on the original train dataset
  - RoBERTa trained on dataset with synonym replacement augmentation
  - RoBERTa trained on dataset with back translation augmentation
3. XLNet
  - XLNet trained on the original train dataset
  - XLNet trained on dataset with synonym replacement augmentation
  - XLNet trained on dataset with back translation augmentation
4. Overall Results

In [None]:
!pip install -qq transformers
!pip install -qq simpletransformers

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dask-cudf 21.8.3 requires cupy-cuda114, which is not installed.
distributed 2021.7.1 requires dask==2021.07.1, but you have dask 2021.10.0 which is incompatible.
dask-cudf 21.8.3 requires dask<=2021.07.1,>=2021.6.0, but you have dask 2021.10.0 which is incompatible.
dask-cudf 21.8.3 requires pandas<1.3.0dev0,>=1.0, but you have pandas 1.3.4 which is incompatible.
allennlp 2.7.0 requires transformers<4.10,>=4.1, but you have transformers 4.12.5 which is incompatible.[0m


In [None]:
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score, f1_score
import logging
from statistics import mean, mode
from simpletransformers.classification import ClassificationArgs, ClassificationModel

## Load Datasets

In [None]:
# Dataset: Train
train = pd.read_csv('../input/mlproject/original_train.csv')
train['sentiment'] = train['sentiment'].apply(lambda x: x + 1)

# Data Augmentation: Synonym
train_synonym = pd.read_csv('../input/mlproject/synonym_augment_train_v2.csv')
train_synonym.drop(columns=['Unnamed: 0'], inplace=True)
train_synonym['sentiment'] = train_synonym['sentiment'].apply(lambda x: x + 1)


# Data Augmentation: Backward Translation Dataset
train_bt = pd.read_csv('../input/mlproject/bt_augment_train.csv')
train_bt.drop(columns=['Unnamed: 0', 'index'], inplace=True)
train_bt['sentiment'] = train_bt['sentiment'].apply(lambda x: x + 1)

# Validation
validation = pd.read_csv('../input/mlproject/validation.csv')
validation.drop(columns=['Unnamed: 0', 'index', 'lemmatized and stopwords_removed'], inplace=True)
validation['sentiment'] = validation['sentiment'].apply(lambda x: x + 1)

# Test set
test = pd.read_csv('../input/mlproject/test.csv')
test.drop(columns=['Unnamed: 0', 'index', 'lemmatized and stopwords_removed'], inplace=True)
test['sentiment'] = test['sentiment'].apply(lambda x: x + 1)

In [None]:
print('Train:', train.shape)
print('Train_synonym:', train_synonym.shape)
print('Train_bt:', train_bt.shape)
print('Validation:', validation.shape)
print('Test', test.shape)

Train: (5360, 2)
Train_synonym: (8162, 2)
Train_bt: (8162, 2)
Validation: (670, 2)
Test (671, 2)


### View Datasets
View a sample of train, test and validation

In [None]:
train.sample(5)

Unnamed: 0,text,sentiment
2999,i just saw an autonomous car in lake charles i...,1
3414,there goes the google car,1
5044,cannot wait for self driving cars so i can rid...,2
5303,china to test driverless cars for miles,1
3161,google enters autonomous vehicle in nascar ser...,1


In [None]:
test.sample(5)

Unnamed: 0,text,sentiment
306,imagine your self driving car negotiating traf...,1
497,center breaks ground on facility to test drive...,1
473,automated vehicle conveyance apparatus transpo...,1
303,i have used apple maps therefore apple buildin...,0
611,and you do have the driverless steering less g...,1


In [None]:
validation.sample(5)

Unnamed: 0,text,sentiment
582,google car kaboom,1
525,google unveils a prototype of its new driverle...,1
125,dmv ponders how to regulate driverless cars ca...,1
228,look closely you will see the reflection of a ...,2
159,is it weird to be excited to see the google ca...,2


## Helper Functions

In [None]:
def f1_multiclass(labels, preds):
    return f1_score(labels, preds, average='weighted')

# BERT
Hyperparameter Tuning Sweep Visualisation: https://wandb.ai/datasiens/bert_original_train/sweeps/5v6557nj?workspace=user-

## BERT: Original Train

In [None]:
bert_model_args = ClassificationArgs()
bert_model_args.reprocess_input_data = True
bert_model_args.overwrite_output_dir = True
bert_model_args.manual_seed = 4
bert_model_args.use_multiprocessing = True
bert_model_args.train_batch_size = 32
bert_model_args.labels_list = [0, 1, 2]
bert_model_args.eval_batch_size = 16
## output_dir and best_mode_dir is commented out to let the models overwrite each other
## to prevent the notebook from running out of memory, as each model is considerably large 
# bert_model_args.output_dir = "bert1_output"
# bert_model_args.best_model_dir = "bert1_output/best_model"

# Set starting learning rate and epoch from WandB
bert_model_args.learning_rate = 0.00013
bert_model_args.num_train_epochs = 2

In [None]:
# Create a TransformerModel
bert_model = ClassificationModel(
    "bert",
    "bert-base-uncased",
    num_labels=3,
    use_cuda=True,
    args=bert_model_args,
)

# Train the model
bert_model.train_model(train)

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/420M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."


  0%|          | 0/5360 [00:00<?, ?it/s]

Epoch:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 0 of 2:   0%|          | 0/168 [00:00<?, ?it/s]

  model.parameters(), args.max_grad_norm


Running Epoch 1 of 2:   0%|          | 0/168 [00:00<?, ?it/s]

(336, 0.6171567415197691)

In [None]:
# Evaluate on test
result, model_outputs, wrong_predictions = bert_model.eval_model(test, f1=f1_multiclass, acc=accuracy_score)

  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."


  0%|          | 0/671 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/42 [00:00<?, ?it/s]

In [None]:
print('BERT on Test')
for key,value in result.items():
    print(key.upper()+':', value)

BERT on Test
MCC: 0.5093198870573498
F1: 0.7367014493663928
ACC: 0.7377049180327869
EVAL_LOSS: 0.6199495650473095


In [None]:
# Save results to dataframe
bert_df = pd.DataFrame(result, index=['Original'])
bert_df

Unnamed: 0,mcc,f1,acc,eval_loss
Original,0.50932,0.736701,0.737705,0.61995


In [None]:
# Free up RAM
del bert_model

## BERT: Synonym Replacement

In [None]:
bert_model_args = ClassificationArgs()
bert_model_args.reprocess_input_data = True
bert_model_args.overwrite_output_dir = True
bert_model_args.manual_seed = 4
bert_model_args.use_multiprocessing = True
bert_model_args.train_batch_size = 32
bert_model_args.labels_list = [0, 1, 2]
bert_model_args.eval_batch_size = 16
# bert_model_args.output_dir = "bert2_output"
# bert_model_args.best_model_dir = "bert2_output/best_model"

# Set starting learning rate and epoch from WandB
bert_model_args.learning_rate = 0.00013
bert_model_args.num_train_epochs = 2

In [None]:
# Create a TransformerModel
bert_model2 = ClassificationModel(
    "bert",
    "bert-base-uncased",
    num_labels=3,
    use_cuda=True,
    args=bert_model_args,
)

# Train the model
bert_model2.train_model(train_synonym)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

  0%|          | 0/8162 [00:00<?, ?it/s]

Epoch:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 0 of 2:   0%|          | 0/256 [00:00<?, ?it/s]

Running Epoch 1 of 2:   0%|          | 0/256 [00:00<?, ?it/s]

  model.parameters(), args.max_grad_norm


(512, 0.5200655364897102)

In [None]:
# Evaluate BERT Model 2 on test data
result2, model_outputs, wrong_predictions = bert_model2.eval_model(test, f1=f1_multiclass, acc=accuracy_score)

  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."


  0%|          | 0/671 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/42 [00:00<?, ?it/s]

In [None]:
print('BERT with Synonym Replacement on Test')
for key,value in result2.items():
    print(key.upper()+':', value)

BERT with Synonym Replacement on Test
MCC: 0.5110726752062041
F1: 0.74034307888134
ACC: 0.7466467958271237
EVAL_LOSS: 0.7639592673097338


In [None]:
row = pd.Series(result2,name='Synonym Replacement')
bert_df = bert_df.append(row)
bert_df

Unnamed: 0,mcc,f1,acc,eval_loss
Original,0.50932,0.736701,0.737705,0.61995
Synonym Replacement,0.511073,0.740343,0.746647,0.763959


In [None]:
# Free up RAM
del bert_model2

## BERT: Back Translation

In [None]:
bert_model_args = ClassificationArgs()
bert_model_args.reprocess_input_data = True
bert_model_args.overwrite_output_dir = True
bert_model_args.manual_seed = 4
bert_model_args.use_multiprocessing = True
bert_model_args.train_batch_size = 32
bert_model_args.labels_list = [0, 1, 2]
bert_model_args.eval_batch_size = 16
# bert_model_args.output_dir = "bert3_output"
# bert_model_args.best_model_dir = "bert3_output/best_model"

# Set starting learning rate and epoch from WandB
bert_model_args.learning_rate = 0.00013
bert_model_args.num_train_epochs = 2

In [None]:
# Create a TransformerModel
bert_model3 = ClassificationModel(
    "bert",
    "bert-base-uncased",
    num_labels=3,
    use_cuda=True,
    args=bert_model_args,
)

# Train the model
bert_model3.train_model(train_bt)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

  0%|          | 0/8162 [00:00<?, ?it/s]

Epoch:   0%|          | 0/2 [00:00<?, ?it/s]

Running Epoch 0 of 2:   0%|          | 0/256 [00:00<?, ?it/s]

  model.parameters(), args.max_grad_norm


Running Epoch 1 of 2:   0%|          | 0/256 [00:00<?, ?it/s]

(512, 0.5055899737635627)

In [None]:
result3, model_outputs, wrong_predictions = bert_model3.eval_model(test, f1=f1_multiclass, acc=accuracy_score)

  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."


  0%|          | 0/671 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/42 [00:00<?, ?it/s]

In [None]:
print('BERT with Back Translation on Test')
for key,value in result3.items():
    print(key.upper()+':', value)

BERT with Back Translation on Test
MCC: 0.5164933121186587
F1: 0.7381579221137389
ACC: 0.736214605067064
EVAL_LOSS: 0.7135642965634664


In [None]:
row = pd.Series(result3,name='Back Translation')
bert_df = bert_df.append(row)
bert_df

Unnamed: 0,mcc,f1,acc,eval_loss
Original,0.50932,0.736701,0.737705,0.61995
Synonym Replacement,0.511073,0.740343,0.746647,0.763959
Back Translation,0.516493,0.738158,0.736215,0.713564


In [None]:
del bert_model3

# RoBERTa
Hyperparameter Tuning Sweep Visualisation: https://wandb.ai/datasiens/roberta_original_train/sweeps/eluhck1w?workspace=user-

## RoBERTa: Original Train

In [None]:
roberta_model_args = ClassificationArgs()
roberta_model_args.reprocess_input_data = True
roberta_model_args.overwrite_output_dir = True
roberta_model_args.manual_seed = 4
roberta_model_args.use_multiprocessing = True
roberta_model_args.train_batch_size = 32
roberta_model_args.labels_list = [0, 1, 2]
roberta_model_args.eval_batch_size = 16
# roberta_model_args.output_dir = "roberta_output"
# roberta_model_args.best_model_dir = "roberta_output/best_model"

# Set starting learning rate and epoch from WandB
roberta_model_args.learning_rate = 0.00001789
roberta_model_args.num_train_epochs = 4

In [None]:
# Create a TransformerModel
roberta_model = ClassificationModel(
    "roberta",
    "roberta-base",
    num_labels=3,
    use_cuda=True,
    args=roberta_model_args,
)

# Train the model
roberta_model.train_model(train)

Downloading:   0%|          | 0.00/481 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/478M [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'roberta.pooler.dense.weight', 'lm_head.bias', 'lm_head.dense.weight', 'roberta.pooler.dense.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.weight', 'classifie

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."


  0%|          | 0/5360 [00:00<?, ?it/s]

Epoch:   0%|          | 0/4 [00:00<?, ?it/s]

Running Epoch 0 of 4:   0%|          | 0/168 [00:00<?, ?it/s]

  model.parameters(), args.max_grad_norm


Running Epoch 1 of 4:   0%|          | 0/168 [00:00<?, ?it/s]

Running Epoch 2 of 4:   0%|          | 0/168 [00:00<?, ?it/s]

Running Epoch 3 of 4:   0%|          | 0/168 [00:00<?, ?it/s]

(672, 0.531295028825601)

In [None]:
result, model_outputs, wrong_predictions = roberta_model.eval_model(test, f1=f1_multiclass, acc=accuracy_score)

  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."


  0%|          | 0/671 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/42 [00:00<?, ?it/s]

In [None]:
print('RoBERTa with Original on Test')
for key,value in result.items():
    print(key.upper()+':', value)

RoBERTa with Original on Test
MCC: 0.6032152726105052
F1: 0.7852178896034848
ACC: 0.7839046199701938
EVAL_LOSS: 0.5661408376126063


In [None]:
roberta_df = pd.DataFrame(result, index=['Original'])
roberta_df

Unnamed: 0,mcc,f1,acc,eval_loss
Original,0.603215,0.785218,0.783905,0.566141


In [None]:
del roberta_model

## RoBERTa: Synonym Replacement

In [None]:
roberta_model_args = ClassificationArgs()
roberta_model_args.reprocess_input_data = True
roberta_model_args.overwrite_output_dir = True
roberta_model_args.manual_seed = 4
roberta_model_args.use_multiprocessing = True
roberta_model_args.train_batch_size = 32
roberta_model_args.labels_list = [0, 1, 2]
roberta_model_args.eval_batch_size = 16
# roberta_model_args.output_dir = "roberta2_output"
# roberta_model_args.best_model_dir = "roberta2_output/best_model"

# Set starting learning rate and epoch from WandB
roberta_model_args.learning_rate = 0.00001789
roberta_model_args.num_train_epochs = 4

In [None]:
# Create a TransformerModel
roberta_model2 = ClassificationModel(
    "roberta",
    "roberta-base",
    num_labels=3,
    use_cuda=True,
    args=roberta_model_args,
)

# Train the model
roberta_model2.train_model(train_synonym)

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'roberta.pooler.dense.weight', 'lm_head.bias', 'lm_head.dense.weight', 'roberta.pooler.dense.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.weight', 'classifie

  0%|          | 0/8162 [00:00<?, ?it/s]

Epoch:   0%|          | 0/4 [00:00<?, ?it/s]

Running Epoch 0 of 4:   0%|          | 0/256 [00:00<?, ?it/s]

  model.parameters(), args.max_grad_norm


Running Epoch 1 of 4:   0%|          | 0/256 [00:00<?, ?it/s]

Running Epoch 2 of 4:   0%|          | 0/256 [00:00<?, ?it/s]

Running Epoch 3 of 4:   0%|          | 0/256 [00:00<?, ?it/s]

(1024, 0.4600408202968538)

In [None]:
result2, model_outputs, wrong_predictions = roberta_model2.eval_model(test, f1=f1_multiclass, acc=accuracy_score)

  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."


  0%|          | 0/671 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/42 [00:00<?, ?it/s]

In [None]:
print('RoBERTa with Synonym Replacement on Test')
for key,value in result2.items():
    print(key.upper()+':', value)

RoBERTa with Synonym Replacement on Test
MCC: 0.5909270416341722
F1: 0.7803393145897085
ACC: 0.7809239940387481
EVAL_LOSS: 0.6845057833762396


In [None]:
row = pd.Series(result2,name='Synonym Replacement')
roberta_df = roberta_df.append(row)
roberta_df

Unnamed: 0,mcc,f1,acc,eval_loss
Original,0.603215,0.785218,0.783905,0.566141
Synonym Replacement,0.590927,0.780339,0.780924,0.684506


In [None]:
del roberta_model2

## RoBERTa: Back Translation

In [None]:
roberta_model_args = ClassificationArgs()
roberta_model_args.reprocess_input_data = True
roberta_model_args.overwrite_output_dir = True
roberta_model_args.manual_seed = 4
roberta_model_args.use_multiprocessing = True
roberta_model_args.train_batch_size = 32
roberta_model_args.labels_list = [0, 1, 2]
roberta_model_args.eval_batch_size = 16
# roberta_model_args.output_dir = "roberta3_output"
# roberta_model_args.best_model_dir = "roberta3_output/best_model"

# Set starting learning rate and epoch from WandB
roberta_model_args.learning_rate = 0.00001789
roberta_model_args.num_train_epochs = 4

In [None]:
# Create a TransformerModel
roberta_model3 = ClassificationModel(
    "roberta",
    "roberta-base",
    num_labels=3,
    use_cuda=True,
    args=roberta_model_args,
)

# Train the model
roberta_model3.train_model(train_bt)

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'roberta.pooler.dense.weight', 'lm_head.bias', 'lm_head.dense.weight', 'roberta.pooler.dense.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.weight', 'classifie

  0%|          | 0/8162 [00:00<?, ?it/s]

Epoch:   0%|          | 0/4 [00:00<?, ?it/s]

Running Epoch 0 of 4:   0%|          | 0/256 [00:00<?, ?it/s]

  model.parameters(), args.max_grad_norm


Running Epoch 1 of 4:   0%|          | 0/256 [00:00<?, ?it/s]

Running Epoch 2 of 4:   0%|          | 0/256 [00:00<?, ?it/s]

Running Epoch 3 of 4:   0%|          | 0/256 [00:00<?, ?it/s]

(1024, 0.45762764394748956)

In [None]:
result3, model_outputs, wrong_predictions = roberta_model3.eval_model(test, f1=f1_multiclass, acc=accuracy_score)

  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."


  0%|          | 0/671 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/42 [00:00<?, ?it/s]

In [None]:
print('RoBERTa with Back Translation on Test')
for key,value in result3.items():
    print(key.upper()+':', value)

RoBERTa with Back Translation on Test
MCC: 0.580443556606198
F1: 0.7700947036659351
ACC: 0.767511177347243
EVAL_LOSS: 0.6586223996820904


In [None]:
row = pd.Series(result3,name='Back Translation')
roberta_df = roberta_df.append(row)
roberta_df

Unnamed: 0,mcc,f1,acc,eval_loss
Original,0.603215,0.785218,0.783905,0.566141
Synonym Replacement,0.590927,0.780339,0.780924,0.684506
Back Translation,0.580444,0.770095,0.767511,0.658622


In [None]:
del roberta_model3

# XLNet

## XLNet: Original Train
Hyperparameter Tuning Sweep Visualisation: https://wandb.ai/datasiens/xlnet_original_train/sweeps/bylsapm6?workspace=user-

In [None]:
xlnet_model_args = ClassificationArgs()
xlnet_model_args.reprocess_input_data = True
xlnet_model_args.overwrite_output_dir = True
xlnet_model_args.manual_seed = 4
xlnet_model_args.use_multiprocessing = True
xlnet_model_args.train_batch_size = 32
xlnet_model_args.labels_list = [0, 1, 2]
xlnet_model_args.eval_batch_size = 16
# xlnet_model_args.output_dir = "xlnet_output"
# xlnet_model_args.best_model_dir = "xlnet_output/best_model"

# Set starting learning rate and epoch from WandB
xlnet_model_args.learning_rate = 0.00003635
xlnet_model_args.num_train_epochs = 5

In [None]:
# Create a TransformerModel
xlnet_model = ClassificationModel(
    "xlnet",
    "xlnet-base-cased",
    num_labels=3,
    use_cuda=True,
    args=xlnet_model_args,
)

# Train the model
xlnet_model.train_model(train)

Downloading:   0%|          | 0.00/760 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/445M [00:00<?, ?B/s]

Some weights of the model checkpoint at xlnet-base-cased were not used when initializing XLNetForSequenceClassification: ['lm_loss.bias', 'lm_loss.weight']
- This IS expected if you are initializing XLNetForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLNetForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLNetForSequenceClassification were not initialized from the model checkpoint at xlnet-base-cased and are newly initialized: ['sequence_summary.summary.weight', 'sequence_summary.summary.bias', 'logits_proj.bias', 'logits_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions a

Downloading:   0%|          | 0.00/779k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.32M [00:00<?, ?B/s]

  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."


  0%|          | 0/5360 [00:00<?, ?it/s]

Epoch:   0%|          | 0/5 [00:00<?, ?it/s]

Running Epoch 0 of 5:   0%|          | 0/168 [00:00<?, ?it/s]

  model.parameters(), args.max_grad_norm


Running Epoch 1 of 5:   0%|          | 0/168 [00:00<?, ?it/s]

Running Epoch 2 of 5:   0%|          | 0/168 [00:00<?, ?it/s]

Running Epoch 3 of 5:   0%|          | 0/168 [00:00<?, ?it/s]

Running Epoch 4 of 5:   0%|          | 0/168 [00:00<?, ?it/s]

(840, 0.4678628672623918)

In [None]:
result, model_outputs, wrong_predictions = xlnet_model.eval_model(test, f1=f1_multiclass, acc=accuracy_score)

  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."


  0%|          | 0/671 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/42 [00:00<?, ?it/s]

In [None]:
print('XLNet with Original on Test')
for key,value in result.items():
    print(key.upper()+':', value)

XLNet with Original on Test
MCC: 0.6075307959226995
F1: 0.7891012527848195
ACC: 0.789865871833085
EVAL_LOSS: 0.7153085342475346


In [None]:
xlnet_df = pd.DataFrame(result, index=['Original'])
xlnet_df

Unnamed: 0,mcc,f1,acc,eval_loss
Original,0.607531,0.789101,0.789866,0.715309


In [None]:
del xlnet_model

## XLNet: Synonym Replacement

In [None]:
xlnet_model_args = ClassificationArgs()
xlnet_model_args.reprocess_input_data = True
xlnet_model_args.overwrite_output_dir = True
xlnet_model_args.manual_seed = 4
xlnet_model_args.use_multiprocessing = True
xlnet_model_args.train_batch_size = 32
xlnet_model_args.labels_list = [0, 1, 2]
xlnet_model_args.eval_batch_size = 16
# xlnet_model_args.output_dir = "xlnet2_output"
# xlnet_model_args.best_model_dir = "xlnet2_output/best_model"

# Set starting learning rate and epoch from WandB
xlnet_model_args.learning_rate = 0.00003635
xlnet_model_args.num_train_epochs = 5

In [None]:
# Create a TransformerModel
xlnet_model2 = ClassificationModel(
    "xlnet",
    "xlnet-base-cased",
    num_labels=3,
    use_cuda=True,
    args=xlnet_model_args,
)

# Train the model
xlnet_model2.train_model(train_synonym)

Some weights of the model checkpoint at xlnet-base-cased were not used when initializing XLNetForSequenceClassification: ['lm_loss.bias', 'lm_loss.weight']
- This IS expected if you are initializing XLNetForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLNetForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLNetForSequenceClassification were not initialized from the model checkpoint at xlnet-base-cased and are newly initialized: ['sequence_summary.summary.weight', 'sequence_summary.summary.bias', 'logits_proj.bias', 'logits_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions a

  0%|          | 0/8162 [00:00<?, ?it/s]

Epoch:   0%|          | 0/5 [00:00<?, ?it/s]

Running Epoch 0 of 5:   0%|          | 0/256 [00:00<?, ?it/s]

  model.parameters(), args.max_grad_norm


Running Epoch 1 of 5:   0%|          | 0/256 [00:00<?, ?it/s]

Running Epoch 2 of 5:   0%|          | 0/256 [00:00<?, ?it/s]

Running Epoch 3 of 5:   0%|          | 0/256 [00:00<?, ?it/s]

Running Epoch 4 of 5:   0%|          | 0/256 [00:00<?, ?it/s]

(1280, 0.3939067353960127)

In [None]:
result2, model_outputs, wrong_predictions = xlnet_model2.eval_model(test, f1=f1_multiclass, acc=accuracy_score)

  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."


  0%|          | 0/671 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/42 [00:00<?, ?it/s]

In [None]:
print('XLNet with Synonym Replacement on Test')
for key,value in result2.items():
    print(key.upper()+':', value)

XLNet with Synonym Replacement on Test
MCC: 0.5329967127176
F1: 0.7511585423506572
ACC: 0.7555886736214605
EVAL_LOSS: 1.055756444022769


In [None]:
row = pd.Series(result2,name='Synonym Replacement')
xlnet_df = xlnet_df.append(row)
xlnet_df

Unnamed: 0,mcc,f1,acc,eval_loss
Original,0.607531,0.789101,0.789866,0.715309
Synonym Replacement,0.532997,0.751159,0.755589,1.055756


In [None]:
del xlnet_model2

## XLNet: Back Translation

In [None]:
xlnet_model_args = ClassificationArgs()
xlnet_model_args.reprocess_input_data = True
xlnet_model_args.overwrite_output_dir = True
xlnet_model_args.manual_seed = 4
xlnet_model_args.use_multiprocessing = True
xlnet_model_args.train_batch_size = 32
xlnet_model_args.labels_list = [0, 1, 2]
xlnet_model_args.eval_batch_size = 16
# xlnet_model_args.output_dir = "xlnet3_output"
# xlnet_model_args.best_model_dir = "xlnet3_output/best_model"

# Set starting learning rate and epoch from WandB
xlnet_model_args.learning_rate = 0.00003635
xlnet_model_args.num_train_epochs = 5

In [None]:
# Create a TransformerModel
xlnet_model3 = ClassificationModel(
    "xlnet",
    "xlnet-base-cased",
    num_labels=3,
    use_cuda=True,
    args=xlnet_model_args,
)

# Train the model
xlnet_model3.train_model(train_bt)

Some weights of the model checkpoint at xlnet-base-cased were not used when initializing XLNetForSequenceClassification: ['lm_loss.bias', 'lm_loss.weight']
- This IS expected if you are initializing XLNetForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLNetForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLNetForSequenceClassification were not initialized from the model checkpoint at xlnet-base-cased and are newly initialized: ['sequence_summary.summary.weight', 'sequence_summary.summary.bias', 'logits_proj.bias', 'logits_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions a

  0%|          | 0/8162 [00:00<?, ?it/s]

Epoch:   0%|          | 0/5 [00:00<?, ?it/s]

Running Epoch 0 of 5:   0%|          | 0/256 [00:00<?, ?it/s]

  model.parameters(), args.max_grad_norm


Running Epoch 1 of 5:   0%|          | 0/256 [00:00<?, ?it/s]

Running Epoch 2 of 5:   0%|          | 0/256 [00:00<?, ?it/s]

Running Epoch 3 of 5:   0%|          | 0/256 [00:00<?, ?it/s]

Running Epoch 4 of 5:   0%|          | 0/256 [00:00<?, ?it/s]

(1280, 0.37115824849170165)

In [None]:
result3, model_outputs, wrong_predictions = xlnet_model3.eval_model(test, f1=f1_multiclass, acc=accuracy_score)

  "Dataframe headers not specified. Falling back to using column 0 as text and column 1 as labels."


  0%|          | 0/671 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/42 [00:00<?, ?it/s]

In [None]:
print('XLNet with Back Translation on Test')
for key,value in result3.items():
    print(key.upper()+':', value)

XLNet with Back Translation on Test
MCC: 0.5351213327875429
F1: 0.7474450550499605
ACC: 0.7466467958271237
EVAL_LOSS: 1.0633099928853058


In [None]:
row = pd.Series(result3,name='Back Translation')
xlnet_df = xlnet_df.append(row)
xlnet_df

Unnamed: 0,mcc,f1,acc,eval_loss
Original,0.607531,0.789101,0.789866,0.715309
Synonym Replacement,0.532997,0.751159,0.755589,1.055756
Back Translation,0.535121,0.747445,0.746647,1.06331


# Overall Results

In [None]:
# For BERT
bert_df

Unnamed: 0,mcc,f1,acc,eval_loss
Original,0.50932,0.736701,0.737705,0.61995
Synonym Replacement,0.511073,0.740343,0.746647,0.763959
Back Translation,0.516493,0.738158,0.736215,0.713564


In [None]:
# For RoBERTa
roberta_df

Unnamed: 0,mcc,f1,acc,eval_loss
Original,0.603215,0.785218,0.783905,0.566141
Synonym Replacement,0.590927,0.780339,0.780924,0.684506
Back Translation,0.580444,0.770095,0.767511,0.658622


In [None]:
# For XLNet
xlnet_df

Unnamed: 0,mcc,f1,acc,eval_loss
Original,0.607531,0.789101,0.789866,0.715309
Synonym Replacement,0.532997,0.751159,0.755589,1.055756
Back Translation,0.535121,0.747445,0.746647,1.06331
