### Import Libraries

In [1]:
!pip install transformers
!pip install tensorboardx
!pip install simpletransformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
import pandas as pd
from simpletransformers.classification import ClassificationModel, ClassificationArgs
import sklearn
from sklearn.metrics import accuracy_score

### Import dataset

In [3]:
train_dataset = pd.read_csv('./ILDC/ILDC_single/train_dataset.csv')
dev_dataset = pd.read_csv('./ILDC/ILDC_single/dev_dataset.csv')
test_dataset = pd.read_csv('./ILDC/ILDC_single/test_dataset.csv')
print(f'Train Dataset: {train_dataset.shape}')
print(f'Dev Dataset: {dev_dataset.shape}')
print(f'Test Dataset: {test_dataset.shape}')

Train Dataset: (5082, 2)
Dev Dataset: (994, 2)
Test Dataset: (1517, 2)


### Train Model (RoBERTa)




In [4]:
model_args = ClassificationArgs()
model_args.num_train_epochs = 5
model_args.learning_rate = 1e-5
model_args.overwrite_output_dir = True

In [5]:
model = ClassificationModel('roberta', 'roberta-base', num_labels=2, args = model_args)

Downloading (…)lve/main/config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/501M [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'roberta.pooler.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.out_proj.bias', 'classifi

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [6]:
model.train_model(train_dataset)



  0%|          | 0/5082 [00:00<?, ?it/s]

Epoch:   0%|          | 0/5 [00:00<?, ?it/s]

Running Epoch 0 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

Running Epoch 1 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

Running Epoch 2 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

Running Epoch 3 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

Running Epoch 4 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

(3180, 0.6463798552938977)

In [7]:
result, model_outputs, wrong_predictions = model.eval_model(test_dataset, acc = accuracy_score)
result



  0%|          | 0/1517 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/190 [00:00<?, ?it/s]

{'mcc': 0.14978372927278294,
 'tp': 233,
 'tn': 621,
 'fp': 134,
 'fn': 529,
 'auroc': 0.6011741495889174,
 'auprc': 0.5991341450958279,
 'acc': 0.5629531970995386,
 'eval_loss': 0.7286154094495272}

### Train Model (bert-base-uncased)

In [8]:
model_args = ClassificationArgs()
model_args.num_train_epochs = 5
model_args.learning_rate = 1e-5
model_args.overwrite_output_dir = True

In [9]:
model = ClassificationModel('bert', 'bert-base-uncased', num_labels=2, args = model_args)

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [10]:
model.train_model(train_dataset)



  0%|          | 0/5082 [00:00<?, ?it/s]

Epoch:   0%|          | 0/5 [00:00<?, ?it/s]

Running Epoch 0 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

Running Epoch 1 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

Running Epoch 2 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

Running Epoch 3 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

Running Epoch 4 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

(3180, 0.6275167105332861)

In [11]:
result, model_outputs, wrong_predictions = model.eval_model(test_dataset, acc = accuracy_score)
result



  0%|          | 0/1517 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/190 [00:00<?, ?it/s]

{'mcc': 0.09116908289246997,
 'tp': 115,
 'tn': 686,
 'fp': 69,
 'fn': 647,
 'auroc': 0.5903191322938937,
 'auprc': 0.5847551417459738,
 'acc': 0.5280158206987475,
 'eval_loss': 0.8052075677796414}

### Train Model (legal-bert-base-uncased)




In [4]:
model_args = ClassificationArgs()
model_args.num_train_epochs = 5
model_args.learning_rate = 1e-5
model_args.overwrite_output_dir = True

In [5]:
model = ClassificationModel('bert', 'nlpaueb/legal-bert-base-uncased', num_labels=2, args = model_args)

Some weights of the model checkpoint at nlpaueb/legal-bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification wer

In [6]:
model.train_model(train_dataset)



  0%|          | 0/5082 [00:00<?, ?it/s]

Epoch:   0%|          | 0/5 [00:00<?, ?it/s]

Running Epoch 0 of 5:   0%|          | 0/636 [00:00<?, ?it/s]



Running Epoch 1 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

Running Epoch 2 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

Running Epoch 3 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

Running Epoch 4 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

(3180, 0.6560604311385245)

In [7]:
result, model_outputs, wrong_predictions = model.eval_model(test_dataset, acc = accuracy_score)
result



  0%|          | 0/1517 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/190 [00:00<?, ?it/s]

{'mcc': 0.028866028751806584,
 'tp': 5,
 'tn': 753,
 'fp': 2,
 'fn': 757,
 'auroc': 0.5785585162781804,
 'auprc': 0.578921891925204,
 'acc': 0.4996704021094265,
 'eval_loss': 0.7542422965953225}

### Train Model (saibo/legal-roberta-base)

In [8]:
model_args = ClassificationArgs()

model_args.num_train_epochs = 5
model_args.learning_rate = 1e-5
model_args.overwrite_output_dir = True

In [9]:
model = ClassificationModel('roberta', 'saibo/legal-roberta-base', num_labels=2, args = model_args)

Downloading (…)lve/main/config.json:   0%|          | 0.00/578 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of the model checkpoint at saibo/legal-roberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.decoder.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at saibo/legal-roberta-base and are newly initialized: ['classifier.out_proj.bias', 'classifier.dense.bia

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.11k [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

In [10]:
model.train_model(train_dataset)



  0%|          | 0/5082 [00:00<?, ?it/s]

Epoch:   0%|          | 0/5 [00:00<?, ?it/s]

Running Epoch 0 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

Running Epoch 1 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

Running Epoch 2 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

Running Epoch 3 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

Running Epoch 4 of 5:   0%|          | 0/636 [00:00<?, ?it/s]

(3180, 0.6339187292182971)

In [11]:
result, model_outputs, wrong_predictions = model.eval_model(test_dataset, acc = accuracy_score)
result



  0%|          | 0/1517 [00:00<?, ?it/s]

Running Evaluation:   0%|          | 0/190 [00:00<?, ?it/s]

{'mcc': 0.10793539545098305,
 'tp': 312,
 'tn': 524,
 'fp': 231,
 'fn': 450,
 'auroc': 0.5771001720811388,
 'auprc': 0.5650731919185299,
 'acc': 0.5510876730388925,
 'eval_loss': 0.7065465785955128}