This notebook inspects optimal hyperparameters for classification models finetuning.

In [1]:

import torch
from numba import cuda
cuda.select_device(0)
cuda.close()
cuda.select_device(0)
torch.cuda.empty_cache()

In [2]:
label_set = ['Negative', 'Positive', 'Neutral']
STR_TO_NUM = {k: i for i, k in enumerate(label_set)}
NUM_TO_STR = {i:k for i, k in enumerate(label_set)}

import pandas as pd
df = pd.read_json("bcs_polsent.jsonl", orient="records", lines=True)
df["label"] = df.label.apply(lambda s: STR_TO_NUM[s])
df = df[["sentence", "label", "split"]].rename(columns={"sentence": "text", "label":"labels"})
train = df[df.split=="train"].drop(columns=["split"])
dev = df[df.split=="dev"].drop(columns=["split"])
test = df[df.split=="test"].drop(columns=["split"])


In [3]:
def train_model(train_df, model_name, model_type, batch_size, NUM_EPOCHS=5):
    from simpletransformers.classification import ClassificationModel
    import torch
    torch.cuda.empty_cache()
    model_args = {
        "num_train_epochs": NUM_EPOCHS,
        "learning_rate": 1e-5,
        "overwrite_output_dir": True,
        "train_batch_size": batch_size,
        "no_save": True,
        "no_cache": True,
        "overwrite_output_dir": True,
        "save_steps": -1,
        "max_seq_length": 512,
        "silent": True,
    }

    model = ClassificationModel(
        model_type, model_name, num_labels=3, use_cuda=True, args=model_args
    )
    model.train_model(train_df)
    return model


def eval_model(model, test_df):
    y_true_enc = test_df.labels
    from tqdm.auto import tqdm
    y_pred_enc = [model.predict(i)[0][0] for i in tqdm(dev.text.values)]

    y_true = [NUM_TO_STR[i] for i in y_true_enc]
    y_pred = [NUM_TO_STR[i] for i in y_pred_enc]
    from sklearn.metrics import f1_score

    microF1 = f1_score(y_true, y_pred, labels=label_set, average="micro")
    macroF1 = f1_score(y_true, y_pred, labels=label_set, average="macro")

    return {"microF1": microF1, "macroF1": macroF1, "y_true": y_true, "y_pred": y_pred}


In [4]:


for epoch in [3, 5, 9, 15]:
    for batch in [8, 16]:
        for modeltype, modelname in zip(
            ["xlmroberta", "xlmroberta", "bert", "electra"],
            [
                "xlm-roberta-base",
                "xlm-roberta-large",
                "EMBEDDIA/crosloengual-bert",
                "classla/bcms-bertic",
            ],
        ):
            print(f"training {modelname},{modeltype},{epoch},{batch}")
            model = train_model(train, modelname, modeltype, batch, NUM_EPOCHS=epoch)
            print("Model trained. Evaluating.")
            stats = eval_model(model, dev)
            del model
            with open("results3.csv", "a") as f:
                f.write(f"{modelname},{modeltype},{epoch},{batch},{stats['macroF1']}\n")


training xlm-roberta-base,xlmroberta,3,8


Some weights of the model checkpoint at xlm-roberta-base were not used when initializing XLMRobertaForSequenceClassification: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.out_p

Model trained. Evaluating.


  0%|          | 0/150 [00:00<?, ?it/s]

training xlm-roberta-large,xlmroberta,3,8


Some weights of the model checkpoint at xlm-roberta-large were not used when initializing XLMRobertaForSequenceClassification: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-large and are newly initialized: ['classifier.out

Model trained. Evaluating.


  0%|          | 0/150 [00:00<?, ?it/s]

training EMBEDDIA/crosloengual-bert,bert,3,8


Some weights of the model checkpoint at EMBEDDIA/crosloengual-bert were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model chec