Import all necessary libraries and install everything you need for training:

In [2]:
# install the libraries necessary for data wrangling, prediction and result analysis
import json
import numpy as np
import pandas as pd
import logging
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.metrics import classification_report, confusion_matrix, f1_score,precision_score, recall_score
import torch
from numba import cuda
from sklearn.model_selection import train_test_split
from sklearn.dummy import DummyClassifier

In [2]:
# Install transformers
# (this needs to be done on Kaggle each time you start the session)
#!pip install -q transformers

In [3]:
# Install the simpletransformers
#!pip install -q simpletransformers
from simpletransformers.classification import ClassificationModel

In [4]:
# Install wandb
#!pip install -q wandb

In [3]:
import wandb

In [4]:
# Login to wandb
wandb.login()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mtajak[0m (use `wandb login --relogin` to force relogin)


True

In [7]:
# Clean the GPU cache

cuda.select_device(0)
cuda.close()
cuda.select_device(0)
torch.cuda.empty_cache()


### Import the data

In [8]:
# FTD
train_df = pd.read_csv("data/FTD-train.txt", sep="\t", index_col=0)
dev_df = pd.read_csv("data/FTD-dev.txt", sep = "\t", index_col = 0)
test_df = pd.read_csv("data/FTD-test.txt", sep = "\t", index_col = 0)

print("FTD train shape: {}, Dev shape: {}, Test shape: {}.".format(train_df.shape, dev_df.shape, test_df.shape))

FTD train shape: (849, 2), Dev shape: (283, 2), Test shape: (283, 2).


In [9]:
train_df.head()

Unnamed: 0,text,labels
1361,Business continuity plans must address massive...,7
1605,"( INDIANAPOLIS – APRIL 16 , 2010 ) – Ash conti...",8
733,Leek Friends of Israel welcome you to their we...,0
495,Npower announces further price increase Energy...,8
1534,"These businesses often had data , broad direct...",0


## Training and saving

We will use the multilingual XLM-RoBERTa model
https://huggingface.co/xlm-roberta-base

In [12]:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [13]:
# Create a list of labels
LABELS = train_df.labels.unique().tolist()
LABELS

[7, 8, 0, 1, 6, 5, 2, 4, 3, 9]

In [14]:
# Initialize Wandb
wandb.init(project="FTD-learning-manual-hyperparameter-search", entity="tajak", name="saving-trained-model")

In [16]:
# Calculate how many steps will each epoch have
# Num steps in epoch = training samples / batch size
steps_per_epoch = int(849/8)
steps_per_epoch

106

I evaluated per every 10th epoch - per 1060 steps. I first trained the model while evaluating it to find the optimal number of epochs and then trained it again and saved it.

In [18]:
# Create a TransformerModel
roberta_base_model = ClassificationModel(
        "xlmroberta", "xlm-roberta-base",
        num_labels=len(LABELS),
        use_cuda=True,
        args= {
            "overwrite_output_dir": True,
            "num_train_epochs": 10,
            "train_batch_size":8,
            "learning_rate": 1e-5,
            # Use these parameters if you want to evaluate during training
            #"evaluate_during_training": True,
            #"evaluate_during_training_steps": steps_per_epoch*10,
            #"evaluate_during_training_verbose": True,
            #"use_cached_eval_features": True,
            #'reprocess_input_data': True,
            "labels_list": LABELS,
            # The following parameters are commented out because I want to save the model
            #"no_cache": True,
            # Disable no_save: True if you want to save the model
            #"no_save": True,
            "max_seq_length": 512,
            "save_steps": -1,
            # Only the trained model will be saved - to prevent filling all of the space
            "save_model_every_epoch":False,
            "wandb_project": 'FTD-learning-manual-hyperparameter-search',
            "silent": True,
            }
        )

Some weights of the model checkpoint at xlm-roberta-base were not used when initializing XLMRobertaForSequenceClassification: ['lm_head.decoder.weight', 'lm_head.bias', 'roberta.pooler.dense.weight', 'lm_head.dense.bias', 'roberta.pooler.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.out_p

In [19]:
# Train the model
roberta_base_model.train_model(train_df)

INFO:simpletransformers.classification.classification_utils: Converting to features started. Cache is not used.
INFO:simpletransformers.classification.classification_utils: Saving features into cached file cache_dir/cached_train_xlmroberta_512_10_2


Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_model: Initializing WandB run for training.





VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

Running Epoch 0 of 10:   0%|          | 0/107 [00:00<?, ?it/s]

Running Epoch 1 of 10:   0%|          | 0/107 [00:00<?, ?it/s]

Running Epoch 2 of 10:   0%|          | 0/107 [00:00<?, ?it/s]

Running Epoch 3 of 10:   0%|          | 0/107 [00:00<?, ?it/s]

Running Epoch 4 of 10:   0%|          | 0/107 [00:00<?, ?it/s]

Running Epoch 5 of 10:   0%|          | 0/107 [00:00<?, ?it/s]

Running Epoch 6 of 10:   0%|          | 0/107 [00:00<?, ?it/s]

Running Epoch 7 of 10:   0%|          | 0/107 [00:00<?, ?it/s]

Running Epoch 8 of 10:   0%|          | 0/107 [00:00<?, ?it/s]

Running Epoch 9 of 10:   0%|          | 0/107 [00:00<?, ?it/s]

INFO:simpletransformers.classification.classification_model: Training of xlmroberta model complete. Saved to outputs/.


(1070, 0.9864452896831192)

In [5]:
# Save the trained model to Wandb
run = wandb.init(project="FTD-learning-manual-hyperparameter-search", entity="tajak", name="saving-trained-model")
trained_model_artifact = wandb.Artifact("FTD-classifier", type="model", description="a model trained on the FTD dataset")
trained_model_artifact.add_dir("artifacts/FTD-classifier:v1")
run.log_artifact(trained_model_artifact)

[34m[1mwandb[0m: Adding directory to artifact (./artifacts/FTD-classifier:v1)... Done. 1.9s


<wandb.sdk.wandb_artifacts.Artifact at 0x7fe257743d60>

In [None]:
# Clean the GPU cache
cuda.select_device(0)
cuda.close()
cuda.select_device(0)
torch.cuda.empty_cache()