*Copyright (c) Microsoft Corporation. All rights reserved.*

*Licensed under the MIT License.*

# Text Classification of MultiNLI Sentences using Multiple Transformer Models  


#### TODO
1. Make sure that all the packages can be installed on Colab


In [1]:
%load_ext autoreload

In [2]:
%autoreload 2

## Import requirements

In [49]:
import json
import os
import pickle
import shutil
import sys
import tempfile
import urllib
import zipfile
from tempfile import TemporaryDirectory

import matplotlib.pyplot as plt
import numpy as np

import pandas as pd
import scrapbook as sb
import torch
import torch.nn as nn
from sklearn.decomposition import NMF, LatentDirichletAllocation
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics import accuracy_score, classification_report
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.svm import LinearSVC
from spacy.lang.en import English
from tqdm import tqdm

# Our code components
from common.article_classification_dataset import (
    ArticleClassificationDataProcessor,
    ArticleClassificationDataSet,
)
from common.article_classification_model import ArticleClassifier
from common.article_classification_utilities import DownloadMindDataset, Timer


In [50]:
# !python -m spacy download en_core_web_md
import en_core_web_md
nlp = en_core_web_md.load()
en = English()

In [51]:
# notebook parameters
CACHE_DIR = TemporaryDirectory().name
NUM_EPOCHS = 1
BATCH_SIZE = 32
NUM_GPUS = torch.cuda.device_count()
MAX_LEN = 100
MODEL_NAMES = ["distilbert-base-uncased", "roberta-base", "xlnet-base-cased"]
MODEL_RESULTS = dict()
LABEL_COL = 'category'
TEXT_COL = 'text'

## Read the files with pandas  
 The news.tsv file contains the detailed information of news articles involved in the behaviors.tsv file.
 It has 7 columns, which are divided by the tab symbol:
 - News ID
 - Category
 - Subcategory
 - Title
 - Abstract
 - URL
 - Title Entities (entities contained in the title of this news)
 - Abstract Entities (entities contained in the abstract of this news)

### Download and Process MIND Dataset  

Using `DownloadMindDataset.process_and_load_dfs()` returns the training and test dataframes.  

> - Using defaults sets the data directory to `mind_dataset` and downloads small set
> - If the data is already downloaded, the download process is skipped


In [76]:
df_train, df_test = DownloadMindDataset.process_and_load_dfs()

Bypassing download of already-downloaded file MINDsmall_train.zip
Bypassing download of already-downloaded file MINDsmall_dev.zip
Downloading and extraction complete!
Train:  ['behaviors.tsv', 'relation_embedding.vec', 'https_mind201910small.blob.core.windows.net_release_MINDsmall_train.zip', 'entity_embedding.vec', 'news.tsv']
Test:  ['behaviors.tsv', 'relation_embedding.vec', 'entity_embedding.vec', 'news.tsv', 'https_mind201910small.blob.core.windows.net_release_MINDsmall_dev.zip']


In [77]:
display(df_train.head())
display(df_train.head())

Unnamed: 0,id,category,subcategory,title,abstract,url,title_entities,abstract_entities
0,N55528,lifestyle,lifestyleroyals,"The Brands Queen Elizabeth, Prince Charles, an...","Shop the notebooks, jackets, and more that the...",https://assets.msn.com/labs/mind/AAGH0ET.html,"[{""Label"": ""Prince Philip, Duke of Edinburgh"",...",[]
1,N19639,health,weightloss,50 Worst Habits For Belly Fat,These seemingly harmless habits are holding yo...,https://assets.msn.com/labs/mind/AAB19MK.html,"[{""Label"": ""Adipose tissue"", ""Type"": ""C"", ""Wik...","[{""Label"": ""Adipose tissue"", ""Type"": ""C"", ""Wik..."
2,N61837,news,newsworld,The Cost of Trump's Aid Freeze in the Trenches...,Lt. Ivan Molchanets peeked over a parapet of s...,https://assets.msn.com/labs/mind/AAJgNsz.html,[],"[{""Label"": ""Ukraine"", ""Type"": ""G"", ""WikidataId..."
3,N53526,health,voices,I Was An NBA Wife. Here's How It Affected My M...,"I felt like I was a fraud, and being an NBA wi...",https://assets.msn.com/labs/mind/AACk2N6.html,[],"[{""Label"": ""National Basketball Association"", ..."
4,N38324,health,medical,"How to Get Rid of Skin Tags, According to a De...","They seem harmless, but there's a very good re...",https://assets.msn.com/labs/mind/AAAKEkt.html,"[{""Label"": ""Skin tag"", ""Type"": ""C"", ""WikidataI...","[{""Label"": ""Skin tag"", ""Type"": ""C"", ""WikidataI..."


Unnamed: 0,id,category,subcategory,title,abstract,url,title_entities,abstract_entities
0,N55528,lifestyle,lifestyleroyals,"The Brands Queen Elizabeth, Prince Charles, an...","Shop the notebooks, jackets, and more that the...",https://assets.msn.com/labs/mind/AAGH0ET.html,"[{""Label"": ""Prince Philip, Duke of Edinburgh"",...",[]
1,N19639,health,weightloss,50 Worst Habits For Belly Fat,These seemingly harmless habits are holding yo...,https://assets.msn.com/labs/mind/AAB19MK.html,"[{""Label"": ""Adipose tissue"", ""Type"": ""C"", ""Wik...","[{""Label"": ""Adipose tissue"", ""Type"": ""C"", ""Wik..."
2,N61837,news,newsworld,The Cost of Trump's Aid Freeze in the Trenches...,Lt. Ivan Molchanets peeked over a parapet of s...,https://assets.msn.com/labs/mind/AAJgNsz.html,[],"[{""Label"": ""Ukraine"", ""Type"": ""G"", ""WikidataId..."
3,N53526,health,voices,I Was An NBA Wife. Here's How It Affected My M...,"I felt like I was a fraud, and being an NBA wi...",https://assets.msn.com/labs/mind/AACk2N6.html,[],"[{""Label"": ""National Basketball Association"", ..."
4,N38324,health,medical,"How to Get Rid of Skin Tags, According to a De...","They seem harmless, but there's a very good re...",https://assets.msn.com/labs/mind/AAAKEkt.html,"[{""Label"": ""Skin tag"", ""Type"": ""C"", ""WikidataI...","[{""Label"": ""Skin tag"", ""Type"": ""C"", ""WikidataI..."


In [78]:
display(df_train.shape)
display(df_test.shape)

(51282, 8)

(42416, 8)

## Functions

## Introduction
In this notebook, we fine-tune and evaluate a number of pretrained models on a subset of the [Microsoft MIND Dataset](https://blogs.msn.com/mind-at-work-news-recommendation-challenge-for-researchers/) dataset.

We use a `ArticleClassifier` that wraps [Hugging Face's PyTorch implementation](https://github.com/huggingface/transformers) of different transformers, like [BERT](https://github.com/google-research/bert), [XLNet](https://github.com/zihangdai/xlnet), and [RoBERTa](https://github.com/pytorch/fairseq).  

It also adapts some of the work done on [Microsoft NLP Recipes](https://github.com/microsoft/nlp) to implement the `ArticleClassifier` and other reusable components that make it easy to fit these transfomer models.  

We leveraged Hugging Face's latest `AutoModels` architecture to help us infer the different transformer models we used for this article classification.  

We fine-tuned the transformer models on Microsoft Azure GPU machines with a configuration of 1 Tesla K80 GPU with 56 GiB RAM. 


## Read Dataset
We start by loading a subset of the data. The following function also downloads and extracts the files, if they don't exist in the data folder.

The MultiNLI dataset is mainly used for natural language inference (NLI) tasks, where the inputs are sentence pairs and the labels are entailment indicators. The sentence pairs are also classified into *genres* that allow for more coverage and better evaluation of NLI models.

For our classification task, we use the first sentence only as the text input, and the corresponding genre as the label. We select the examples corresponding to one of the entailment labels (*neutral* in this case) to avoid duplicate rows, as the sentences are not unique, whereas the sentence pairs are.

In [79]:
df_train = df_train[["title", "abstract", "category"]]
df_test = df_test[["title", "abstract", "category"]]

In [80]:
display(df_train.isnull().sum())

title          0
abstract    2666
category       0
dtype: int64

In [81]:
display(df_test.isnull().sum())

title          0
abstract    2021
category       0
dtype: int64

Exploring the data, we observed that there are rows with NaN. This is interesting, because you would expect that from a numerical data and not text. However, we need to spend some time data wrangling and cleaning.  

1. First, we remove the rows with Nan
1. To avoid scrapping the entire news article with their links, we decided to use a combination of the news title and abstract, from the MIND dataset, as the full text to train our classifiers
1. There is a category of article called `news`. We decided to choose the top unique 6 categories removing the `news` category.

In [82]:
df_train.dropna(inplace=True)
df_test.dropna(inplace=True)

In [83]:
display(df_train.isnull().sum())

title       0
abstract    0
category    0
dtype: int64

In [84]:
display(df_test.isnull().sum())

title       0
abstract    0
category    0
dtype: int64

In [85]:
display(df_train.shape)
display(df_test.shape)

(48616, 3)

(40395, 3)

### Merging the title and abstract to form long enough text to finetune our classifier

In [86]:
df_train["text"] = df_train["title"].astype(str) + df_train["abstract"].astype(str)
df_train.drop(columns=['title', 'abstract'], inplace=True)

In [87]:
display(df_train.head())
display(df_train.shape)

Unnamed: 0,category,text
0,lifestyle,"The Brands Queen Elizabeth, Prince Charles, an..."
1,health,50 Worst Habits For Belly FatThese seemingly h...
2,news,The Cost of Trump's Aid Freeze in the Trenches...
3,health,I Was An NBA Wife. Here's How It Affected My M...
4,health,"How to Get Rid of Skin Tags, According to a De..."


(48616, 2)

In [88]:
df_test["text"] = df_test["title"].astype(str) + df_test["abstract"].astype(str)
df_test.drop(columns=['title', 'abstract'], inplace=True)

In [89]:
display(df_test.head())
display(df_test.shape)

Unnamed: 0,category,text
0,lifestyle,"The Brands Queen Elizabeth, Prince Charles, an..."
2,news,The Cost of Trump's Aid Freeze in the Trenches...
3,health,I Was An NBA Wife. Here's How It Affected My M...
4,health,"How to Get Rid of Skin Tags, According to a De..."
5,sports,Should NFL be able to fine players for critici...


(40395, 2)

### Filter the data to only interesting article types 

In [90]:
# Choose top interesting categories
chosen_articles = [
    "sports",
    "finance",
    "foodanddrink",
    "health",
    "travel",
    "weather",
    "movies",
    "music",
]

In [91]:
display(df_train.shape)
df_train = df_train[df_train.category.isin(chosen_articles)]

(48616, 2)

In [92]:
display(df_test.shape)
df_test = df_test[df_test.category.isin(chosen_articles)]

(40395, 2)

In [93]:
# shuffle and reset index
df_train = df_train.sample(frac=1).reset_index(drop=True)
df_test = df_test.sample(frac=1).reset_index(drop=True)

In [94]:
display(df_train.head(10))
display(df_train.shape)

Unnamed: 0,category,text
0,sports,Michigan State's Cassius Winston makes first s...
1,sports,Hillsborough dad: Son walks to bus stop on bus...
2,foodanddrink,A 'Cake Boss' Pastry Chef Returns to Washingto...
3,finance,"Barneys Is Sold for Scrap, Ending an EraAuthen..."
4,sports,"In its last World Series Game 7, Washington lo..."
5,travel,Cheatham Co. annual ceremony celebrates local ...
6,weather,Tornado confirmed near Philly; central Pa. get...
7,sports,Red Zone Play: Carlos Ain't Hyde'n No MoreThe ...
8,sports,"Which of these Mavericks plays first: Barea, B..."
9,sports,"Grand Canyon signs 3 basketball players, waiti..."


(26084, 2)

In [95]:
display(df_test.head(10))
display(df_test.shape)

Unnamed: 0,category,text
0,travel,Plane crashes near Atlanta Air ShowA Canadian ...
1,sports,7 biggest takeaways from the opening College F...
2,sports,Ravens TE doubts Patriots defense: 'We'll see ...
3,sports,Panthers waive struggling return specialist Ra...
4,foodanddrink,Here Are Four Easy Ways to Chill Your Drinks F...
5,sports,"Sure, it was the Jets, but that was the best p..."
6,sports,MLB world reacts to Justin Verlander winning C...
7,finance,Capital One customers can't withdraw money aft...
8,finance,The best and worst states to live inOne of the...
9,sports,Betting on golf: How our experts have correctl...


(21461, 2)

The examples in the dataset are grouped into 8 news article category

In [96]:
df_train[LABEL_COL].value_counts()

sports          13231
finance          3048
foodanddrink     2513
travel           2223
weather          1879
health           1834
music             754
movies            602
Name: category, dtype: int64

In [97]:
df_test[LABEL_COL].value_counts()

sports          10778
finance          2533
foodanddrink     2213
travel           1755
health           1676
weather          1364
music             605
movies            537
Name: category, dtype: int64

In [98]:
# encode labels
label_encoder = LabelEncoder()
df_train[LABEL_COL] = label_encoder.fit_transform(df_train[LABEL_COL])
df_test[LABEL_COL] = label_encoder.transform(df_test[LABEL_COL])

num_labels = len(np.unique(df_train[LABEL_COL]))

In [99]:
print("Number of unique labels: {}".format(num_labels))
print("Number of training examples: {}".format(df_train.shape[0]))
print("Number of testing examples: {}".format(df_test.shape[0]))

Number of unique labels: 8
Number of training examples: 26084
Number of testing examples: 21461


In [100]:
display( len(np.unique(df_train[LABEL_COL])))
display( len(np.unique(df_test[LABEL_COL])))

8

8

## Select Pretrained Models

Several pretrained models have been made available by [Hugging Face](https://github.com/huggingface/transformers). For text classification. We will be using `distillBert, Roberta` and `XLNet` because of their size compared to other larger transformer models.  


## Fine-tune

Our wrappers make it easy to fine-tune different models in a unified way, hiding the preprocessing details that are needed before training. In this example, we're going to select the following models and use the same piece of code to fine-tune them on our genre classification task. Note that some models were pretrained on multilingual datasets and can be used with non-English datasets.

In [103]:
print(MODEL_NAMES)

['distilbert-base-uncased', 'roberta-base', 'xlnet-base-cased']


For each pretrained model, we preprocess the data, fine-tune the classifier, score the test set, and store the evaluation results.

In [38]:
for name in tqdm(MODEL_NAMES[:1], disable=True):

    # preprocess
    processor = ArticleClassificationDataProcessor(
        model_name=str(name),
        to_lower=name.endswith("uncased"),
        batch_size=BATCH_SIZE, 
        num_gpus=NUM_GPUS,
        cache_dir=CACHE_DIR
    )
    
    # Defining training artifacts
    train_dataset = processor.create_dataset_from_dataframe(df_train, TEXT_COL, LABEL_COL, max_len=MAX_LEN)
    train_dataloader = processor.create_dataloader_from_dataset(train_dataset, shuffle=True)
    
    # Defining test artifacts
    test_dataset = processor.create_dataset_from_dataframe(df_test, TEXT_COL, LABEL_COL, max_len=MAX_LEN)
    test_dataloader = processor.create_dataloader_from_dataset(test_dataset, shuffle=False)

    # fine-tune
    classifier = ArticleClassifier(model_name=name, num_labels=num_labels, cache_dir=CACHE_DIR)
    with Timer() as t:
        classifier.fit(
            train_dataloader, num_epochs=NUM_EPOCHS, num_gpus=NUM_GPUS, verbose=False,
        )
    train_time = t.interval / 3600

    # predict
    preds = classifier.predict(test_dataloader, num_gpus=NUM_GPUS, verbose=True)

    # eval
    accuracy = accuracy_score(df_test[LABEL_COL], preds)
    class_report = classification_report(
        df_test[LABEL_COL], preds, target_names=label_encoder.classes_, output_dict=True
    )

    # save results
    MODEL_RESULTS[name] = {
        "Test Set Accuracy": accuracy,
        "f1-score": class_report["macro avg"]["f1-score"],
        "time(hrs)": train_time,
    }

HBox(children=(IntProgress(value=0, description='Downloading', max=442, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=231508, style=ProgressStyle(description_wid…




HBox(children=(IntProgress(value=0, description='Downloading', max=267967963, style=ProgressStyle(description_…




Iteration:   2%|▏         | 10/408 [00:05<03:27,  1.92it/s]

timestamp: 07/08/2020 01:13:26, average loss: 1.687456, time duration: 5.396389,
                            number of examples in current reporting: 320, step 10
                            out of total 408


Iteration:   5%|▍         | 20/408 [00:10<03:20,  1.94it/s]

timestamp: 07/08/2020 01:13:31, average loss: 1.170485, time duration: 5.141245,
                            number of examples in current reporting: 320, step 20
                            out of total 408


Iteration:   7%|▋         | 30/408 [00:15<03:13,  1.96it/s]

timestamp: 07/08/2020 01:13:36, average loss: 0.865737, time duration: 5.118764,
                            number of examples in current reporting: 320, step 30
                            out of total 408


Iteration:  10%|▉         | 40/408 [00:20<03:07,  1.97it/s]

timestamp: 07/08/2020 01:13:41, average loss: 0.831665, time duration: 5.098286,
                            number of examples in current reporting: 320, step 40
                            out of total 408


Iteration:  12%|█▏        | 50/408 [00:25<03:06,  1.92it/s]

timestamp: 07/08/2020 01:13:47, average loss: 0.654620, time duration: 5.149658,
                            number of examples in current reporting: 320, step 50
                            out of total 408


Iteration:  15%|█▍        | 60/408 [00:31<02:57,  1.96it/s]

timestamp: 07/08/2020 01:13:52, average loss: 0.538452, time duration: 5.108948,
                            number of examples in current reporting: 320, step 60
                            out of total 408


Iteration:  17%|█▋        | 70/408 [00:36<02:53,  1.95it/s]

timestamp: 07/08/2020 01:13:57, average loss: 0.535440, time duration: 5.130575,
                            number of examples in current reporting: 320, step 70
                            out of total 408


Iteration:  20%|█▉        | 80/408 [00:41<02:50,  1.93it/s]

timestamp: 07/08/2020 01:14:02, average loss: 0.447491, time duration: 5.156112,
                            number of examples in current reporting: 320, step 80
                            out of total 408


Iteration:  22%|██▏       | 90/408 [00:46<02:44,  1.93it/s]

timestamp: 07/08/2020 01:14:07, average loss: 0.340346, time duration: 5.201230,
                            number of examples in current reporting: 320, step 90
                            out of total 408


Iteration:  25%|██▍       | 100/408 [00:51<02:40,  1.92it/s]

timestamp: 07/08/2020 01:14:12, average loss: 0.363850, time duration: 5.206596,
                            number of examples in current reporting: 320, step 100
                            out of total 408


Iteration:  27%|██▋       | 110/408 [00:56<02:34,  1.93it/s]

timestamp: 07/08/2020 01:14:18, average loss: 0.368781, time duration: 5.192818,
                            number of examples in current reporting: 320, step 110
                            out of total 408


Iteration:  29%|██▉       | 120/408 [01:02<02:30,  1.91it/s]

timestamp: 07/08/2020 01:14:23, average loss: 0.415847, time duration: 5.224649,
                            number of examples in current reporting: 320, step 120
                            out of total 408


Iteration:  32%|███▏      | 130/408 [01:07<02:25,  1.91it/s]

timestamp: 07/08/2020 01:14:28, average loss: 0.363152, time duration: 5.218906,
                            number of examples in current reporting: 320, step 130
                            out of total 408


Iteration:  34%|███▍      | 140/408 [01:12<02:19,  1.92it/s]

timestamp: 07/08/2020 01:14:33, average loss: 0.386361, time duration: 5.218870,
                            number of examples in current reporting: 320, step 140
                            out of total 408


Iteration:  37%|███▋      | 150/408 [01:17<02:13,  1.93it/s]

timestamp: 07/08/2020 01:14:38, average loss: 0.457655, time duration: 5.196765,
                            number of examples in current reporting: 320, step 150
                            out of total 408


Iteration:  39%|███▉      | 160/408 [01:23<02:09,  1.92it/s]

timestamp: 07/08/2020 01:14:44, average loss: 0.500720, time duration: 5.236064,
                            number of examples in current reporting: 320, step 160
                            out of total 408


Iteration:  42%|████▏     | 170/408 [01:28<02:04,  1.91it/s]

timestamp: 07/08/2020 01:14:49, average loss: 0.383316, time duration: 5.224211,
                            number of examples in current reporting: 320, step 170
                            out of total 408


Iteration:  44%|████▍     | 180/408 [01:33<01:58,  1.93it/s]

timestamp: 07/08/2020 01:14:54, average loss: 0.383354, time duration: 5.197513,
                            number of examples in current reporting: 320, step 180
                            out of total 408


Iteration:  47%|████▋     | 190/408 [01:38<01:53,  1.91it/s]

timestamp: 07/08/2020 01:14:59, average loss: 0.325613, time duration: 5.233720,
                            number of examples in current reporting: 320, step 190
                            out of total 408


Iteration:  49%|████▉     | 200/408 [01:43<01:48,  1.91it/s]

timestamp: 07/08/2020 01:15:05, average loss: 0.329252, time duration: 5.208134,
                            number of examples in current reporting: 320, step 200
                            out of total 408


Iteration:  51%|█████▏    | 210/408 [01:49<01:43,  1.92it/s]

timestamp: 07/08/2020 01:15:10, average loss: 0.302565, time duration: 5.226939,
                            number of examples in current reporting: 320, step 210
                            out of total 408


Iteration:  54%|█████▍    | 220/408 [01:54<01:39,  1.90it/s]

timestamp: 07/08/2020 01:15:15, average loss: 0.351033, time duration: 5.243717,
                            number of examples in current reporting: 320, step 220
                            out of total 408


Iteration:  56%|█████▋    | 230/408 [01:59<01:33,  1.89it/s]

timestamp: 07/08/2020 01:15:20, average loss: 0.376471, time duration: 5.270166,
                            number of examples in current reporting: 320, step 230
                            out of total 408


Iteration:  59%|█████▉    | 240/408 [02:04<01:28,  1.90it/s]

timestamp: 07/08/2020 01:15:26, average loss: 0.346392, time duration: 5.251647,
                            number of examples in current reporting: 320, step 240
                            out of total 408


Iteration:  61%|██████▏   | 250/408 [02:10<01:22,  1.92it/s]

timestamp: 07/08/2020 01:15:31, average loss: 0.348630, time duration: 5.247874,
                            number of examples in current reporting: 320, step 250
                            out of total 408


Iteration:  64%|██████▎   | 260/408 [02:15<01:17,  1.90it/s]

timestamp: 07/08/2020 01:15:36, average loss: 0.377407, time duration: 5.253714,
                            number of examples in current reporting: 320, step 260
                            out of total 408


Iteration:  66%|██████▌   | 270/408 [02:20<01:12,  1.90it/s]

timestamp: 07/08/2020 01:15:41, average loss: 0.357583, time duration: 5.249387,
                            number of examples in current reporting: 320, step 270
                            out of total 408


Iteration:  69%|██████▊   | 280/408 [02:25<01:07,  1.90it/s]

timestamp: 07/08/2020 01:15:47, average loss: 0.320712, time duration: 5.264165,
                            number of examples in current reporting: 320, step 280
                            out of total 408


Iteration:  71%|███████   | 290/408 [02:31<01:02,  1.90it/s]

timestamp: 07/08/2020 01:15:52, average loss: 0.339694, time duration: 5.285933,
                            number of examples in current reporting: 320, step 290
                            out of total 408


Iteration:  74%|███████▎  | 300/408 [02:36<00:57,  1.88it/s]

timestamp: 07/08/2020 01:15:57, average loss: 0.379691, time duration: 5.277739,
                            number of examples in current reporting: 320, step 300
                            out of total 408


Iteration:  76%|███████▌  | 310/408 [02:41<00:51,  1.91it/s]

timestamp: 07/08/2020 01:16:02, average loss: 0.344075, time duration: 5.270586,
                            number of examples in current reporting: 320, step 310
                            out of total 408


Iteration:  78%|███████▊  | 320/408 [02:46<00:46,  1.89it/s]

timestamp: 07/08/2020 01:16:08, average loss: 0.354495, time duration: 5.274941,
                            number of examples in current reporting: 320, step 320
                            out of total 408


Iteration:  81%|████████  | 330/408 [02:52<00:41,  1.89it/s]

timestamp: 07/08/2020 01:16:13, average loss: 0.266078, time duration: 5.260077,
                            number of examples in current reporting: 320, step 330
                            out of total 408


Iteration:  83%|████████▎ | 340/408 [02:57<00:35,  1.91it/s]

timestamp: 07/08/2020 01:16:18, average loss: 0.326496, time duration: 5.229506,
                            number of examples in current reporting: 320, step 340
                            out of total 408


Iteration:  86%|████████▌ | 350/408 [03:02<00:30,  1.91it/s]

timestamp: 07/08/2020 01:16:23, average loss: 0.257823, time duration: 5.239575,
                            number of examples in current reporting: 320, step 350
                            out of total 408


Iteration:  88%|████████▊ | 360/408 [03:07<00:25,  1.89it/s]

timestamp: 07/08/2020 01:16:29, average loss: 0.216088, time duration: 5.278178,
                            number of examples in current reporting: 320, step 360
                            out of total 408


Iteration:  91%|█████████ | 370/408 [03:13<00:19,  1.91it/s]

timestamp: 07/08/2020 01:16:34, average loss: 0.293232, time duration: 5.249712,
                            number of examples in current reporting: 320, step 370
                            out of total 408


Iteration:  93%|█████████▎| 380/408 [03:18<00:14,  1.91it/s]

timestamp: 07/08/2020 01:16:39, average loss: 0.313664, time duration: 5.296565,
                            number of examples in current reporting: 320, step 380
                            out of total 408


Iteration:  96%|█████████▌| 390/408 [03:23<00:09,  1.91it/s]

timestamp: 07/08/2020 01:16:45, average loss: 0.309406, time duration: 5.249838,
                            number of examples in current reporting: 320, step 390
                            out of total 408


Iteration:  98%|█████████▊| 400/408 [03:29<00:04,  1.92it/s]

timestamp: 07/08/2020 01:16:50, average loss: 0.284367, time duration: 5.228394,
                            number of examples in current reporting: 320, step 400
                            out of total 408


Iteration: 100%|██████████| 408/408 [03:33<00:00,  2.15it/s]
Scoring: 100%|██████████| 336/336 [01:12<00:00,  4.65it/s]


## Evaluate

Finally, we report the accuracy and F1-score metrics for each model, as well as the fine-tuning time in hours.

In [39]:
df_results = pd.DataFrame(results)
df_results

Unnamed: 0,distilbert-base-uncased
accuracy,0.925629
f1-score,0.882435
time(hrs),0.059749
