# Run Text-Only Experiments

This notebook shows the end-to-end pipeline to fine-tune pre-trained BERT model for text classification on our dataset.

Parts of this pipeline are adapted from [McCormick's and Ryan's Tutorial on BERT Fine-Tuning](http://mccormickml.com/2019/07/22/BERT-fine-tuning/) and the
Huggingface `run_mmimdb.py` script to execute the MMBT model. This code can
be accessed [here.](https://github.com/huggingface/transformers/blob/8ea412a86faa8e9edeeb6b5c46b08def06aa03ea/examples/research_projects/mm-imdb/run_mmimdb.py#L305)

## Skip unless on Google Colab


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
%pwd

'/content'

In [3]:
%cd /content/drive/MyDrive/LAP_MMBT
%pwd

/content/drive/MyDrive/LAP_MMBT


'/content/drive/MyDrive/LAP_MMBT'

## Check GPU is Available

In [4]:
import torch

# If there's a GPU available...
if torch.cuda.is_available():    

    # Tell PyTorch to use the GPU.    
    device = torch.device("cuda")

    print('There are %d GPU(s) available.' % torch.cuda.device_count())

    print('We will use the GPU:', torch.cuda.get_device_name(0))

# If not...
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")

There are 1 GPU(s) available.
We will use the GPU: Tesla T4


## Install Huggingface Trnasformers and WandB modules

These should have been installed during your environment set-up; you only need to run these cells in Google Colab.

In [5]:
!pip install transformers



In [6]:
%pip install wandb



## Import Required Modules

In [7]:
from textBert_utils import get_train_val_test_data, tokenize_and_encode_data, make_tensor_dataset, make_dataloader, set_seed

In [30]:
import textBert_utils

In [8]:
import argparse 
import pandas as pd
import os
import wandb
import glob
import numpy as np

In [9]:
import logging
import json

In [10]:
from transformers import (
    WEIGHTS_NAME,
    AutoConfig,
    AutoModelForSequenceClassification,
    AutoTokenizer,
)

# Set-up Experiment Hyperparameters and Arguments

Specify the training, validation, and test files to run the experiment on. The default here is running the model on 'impression' texts.  

To re-make the training, validation, and test data, please refer to the information in the **data/** directory.  

Change the default values in the parser.add_argument function for the hyperparameters that you want to specify in the following cell or use the default option.  

For multiple experiment runs, please make sure to change the `output_dir` argument so that new results don't overwrit existing ones.

In [11]:
train_file = "image_labels_impression_frontal_train.csv"
val_file = "image_labels_impression_frontal_val.csv"
test_file = "image_labels_impression_frontal_test.csv"

In [12]:
parser = argparse.ArgumentParser(f'Project Hyperparameters and Other Configurations Argument Parser')

In [13]:
parser = argparse.ArgumentParser()

# Required parameters
parser.add_argument(
    "--data_dir",
    default="data/csv",
    type=str,
    help="The input data dir. Should contain the .jsonl files.",
)
parser.add_argument(
    "--model_name",
    default="bert-base-uncased",
    type=str,
    help="model identifier from huggingface.co/models",
)
parser.add_argument(
    "--output_dir",
    default="text_only",
    type=str,
    help="The output directory where the model predictions and checkpoints will be written.",
)

    
parser.add_argument(
    "--config_name", default="bert-base-uncased", type=str, help="Pretrained config name if not the same as model_name"
)
parser.add_argument(
    "--tokenizer_name",
    default="bert-base-uncased",
    type=str,
    help="Pretrained tokenizer name or path if not the same as model_name",
)

parser.add_argument("--train_batch_size", default=32, type=int, help="Batch size for training.")
parser.add_argument(
    "--eval_batch_size", default=32, type=int, help="Batch size for evaluation."
)
parser.add_argument(
    "--max_seq_length",
    default=256,
    type=int,
    help="The maximum total input sequence length after tokenization. Sequences longer "
    "than this will be truncated, sequences shorter will be padded.",
)
parser.add_argument(
    "--num_image_embeds", default=3, type=int, help="Number of Image Embeddings from the Image Encoder"
)
parser.add_argument("--do_train", default=True, type=bool, help="Whether to run training.")
parser.add_argument("--do_eval", default=True, type=bool, help="Whether to run eval on the dev set.")
parser.add_argument(
    "--evaluate_during_training", default=True, type=bool, help="Rul evaluation during training at each logging step."
)


parser.add_argument(
    "--gradient_accumulation_steps",
    type=int,
    default=1,
    help="Number of updates steps to accumulate before performing a backward/update pass.",
)
parser.add_argument("--learning_rate", default=5e-5, type=float, help="The initial learning rate for Adam.")
parser.add_argument("--weight_decay", default=0.1, type=float, help="Weight deay if we apply some.")
parser.add_argument("--adam_epsilon", default=1e-8, type=float, help="Epsilon for Adam optimizer.")
parser.add_argument("--max_grad_norm", default=1.0, type=float, help="Max gradient norm.")
parser.add_argument(
    "--num_train_epochs", default=4.0, type=float, help="Total number of training epochs to perform."
)
parser.add_argument("--patience", default=5, type=int, help="Patience for Early Stopping.")
parser.add_argument(
    "--max_steps",
    default=-1,
    type=int,
    help="If > 0: set total number of training steps to perform. Override num_train_epochs.",
)
parser.add_argument("--warmup_steps", default=0, type=int, help="Linear warmup over warmup_steps.")

parser.add_argument("--logging_steps", type=int, default=25, help="Log every X updates steps.")
parser.add_argument("--save_steps", type=int, default=25, help="Save checkpoint every X updates steps.")
parser.add_argument(
    "--eval_all_checkpoints",
    default=True, type=bool,
    help="Evaluate all checkpoints starting with the same prefix as model_name ending and ending with step number",
)

parser.add_argument("--num_workers", type=int, default=8, help="number of worker threads for dataloading")

parser.add_argument("--seed", type=int, default=42, help="random seed for initialization")


args = parser.parse_args("")

# Setup CUDA, GPU & distributed training
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
args.n_gpu = torch.cuda.device_count() if torch.cuda.is_available() else 0
args.device = device

# Setup Train/Val/Test filenames
args.train_file = train_file
args.val_file = val_file
args.test_file = test_file

### Check that the Args dict contains correct configurations

In [14]:
args.__dict__

{'adam_epsilon': 1e-08,
 'config_name': 'bert-base-uncased',
 'data_dir': 'data/csv',
 'device': device(type='cuda'),
 'do_eval': True,
 'do_train': True,
 'eval_all_checkpoints': True,
 'eval_batch_size': 32,
 'evaluate_during_training': True,
 'gradient_accumulation_steps': 1,
 'learning_rate': 5e-05,
 'logging_steps': 25,
 'max_grad_norm': 1.0,
 'max_seq_length': 256,
 'max_steps': -1,
 'model_name': 'bert-base-uncased',
 'n_gpu': 1,
 'num_image_embeds': 3,
 'num_train_epochs': 4.0,
 'num_workers': 8,
 'output_dir': 'text_only',
 'patience': 5,
 'save_steps': 25,
 'seed': 42,
 'test_file': 'image_labels_impression_frontal_test.csv',
 'tokenizer_name': 'bert-base-uncased',
 'train_batch_size': 32,
 'train_file': 'image_labels_impression_frontal_train.csv',
 'val_file': 'image_labels_impression_frontal_val.csv',
 'warmup_steps': 0,
 'weight_decay': 0.1}

## Set-up WandB

We are setting up our code to run more experiments later and would be tracking them in the WandB API. You need to sign up for an account first to continue.

In [15]:
wandb.login()

[34m[1mwandb[0m: Currently logged in as: [33mlap_mmbtws2021[0m (use `wandb login --relogin` to force relogin)


True

In [16]:
wandb.init(name="Train_Impression_Texts", tags=['Impression', 'frontal'], project="Text_Only", notes="256 size and 32 batch", config=args.__dict__, sync_tensorboard=True)
run_name = wandb.run.name
wandb_config = wandb.config

## Create Dataset

In [17]:
train, val, test = get_train_val_test_data(wandb_config)

Number of training sentences: 1,947

Number of val sentences: 649

Number of test sentences: 650



In [18]:
train.head()

Unnamed: 0.1,Unnamed: 0,img,label,text
0,459,CXR865_IM-2385-1001.png,0,no acute cardiopulmonary findings.
1,443,CXR835_IM-2360-1001.png,0,no acute radiographic cardiopulmonary process.
2,1956,CXR3828_IM-1932-1001.png,0,no acute cardiopulmonary abnormalities.
3,3035,CXR3273_IM-1554-1001.png,1,cardiomegaly without acute cardiopulmonary abn...
4,2044,CXR21_IM-0729-1001-0001.png,1,heart size normal. mediastinal silhouettes and...


In [19]:
val.head()

Unnamed: 0.1,Unnamed: 0,img,label,text
0,950,CXR1849_IM-0550-1001.png,0,no acute cardiopulmonary findings.
1,22,CXR42_IM-2063-1001.png,0,no acute cardiopulmonary abnormalities. .
2,2482,CXR1493_IM-0318-1001.png,1,stable chest. elevated left diaphragm. two bul...
3,1203,CXR2368_IM-0928-1001.png,0,1. no acute cardiopulmonary disease.
4,134,CXR279_IM-1224-1001-0001.png,0,1. no evidence of active disease.


In [20]:
test.head()

Unnamed: 0.1,Unnamed: 0,img,label,text
0,331,CXR639_IM-2218-1001.png,0,no evidence of active disease.
1,98,CXR201_IM-0660-1001.png,0,no acute cardiopulmonary findings. .
2,1918,CXR3749_IM-1874-1001.png,0,no acute cardiopulmonary abnormality.
3,986,CXR1908_IM-0590-1001.png,0,no acute cardiopulmonary disease.
4,1384,CXR2758_IM-1206-1001.png,0,"no acute or active cardiac, pulmonary or pleur..."


# sentences and labels

In [21]:
train_sentences = train.text.values
train_labels = train.label.values

val_sentences = val.text.values
val_labels = val.label.values

test_sentences = test.text.values
test_labels = test.label.values

In [22]:
train_sentences[:10]

array(['no acute cardiopulmonary findings.',
       'no acute radiographic cardiopulmonary process.',
       'no acute cardiopulmonary abnormalities.',
       'cardiomegaly without acute cardiopulmonary abnormality.',
       'heart size normal. mediastinal silhouettes and pulmonary vascularity are within normal limits. calcified lingular granuloma. no focal consolidations or pleural effusions. no pneumothorax. breast implants there is a moderate wedge xxxx deformity of the midthoracic vertebrae, xxxx t6, age-indeterminate.',
       '1. no acute cardiopulmonary process.',
       '1. no acute radiographic cardiopulmonary process.',
       'normal chest',
       'stable chest, no active/acute cardiopulmonary disease.',
       '1. low lung volume study with minimal bibasilar atelectasis. stable chest.'],
      dtype=object)

In [23]:
train_labels[:10]

array([0, 0, 0, 1, 1, 0, 0, 0, 0, 1])

# Tokenize and Encode with BERT encoder plus

The `tokenizer.encode_plus` function combines multiple steps for us:

1. Split the sentence into tokens.
2. Add the special `[CLS]` and `[SEP]` tokens.
3. Map the tokens to their IDs.
4. Pad or truncate all sentences to the same length.
5. Create the attention masks which explicitly differentiate real tokens from `[PAD]` tokens.

These steps are performed inside the `make_tensor_dataset` function.

# Torch dataset and dataloader

In [24]:
train_dataset = make_tensor_dataset(train_sentences, train_labels, wandb_config)
val_dataset = make_tensor_dataset(val_sentences, val_labels, wandb_config)

Original:  no acute cardiopulmonary findings.
Token IDs: tensor([  101,  2053, 11325,  4003,  3695, 14289, 13728,  7856,  2854,  9556,
         1012,   102,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
       

In [25]:
print(f'{len(train_dataset):>5,} training samples')
print(f'{len(val_dataset):>5,} validation samples')
#print(f'{len(test_dataset):>5,} test samples')

1,947 training samples
  649 validation samples


In [26]:
train_dataset[:3]

(tensor([[  101,  2053, 11325,  4003,  3695, 14289, 13728,  7856,  2854,  9556,
           1012,   102,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0,     0,     0,     0,   

Create an iterator for the dataset using the torch DataLoader class. 

In [27]:
data_loaders = {
    'train' : make_dataloader(train_dataset, wandb_config, eval=False),
    'train_size': len(train_dataset),
    'eval' : make_dataloader(val_dataset, wandb_config, eval=True),
    'eval_size' : len(val_dataset)
}

# Fine Tune BERT for Classification

## Setup Logging

In [28]:
# Setup logging
logger = logging.getLogger(__name__)
if not os.path.exists(wandb_config.output_dir):
    os.makedirs(wandb_config.output_dir)
logging.basicConfig(format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
                    datefmt="%m/%d/%Y %H:%M:%S",
                    filename=os.path.join(wandb_config.output_dir, f"{os.path.splitext(wandb_config.train_file)[0]}_logging.txt"),
                    level=logging.INFO)
logger.warning("device: %s, n_gpu: %s",
        wandb_config.device,
        wandb_config.n_gpu
)
# Set the verbosity to info of the Transformers logger (on main process only):

# Set seed
set_seed(wandb_config)

## Set up the Model and Train

The Code will simply train and validate the specified train and validation sets. 

Outputs and saved checkpoints are saved in the specifed `--output_dir` argument.
Tensorboard data are saved in the `runs/` directory with the date and time of the experiment as well as the filename of the train/test data file.

In [31]:
# set up model
transformer_config = AutoConfig.from_pretrained(wandb_config.model_name)
tokenizer = AutoTokenizer.from_pretrained(
        wandb_config.tokenizer_name,
        do_lower_case=True,
        cache_dir=None,
    )
transformer_model = AutoModelForSequenceClassification.from_pretrained(wandb_config.model_name, config=transformer_config)
transformer_model.to(device)
logger.info(f"Training/evaluation parameters: {wandb_config}")
# Training
if wandb_config.do_train:
    global_step, tr_loss = textBert_utils.train(data_loaders, wandb_config, transformer_model)
    logger.info(" global_step = %s, average loss = %s", global_step, tr_loss)

# Saving best-practices: if you use defaults names for the model, you can reload it using from_pretrained()
    logger.info("Saving model checkpoint to %s", wandb_config.output_dir)
    # Save a trained model, configuration and tokenizer using `save_pretrained()`.
    # They can then be reloaded using `from_pretrained()`
    model_to_save = (transformer_model.module if hasattr(transformer_model, "module") else transformer_model)  # Take care of distributed/parallel training
    torch.save(model_to_save.state_dict(), os.path.join(wandb_config.output_dir, WEIGHTS_NAME))
    tokenizer.save_pretrained(wandb_config.output_dir)
    transformer_config.save_pretrained(wandb_config.output_dir)

    # Good practice: save your training arguments together with the trained model
    torch.save(args, os.path.join(wandb_config.output_dir, "training_args.bin"))

    # Load a trained model and vocabulary that you have fine-tuned
    transformer_model = AutoModelForSequenceClassification.from_pretrained(wandb_config.model_name, config=transformer_config)
    transformer_model.load_state_dict(torch.load(os.path.join(wandb_config.output_dir, WEIGHTS_NAME)))
    tokenizer = AutoTokenizer.from_pretrained(wandb_config.output_dir)
    transformer_model.to(device)
logger.info("***** Training Finished *****")
wandb.finish()


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

{"eval_loss": 0.2008007083620344, "eval_accuracy": 0.9352850539291218, "learning_rate": 4.487704918032787e-05, "training_loss": 0.33485968500375746, "step": 25}



Batch Iteration:  41%|████      | 25/61 [00:44<03:01,  5.04s/it][A
Batch Iteration:  43%|████▎     | 26/61 [00:46<02:18,  3.94s/it][A
Batch Iteration:  44%|████▍     | 27/61 [00:47<01:47,  3.15s/it][A
Batch Iteration:  46%|████▌     | 28/61 [00:48<01:26,  2.61s/it][A
Batch Iteration:  48%|████▊     | 29/61 [00:50<01:11,  2.23s/it][A
Batch Iteration:  49%|████▉     | 30/61 [00:51<01:00,  1.96s/it][A
Batch Iteration:  51%|█████     | 31/61 [00:52<00:53,  1.77s/it][A
Batch Iteration:  52%|█████▏    | 32/61 [00:54<00:47,  1.65s/it][A
Batch Iteration:  54%|█████▍    | 33/61 [00:55<00:43,  1.56s/it][A
Batch Iteration:  56%|█████▌    | 34/61 [00:56<00:40,  1.50s/it][A
Batch Iteration:  57%|█████▋    | 35/61 [00:58<00:37,  1.45s/it][A
Batch Iteration:  59%|█████▉    | 36/61 [00:59<00:35,  1.42s/it][A
Batch Iteration:  61%|██████    | 37/61 [01:00<00:33,  1.40s/it][A
Batch Iteration:  62%|██████▏   | 38/61 [01:02<00:31,  1.38s/it][A
Batch Iteration:  64%|██████▍   | 39/61 [01:03<

{"eval_loss": 0.12466654491921265, "eval_accuracy": 0.9661016949152542, "learning_rate": 3.975409836065574e-05, "training_loss": 0.1165430748462677, "step": 50}



Batch Iteration:  82%|████████▏ | 50/61 [01:31<00:56,  5.16s/it][A
Batch Iteration:  84%|████████▎ | 51/61 [01:32<00:40,  4.03s/it][A
Batch Iteration:  85%|████████▌ | 52/61 [01:34<00:29,  3.24s/it][A
Batch Iteration:  87%|████████▋ | 53/61 [01:35<00:21,  2.69s/it][A
Batch Iteration:  89%|████████▊ | 54/61 [01:36<00:16,  2.30s/it][A
Batch Iteration:  90%|█████████ | 55/61 [01:38<00:12,  2.03s/it][A
Batch Iteration:  92%|█████████▏| 56/61 [01:39<00:09,  1.85s/it][A
Batch Iteration:  93%|█████████▎| 57/61 [01:41<00:06,  1.72s/it][A
Batch Iteration:  95%|█████████▌| 58/61 [01:42<00:04,  1.63s/it][A
Batch Iteration:  97%|█████████▋| 59/61 [01:44<00:03,  1.57s/it][A
Batch Iteration:  98%|█████████▊| 60/61 [01:45<00:01,  1.53s/it][A
Batch Iteration: 100%|██████████| 61/61 [01:46<00:00,  1.75s/it]

Batch Evaluating:   0%|          | 0/21 [00:00<?, ?it/s][A
Batch Evaluating:   5%|▍         | 1/21 [00:00<00:10,  1.85it/s][A
Batch Evaluating:  10%|▉         | 2/21 [00:01<00:10,  1.

{"eval_loss": 0.14479255028778598, "eval_accuracy": 0.9722650231124808, "learning_rate": 3.463114754098361e-05, "training_loss": 0.06046227022074163, "step": 75}



Batch Iteration:  23%|██▎       | 14/61 [00:32<04:09,  5.30s/it][A
Batch Iteration:  25%|██▍       | 15/61 [00:34<03:10,  4.14s/it][A
Batch Iteration:  26%|██▌       | 16/61 [00:35<02:29,  3.32s/it][A
Batch Iteration:  28%|██▊       | 17/61 [00:37<02:00,  2.74s/it][A
Batch Iteration:  30%|██▉       | 18/61 [00:38<01:40,  2.34s/it][A
Batch Iteration:  31%|███       | 19/61 [00:39<01:26,  2.05s/it][A
Batch Iteration:  33%|███▎      | 20/61 [00:41<01:16,  1.86s/it][A
Batch Iteration:  34%|███▍      | 21/61 [00:42<01:08,  1.72s/it][A
Batch Iteration:  36%|███▌      | 22/61 [00:44<01:03,  1.63s/it][A
Batch Iteration:  38%|███▊      | 23/61 [00:45<00:59,  1.56s/it][A
Batch Iteration:  39%|███▉      | 24/61 [00:46<00:55,  1.51s/it][A
Batch Iteration:  41%|████      | 25/61 [00:48<00:53,  1.48s/it][A
Batch Iteration:  43%|████▎     | 26/61 [00:49<00:50,  1.45s/it][A
Batch Iteration:  44%|████▍     | 27/61 [00:51<00:48,  1.44s/it][A
Batch Iteration:  46%|████▌     | 28/61 [00:52<

{"eval_loss": 0.12908778995985076, "eval_accuracy": 0.9661016949152542, "learning_rate": 2.9508196721311478e-05, "training_loss": 0.04320165743120015, "step": 100}



Batch Iteration:  64%|██████▍   | 39/61 [01:21<01:59,  5.41s/it][A
Batch Iteration:  66%|██████▌   | 40/61 [01:22<01:28,  4.22s/it][A
Batch Iteration:  67%|██████▋   | 41/61 [01:24<01:07,  3.38s/it][A
Batch Iteration:  69%|██████▉   | 42/61 [01:25<00:52,  2.78s/it][A
Batch Iteration:  70%|███████   | 43/61 [01:26<00:42,  2.37s/it][A
Batch Iteration:  72%|███████▏  | 44/61 [01:28<00:35,  2.08s/it][A
Batch Iteration:  74%|███████▍  | 45/61 [01:29<00:30,  1.88s/it][A
Batch Iteration:  75%|███████▌  | 46/61 [01:31<00:26,  1.75s/it][A
Batch Iteration:  77%|███████▋  | 47/61 [01:32<00:23,  1.65s/it][A
Batch Iteration:  79%|███████▊  | 48/61 [01:33<00:20,  1.58s/it][A
Batch Iteration:  80%|████████  | 49/61 [01:35<00:18,  1.53s/it][A
Batch Iteration:  82%|████████▏ | 50/61 [01:36<00:16,  1.49s/it][A
Batch Iteration:  84%|████████▎ | 51/61 [01:38<00:14,  1.47s/it][A
Batch Iteration:  85%|████████▌ | 52/61 [01:39<00:13,  1.45s/it][A
Batch Iteration:  87%|████████▋ | 53/61 [01:41<

{"eval_loss": 0.1431316461918565, "eval_accuracy": 0.9738058551617874, "learning_rate": 2.4385245901639343e-05, "training_loss": 0.030613028993830085, "step": 125}



Batch Iteration:   5%|▍         | 3/61 [00:17<05:10,  5.36s/it][A
Batch Iteration:   7%|▋         | 4/61 [00:18<03:58,  4.18s/it][A
Batch Iteration:   8%|▊         | 5/61 [00:20<03:07,  3.34s/it][A
Batch Iteration:  10%|▉         | 6/61 [00:21<02:31,  2.76s/it][A
Batch Iteration:  11%|█▏        | 7/61 [00:23<02:07,  2.36s/it][A
Batch Iteration:  13%|█▎        | 8/61 [00:24<01:49,  2.07s/it][A
Batch Iteration:  15%|█▍        | 9/61 [00:25<01:37,  1.88s/it][A
Batch Iteration:  16%|█▋        | 10/61 [00:27<01:28,  1.74s/it][A
Batch Iteration:  18%|█▊        | 11/61 [00:28<01:21,  1.64s/it][A
Batch Iteration:  20%|█▉        | 12/61 [00:30<01:17,  1.57s/it][A
Batch Iteration:  21%|██▏       | 13/61 [00:31<01:13,  1.52s/it][A
Batch Iteration:  23%|██▎       | 14/61 [00:32<01:09,  1.49s/it][A
Batch Iteration:  25%|██▍       | 15/61 [00:34<01:07,  1.46s/it][A
Batch Iteration:  26%|██▌       | 16/61 [00:35<01:05,  1.45s/it][A
Batch Iteration:  28%|██▊       | 17/61 [00:37<01:03, 

{"eval_loss": 0.1600328143741492, "eval_accuracy": 0.9738058551617874, "learning_rate": 1.9262295081967212e-05, "training_loss": 0.0065579659095965324, "step": 150}



Batch Iteration:  46%|████▌     | 28/61 [01:05<02:57,  5.39s/it][A
Batch Iteration:  48%|████▊     | 29/61 [01:07<02:14,  4.20s/it][A
Batch Iteration:  49%|████▉     | 30/61 [01:08<01:44,  3.36s/it][A
Batch Iteration:  51%|█████     | 31/61 [01:10<01:23,  2.77s/it][A
Batch Iteration:  52%|█████▏    | 32/61 [01:11<01:08,  2.36s/it][A
Batch Iteration:  54%|█████▍    | 33/61 [01:12<00:58,  2.07s/it][A
Batch Iteration:  56%|█████▌    | 34/61 [01:14<00:50,  1.88s/it][A
Batch Iteration:  57%|█████▋    | 35/61 [01:15<00:45,  1.74s/it][A
Batch Iteration:  59%|█████▉    | 36/61 [01:17<00:41,  1.64s/it][A
Batch Iteration:  61%|██████    | 37/61 [01:18<00:37,  1.57s/it][A
Batch Iteration:  62%|██████▏   | 38/61 [01:19<00:35,  1.52s/it][A
Batch Iteration:  64%|██████▍   | 39/61 [01:21<00:32,  1.49s/it][A
Batch Iteration:  66%|██████▌   | 40/61 [01:22<00:30,  1.47s/it][A
Batch Iteration:  67%|██████▋   | 41/61 [01:24<00:28,  1.45s/it][A
Batch Iteration:  69%|██████▉   | 42/61 [01:25<

{"eval_loss": 0.1599634958991027, "eval_accuracy": 0.9738058551617874, "learning_rate": 1.4139344262295081e-05, "training_loss": 0.005946670318953693, "step": 175}



Batch Iteration:  87%|████████▋ | 53/61 [01:54<00:42,  5.34s/it][A
Batch Iteration:  89%|████████▊ | 54/61 [01:55<00:29,  4.17s/it][A
Batch Iteration:  90%|█████████ | 55/61 [01:57<00:20,  3.34s/it][A
Batch Iteration:  92%|█████████▏| 56/61 [01:58<00:13,  2.76s/it][A
Batch Iteration:  93%|█████████▎| 57/61 [01:59<00:09,  2.35s/it][A
Batch Iteration:  95%|█████████▌| 58/61 [02:01<00:06,  2.06s/it][A
Batch Iteration:  97%|█████████▋| 59/61 [02:02<00:03,  1.87s/it][A
Batch Iteration:  98%|█████████▊| 60/61 [02:04<00:01,  1.73s/it][A
Batch Iteration: 100%|██████████| 61/61 [02:05<00:00,  2.05s/it]

Batch Evaluating:   0%|          | 0/21 [00:00<?, ?it/s][A
Batch Evaluating:   5%|▍         | 1/21 [00:00<00:10,  1.89it/s][A
Batch Evaluating:  10%|▉         | 2/21 [00:01<00:10,  1.88it/s][A
Batch Evaluating:  14%|█▍        | 3/21 [00:01<00:09,  1.86it/s][A
Batch Evaluating:  19%|█▉        | 4/21 [00:02<00:09,  1.86it/s][A
Batch Evaluating:  24%|██▍       | 5/21 [00:02<00:08,  1.

{"eval_loss": 0.14155679393886766, "eval_accuracy": 0.9768875192604006, "learning_rate": 9.016393442622952e-06, "training_loss": 0.005091448593884706, "step": 200}



Batch Iteration:  28%|██▊       | 17/61 [00:36<03:54,  5.33s/it][A
Batch Iteration:  30%|██▉       | 18/61 [00:38<02:59,  4.16s/it][A
Batch Iteration:  31%|███       | 19/61 [00:39<02:19,  3.33s/it][A
Batch Iteration:  33%|███▎      | 20/61 [00:41<01:52,  2.75s/it][A
Batch Iteration:  34%|███▍      | 21/61 [00:42<01:33,  2.35s/it][A
Batch Iteration:  36%|███▌      | 22/61 [00:44<01:20,  2.07s/it][A
Batch Iteration:  38%|███▊      | 23/61 [00:45<01:11,  1.87s/it][A
Batch Iteration:  39%|███▉      | 24/61 [00:46<01:04,  1.73s/it][A
Batch Iteration:  41%|████      | 25/61 [00:48<00:58,  1.63s/it][A
Batch Iteration:  43%|████▎     | 26/61 [00:49<00:54,  1.56s/it][A
Batch Iteration:  44%|████▍     | 27/61 [00:51<00:51,  1.51s/it][A
Batch Iteration:  46%|████▌     | 28/61 [00:52<00:48,  1.48s/it][A
Batch Iteration:  48%|████▊     | 29/61 [00:53<00:46,  1.46s/it][A
Batch Iteration:  49%|████▉     | 30/61 [00:55<00:44,  1.45s/it][A
Batch Iteration:  51%|█████     | 31/61 [00:56<

{"eval_loss": 0.1451291164641069, "eval_accuracy": 0.9768875192604006, "learning_rate": 3.89344262295082e-06, "training_loss": 0.0003894365415908396, "step": 225}



Batch Iteration:  69%|██████▉   | 42/61 [01:25<01:42,  5.37s/it][A
Batch Iteration:  70%|███████   | 43/61 [01:26<01:15,  4.19s/it][A
Batch Iteration:  72%|███████▏  | 44/61 [01:28<00:56,  3.35s/it][A
Batch Iteration:  74%|███████▍  | 45/61 [01:29<00:44,  2.77s/it][A
Batch Iteration:  75%|███████▌  | 46/61 [01:31<00:35,  2.36s/it][A
Batch Iteration:  77%|███████▋  | 47/61 [01:32<00:29,  2.07s/it][A
Batch Iteration:  79%|███████▊  | 48/61 [01:33<00:24,  1.88s/it][A
Batch Iteration:  80%|████████  | 49/61 [01:35<00:20,  1.74s/it][A
Batch Iteration:  82%|████████▏ | 50/61 [01:36<00:18,  1.64s/it][A
Batch Iteration:  84%|████████▎ | 51/61 [01:38<00:15,  1.57s/it][A
Batch Iteration:  85%|████████▌ | 52/61 [01:39<00:13,  1.52s/it][A
Batch Iteration:  87%|████████▋ | 53/61 [01:40<00:11,  1.49s/it][A
Batch Iteration:  89%|████████▊ | 54/61 [01:42<00:10,  1.46s/it][A
Batch Iteration:  90%|█████████ | 55/61 [01:43<00:08,  1.45s/it][A
Batch Iteration:  92%|█████████▏| 56/61 [01:45<

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

# Evaluation on Test set

## tokenizer and prepare test dataset

use the saved tokenizer from the training step

In [36]:
wandb.init(name="Test_Impression_Texts", tags=['Impression', 'frontal'], project="Text_Only", notes="256 size and 32 batch", config=args.__dict__, sync_tensorboard=True)
# wandb.tensorboard.patch(root_logdir="...")
run_name = wandb.run.name
wandb_config = wandb.config

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

In [37]:
test_dataset = make_tensor_dataset(test_sentences, test_labels, wandb_config, saved_model=True)

Original:  no evidence of active disease.
Token IDs: tensor([ 101, 2053, 3350, 1997, 3161, 4295, 1012,  102,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
           0,    0,    0,    0,    0,    0,    0,    0,    0,    0,

In [38]:
data_loaders['test'] = make_dataloader(test_dataset, wandb_config, eval=True)
data_loaders['test_size'] = len(test_dataset)

In [39]:
# Evaluation
results = {}
if wandb_config.do_eval:
    checkpoints = [wandb_config.output_dir]
    if wandb_config.eval_all_checkpoints:
        checkpoints = list(os.path.dirname(c) 
        for c in sorted(glob.glob(wandb_config.output_dir + "/**/" + 
                                  WEIGHTS_NAME, recursive=False)))
        # recursive=False because otherwise the parent diretory gets included
        # which is not what we want; only subdirectories

    logger.info("Evaluate the following checkpoints: %s", checkpoints)

    for checkpoint in checkpoints:
        global_step = checkpoint.split("-")[-1] if len(checkpoints) > 1 else ""
        prefix = checkpoint.split("/")[-1] if checkpoint.find("checkpoint") != -1 else ""
        transformer_model = AutoModelForSequenceClassification.from_pretrained(wandb_config.model_name, config=transformer_config)
        checkpoint = os.path.join(checkpoint, 'pytorch_model.bin')
        transformer_model.load_state_dict(torch.load(checkpoint))
        transformer_model.to(wandb_config.device)
        result = textBert_utils.evaluate(data_loaders, wandb_config, transformer_model, prefix=prefix, test=True) # test=True uses the test_dataset not val_dataset
        result = dict((k + "_{}".format(global_step), v) for k, v in result.items())
        results.update(result)
    logger.info("***** Evaluation on Test Data Finished *****")
wandb.finish()

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

## Saving Test Eval Results

The code automatically saved evaluation result from each checkpoint in its respective folder. This next cell simply saves all of them in one place.

In [40]:
with open(os.path.join(args.output_dir, f"{os.path.splitext(args.test_file)[0]}_eval_results.txt"), mode='w', encoding='utf-8') as out_f:
    print(results, file=out_f)