# ViT 2: Electric Transformaroo
We learned a few lessons from the previous experiment. First, and primarily, a larger model doesn't always mean better. We used the large imageNet transformer model, but that didn't actually lead to good predictions. Also, I think I screwed up a bunch of stuff, and the large model is taking way too long to train for me to iterate at all. We are going to switch to a smaller model. We are going to follow the ViTMAE method, use masked auto encodings for pretraining in a self-supervised manner on the entire training set, then fine tune on the prediction task. Guide found here: https://github.com/huggingface/transformers/blob/main/examples/pytorch/image-pretraining/run_mae.py

In [2]:
# imports
import numpy as np
import pandas as pd

%load_ext autoreload
%autoreload 2

import torch
print(torch.cuda.is_available())

from transformers import ViTFeatureExtractor, ViTForImageClassification, ViTMAEForPreTraining, ViTMAEConfig
from transformers import TrainingArguments, Trainer
from torchvision.transforms import RandomHorizontalFlip, RandomResizedCrop
from torchvision.transforms.functional import InterpolationMode

from sklearn.model_selection import train_test_split

True


In [3]:
from dataloader import *
from utils import *
from trainer import *

In [13]:
# CONSTANTS
VIT_MODEL_NAME = 'facebook/vit-mae-base'
TRAIN_SPLIT = 0.8
BATCH_SIZE = 48
LEARNING_RATE = 1.5e-4
LR_SCHEDULER_TYPE = "cosine"
WEIGHT_DECAY = 0.05
WARMUP_RATIO = 0.05
LOGGING_STRATEGY = "steps"
LOGGING_STEPS = 10
FP16 = True
EPOCHS = 3
EVALUATION_STRATEGY = "steps"
EVAL_STEPS = 200
OUTPUT_DIR = './vit-mae-chexpert'
REMOVE_UNUSED_COLUMNS = False
GRAD_ACCUM_STEPS = 10
MASK_RATIO = 0.75
NORM_PIX_LOSS = True
DATALOADER_NUM_WORKERS = 4

In [5]:
feature_extractor = ViTFeatureExtractor.from_pretrained(VIT_MODEL_NAME, image_mean=[0.485, 0.456, 0.406], image_std=[0.229, 0.224, 0.225])

In [6]:
# set up our transforms
transforms = [
    RandomResizedCrop(feature_extractor.size, scale=(0.2, 1.0), interpolation=InterpolationMode.BICUBIC),
    RandomHorizontalFlip(),
]

In [7]:
# set up the dataset
np.random.seed(42)
train_df = pd.read_csv("ChexPert/train.csv")
train_df, eval_df = train_test_split(train_df, train_size=TRAIN_SPLIT)

train_dataset = ChexpertViTDataset("ChexPert/data", train_df, feature_extractor, include_labels=False, transforms=transforms, classes=COMPETITION_TASKS,
    uncertainty_method="smooth", smoothing_lower_bound=0.55, smoothing_upper_bound=0.85)
eval_dataset = ChexpertViTDataset("ChexPert/data", eval_df, feature_extractor, include_labels=False, transforms=transforms, classes=COMPETITION_TASKS,
    uncertainty_method="smooth", smoothing_lower_bound=0.55, smoothing_upper_bound=0.85)

In [8]:
train_dataset.labels

['No Finding',
 'Atelectasis',
 'Cardiomegaly',
 'Consolidation',
 'Edema',
 'Pleural Effusion']

In [9]:
config = ViTMAEConfig.from_pretrained(VIT_MODEL_NAME)
config.update({
    "mask_ratio": MASK_RATIO,
    "norm_pix_ratio": NORM_PIX_LOSS
})

In [10]:
model = ViTMAEForPreTraining.from_pretrained(
    VIT_MODEL_NAME,
    config=config
).to("cuda")

In [14]:
# set up training arguments
training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=BATCH_SIZE,
    evaluation_strategy=EVALUATION_STRATEGY,
    num_train_epochs=EPOCHS,
    fp16=FP16,
    eval_steps = EVAL_STEPS,
    learning_rate=LEARNING_RATE,
    remove_unused_columns=REMOVE_UNUSED_COLUMNS,
    report_to="tensorboard",
    gradient_accumulation_steps=GRAD_ACCUM_STEPS,
    lr_scheduler_type=LR_SCHEDULER_TYPE,
    weight_decay=WEIGHT_DECAY,
    warmup_ratio=WARMUP_RATIO,
    logging_strategy=LOGGING_STRATEGY,
    logging_steps=LOGGING_STEPS,
    dataloader_num_workers=DATALOADER_NUM_WORKERS
)

PyTorch: setting up devices


In [15]:
trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=collate_fn,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

Using amp half precision backend


In [16]:
trainer.train()

***** Running training *****
  Num examples = 152878
  Num Epochs = 3
  Instantaneous batch size per device = 48
  Total train batch size (w. parallel, distributed & accumulation) = 480
  Gradient Accumulation steps = 10
  Total optimization steps = 954


Step,Training Loss,Validation Loss
200,0.08,0.078927
400,0.0754,0.076073
600,0.0745,0.074165
800,0.072,0.073025


***** Running Evaluation *****
  Num examples = 38149
  Batch size = 48
***** Running Evaluation *****
  Num examples = 38149
  Batch size = 48
Saving model checkpoint to ./vit-mae-chexpert/checkpoint-500
Configuration saved in ./vit-mae-chexpert/checkpoint-500/config.json
Model weights saved in ./vit-mae-chexpert/checkpoint-500/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 38149
  Batch size = 48
***** Running Evaluation *****
  Num examples = 38149
  Batch size = 48


Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=954, training_loss=0.07708730593917255, metrics={'train_runtime': 4488.6791, 'train_samples_per_second': 102.176, 'train_steps_per_second': 0.213, 'total_flos': 4.633080800269566e+19, 'train_loss': 0.07708730593917255, 'epoch': 3.0})

In [18]:
trainer.save_model()

Saving model checkpoint to ./vit-mae-chexpert
Configuration saved in ./vit-mae-chexpert/config.json
Model weights saved in ./vit-mae-chexpert/pytorch_model.bin


I confirmed in another notebook that it worked fairly well! Now let's load that model and fine tune it (in another notebook).