As the third step of this tutorial, we will train a text model. This step can be run in parallel with Step 2 (training the image model).

This notebook was run on an AWS p3.2xlarge

# Octopod Text Model Training Pipeline

In [1]:
%load_ext autoreload

%autoreload 2

In [2]:
import sys
sys.path.append('../../')

In [3]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.utils.data import Dataset, DataLoader
from transformers import AdamW, BertTokenizer, get_cosine_schedule_with_warmup

Note: for text, we use the MultiTaskLearner since we will only have one input, the text.

In [4]:
from octopod import MultiTaskLearner, MultiDatasetLoader
from octopod.text.dataset import OctopodTextDataset
from octopod.text.models.multi_task_bert import BertForMultiTaskClassification

For our Bert model, we need a tokenizer. We'll use the one from huggingface's `transformers` library.

In [5]:
bert_tok = BertTokenizer.from_pretrained(
    'bert-base-uncased',
    do_lower_case=True
)

## Load in train and validation datasets

First we load in the csv's we created in Step 1.
Remember to change the path if you stored your data somewhere other than the default.

In [6]:
TRAIN_COLOR_DF = pd.read_csv('data/color_swatches/color_train.csv')

In [7]:
VALID_COLOR_DF = pd.read_csv('data/color_swatches/color_valid.csv')

In [8]:
TRAIN_PATTERN_DF = pd.read_csv('data/pattern_swatches/pattern_train.csv')

In [9]:
VALID_PATTERN_DF = pd.read_csv('data/pattern_swatches/pattern_valid.csv')

You will most likely have to alter this to however big your batches can be on your machine

In [10]:
batch_size = 16

We use the `OctopodTextDataSet` class to create train and valid datasets for each task.

Check out the documentation for infomation about the `tokenizer` and `max_seq_length` arguments.

In [11]:
max_seq_length = 128

In [12]:
color_train_dataset = OctopodTextDataset(
    x=TRAIN_COLOR_DF['complex_color'],
    y=TRAIN_COLOR_DF['simple_color_cat'],
    tokenizer=bert_tok,
    max_seq_length=max_seq_length
)
color_valid_dataset = OctopodTextDataset(
    x=VALID_COLOR_DF['complex_color'],
    y=VALID_COLOR_DF['simple_color_cat'],
    tokenizer=bert_tok,
    max_seq_length=max_seq_length
)

pattern_train_dataset = OctopodTextDataset(
    x=TRAIN_PATTERN_DF['fake_text'],
    y=TRAIN_PATTERN_DF['pattern_type_cat'],
    tokenizer=bert_tok,
    max_seq_length=max_seq_length
)
pattern_valid_dataset = OctopodTextDataset(
    x=VALID_PATTERN_DF['fake_text'],
    y=VALID_PATTERN_DF['pattern_type_cat'],
    tokenizer=bert_tok,
    max_seq_length=max_seq_length
)

We then put the datasets into a dictionary of dataloaders.

Each task is a key.

In [13]:
train_dataloaders_dict = {
    'color': DataLoader(color_train_dataset, batch_size=batch_size, shuffle=True, num_workers=2),
    'pattern': DataLoader(pattern_train_dataset, batch_size=batch_size, shuffle=True, num_workers=2),
}
valid_dataloaders_dict = {
    'color': DataLoader(color_valid_dataset, batch_size=batch_size, shuffle=False, num_workers=2),
    'pattern': DataLoader(pattern_valid_dataset, batch_size=batch_size, shuffle=False, num_workers=2),
}

The dictionary of dataloaders is then put into an instance of the Octopod `MultiDatasetLoader` class.

In [14]:
TrainLoader = MultiDatasetLoader(loader_dict=train_dataloaders_dict)
len(TrainLoader)

26

In [15]:
ValidLoader = MultiDatasetLoader(loader_dict=valid_dataloaders_dict, shuffle=False)
len(ValidLoader)

9

We need to create a dictionary of the tasks and the number of unique values so that we can create our model.

In [16]:
new_task_dict = {
    'color': TRAIN_COLOR_DF['simple_color_cat'].nunique(),
    'pattern': TRAIN_PATTERN_DF['pattern_type_cat'].nunique(),
}

In [17]:
new_task_dict

{'color': 2, 'pattern': 2}

In [18]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)

cuda:0


Create Model and Learner
===

These are completely new tasks so we use `new_task_dict`. If we had already trained a model on some tasks, we would use `pretrained_task_dict`.

We are using the trained bert weights from the `transformers` library.

In [19]:
model = BertForMultiTaskClassification.from_pretrained(
    'bert-base-uncased',
    new_task_dict=new_task_dict
)

You will likely need to explore different values in this section to find some that work
for your particular model.

In [20]:
loss_function = nn.CrossEntropyLoss()

lr = 1e-5
num_total_steps = len(TrainLoader)
num_warmup_steps = int(len(TrainLoader) * 0.1)

optimizer = AdamW(model.parameters(), lr=lr, correct_bias=True)

scheduler = get_cosine_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=num_warmup_steps,
    num_training_steps=num_total_steps
)

In [21]:
learn = MultiTaskLearner(model, TrainLoader, ValidLoader, new_task_dict)

Train Model
===

As your model trains, you can see some output of how the model is performing overall and how it is doing on each individual task.

In [22]:
learn.fit(
    num_epochs=10,
    loss_function=loss_function,
    scheduler=scheduler,
    step_scheduler_on_batch=False,
    optimizer=optimizer,
    device=device,
    best_model=True
)

train_loss,val_loss,color_train_loss,color_val_loss,color_acc,pattern_train_loss,pattern_val_loss,pattern_acc,time
0.673507,0.678065,0.669093,0.673634,0.633028,0.692575,0.697387,0.48,00:04
0.635844,0.681517,0.624756,0.6772,0.633028,0.683744,0.700339,0.48,00:04
0.648323,0.677732,0.630768,0.664646,0.633028,0.724161,0.734785,0.48,00:04
0.643457,0.668759,0.623312,0.663126,0.633028,0.730487,0.693319,0.52,00:04
0.647667,0.661604,0.63315,0.654076,0.633028,0.710381,0.694424,0.48,00:04
0.6326,0.654549,0.61798,0.645752,0.633028,0.695761,0.692903,0.52,00:04
0.606633,0.498171,0.582558,0.453125,0.880734,0.710634,0.694575,0.48,00:04
0.477038,0.467392,0.422161,0.417661,0.816514,0.714107,0.684223,0.48,00:04
0.385024,0.422237,0.321107,0.353005,0.844037,0.661145,0.724087,0.52,00:04
0.339846,0.386559,0.244334,0.32232,0.862385,0.752454,0.666643,0.64,00:04


Epoch 9 best model saved with loss of 0.38655938819718005


Validate Model
===

We provide a method on the learner called `get_val_preds`, which makes predictions on the validation data. You can then use this to analyze your model's performance in more detail.

In [23]:
pred_dict = learn.get_val_preds(device)

In [24]:
pred_dict

{'color': {'y_true': array([1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 0., 0., 1., 1., 0., 0., 1.,
         0., 0., 1., 1., 1., 1., 1., 1., 0., 0., 1., 0., 0., 0., 1., 1., 0.,
         1., 1., 0., 0., 1., 1., 1., 0., 1., 1., 0., 0., 0., 1., 1., 0., 1.,
         1., 0., 1., 1., 1., 1., 1., 1., 0., 1., 0., 1., 0., 1., 0., 1., 1.,
         1., 0., 1., 0., 1., 1., 1., 1., 0., 0., 0., 1., 1., 0., 1., 0., 0.,
         0., 1., 1., 0., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 1., 1., 0.,
         1., 1., 0., 1., 1., 1., 1.]),
  'y_pred': array([[0.03190912, 0.96809083],
         [0.03047537, 0.96952462],
         [0.0298619 , 0.97013813],
         [0.84294468, 0.15705533],
         [0.03198458, 0.96801543],
         [0.02479118, 0.97520882],
         [0.02475239, 0.97524762],
         [0.04119167, 0.9588083 ],
         [0.18129022, 0.81870973],
         [0.02601881, 0.9739812 ],
         [0.84400207, 0.15599801],
         [0.83692592, 0.16307409],
         [0.03339685, 0.9666031 ],
         [0.031

Save/Export Model
===

Once we are happy with our training we can save (or export) our model, using the `save` method (or `export`).

See the docs for the difference between `save` and `export`.

We will need the saved model later to use in the ensemble model

In [25]:
model.save(folder='models/', model_id='TEXT_MODEL1')

In [26]:
model.export(folder='models/', model_id='TEXT_MODEL1')

Now that we have an image model and a text model, we can move to `Step4_train_ensemble_model`.