## Using PyTorch Lightning

After you've  written a dozen  pytorch models you'll discover that there's a lot of common structure and a huge amount of boilerplate. It's good to understand what's going on undere the hood, but when moving to production use cases you'll want to opt for more reliable, reproducible code. PyTorch Lightning & Ignite are great libraries that abstract away these core bits.

To install the libraries:

    pip uninstall tensorboard
    conda install tensorboard -y
    conda install pytorch-lightning -y -c conda-forge
    pip install wandb

In [1]:
from sklearn.model_selection import train_test_split
import numpy as np

n = 50000
# X is just a 9D normally distributed dataset
X = np.random.normal(size=(n, 9)).astype(np.float32)
# The prediction is a linear transformation on X
# from 9D to 4D plus additive noise
Y = np.random.normal(size=(n, 4)) * 1e-2 + np.dot(X, np.random.normal(size=(9, 4)))
Y = Y.astype(np.float32)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y)
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape

((37500, 9), (12500, 9), (37500, 4), (12500, 4))

Let's write our initial model as Lightning  module:

Don't be afraid of how much extra code this injects. Although it  initially looks like a ton of little class functions, it's all about being organized, deliberate, standardized and repeatable. It's not about science, it's about having good lab hygiene. 

- We'll move some of the iteration code into `training_step`  and `test_step`, and `test_epoch_end`.
- Add in a `configure_optimizers` function.
- Separate out train & test loaders

In [2]:
import torch
import numpy as np
from random import shuffle
from torch import from_numpy
import pytorch_lightning as pl
from torch.utils.data import DataLoader
from torch.utils.data import TensorDataset
from torch.utils.data import BatchSampler
from torch.utils.data import RandomSampler


class AbstractModel(pl.LightningModule):
    def save_data(self, train_x, train_y, test_x, test_y, train_d=None, test_d=None):
        if train_d is None:
            self.train_arrs = [from_numpy(x) for x in [train_x, train_y]]
            self.test_arrs = [from_numpy(x) for x in [test_x, test_y]]
        else:
            self.train_arrs = [from_numpy(x) for x in [train_x, train_y, train_d]]
            self.test_arrs = [from_numpy(x) for x in [test_x, test_y, test_d]]

    def step(self, batch, batch_nb, prefix='train', add_reg=True):
        inpt, target = batch
        prediction = self.forward(inpt)
        loss = self.likelihood(prediction, target)
        if add_reg:
            loss = loss + self.reg()
        tensorboard_logs = {f'{prefix}_loss': loss}
        return {f'{prefix}_loss': loss, 'loss':loss, 'log': tensorboard_logs}

    def training_step(self, batch, batch_nb):
        return self.step(batch, batch_nb, 'train')
    
    def test_step(self, batch, batch_nb):
        # Note that we do *not* include the regularization / reg loss
        # at test time
        return self.step(batch, batch_nb, 'test', add_reg=False)    
    
    def validation_step(self, batch, batch_nb):
        return self.step(batch, batch_nb, 'val', add_reg=False)    
    
    def test_epoch_end(self, outputs):
        test_loss_mean = torch.stack([x['test_loss'] for x in outputs]).mean()
        log = {'val_loss': test_loss_mean}
        return {'avg_test_loss': test_loss_mean, 'log': log}

    def validation_epoch_end(self, outputs):
        test_loss_mean = torch.stack([x['val_loss'] for x in outputs]).mean()
        log = {'val_loss': test_loss_mean}
        return {'avg_val_loss': test_loss_mean, 'log': log}

    def dataloader(self, is_train=True):
        if is_train:
            dataset = TensorDataset(*self.train_arrs)
        else:
            dataset = TensorDataset(*self.test_arrs)
        bs = BatchSampler(RandomSampler(dataset), 
                          batch_size=self.batch_size, drop_last=False)
        return DataLoader(dataset, batch_sampler=bs, num_workers=8)
    
    def train_dataloader(self):
        return self.dataloader(is_train=True)

    def test_dataloader(self):
        return self.dataloader(is_train=False)

    def val_dataloader(self):
        return self.dataloader(is_train=False)
    
    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=1e-3)

And then we'll keep out `Bottleneck` model, but now it will inherit from our `AbstractModel`. Over the next few notebooks we'll keep using the `AbstractModel` class and just stick to focusing  our changes within the subclasses.

In [10]:
from torch import nn
from pytorch_lightning.logging import WandbLogger


class Bottleneck(AbstractModel):
    def __init__(self, n_in_cols, n_out_cols, n_hidden=3, batch_size=32,
                 lam1=1e-3, lam2=1e-3):
        super().__init__()
        self.lin1 = nn.Linear(n_in_cols, n_hidden)
        self.lin2 = nn.Linear(n_hidden, n_out_cols)
        self.batch_size = batch_size
        # Regularization coefficients
        self.lam1 = lam1
        self.lam2 = lam2
        self.save_hyperparameters()
    
    def forward(self, x):
        # x is a minibatch of rows of our features
        hidden = self.lin1(x)
        # y is a minibatch of our predictions
        y = self.lin2(hidden)
        return y

    def likelihood(self, prediction, target):
        # This is just the mean squared error
        return ((prediction - target)**2.0).sum()
    
    def reg(self):
        # This computes our Frobenius norm over both matrices
        # Note that we can access the Linear model's variables
        # directly if we'd like. No tricks here!
        loss_reg_m1 = (self.lin1.weight**2.0 * self.lam1).sum()
        loss_reg_m2 = (self.lin2.weight**2.0 * self.lam2).sum()
        return loss_reg_m1 + loss_reg_m2


model = Bottleneck(9, 4, 3)
model.save_data(X_train, Y_train, X_test, Y_test)

# add a logger
logger = WandbLogger(name="00_intro", log_model=True, project="simple_mf")
# logger = TensorBoardLogger("tb_logs", name="bottleneck_model")

# We could have turned on multiple GPUs here, for example
# trainer = pl.Trainer(gpus=8, precision=16)    
trainer = pl.Trainer(max_epochs=3, progress_bar_refresh_rate=10,
                     reload_dataloaders_every_epoch=True,
                     logger=logger)    

GPU available: True, used: False
TPU available: False, using: 0 TPU cores


Before we train the model, the parameters and weights will all be initialized randomly. So when we evaluate the test loss, it'll be pretty bad.

In [11]:
trainer.test(model)

Failed to query for notebook name, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Testing', layout=Layout(flex='2'), max=…

--------------------------------------------------------------------------------
TEST RESULTS
{'avg_test_loss': tensor(1040.7112), 'val_loss': tensor(1040.7112)}
--------------------------------------------------------------------------------



{'avg_test_loss': 1040.711181640625, 'val_loss': 1040.711181640625}

Now let's fit our model and then check the test loss again. 

In [12]:
trainer.fit(model) 

Failed to query for notebook name, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable

  | Name | Type   | Params
--------------------------------
0 | lin1 | Linear | 30    
1 | lin2 | Linear | 16    


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…



HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…




1

In [13]:
trainer.test(model)

Failed to query for notebook name, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Testing', layout=Layout(flex='2'), max=…

--------------------------------------------------------------------------------
TEST RESULTS
{'avg_test_loss': tensor(61.2347), 'val_loss': tensor(61.2347)}
--------------------------------------------------------------------------------



{'avg_test_loss': 61.2347297668457, 'val_loss': 61.2347297668457}

Voila! The test loss (~100) is much lower than it was before  ~4000.

### Visualize the model 

Checkout the link on wandb to see train progress. For me, that link looks like: 

Run page: https://app.wandb.ai/chrisemoody/simple_mf-notebooks/runs/2o5ofsn4

### Tune hyperparameters with Optuna and Weights & Biases

You may have to instal optuna:
    
    pip install optuna

In [16]:
import optuna


def objective(trial):
    # Sample parameters -- without declaring them in advance!
    n_hid = trial.suggest_int('n_hid', 1, 10)
    lam1 = trial.suggest_loguniform('lam1', 1e-8, 1e-3)
    lam2 = trial.suggest_loguniform('lam2', 1e-8, 1e-3)
    
    model = Bottleneck(9, 4, n_hid, lam1=lam1, lam2=lam2)
    model.save_data(X_train, Y_train, X_test, Y_test)
    
    logger = WandbLogger(name="00_intro_optimize", log_model=True, project="simple_mf")
    logger.log_hyperparams(model.hparams)

    # Note that we added early stoping  
    trainer = pl.Trainer(max_epochs=3,
                         reload_dataloaders_every_epoch=True,
                         early_stop_callback=True,
                         logger=logger)    
    trainer.fit(model)
    results = trainer.test(model)
    return results['avg_test_loss']

In [None]:
study = optuna.create_study()
study.optimize(objective, n_trials=10)

Failed to query for notebook name, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable
GPU available: True, used: False
TPU available: False, using: 0 TPU cores

  | Name | Type   | Params
--------------------------------
0 | lin1 | Linear | 80    
1 | lin2 | Linear | 36    


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…



HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…




HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Testing', layout=Layout(flex='2'), max=…

--------------------------------------------------------------------------------
TEST RESULTS
{'avg_test_loss': tensor(0.0130), 'val_loss': tensor(0.0130)}
--------------------------------------------------------------------------------



[I 2020-08-04 02:10:00,494] Trial 0 finished with value: 0.012997656129300594 and parameters: {'n_hid': 8, 'lam1': 4.491775690966461e-08, 'lam2': 2.2438209987003427e-05}. Best is trial 0 with value: 0.012997656129300594.


Failed to query for notebook name, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable
GPU available: True, used: False
TPU available: False, using: 0 TPU cores

  | Name | Type   | Params
--------------------------------
0 | lin1 | Linear | 30    
1 | lin2 | Linear | 16    


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…



HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…




HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Testing', layout=Layout(flex='2'), max=…

--------------------------------------------------------------------------------
TEST RESULTS
{'avg_test_loss': tensor(61.3223), 'val_loss': tensor(61.3223)}
--------------------------------------------------------------------------------



[I 2020-08-04 02:10:21,830] Trial 1 finished with value: 61.32229995727539 and parameters: {'n_hid': 3, 'lam1': 4.86228824944716e-07, 'lam2': 0.0006346956137931223}. Best is trial 0 with value: 0.012997656129300594.


Failed to query for notebook name, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable
GPU available: True, used: False
TPU available: False, using: 0 TPU cores

  | Name | Type   | Params
--------------------------------
0 | lin1 | Linear | 20    
1 | lin2 | Linear | 12    


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…



HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…




HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Testing', layout=Layout(flex='2'), max=…

--------------------------------------------------------------------------------
TEST RESULTS
{'avg_test_loss': tensor(158.4703), 'val_loss': tensor(158.4703)}
--------------------------------------------------------------------------------



[I 2020-08-04 02:10:42,820] Trial 2 finished with value: 158.4702606201172 and parameters: {'n_hid': 2, 'lam1': 5.9223224321546895e-06, 'lam2': 3.289407335032943e-08}. Best is trial 0 with value: 0.012997656129300594.


Failed to query for notebook name, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable
GPU available: True, used: False
TPU available: False, using: 0 TPU cores

  | Name | Type   | Params
--------------------------------
0 | lin1 | Linear | 60    
1 | lin2 | Linear | 28    


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…



HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…




HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Testing', layout=Layout(flex='2'), max=…

--------------------------------------------------------------------------------
TEST RESULTS
{'avg_test_loss': tensor(0.0130), 'val_loss': tensor(0.0130)}
--------------------------------------------------------------------------------



[I 2020-08-04 02:11:10,517] Trial 3 finished with value: 0.012997598387300968 and parameters: {'n_hid': 6, 'lam1': 3.049950131029259e-07, 'lam2': 0.0001885320249328323}. Best is trial 3 with value: 0.012997598387300968.


Failed to query for notebook name, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable
GPU available: True, used: False
TPU available: False, using: 0 TPU cores

  | Name | Type   | Params
--------------------------------
0 | lin1 | Linear | 20    
1 | lin2 | Linear | 12    


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…



HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…




HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Testing', layout=Layout(flex='2'), max=…

--------------------------------------------------------------------------------
TEST RESULTS
{'avg_test_loss': tensor(158.3181), 'val_loss': tensor(158.3181)}
--------------------------------------------------------------------------------



[I 2020-08-04 02:11:33,621] Trial 4 finished with value: 158.31808471679688 and parameters: {'n_hid': 2, 'lam1': 2.6763899605309204e-08, 'lam2': 4.674158379205071e-06}. Best is trial 3 with value: 0.012997598387300968.


Failed to query for notebook name, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable
GPU available: True, used: False
TPU available: False, using: 0 TPU cores

  | Name | Type   | Params
--------------------------------
0 | lin1 | Linear | 50    
1 | lin2 | Linear | 24    


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…



HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…




HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Testing', layout=Layout(flex='2'), max=…

--------------------------------------------------------------------------------
TEST RESULTS
{'avg_test_loss': tensor(0.0129), 'val_loss': tensor(0.0129)}
--------------------------------------------------------------------------------



[I 2020-08-04 02:11:55,112] Trial 5 finished with value: 0.012939749285578728 and parameters: {'n_hid': 5, 'lam1': 6.961025209360357e-05, 'lam2': 2.104810376363773e-08}. Best is trial 5 with value: 0.012939749285578728.


Failed to query for notebook name, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable
GPU available: True, used: False
TPU available: False, using: 0 TPU cores

  | Name | Type   | Params
--------------------------------
0 | lin1 | Linear | 30    
1 | lin2 | Linear | 16    


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…



HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…