# Future Talk


## Monitoring and Debugging Deep Neural Networks



by Adrian Wälchli


In [1]:
import torch
torch.__version__

'1.1.0'

## Topics

- Introduction to Weights and Biases
    - Error Curves
    - Images
    - Histograms
    - Hyperparameter Tracking



- Best Practices
    - Managing Runs
    - Datasets
    - Checkpointing and Resuming
    - Multi-GPU
    - Miscellaneous Tipps and Tricks

- Debugging Neural Networks
    - Input and Output
    - Batch Normalization
    - Dropout
    - Classification Caveats
    - Gradients
    - Adversarial Examples

### Let's start easy ...

with the good old [MNIST](./01_MNIST_Basic.ipynb) example!

### What It Should Look Like

#### Logging metrics in different runs
<img src="./figures/logging2.png" width="80%"/>

#### Plotting error curves
<img src="./figures/logging1.png" width="50%"/>

#### Visualizing and exploring data
<img src="./figures/logging3.png" width="50%"/>

### Weights & Biases
##### Double You and Bee
1. Sign up with GitHub or Google Account at https://www.wandb.com
2. Copy the API key
3. Log in
```
conda activate pytorch-demo
wandb login API_KEY
```
4. Create a project


See also: https://docs.wandb.com/

### Usage

Call the ```init``` function once to setup your run.

In [17]:
import wandb

wandb.init(
    name='Introduction to W & B', 
    config=dict(),  
    project='pytorch-demo', 
    tags=['baseline'],
    dir='./runs',
    entity='awaelchli',
    group='slides',
    resume=False,
)

W&B Run: https://app.wandb.ai/awaelchli/pytorch-demo/runs/dj5bl507

### Config
Keeping track of training parameters is easy!

In [18]:
wandb.config.batch_size = 16
wandb.config.epochs = 5

In [20]:
wandb.config.update({
    'learning_rate': 1e-3,
    'batch_norm': True,
})

print(wandb.config)

wandb_version: 1

_wandb:
  desc: null
  value:
    cli_version: 0.8.3
batch_norm:
  desc: null
  value: true
batch_size:
  desc: null
  value: 16
epochs:
  desc: null
  value: 5
learning_rate:
  desc: null
  value: 0.001



Even better: Pass your existing ```argparse``` flags to wandb!

In [5]:
import argparse
import sys
sys.argv = ['demo']
parser = argparse.ArgumentParser()
parser.add_argument('--image_height', type=int, default=128)
args = parser.parse_args()

wandb.config.update(args)  # adds all of the arguments as config variables
#print(wandb.config)

### Plotting the Error Curve

In [21]:
#%%wandb

for it in range(50):
    # ... some deep learning stuff here ...
    
    wandb.log({'loss': torch.rand(1), 'accuracy': torch.rand(1)}, step=it)


### Plotting Images

In [31]:
#%%wandb

for it in range(5):
    # ... some deep learning stuff here ...
    
    wandb.log({"examples": [wandb.Image(torch.rand(2, 3, 128, 128), caption="A caption")]})
   

### Plotting Histograms

In [29]:
#%%wandb

for it in range(5):
    # ... some deep learning stuff here ...
    
    wandb.log({"gradients": wandb.Histogram(torch.Tensor(100, 100).normal_(0, it))})


### Other 

Similarly, we can plot other data.

- Audio
- Text and Tables
- HTML
- 3D Objects (point clouds)


### Back to the Future!

Now, let's improve our [MNIST](./02_MNIST_Logging.ipynb) example.

## More Improvements

### ArgumentParster vs. Settings File

- ArgumentParser defaults are often misused as inputs
- W&B can track hyperparameters, but what if we want to share the source code?
- Often many hyperparameters (fills your terminal)
- Same source code, multiple experiments

### Typical Settings File

``` json
{
    "train_data":               "../../data/HFR/train.txt",
    "test_data":                "../../data/HFR/test.txt",
    "val_data":                 "../../data/HFR/val.txt",

    "batch_size":               5,
    "epochs":                   1000,

    "lr":                       0.00001,
    "lr_decay_factor":          0.98,

    "weight_reconstruction":    1.0,
    "weight_perceptual":        0.1,
    "weight_binarization":      0.8,
    "weight_hinge":             0.8,

    "crop_size":                [224, 224],

    "device":                   "cuda",
    "gpus":                     [2, 3],
    "num_workers":              4,

    "log_interval":             100,
    "log_dir":                  "../../runs/flow/layered/01-initial",
    "save":                     "checkpoint.pt",
    "resume":                   false,

    "comment":                  "Layered representation for optical flow. The mask is now used to inpaint only the background. Added an extra conv layer to the upsampling block in M. Jin's network. The Inpainter now also takes the mask as input, and the non-inpainted pixels are replaced with the original values. The frame skip is now 10 instead of 5."
}
```

### Loading the Settings File

In [34]:
import json
import os
from argparse import Namespace


def load_config(file):
    with open(file, 'r') as f:
        config = json.load(f)
        config = Namespace(**config)
    return config

def backup_config(file, log_dir):
    # backup the config file to the log-folder
    shutil.copy(file, os.path.join(log_dir, os.path.basename(file)))

### Alternatives: 

- YAML
- Text file
- Bash file 