# Debugging

Author: **Christian Lessig et al.**

`christian.lessig@ecmwf.int`

## Introduction

Most often, we spend more time debugging code than writing it. This is particularly true for python.

Debugging usually consists of three steps:
1. Localize the problem.
2. Understand what precisely goes wrong.
3. Fix the problem.
The third step is usually the easy one once the first two have been accomplished.

To understand 

In [8]:
import os
from importlib import reload
import code

import torch

In [23]:
import model
reload( model)
from model import MLP

net = MLP( dim_in=512, dim_out=512)

# check if GPU is available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
net = net.to(device)


In [24]:
# test if we can evaluate the network

t_in = torch.rand( (16, 512)).to(device)
t_out = net( t_in)

In [28]:
# test if data loading works

import dataset
reload( dataset)
from dataset import CustomDataSet

custom_dataset = CustomDataSet( len=32768, batch_size=128, dim_data=512)
data_iter = iter(custom_dataset)

lossfct = torch.nn.MSELoss()

# load sample
(source, target) = next(data_iter)
# evaluate network
pred = net( source)
# compute loss
loss = lossfct( pred, target)

print( f'loss : {loss}')

loss : 0.17747855186462402


In [32]:
# training loop

optimizer = torch.optim.AdamW( net.parameters(), lr=0.00005)

# parallel data loader
loader_params = { 'batch_size': None, 'batch_sampler': None, 'shuffle': False, 
                   'num_workers': 8, 'pin_memory': True}
dataloader = torch.utils.data.DataLoader( custom_dataset, **loader_params, sampler = None)

num_epochs = 8

optimizer.zero_grad()
for epoch in range( num_epochs) :

  # data_iter = iter( dataset)
  data_iter = iter( dataloader)
  
  for bidx, (source, target) in enumerate(data_iter) :

    pred = net( source)
    loss = lossfct( pred, target)

    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

  print( f'Finished epoch={epoch} with loss={loss}.')

Finished epoch=0 with loss=0.14155101776123047.
Finished epoch=1 with loss=0.13087543845176697.
Finished epoch=2 with loss=0.12232525646686554.
Finished epoch=3 with loss=0.11737056076526642.
Finished epoch=4 with loss=0.11309609562158585.
Finished epoch=5 with loss=0.10898953676223755.
Finished epoch=6 with loss=0.10630710422992706.
Finished epoch=7 with loss=0.1054118201136589.


In [34]:
idx = torch.arange( 512)
loss = lossfct( source[idx], target[idx])

IndexError: index 128 is out of bounds for dimension 0 with size 128

## What to do if my training doesn't work?

- Try to overfit!
  - Loss function needs to be meaningful at all