# Octave Convolution Tests

We can use this notebook to test our implementation of the OctConv module.

The OctConv module itself is defined under `modules.py`.

## Setup

In [11]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [12]:
%load_ext autoreload
%autoreload 2

from modules import OctConv2dStackable, get_stacked_4
from octconv_tests import test_octconv_shapes, test_octconv_as_conv

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [13]:
USE_GPU = True

dtype = torch.float32

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

print('using device:', device)

using device: cpu


## Testing OctConv Behavior

Testing code is located in `octconv_tests.py`.

We can disregard the 'nn.Upsample' warning and safely use `nn.Upsample` as a layer according to these [posts](https://discuss.pytorch.org/t/which-function-is-better-for-upsampling-upsampling-or-interpolate/21811/12)

In [15]:
# Example test for Octconv layer with padding and stride
oc = OctConv2dStackable(16, 32, (3, 3), 0.25, 0.25, stride=1, padding=1)
input_stacked = torch.randn(128, 13, 32, 32)
out = oc(input_stacked)
assert out.shape == (128, 26, 32, 32), "Shape mismatch for stride=1, padding=1"

In [16]:
test_octconv_shapes()
test_octconv_as_conv()

## Building an Octconv Network

Here we use the `FourLayerOctConvNet` defined in `modules.py`. That code is not super flexible, but it proves that a network built with OctConv layers can overfit a small dataset.

In [23]:
# Initialize random training data
N, C, H, W, D_out = 64, 3, 32, 32, 10
x = torch.randn(N, C, H, W, dtype=dtype, device=device)
y = torch.randint(0, D_out, (N, ), dtype=dtype, device=device)

In [24]:
# Create our model
alpha, hidden_channels = .25, 32
model = get_stacked_4(alpha, hidden_channels, C, H, W, D_out)

In [25]:
for name, param in model.named_parameters():
    if param.requires_grad:
        print(name)

0.conv_hh.weight
0.conv_hh.bias
0.conv_hl.weight
0.conv_hl.bias
2.conv_hh.weight
2.conv_hh.bias
2.conv_ll.weight
2.conv_ll.bias
2.conv_lh.weight
2.conv_lh.bias
2.conv_hl.weight
2.conv_hl.bias
5.conv_hh.weight
5.conv_hh.bias
5.conv_ll.weight
5.conv_ll.bias
5.conv_lh.weight
5.conv_lh.bias
5.conv_hl.weight
5.conv_hl.bias
7.conv_hh.weight
7.conv_hh.bias
7.conv_lh.weight
7.conv_lh.bias
11.weight
11.bias


In [26]:
# Overfit on our fake dataset
# This training code shamelessy adapted from Justin Johnson's Pytorch examples
model = model.to(device=device)
x = x.to(device=device, dtype=dtype)
y = y.to(device=device, dtype=torch.long)

learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(250):
    y_pred = model(x)
    
    loss = F.cross_entropy(y_pred, y)
    if t % 50 == 0:
        print("Iteration {}, loss: {}".format(t, loss.item()))
        _, class_preds = torch.max(y_pred, 1)
        correct = (class_preds == y).sum()
        print("Iteration {}, train accuracy: {}".format(t, float(correct) / len(y)))

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
y_pred = model(x)


Iteration 0, loss: 2.30806040763855
Iteration 0, train accuracy: 0.0625
Iteration 50, loss: 2.1193060874938965
Iteration 50, train accuracy: 0.5
Iteration 100, loss: 1.6581358909606934
Iteration 100, train accuracy: 0.859375
Iteration 150, loss: 0.35289162397384644
Iteration 150, train accuracy: 1.0
Iteration 200, loss: 0.0251496359705925
Iteration 200, train accuracy: 1.0
