# Octave Convolution Tests

We can use this notebook to test our implementation of the OctConv module.

The OctConv module itself is defined under `modules.py`.

## Setup

In [103]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [104]:
%load_ext autoreload
%autoreload 2

from modules import OctConv2d, FourLayerOctConvNet
from octconv_tests import test_octconv_shapes, test_octconv_as_conv

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [105]:
USE_GPU = True

dtype = torch.float32

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

print('using device:', device)

using device: cpu


## Testing OctConv Behavior

Testing code is located in `octconv_tests.py`.

We can disregard the 'nn.Upsample' warning and safely use `nn.Upsample` as a layer according to these [posts](https://discuss.pytorch.org/t/which-function-is-better-for-upsampling-upsampling-or-interpolate/21811/12)

In [106]:
# Example test for Octconv layer with padding and stride
oc = OctConv2d(16, 32, (3, 3), 0.25, 0.25, stride=1, padding=1)
input_h = torch.randn(128, 12, 32, 32)
input_l = torch.randn(128, 4, 16, 16)
output_h, output_l = oc(input_h, input_l)
assert output_h.shape == (128, 24, 32, 32), "Shape mismatch for stride=1, padding=1"
assert output_l.shape == (128, 8, 16, 16), "Shape mismatch for stride=1, padding=1"

In [107]:
test_octconv_shapes()
test_octconv_as_conv()

## Building an Octconv Network

Here we use the `FourLayerOctConvNet` defined in `modules.py`. That code is not super flexible, but it proves that a network built with OctConv layers can overfit a small dataset.

In [117]:
# Initialize random training data
N, C, H, W, D_out = 64, 3, 32, 32, 10
x = torch.randn(N, C, H, W, dtype=dtype, device=device)
y = torch.randint(0, D_out, (N, ), dtype=dtype, device=device)

In [118]:
# Create our model
alpha, hidden_channels = 0.25, 32
model = FourLayerOctConvNet(alpha, hidden_channels, C, H, W, D_out)

In [119]:
for name, param in model.named_parameters():
    if param.requires_grad:
        print(name)

oc1.conv_hh.weight
oc1.conv_hh.bias
oc1.conv_hl.weight
oc1.conv_hl.bias
oc2.conv_hh.weight
oc2.conv_hh.bias
oc2.conv_ll.weight
oc2.conv_ll.bias
oc2.conv_lh.weight
oc2.conv_lh.bias
oc2.conv_hl.weight
oc2.conv_hl.bias
oc3.conv_hh.weight
oc3.conv_hh.bias
oc3.conv_ll.weight
oc3.conv_ll.bias
oc3.conv_lh.weight
oc3.conv_lh.bias
oc3.conv_hl.weight
oc3.conv_hl.bias
oc4.conv_hh.weight
oc4.conv_hh.bias
oc4.conv_lh.weight
oc4.conv_lh.bias
fc1.weight
fc1.bias


In [120]:
# Overfit on our fake dataset
# This training code shamelessy adapted from Justin Johnson's Pytorch examples
model = model.to(device=device)
x = x.to(device=device, dtype=dtype)
y = y.to(device=device, dtype=torch.long)

learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(250):
    y_pred = model(x)
    
    loss = F.cross_entropy(y_pred, y)
    if t % 50 == 0:
        print("Iteration {}, loss: {}".format(t, loss.item()))
        _, class_preds = torch.max(y_pred, 1)
        correct = (class_preds == y).sum()
        print("Iteration {}, train accuracy: {}".format(t, float(correct) / len(y)))

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
y_pred = model(x)


Iteration 0, loss: 2.3111572265625
Iteration 0, train accuracy: 0.109375
Iteration 50, loss: 2.102719783782959
Iteration 50, train accuracy: 0.21875
Iteration 100, loss: 1.5918688774108887
Iteration 100, train accuracy: 0.546875
Iteration 150, loss: 0.25606366991996765
Iteration 150, train accuracy: 1.0
Iteration 200, loss: 0.023907391354441643
Iteration 200, train accuracy: 1.0
