# Octave Convolution Tests

We can use this notebook to test our implementation of the OctConv module.

The OctConv module itself is defined under `modules.py`.

## Setup

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [2]:
%load_ext autoreload
%autoreload 2

from modules import OctConv2dStackable, OctConv2dBN, get_stacked_4, get_stacked_4BN
from octconv_tests import test_octconv_shapes, test_octconv_as_conv

In [3]:
USE_GPU = False

dtype = torch.float32

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

print('using device:', device)

using device: cpu


## Testing OctConv Behavior

Testing code is located in `octconv_tests.py`.

We can disregard the 'nn.Upsample' warning and safely use `nn.Upsample` as a layer according to these [posts](https://discuss.pytorch.org/t/which-function-is-better-for-upsampling-upsampling-or-interpolate/21811/12)

In [4]:
# Example test for Octconv layer with padding and stride
oc = OctConv2dStackable(16, 32, (3, 3), 0.25, 0.25, stride=1, padding=1)
input_stacked = torch.randn(128, 13, 32, 32)
out = oc(input_stacked)
assert out.shape == (128, 26, 32, 32), "Shape mismatch for stride=1, padding=1"



In [5]:
test_octconv_shapes()
test_octconv_as_conv()

TypeError: forward() takes 2 positional arguments but 3 were given

## Building an Octconv Network

Here we use the `FourLayerOctConvNet` defined in `modules.py`. That code is not super flexible, but it proves that a network built with OctConv layers can overfit a small dataset.

In [4]:
# Initialize random training data
N, C, H, W, D_out = 64, 3, 30, 30, 10
x = torch.randn(N, C, H, W, dtype=dtype, device=device)
y = torch.randint(0, D_out, (N, ), dtype=dtype, device=device)

In [5]:
# Create our model
alpha, freq_ratio, hidden_channels = .25, 3, 32
model = get_stacked_4(alpha, freq_ratio, hidden_channels, C, H, W, D_out)

In [6]:
for name, param in list(model.named_parameters())[:10]:
    if param.requires_grad:
        print(name)

11.weight
11.bias


In [8]:
# Overfit on our fake dataset
# This training code shamelessy adapted from Justin Johnson's Pytorch examples
model = model.to(device=device)
x = x.to(device=device, dtype=dtype)
y = y.to(device=device, dtype=torch.long)

learning_rate = 1e-3
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(250):
    y_pred = model(x)
    
    loss = F.cross_entropy(y_pred, y)
    if t % 25 == 0:
        _, class_preds = torch.max(y_pred, 1)
        correct = (class_preds == y).sum()
        print("Iteration {}, loss: {}, train accuracy: {}".format(t, loss.item(), float(correct) / len(y)))

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
y_pred = model(x)


Iteration 0, loss: 2.121100425720215, train accuracy: 0.296875
Iteration 25, loss: 1.9618334770202637, train accuracy: 0.5625
Iteration 50, loss: 1.8080220222473145, train accuracy: 0.6875
Iteration 75, loss: 1.661226749420166, train accuracy: 0.828125
Iteration 100, loss: 1.5221688747406006, train accuracy: 0.953125
Iteration 125, loss: 1.3913532495498657, train accuracy: 1.0
Iteration 150, loss: 1.269147515296936, train accuracy: 1.0
Iteration 175, loss: 1.155777096748352, train accuracy: 1.0
Iteration 200, loss: 1.0513089895248413, train accuracy: 1.0
Iteration 225, loss: 0.9556519985198975, train accuracy: 1.0


## Building an OctConv network with Batchnorm

In [9]:
model = get_stacked_4BN(alpha, freq_ratio, hidden_channels, C, H, W, D_out)
for name, param in list(model.named_parameters())[:10]:
    if param.requires_grad:
        print(name)

11.weight
11.bias


In [10]:
# Overfit on our fake dataset
# As expected, Batchnorm speeds up training by 2x - 3x!
# This training code shamelessy adapted from Justin Johnson's Pytorch examples
model = model.to(device=device)
x = x.to(device=device, dtype=dtype)
y = y.to(device=device, dtype=torch.long)

model.train()
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(250):
    y_pred = model(x)
    
    loss = F.cross_entropy(y_pred, y)
    if t % 25 == 0:
        _, class_preds = torch.max(y_pred, 1)
        correct = (class_preds == y).sum()
        print("Iteration {}, loss: {}, train accuracy: {}".format(t, loss.item(), float(correct) / len(y)))

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

model.eval()
y_pred = model(x)

Iteration 0, loss: 2.3677077293395996, train accuracy: 0.078125
Iteration 25, loss: 1.9777846336364746, train accuracy: 0.421875
Iteration 50, loss: 1.6550426483154297, train accuracy: 0.75
Iteration 75, loss: 1.373095989227295, train accuracy: 0.9375
Iteration 100, loss: 1.1338887214660645, train accuracy: 0.984375
Iteration 125, loss: 0.935090959072113, train accuracy: 1.0
Iteration 150, loss: 0.7725397348403931, train accuracy: 1.0
Iteration 175, loss: 0.6412968039512634, train accuracy: 1.0
Iteration 200, loss: 0.5361815094947815, train accuracy: 1.0
Iteration 225, loss: 0.4522348642349243, train accuracy: 1.0
