# Debugging AlexNet and ResNet

Covering a few counterintuitive things about real-world networks in practice.
* Why is my network very inaccurate?
* How to debug a network step-by-step.
* Where is my softmax?
* Why does it always have `grad_fn`?
* Are deeper networks bigger?


# Show setup code

In [None]:
%%bash
# If you are on Google Colab, this sets up everything needed.
# If not, you will want to pip install the cs7150lib as shown below.
!(stat -t /usr/local/lib/*/dist-packages/google/colab > /dev/null 2>&1) && exit
pip install git+https://github.com/cs7150/cs7150lib@main

wget -N https://cs7150.baulab.info/2022-Fall/data/dog-and-cat-example.jpg
wget -N https://cs7150.baulab.info/2022-Fall/data/hungry-cat.jpg
wget -N https://cs7150.baulab.info/2022-Fall/data/imagenet-labels.txt


This defines some visualization functions.

In [None]:
import torch, os
from torchvision.models import alexnet, resnet18
from torchvision.transforms import Compose, ToTensor, Normalize, Resize
from baukit import ImageFolderSet, show, renormalize, set_requires_grad
from torchvision.datasets.utils import download_and_extract_archive

if not os.path.isdir('imagenet10k'):
    download_and_extract_archive('https://cs7150.baulab.info/2022-Fall/data/imagenet10k.zip', 'imagenet10k')

def rgb_heatmap(
    data,
    size=None,
    colormap="hot",
    amax=None,
    amin=None,
    mode="bicubic",
    symmetric=False,
):
    size = spec_size(size)
    mapping = getattr(cm, colormap)
    scaled = torch.nn.functional.interpolate(data[None, None], size=size, mode=mode)[
        0, 0
    ]
    if amax is None:
        amax = data.max()
    if amin is None:
        amin = data.min()
    if symmetric:
        amax = max(amax, -amin)
        amin = min(amin, -amax)
    normed = (scaled - amin) / (amax - amin + 1e-10)
    return PIL.Image.fromarray((255 * mapping(normed)).astype("uint8"))


def rgb_threshold(data, size=None, mode="bicubic", p=0.2):
    size = spec_size(size)
    scaled = torch.nn.functional.interpolate(data[None, None], size=size, mode=mode)[
        0, 0
    ]
    ordered = scaled.view(-1).sort()[0]
    threshold = ordered[int(len(ordered) * (1 - p))]
    result = numpy.tile((scaled > threshold)[:, :, None], (1, 1, 3))
    return PIL.Image.fromarray((255 * result).astype("uint8"))


def overlay(im1, im2, alpha=0.5):
    import numpy

    return PIL.Image.fromarray(
        (
            numpy.array(im1)[..., :3] * alpha + numpy.array(im2)[..., :3] * (1 - alpha)
        ).astype("uint8")
    )


def overlay_threshold(im1, im2, alpha=0.5):
    import numpy

    return PIL.Image.fromarray(
        (
            numpy.array(im1)[..., :3] * (1 - numpy.array(im2)[..., :3] / 255) * alpha
            + numpy.array(im2)[..., :3] * (numpy.array(im1)[..., :3] / 255)
        ).astype("uint8")
    )


def spec_size(size):
    if isinstance(size, int):
        dims = (size, size)
    if isinstance(size, torch.Tensor):
        size = size.shape[:2]
    if isinstance(size, PIL.Image.Image):
        size = (size.size[1], size.size[0])
    if size is None:
        size = (224, 224)
    return size


def resize_and_crop(im, d):
    if im.size[0] >= im.size[1]:
        im = im.resize((int(im.size[0] / im.size[1] * d), d))
        return im.crop(((im.size[0] - d) // 2, 0, (im.size[0] + d) // 2, d))
    else:
        im = im.resize((d, int(im.size[1] / im.size[9] * d)))
        return im.crop((0, (im.size[1] - d) // 2, d, (im.size[1] + d) // 2))

## 1. Load a pretrained alexnet

The code below loads a pretrained Alexnet, the famous network by Alex Krizhevsky in 2012.

Examine the network's layers.  Notice that net.features is a stack of convolutions.

In [None]:
anet = alexnet(pretrained=True)

# 2. Run a single image through alexnet

First, load the first image in the training set

In [None]:
preprocess = Compose([
    ToTensor(),
    Resize(227),
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

ds = ImageFolderSet('imagenet10k/train', transform=preprocess, classification=True, shuffle=True)
image_tensor = ds[0][0]

show(renormalize.as_image(image_tensor, source=ds))
print('shape', image_tensor.shape)

Then run it through the network.  Try repeating this several times.

In [None]:
input_batch = image_tensor[None]
output = anet(input_batch)
value, index = output.max(dim=1)
print('Predicted class', index, ds.classes[index])
print('Predicted value', value)

## Questions to discuss

* Is the network outputting a probability?  How can I tell?
* What if I want one?
* What is the `grad_fn` business?  Do I want it?  When would I want it?
* Does the network answer the same answer each time?

## Let's fix it
1. Go through all the parameters of the network and mark them as `requires_grad=False`.
2. Debug the nondeterminism by running the first half of the network and then the second half and then figuring out which layer is messing us up.
3. Put the network in the right mode to fix the problem.
4. Finally, put a softmax around the whole thing.  Do we need the softmax to measure accuracy?


## 3. Test the accuracy of alexnet

The code below downloads a small sample of imagenet and tests the accuracy of alexnet on it.

It shows the first 12 examples.  How does it do?

 1. Modify the code (remove the "break") so that it tests all 10k training examples. Speed things up by using the gpu/tpu.
 2. Now modify the code (change from the "/train" directory to the "/val" directory) to test it on held-out examples.

What is your impression of the accuracy of the model?

In [None]:
from baukit import pbar

examples = []
correct = 0
tested = 0
for i, (im, label) in enumerate(pbar(ds)):
    pred = anet(im[None]).argmax(1).item()
    if len(examples) < 12:
        examples.append([
            f'pred: {ds.classes[pred]}',
            f'true: {ds.classes[label]}',
            [renormalize.as_image(im, source=ds)]])
        if len(examples) == 12:
            show(show.WRAP, *[examples])
            break
    tested += 1
    if pred == label:
        correct += 1

print('correct:', correct, 'out of', tested)

# 4. Now compare it with resnet

In [None]:
rnet = resnet18(pretrained=True)
rnet

## Things to discuss
* This is resnet18.  Is it larger or smaller than AlexNet?
* Which one took longer to download?  Why?  Let's write the code to figure it out, like this:

```
num_params = 0
for n, t in anet.named_parameters():
    print(n, t.shape, 'param count:', t.numel())
    num_params += t.numel()
print('total params:', num_params)
```

Now let's try measuring the accuracy.  Run the code below.

* Isn't resnet supposed to be newer?  Why is it always wrong?
* Let's debug it.
* What input size is resnet supposed to have?  Let's fix it.

In [None]:
examples = []
correct = 0
tested = 0
for i, (im, label) in enumerate(pbar(ds)):
    pred = rnet(im[None]).argmax(1).item()
    if len(examples) < 12:
        examples.append([
            f'pred: {ds.classes[pred]}',
            f'true: {ds.classes[label]}',
            [renormalize.as_image(im, source=ds)]])
        if len(examples) == 12:
            show(show.WRAP, *[examples])
            break
    tested += 1
    if pred == label:
        correct += 1

print('correct:', correct, 'out of', tested)

## 5. Do a sliding window heatmap of alexnet's and resnet salience.

Here we will construct a new example by hand, if enough time, using Matt Zeiler's masking salience technique.

Try it with resnet also.  What do you have to fix?

In [None]:

imgnum = 3
img, target_class = ds[imgnum]
pred = net(img[None]).argmax(1).item()
show(renormalize.as_image(img, source=ds))
print('shape', img.shape)
print('pred:', ds.classes[pred], pred)
print('true:', ds.classes[target_class], target_class)

total_pred = torch.zeros(227, 227)
total_true = torch.zeros(227, 227)
weight = torch.zeros(227, 227)
set_requires_grad(False, net)
for y in range(0, 227, 8):
    for x in range(0, 227, 8):
        inp = img.clone()
        inp[:, y:y+32, x:x+32] = 0
        #out = torch.nn.functional.softmax(net(inp[None]), dim=1)
        out = anet(inp[None])
        # print(y, x, out[0, target_class])
        total_pred[y:y+32, x:x+32] += out[0, pred]
        total_true[y:y+32, x:x+32] += out[0, target_class]
        weight[y:y+32, x:x+32] += 1

heatmap_pred = (total_pred / weight)
heatmap_pred = heatmap_pred - heatmap_pred.min()
heatmap_pred = heatmap_pred / heatmap_pred.max() * -2 + 1

heatmap_true = (total_true / weight)
heatmap_true = heatmap_true - heatmap_true.min()
heatmap_true = heatmap_true / heatmap_true.max() * -2 + 1

show([[
    ['input', [renormalize.as_image(img, source=ds)]],
    [ds.classes[pred], [renormalize.as_image(heatmap_pred[None])]],
    [ds.classes[target_class], [renormalize.as_image(heatmap_true[None])]]
    ]])

## 6. Try running resnet layers

Try running resnet layers one-at-a-time and making one channel very large partway through.

How does it behave?