In [2]:
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from torchvision import transforms
from torchvision import models
from PIL import Image

In order to run the AlexNet architecture on an input image, we can create an instance of the AlexNet class. This is how it’s done:

In [3]:
alexnet = models.AlexNet()

Let’s create an instance of the network now. We’ll pass an argument that will instruct the function to download the weights of resnet101 trained on the ImageNet dataset, with 1.2 million images and 1,000 categories:


In [4]:
resnet = models.resnet101(pretrained=True)



Let's take a peek at what a resnet101 looks like. This gives us a textual representation providing details about the structure of the network. For now, this will be information overload, but as we progress through this project, we’ll increase our ability to understand what this code is telling us.

In [5]:
#resnet

In this case, we defined a preprocess function that will scale the input image to 256 × 256, crop the image to 224 × 224 around the center, transform it to a tensor (a PyTorch multidimensional array: in this case, a 3D array with color, height, and width), and normalize its RGB (red, green, blue) components so that they have defined means and standard deviations.

In [9]:
preprocess = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225]
        )])
img = Image.open("data/project1/croco.jpeg")
img_t = preprocess(img)
batch_t = torch.unsqueeze(img_t, 0)

The process of running a trained model on new data is called inference in deep learn- ing circles. In order to do inference, we need to put the network in eval mode.

In [45]:
resnet.eval()

out = resnet(batch_t)

Let’s load the file containing the 1,000 labels for the ImageNet dataset classes

In [47]:
with open('data/project1/imagenet_classes.txt') as f:
    labels = [line.strip() for line in f.readlines()]

We determine the index corresponding to the maximum score in the out tensor we obtained previously

In [49]:
_, index = torch.max(out, 1)

We also use torch.nn.functional.softmax (http://mng.bz/BYnq) to nor- malize our outputs to the range [0, 1], and divide by the sum

In [55]:
percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100

print(labels[index[0]])
print(percentage[index[0]].item())

African crocodile, Nile crocodile, Crocodylus niloticus
99.9884262084961


In [56]:
_, indices = torch.sort(out, descending=True)
[(labels[idx], percentage[idx].item()) for idx in indices[0][:5]]

[('African crocodile, Nile crocodile, Crocodylus niloticus', 99.9884262084961),
 ('American alligator, Alligator mississipiensis', 0.010866689495742321),
 ('alligator lizard', 0.0003804856678470969),
 ('Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis',
  0.00010293831292074174),
 ('frilled lizard, Chlamydosaurus kingi', 2.4088680220302194e-05)]