# Building a Simple Feed-Forward NN for Image Classification

Practice session based on concepts and code explained in Chapter 2 of Programming PyTorch for Deep Learning (2019) by Ian Porter.

In [1]:
import torch
import torch.nn as nn                     # module in creating and training of the neural network
import torch.optim as optim               # package implementing various optimization algorithms
import torch.utils.data                   # data loading utilities
import torch.nn.functional as F           # contains activation functions
import torchvision                        # package consisting of datasets, model architectures, and common image transformations for computer vision
from torchvision import transforms
from PIL import Image, ImageFile

ImageFile.LOAD_TRUNACTED_IMAGES = True    # enables loading even the truncated images

## Setting up data loaders

In [2]:
def check_image(path):
    try:
        im = Image.open(path)             # if successful, this method returns reference to the image at path
        return True
    except:
        return False

GPUs are built to be fast at performing calculations that are a standard size. However, resolutions of the images that make up datasets usually vary - for that reason we will scale every incoming image to 64x64 via the Resize(64) transform. After that, images are converted to tensors and finally normalized around specific set of mean and standard deviation points (in this case we are using standard deviation of the ImageNet dataset as a whole). The resolution of 64x64 is arbitrary by the way and can be changed.

In [3]:
img_transforms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std =[0.229, 0.224, 0.225])
    ])

Here we build the training, validation and test datasets using the torchvision class ImageFolder which takes the path, any transforms we want to apply and a method to check if image is valid. NB datasets are the totality of images used for a particular task (training, validation or testing).

In [4]:
train_data_path = "/Users/nikolavetnic/Desktop/Datasets/img_catfish/train"
train_data = torchvision.datasets.ImageFolder(
    root=train_data_path,
    transform=img_transforms,
    is_valid_file=check_image)

val_data_path = "/Users/nikolavetnic/Desktop/Datasets/img_catfish/val/"
val_data = torchvision.datasets.ImageFolder(
    root=val_data_path,
    transform=img_transforms,
    is_valid_file=check_image)

test_data_path = "/Users/nikolavetnic/Desktop/Datasets/img_catfish/test/"
test_data = torchvision.datasets.ImageFolder(
    root=test_data_path,
    transform=img_transforms,
    is_valid_file=check_image)

While the datasets give us means of reaching the data we're supplying to the network, the data loader feeds the data into the network. Important parameter is the batch size, which defines how many images of the entire dataset are we feeding into the network during an epoch. Various parameters further define how we chose the images to be fed out of the set.

In [5]:
batch_size = 64

A simple definition of three data loaders, each taking nothing more than the path to the set and the batch size.

In [6]:
train_data_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size)
val_data_loader = torch.utils.data.DataLoader(val_data, batch_size=batch_size)
test_data_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size)

Here we check if we can use the GPU for our calculations and if not we decide to use the CPU.

In [7]:
if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

## Creating the model

Base class for all neural network models is torch.nn.Module and every model should also subclass this class.

If there is any situation that you don't know how many rows you want but are sure of the number of columns, then you can specify this with a -1. (Note that you can extend this to tensors with more dimensions. Only one of the axis value can be -1). This is a way of telling the library: "give me a tensor that has these many columns and you compute the appropriate number of rows that is necessary to make this happen". This can be seen in the neural network code that you have given above. After the line `x = self.pool(F.relu(self.conv2(x)))` in the forward function, you will have a 16 depth feature map. You have to flatten this to give it to the fully connected layer. So you tell pytorch to reshape the tensor you obtained to have specific number of columns and tell it to decide the number of rows by itself.

In [8]:
class SimpleNet(nn.Module):

    def __init__(self):                   # complete setup is done in __init__ (calling superclass constructor and three fc layers)
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(12288, 84)   # fc - fully connected layer, called 'Linear' in PyTorch
        self.fc2 = nn.Linear(84, 50)      # fc - fully connected layer, called 'Linear' in PyTorch
        self.fc3 = nn.Linear(50, 2)       # fc - fully connected layer, called 'Linear' in PyTorch

    def forward(self, x):                 # describes how data flows through the network in both training and making predictions
        x = x.view(-1, 12288)             # convert to 1D vector, 64 * 64 * RGB = 12288
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [9]:
simplenet = SimpleNet()

Here we are creating an Adam optimizer with learning rate of 0.001. Although this particular learning rate value works well in many cases, there are means of calculating the optimal value which will be explained in follow up notebooks.

In [10]:
optimizer = optim.Adam(simplenet.parameters(), lr=0.001)

Here we copy the model to our previously selected device - GPU or CPU.

In [11]:
simplenet.to(device)

SimpleNet(
  (fc1): Linear(in_features=12288, out_features=84, bias=True)
  (fc2): Linear(in_features=84, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

## Training the model

Data loaders process entire data set in each epoch by passing as many batches as needed. How many batches is created is determined by the set and batch sizes.

`torch.eq()` method computes element-wise equality and returns a `torch.BoolTensor` containing a `True` at each location where comparison is true: `torch.eq(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]]))`

`torch.max()` returns a namedtuple `(values, indices)` where values is the maximum value of each row of the input tensor in the given dimension `dim`, and indices is the index location of each maximum value found (argmax): `torch.max(input, dim, keepdim=False, out=None) -> (Tensor, LongTensor)`

So, what happens in the model is this: `torch.eq()` is called to compute element-wise equality and return a boolean tensor with the info; it is fed `torch.max()` as input whose output is to be compared to targets acquired from the data loader; `torch.max()` is fed `F.softmax()` and given `dim=1` as parameter which means it will look in a 1D tensor to find its max value, and `[1]` index means that it will return the index of the maximum (i. e. label), not maximum value itself; finally, `F.softmax()` is used as the activation function of the output layer of the model (hence we left it out from the `forward()` method upon defining the model class), also with `dim=1` parameter as we will have a 1D tensor as a result. In short, this line compares model predictions to the actual values of the data fed.

In [17]:
def train(model, optimizer, loss_fn, train_loader, val_loader, epochs=20, device="cpu"):

    for epoch in range(epochs):

        training_loss = 0.0               # losses should be set to zero on each epoch's start
        valid_loss = 0.0

    # during training we run the data through the model, calculate the loss and adjust the weights accordingly

        model.train()
        for batch in train_loader:        # we take a batch from our training set on every iteration of the loop, which is handled by our data loader
            optimizer.zero_grad()         # the calculated gradients accumulate by default, we want to reset them to zero on each run
            inputs, targets = batch
            inputs = inputs.to(device)
            targets = targets.to(device)
            output = model(inputs)            # we then run those through our model...
            loss = loss_fn(output, targets)   # ...and compute the loss from the expected output
            loss.backward()                   # to compute the gradients, we call the backward() method on the modloss calculated
            optimizer.step()                  # uses those gradients afterward to perform the adjustment of the weights
            training_loss += loss.data.item() * inputs.size(0)
    
        training_loss /= len(train_loader.dataset)

    # during validation we run validation data through the model and measure how good it is

        model.eval()
        num_correct = 0
        num_examples = 0
        for batch in val_loader:
            inputs, targets = batch
            inputs = inputs.to(device)
            targets = targets.to(device)
            output = model(inputs)
            loss = loss_fn(output, targets)
            valid_loss += loss.data.item() * inputs.size(0)
            correct = torch.eq(torch.max(F.softmax(output, dim=1), dim=1)[1], targets)
            num_correct += torch.sum(correct).item()
            num_examples += correct.shape[0]
    
        valid_loss /= len(val_loader.dataset)

        print('Epoch: {}, Training Loss: {:.2f}, Validation Loss: {:.2f}, accuracy = {:.2f}'
            .format(epoch, training_loss, valid_loss, num_correct / num_examples))

In [21]:
train(simplenet, optimizer, torch.nn.CrossEntropyLoss(), train_data_loader, val_data_loader, epochs=10, device=device)

Epoch: 0, Training Loss: 0.05, Validation Loss: 0.79, accuracy = 0.74
Epoch: 1, Training Loss: 0.04, Validation Loss: 0.79, accuracy = 0.75
Epoch: 2, Training Loss: 0.03, Validation Loss: 0.84, accuracy = 0.76
Epoch: 3, Training Loss: 0.03, Validation Loss: 0.88, accuracy = 0.76
Epoch: 4, Training Loss: 0.02, Validation Loss: 0.85, accuracy = 0.75
Epoch: 5, Training Loss: 0.02, Validation Loss: 0.86, accuracy = 0.75
Epoch: 6, Training Loss: 0.01, Validation Loss: 0.89, accuracy = 0.75
Epoch: 7, Training Loss: 0.01, Validation Loss: 0.90, accuracy = 0.75
Epoch: 8, Training Loss: 0.01, Validation Loss: 0.95, accuracy = 0.75
Epoch: 9, Training Loss: 0.01, Validation Loss: 1.04, accuracy = 0.76


## Making predictions

As labels are in alphanumeric order cat will be `0` and fish `1`. We open the image we would like to make prediction for, perform transforms and copy it to the device.

In [24]:
labels = ['cat', 'fish']

img = Image.open("/Users/nikolavetnic/Desktop/Datasets/img_catfish/test/fish/2226705269_0234008814.jpg")
img = img_transforms(img).to(device)

prediction = F.softmax(simplenet(img), dim=1)
prediction = prediction.argmax()
print(labels[prediction])

fish


## Saving models

We can either save the entire model using `save` or just the parameters using `state_dict`. Using the latter is normally preferable, as it allows you to reuse parameters even if the model's structure changes (or apply parameters from one model to another).

In [15]:
# saving the entire model
torch.save(simplenet, "/Users/nikolavetnic/Desktop/Datasets/img_catfish/simplenet_model")
simplenet = torch.load("/Users/nikolavetnic/Desktop/Datasets/img_catfish/simplenet_model")

  "type " + obj.__name__ + ". It won't be checked "


In [17]:
# saving the parameters using state_dict
torch.save(simplenet.state_dict(), "/Users/nikolavetnic/Desktop/Datasets/img_catfish/simplenet_state_dict")
simplenet = SimpleNet()
simplenet_state_dict = torch.load("/Users/nikolavetnic/Desktop/Datasets/img_catfish/simplenet_state_dict")
simplenet.load_state_dict(simplenet_state_dict)

<All keys matched successfully>