In [1]:
import numpy as np

Reading an image file:
- when working with images, we can read images into numpy arrays using the uint8 (unsigned 8-bit integer) data type to reduce memory usage compared to 16-bit, 32-bit, or 64-bit integer types, for example:
- Unsigned 8-bit integers take values in the range [0,255], which is sufficient to store the pixel information in RGB images, which also take values in the same range

In [2]:
import torch
from torchvision.io import read_image

In [3]:
img = read_image('../../data/cat_dog_images/dog-01.jpg')

In [4]:
print('Image shape: ',img.shape)

Image shape:  torch.Size([3, 800, 1200])


In [5]:
print('Number of channels: ',img.shape[0])

Number of channels:  3


In [6]:
print('Image data type: ',img.dtype)

Image data type:  torch.uint8


Note: with torchvision, the input and output tensors are in the format of Tensor[channels, image_height, image_width]

For multiple channel images, the convolutional operation is performed separately for each channel and the results are added together using matrix summation. The convolution associated with each channel c has its own kernel matrix as W[:,:,c].

**Regularizing a NN with L2 regularization and Dropout:**

In [7]:
import torch.nn as nn

In [8]:
loss_fn = nn.BCELoss()
loss = loss_fn(torch.tensor([0.9]), torch.tensor([1.0]))
loss

tensor(0.1054)

In [10]:
l2_lambda = 0.001
conv_layer = nn.Conv2d(in_channels = 3,
                       out_channels = 5,
                       kernel_size = 5)

l2_penalty = l2_lambda * sum(
    [(p**2).sum() for p in conv_layer.parameters()]
)

loss_with_penalty = loss + l2_penalty
linear_layer = nn.Linear(10,16)
l2_penalty = l2_lambda * sum(
    [(p**2).sum() for p in linear_layer.parameters()]
)
loss_with_penalty = loss + l2_penalty

In [13]:
# conv layer closer examination:
conv_layer = nn.Conv2d(in_channels = 3, out_channels=5, kernel_size = 5)

for param in conv_layer.parameters():
    print(param.shape)

torch.Size([5, 3, 5, 5])
torch.Size([5])


In [14]:
lin_layer = nn.Linear(10, 16)
for param in lin_layer.parameters():
    print(param.shape)

torch.Size([16, 10])
torch.Size([16])


Weight decay v. L2 Regularization:
- an alternative way to use L2 regularization is by setting the weight decay parameter in a PyTorch optimizer to a positive value. For example:
- while L2 regularization and weight_decay are not strictly identical, it can be shown that they are equivalent when using stochastic gradient descent (SGD) optimizers. 
- can use dropout as well, which is usually applied to the hidden units of higher layers. During the training phase, of an NN, a fraction of the hidden units are randomly dropped at every iteration with the probability pdrop. This dropout probability is determined by the yser and the common choice is p=0.5. When dropping a certain fraction of input neurons, the weights associated with the remaining neurons are rescaled to account for the missing (dropped) neurons.
- The effect is that the network os forced to learn a redundant representation of the data. Therefore, the network cannot rely on the activation of any set of hidden units, since they may be turned off at any time during training, and is forced to learn more general and robust patterns from the data. 
- random dropout can effectively prevent overfitting. 

**Loss functions:**

In [None]:
# binary cross-entropy
logits = torch.tensor([0.8])
probas = torch.sigmoid(logits)
target = torch.tensor([1.0])
bce_loss_fn = nn.BCELoss() # inputs to the returned object are pred(probabilities), target
bce_logits_loss_fn = nn.BCEWithLogitsLoss() # inputs to the returned object are logits, target
print(f'BCE (w Probas): {bce_loss_fn(probas, target):.4f}')
print(f'BCE (w Logits): {bce_logits_loss_fn(logits, target):.4f}')

BCE (w Probas): 0.3711
BCE (w Logits): 0.3711


In [16]:
## Categorical cross entropy loss
logits = torch.tensor([[1.5, 0.8, 2.1]])
probas = torch.softmax(logits, dim=1)
target = torch.tensor([2])
cce_loss_fn = nn.NLLLoss()
cce_logits_loss_fn = nn.CrossEntropyLoss()
print(f'CCE (w Logits): {cce_logits_loss_fn(logits, target):.4f}')
print(f'CCE (w Probas): {cce_loss_fn(torch.log(probas),target):.4f}')

CCE (w Logits): 0.5996
CCE (w Probas): 0.5996


**Implementing a deep CNN using PyTorch:**

**Loading and preprocessing the data:**
- First, we load the MNIST dataset using the torchvision module and construct the training and test sets, as before:

In [17]:
import torchvision 
from torchvision import transforms

In [19]:
from torch.utils.data import Subset

In [21]:
from torch.utils.data import DataLoader

In [18]:
image_path = '../../data/mnist'

transform = transforms.Compose([
    transforms.ToTensor()
])

mnist_dataset = torchvision.datasets.MNIST(root=image_path, train=True, transform=transform, download=False)

In [20]:
mnist_valid_dataset = Subset(mnist_dataset, torch.arange(10000))
mnist_train_dataset = Subset(mnist_dataset, torch.arange(10000, len(mnist_dataset)))
mnist_test_dataset = torchvision.datasets.MNIST(
    root = image_path, train=False, transform=transform, download=False
)

Next, we construct the data loader with batches of 64 images for the training set and validation set respectively:

In [22]:
batch_size = 64
torch.manual_seed(1)

train_dl = DataLoader(mnist_train_dataset, batch_size, shuffle=True)
valid_dl = DataLoader(mnist_valid_dataset, batch_size, shuffle=False)