# Working with Images

In [6]:
import numpy as np
import torch
import os
torch.set_printoptions(edgeitems=2, threshold=50)

## Load Image File

In [3]:
import imageio.v2 as imageio

img_arr = imageio.imread('../../../images/p2ch4/bobby.jpg')
img_arr.shape

(720, 1280, 3)

## Changing the layout (`permute`)

PyTorch modules dealing with image data require tensors to be laid out as 

`C × H × W`: channels, height, and width, respectively.

In [4]:
img = torch.from_numpy(img_arr)
out = img.permute(2, 0, 1)
out.shape

torch.Size([3, 720, 1280])

`out` uses the same underlying storage as `img` and only plays with the size and stride information at the tensor level. This is convenient because the operation is very cheap;

## Load Image as Batches

In [13]:
data_dir = '../../../images/p2ch4/image-cats'
png_files = [f for f in os.listdir(data_dir) if f.endswith('.png')]
batch = torch.zeros(len(png_files), 3, 256, 256, dtype=torch.uint8)
batch.shape

torch.Size([3, 3, 256, 256])

In [12]:
for i, filename in enumerate(png_files):
    img_arr = imageio.imread(os.path.join(data_dir, filename))
    img_t = torch.from_numpy(img_arr)
    img_t = img_t.permute(2, 0, 1)
    img_t = img_t[:3] # <1> Keep 1st three channels
    batch[i] = img_t

`#1` Here we keep only the first three channels. Sometimes images also have an alpha channel indicating transparency, but our network only wants RGB input.

## Normalizing the data

Neural networks exhibit the best training performance when the input data ranges roughly from 0 to 1, or from -1 to 1 (this is an effect of how their building blocks are defined).

In [None]:
batch = batch.float()

In [16]:
img_t.dtype

torch.uint8

### Simple

just divide the values of the pixels by 255 (the maximum representable number in 8-bit unsigned):

In [17]:
batch /= 255.0

### Standardization

compute the mean and standard deviation of the input data and scale it so that the output has zero mean and unit standard deviation across each channel — a technique commonly known as standardization:

In [18]:
n_channels = batch.shape[1]
for c in range(n_channels):
    mean = torch.mean(batch[:, c])
    std = torch.std(batch[:, c])
    batch[:, c] = (batch[:, c] - mean) / std

Here, we normalize just a single batch of images because we do not know yet how to operate on an entire dataset. In working with images, it is good practice to compute the mean and standard deviation on all the training data in advance and then subtract and divide by these fixed, precomputed quantities. 