# Working with Images

Andreas Sünder

# Basics

In [11]:
import imageio.v3 as imageio
import torch

In [10]:
img_arr = imageio.imread("../data/p1ch4/image-dog/bobby.jpg")
img_arr.shape

(720, 1280, 3)

This NumPy array-like object current consists of three dimensions, namely two spatial dimensions (width and height) aswell as as a third dimension corresponding to the RGB channels. To change the layout of this array, we can use methods of the PyTorch library:

In [18]:
img = torch.from_numpy(img_arr)
out = img.permute(2, 0, 1)

In [19]:
out.shape

torch.Size([3, 720, 1280])

## Using a batch of images

At this point we successfully created an $C \times H \times W$ tensor. Next, we can try to create a dataset of multiple images by preallocating a tensor of a specific size and fill it with images:

In [20]:
BATCH_SIZE = 3

In [21]:
batch = torch.zeros(BATCH_SIZE, 3, 256, 256, dtype=torch.uint8)

This creates a tensor object to be used for three RGB images with 256 pixels in height and 256 pixsels in width. By specifying the `dtype` parameter, we expect each color to be represented as an 8-bit integer.

In [24]:
import os
DATA_DIR = "../data/p1ch4/image-cats/"

In [28]:
filenames = [name for name in os.listdir(DATA_DIR) if os.path.splitext(name)[-1] == ".png"]
for i, filename in enumerate(filenames):
    img_arr = imageio.imread(os.path.join(DATA_DIR, filename))
    img_t = torch.from_numpy(img_arr)
    img_t = img_t.permute(2, 0, 1)
    img_t = img_t[:3] # Cutting off everything but the first three values
    batch[i] = img_t

Neural networks work best with input values between 0 and 1, so a good strategy would be to normalize our data:

In [37]:
batch = batch.float()
batch /= 255

In [38]:
batch

tensor([[[[0.7922, 0.7569, 0.7451,  ..., 0.0510, 0.0510, 0.0471],
          [0.7804, 0.7529, 0.7412,  ..., 0.0549, 0.0549, 0.0549],
          [0.7765, 0.7569, 0.7373,  ..., 0.0471, 0.0471, 0.0471],
          ...,
          [0.3647, 0.3216, 0.2980,  ..., 0.1412, 0.1412, 0.1412],
          [0.2941, 0.2667, 0.3961,  ..., 0.1412, 0.1412, 0.1451],
          [0.3333, 0.4039, 0.3529,  ..., 0.1412, 0.1451, 0.1490]],

         [[0.5922, 0.5451, 0.5216,  ..., 0.0353, 0.0353, 0.0314],
          [0.5922, 0.5490, 0.5255,  ..., 0.0431, 0.0431, 0.0431],
          [0.5961, 0.5608, 0.5255,  ..., 0.0431, 0.0431, 0.0431],
          ...,
          [0.2235, 0.1765, 0.1529,  ..., 0.1020, 0.1020, 0.1020],
          [0.1294, 0.1020, 0.2314,  ..., 0.1020, 0.1020, 0.1059],
          [0.1569, 0.2275, 0.1765,  ..., 0.1020, 0.1059, 0.1098]],

         [[0.2667, 0.2078, 0.1725,  ..., 0.0235, 0.0235, 0.0196],
          [0.2627, 0.2118, 0.1725,  ..., 0.0235, 0.0235, 0.0235],
          [0.2627, 0.2196, 0.1725,  ..., 0