# Working with images
- Need to load images from common image formats & turn them into tensors
- Images are represented as a collection of scalars in a regular grid
    - Could have a single scalar per grid point -> grayscale
    - Could have RGB color channels or multiple scalars that represent features, e.g. depth
    - Usually represented as 8-bit ints, but could also use 12 or 16 bit ints  

# Loading images 
-  `imageio` library: Handles different data types with a uniform API
- `torchvision` is generally a good alternative

In [1]:
import os

import torch
import imageio.v2
import numpy as np

In [2]:
img_arr = imageio.v2.imread('bobby.jpg') # Returns an n-d array: height, width, channels

In [3]:
img_arr.shape, type(img_arr) # Numpy array-like object: 2 spacial dimensions & a third dim for rgb values
# Issue: torch requires tensors to be laid out as follows: Channels X height X width

((720, 1280, 3), numpy.ndarray)

In [4]:
# Use torch.permute to solve
img = torch.from_numpy(img_arr)
out = img.permute(2, 0, 1) # move the color channels up to the first dim-> Operation does not copy the tensor (uses same underlying storage)-> Cheap operation

# Creating a dataset with multiple images
- Stack tensors along the first dim
- More efficient alternative: Use `stack` to build up the tensor: Preallocate a tensor of the correct size then fill it with images

In [5]:
import os
data_dir = 'image-cats'

In [6]:
batch_size = 3
batch = torch.zeros(batch_size, 3, 256, 256, dtype=torch.uint8)

In [7]:
filenames = [name for name in os.listdir(data_dir) if os.path.splitext(name)[-1] == '.png']
# list comp: lists all the files in the specified dir & adds to a list if the file is a png
#  os.path.splitext(name)[-1] : extracts the file extension of the file name-> splits file name into tuple (root, extension)-> [-1] extracts the extension & makes sure it is a png

In [8]:
for i, filename in enumerate(filenames):
    img_arr = imageio.v2.imread(os.path.join(data_dir, filename)) # use imageio to read in the images (use os to join the filename with the rest of the path)
    img = torch.from_numpy(img_arr)
    out = img.permute(2, 0, 1)[:3] # only want first 3 (discard any other numbers e.g. transparency)
    batch[i] = out

# Data normalization
- Neural networks exhibit the best performance when the input data ranges from 0 to 1 or from -1 to 1
- Need to cast tensors to floating-points & normalize the pixel values
- Have a few options
    - divide pixel values by 255-> returns values bound between 0 & 1
    - Compute the mean & standard deviation of the input & scale so that the output has 0 mean & unit standard deviation

In [9]:
# Option 1
batch = batch.float()
# batch /= 255

In [10]:
# Option 2
n_channels = batch.shape[1] # Grab the number of channels (shape: batch size X num_channels X height X Width)
for c in range(n_channels):
    mean = torch.mean(batch[:, c])
    std = torch.std(batch[:, c])
    batch[:, c] = (batch[:, c] - mean) /std

In [11]:
n_channels

3

In [12]:
batch

tensor([[[[ 0.1439,  0.0730, -0.4234,  ...,  0.0375,  0.0198,  0.1794],
          [ 0.4631, -0.2461,  0.3035,  ..., -0.4944, -0.2107, -0.1752],
          [-0.3703,  0.1439, -0.7249,  ..., -0.2993, -0.0866,  0.2858],
          ...,
          [-0.5653, -0.3171, -0.3348,  ..., -0.3703, -0.5298, -0.6362],
          [-0.3348, -0.3171, -0.4412,  ..., -0.5830, -0.4766, -0.6007],
          [-0.3348, -0.4412, -0.5298,  ..., -0.6185, -0.4766, -0.4944]],

         [[ 0.4632,  0.3874, -0.1058,  ...,  0.3874,  0.3874,  0.6150],
          [ 0.8615,  0.0839,  0.6529,  ..., -0.1816,  0.1408,  0.1787],
          [-0.0299,  0.4822, -0.4661,  ...,  0.0649,  0.2736,  0.7098],
          ...,
          [-0.2954, -0.0868, -0.0678,  ...,  0.0460, -0.1247, -0.2196],
          [-0.0678, -0.0678, -0.1627,  ..., -0.1627, -0.0489, -0.1816],
          [-0.0678, -0.2006, -0.2385,  ..., -0.2196, -0.0868, -0.0678]],

         [[ 0.7792,  0.6573,  0.1495,  ...,  0.8198,  0.8401,  1.1041],
          [ 1.3072,  0.3933,  

# 3D images
* important in e.g. medical imaging applications (e.g. CT scans):
    * Sequences of images stacked along the head to foot axis (representing different topological levels)
    * In CT scans, the intensity represents the particle density in different parts of the body
    * CT scans only have 1 color channel (like grayscale images)
    * 3-dimensional images: 3rd dimension: stacking 2d images on top of each other

In [13]:
import csv

In [15]:
wine_path = 'tabular_wine.csv'
wine_np = np.loadtxt(wine_path, dtype=np.float32, delimiter=';', skiprows=1) # specify the file path, the dtype, the delimiter used to separate values & the fact that the first row should be skipped
wine_np

array([[ 7.  ,  0.27,  0.36, ...,  0.45,  8.8 ,  6.  ],
       [ 6.3 ,  0.3 ,  0.34, ...,  0.49,  9.5 ,  6.  ],
       [ 8.1 ,  0.28,  0.4 , ...,  0.44, 10.1 ,  6.  ],
       ...,
       [ 6.5 ,  0.24,  0.19, ...,  0.46,  9.4 ,  6.  ],
       [ 5.5 ,  0.29,  0.3 , ...,  0.38, 12.8 ,  7.  ],
       [ 6.  ,  0.21,  0.38, ...,  0.32, 11.8 ,  6.  ]], dtype=float32)

In [17]:
wineq = torch.from_numpy(wine_np)
wineq.shape, wineq.dtype

(torch.Size([4898, 12]), torch.float32)