# Day 19 - Real-world data representation using tensors

## Working with images

* Images are either represented with a single scalar per grid point (pixel), in which case they are grayscale, or with multiple scalars per grid point, which encode colors, or other features
* Usually, these are 8-bit integers, but medical and scientific equipment may work with higher precision

### Adding color channels

* The most common way to encode color is in RGB channels
* Each channel itself can be seen as an intensity map of that color in the image

### Loading an image file

* We can load images easily with `imageio`

In [1]:
import imageio.v2 as imageio

img_arr = imageio.imread("./DLPT/data/golden-retriever-dog.jpg")
img_arr.shape

(1280, 1920, 3)

* We can see that the dimensions here are height, width, and RGB channels, respectively
* PyTorch modules dealing with images expect $C\times H\times W$ instead

### Changing the layout

* Using the `permute` method, we can change the layout of the tensor to the desired $C \times H \times W$

In [2]:
import torch

img = torch.from_numpy(img_arr)
out = img.permute(2, 0, 1)

* This does not make acopy of `img`, but simply creates a new view over the data
* We store images as batches, so the tensor will be of shape $N \times C \times H \times W$
* It is slightly more efficient than using `stack` to preallocate the batch tensor and fill it out with images

In [3]:
batch_size = 3
batch = torch.zeros(batch_size, 3, 256, 256, dtype=torch.uint8)

* We now load all images from a directory into a tensor

In [4]:
import os

data_dir = "./DLPT/data/image-cats/"
filenames = [name for name in os.listdir(data_dir)
            if os.path.splitext(name)[-1] == ".png"]
for i, filename in enumerate(filenames):
    img_arr = imageio.imread(os.path.join(data_dir, filename))
    img_t = torch.from_numpy(img_arr)
    img_t = img_t.permute(2, 0, 1)
    img_t = img_t[:3] # Drop the alpha channel, as these are PNGs
    batch[i] = img_t

### Normalizing the data

* Neural networks work best with floating point data in the ranges from 0 to 1, or from -1 to 1
* One naive way to scale the input data is to divide it by the maximum value, in this case 255

In [5]:
batch = batch.float()
batch /= 255.0

* A somewhat more principled way is to compute the mean and standard deviation, and normalize with these

In [6]:
n_channels = batch.shape[1]
for c in range(n_channels):
    mean = torch.mean(batch[:, c])
    std = torch.std(batch[:, c])
    batch[:, c] = (batch[:, c] - mean) / std

* If possible, it is good practice to calculate `mean` and `std` on the entire dataset, instead of a single batch
* There are many other ways to transform images, which we may be required to perform in order for our images to conform to a specific network's input requirements

## 3D images: Volumetric data

* In areas like medical imaging involving CT, there is an extra dimension representing 3D physical space
* In our tensor, we will represent this $depth$ after the channels dimension: $N\times C\times D\times W\times H$
* In CT specifically, there is only one channel, representing density

### Loading a specialized format

* To read volumetric data, the `imageio` module provides `volread`

In [7]:
import imageio

dir_path = "./DLPT/data/volumetric-dicom/2-LUNG 3.0  B70f-04083" # I should grab my own MRI scan!
vol_arr = imageio.volread(dir_path, "DICOM")
vol_arr.shape

Reading DICOM (examining files): 99/99 files (100.0%)
  Found 1 correct series.
Reading DICOM (loading data): 99/99  (100.0%)


(99, 512, 512)

* We can see that the depth is 99, but we are missing the channel dimension that PyTorch expects

In [8]:
vol = torch.from_numpy(vol_arr).float()
vol = vol.unsqueeze(0)
vol.shape

torch.Size([1, 99, 512, 512])