<img src="data/images/div/lecture-notebook-header.png" />

# Image Preprocessing

Image preprocessing is a fundamental stage in preparing image data for images analytics machine learning tasks, especially in computer vision. It involves a suite of operations applied to images before feeding them into models. This preparatory phase significantly impacts the performance and effectiveness of these models.

The significance of image preprocessing lies in its ability to standardize, clean, and optimize raw image data. Normalization, for instance, ensures that pixel values across images are scaled uniformly, aiding in faster convergence during model training and preventing biases toward certain intensity ranges. Additionally, operations like noise reduction and cleaning eliminate irrelevant elements or artifacts, refining the data to help models extract relevant features more accurately. Resizing and rescaling ensure uniformity in image dimensions, simplifying computational complexity while maintaining consistency in input formats for models to learn effectively.

Overall, image preprocessing acts as a crucial enabler, enhancing the quality of input data for machine learning models. By preparing images systematically, it empowers models to learn more efficiently, generalize better to new data, and produce more reliable and precise outputs in various computer vision tasks.

Let's get started...

## Setting up the Notebook

### Importing Required Packages

`torchvision` is a popular library in the PyTorch ecosystem primarily used for computer vision tasks. It provides tools and utilities for image and video processing, including datasets, image transformations, pre-trained models, and common image-based operations. Here are some advantages of `torchvision`:

* **Datasets and Data Loaders:** `torchvision` offers easy access to standard datasets used in computer vision, such as MNIST, CIFAR-10, ImageNet, etc. It provides convenient data loaders to efficiently load and preprocess these datasets for training and testing neural networks.

* **Image Transformations:** It offers a wide range of image transformations (such as cropping, resizing, normalization, etc.) that can be applied to datasets during training or inference. These transformations help in augmenting data, improving model generalization, and preprocessing images for neural network input.

* **Pre-trained Models:** `torchvision` includes pre-trained state-of-the-art models like ResNet, VGG, AlexNet, etc., trained on large datasets like ImageNet. These models can be easily loaded and fine-tuned for specific tasks, saving time and computational resources.

* **Utilities for Computer Vision:** It provides various utility functions for common computer vision tasks, such as image filtering, object detection, segmentation, and more. These utilities simplify the implementation of complex vision algorithms.

* **Integration with PyTorch:** Being a part of the PyTorch ecosystem, `torchvision` seamlessly integrates with other PyTorch functionalities, allowing for easy incorporation of computer vision components into deep learning workflows.

Overall, `torchvision` streamlines the development process for computer vision tasks by offering a set of pre-built tools, datasets, models, and transformations, which significantly simplifies the implementation and experimentation with deep learning models in PyTorch.

In this notebook, we focus on **Image Transformations** as they include the important preprocessing steps of required to perform to prepare images for futher analytics.

In [None]:
import torch
from torchvision.transforms import v2

`torchvision.transforms` is a module within the `torchvision` library that offers a wide range of image transformations commonly used in computer vision tasks. These transformations can be applied to images or datasets to augment data, preprocess images, and prepare them for consumption by neural networks. The current version is `v2`.

The Python Imaging Library, commonly known as `PIL`, is a library for performing basic image processing tasks in Python. However, it's important to note that the original `PIL` library hasn't been updated since 2011. Instead, its fork, known as the Python Imaging Library (PIL) fork, `Pillow`, has become the more commonly used and actively maintained library for image processing in Python.

`Pillow` extends the capabilities of the original `PIL` library and provides a wide range of functionalities, including:

* **Image Opening and Saving:** `Pillow` allows you to open and save various image file formats, such as JPEG, PNG, BMP, TIFF, and more.

* **Image Manipulation:** It enables you to perform basic image manipulations like resizing, cropping, rotating, flipping, and transforming images.

* **Image Filtering:** `Pillow` provides a set of filters and enhancements like blurring, sharpening, edge detection, and applying various effects to images.

* **Color Space Manipulation:** You can convert images between different color spaces, such as RGB, grayscale, CMYK, etc.

* **Image Drawing:** `Pillow` allows you to draw on images, add text, shapes, and annotations.

* **Image Metadata Handling:** It supports handling image metadata, including EXIF data, allowing you to access and modify metadata information associated with images.

Pillow (the Python Imaging Library fork) is widely used in the Python ecosystem for tasks related to image processing, computer vision, web development, scientific computing, and more due to its ease of use and extensive capabilities in handling and manipulating images. Here, we only need it to open and load images such as JPGs.

In [None]:
from PIL import Image

# Jupyter notebook method display to render PIL images
from IPython.display import display

---

## Load & Inspect Image

### Load PIL Image from File

The most important class in the Python Imaging Library is the `Image` class, defined in the module with the same name. You can create instances of this class in several ways; either by loading images from files, processing other images, or creating images from scratch. To load an image from a file, use the `open()` function in the `Image` module:

In [None]:
image = Image.open('data/images/examples/cruise-ship-01.jpg')

In [None]:
display(image)

### Check Basic Information

We can now use instance attributes to examine the file contents:

In [None]:
print('Image format: {}'.format(image.format))
print('Image size: {}'.format(image.size))
print('Image mode: {}'.format(image.mode))

### Convert Image to Tensor

After loading the image, it is stored as an internal `PIL` data structure. Most analytics algorithms, however, assume the images represented as a **tensor**, i.e., as a multidimensional array. The `torchvision.transforms` package provides the required function to convert a `PIL` image into its corresponding tensor representation.

In the code cell below, we use `vs.Compose()` to define a list of preprocessing steps we want to perform on an image, and we wrap all steps as a new function we call `convert_image`. In later steps, we will extend this approach to include additional preprocessing steps that are commonly performed over images. Right now, we limit ourselves to the conversion to tensors (and ignore any resizing, cropping, etc.)

In [None]:
convert_image = v2.Compose([
    v2.ToImage(),                         # Convert to tensor, only needed if input is a PIL image
    v2.ToDtype(torch.uint8, scale=True),  # Optional; most input are already uint8 at this 
])

Let's now apply the method `convert_image()` on our original input image and print the result.

In [None]:
image_converted = convert_image(image)

print(image_converted)

The output of the code cell below shows an abbreviated version of the tensor. Recall that our image has 800x533 pixels and 3 color channels (Red, Green, Blue). This means that our tensor contains 800x500x3 = 1,200,000 entries, 3 entries for each pixel (again, given the 3 color channels). We can also explicitly get this information by looking at the `shape` of the tensor:

In [None]:
print(image_converted.shape)

This output shows that the first dimension reflects the number of color channels (3), the second dimension reflects the height of the image (533 pixels), and the third dimension reflects the width of the image (800 pixels). Since `image_converted` is now just a 3d tensor with numerical value, trying to use `display()` to show the image no longer works. Instead, it just prints the tensor again.

In [None]:
display(image_converted)

To actually show the image again, the `torchvision.transforms` package comes with an auxiliary function to convert the tensor back into a `PIL` image.

In [None]:
transform_to_PIL = v2.ToPILImage()

display(transform_to_PIL(image_converted))

We will use the method `transform_to_PIL()` in the following to also visualize the image after performing various preprocessing steps.

---

## Basic Preprocessing

Most data analytics require that all data samples have a fixed and predefined size. In the context of images analytics this means that all images must have the same size in terms of the number of pixels and their heights and widths. In practice, it is very common that images in a dataset have different sizes. We therefore need to convert all images to the same size. The most commonly applied to steps are:

* **Resize:** Due to performance reasons, image analytics and machine learning tasks over image datasets are typically performed over smaller images. The first step is therefore to resize the image, again, typically to a smaller version compared to the original. Note that often the images are not resized to the final target size. The assumption is that the important parts of the image are typically more in the center than on the edges of the image.

* **Crop:** By default, resizing does not change the aspect ratio of the image. However, in the end, all images need to have the same size in terms of height and width. The straightforward way to accomplish this is to crop the image. In a nutshell, cropping the image refers to removing certain outer parts of an image.

Again, the `torchvision.transforms` package provides the required function to make performing these two steps very easy, as shown in the code cell below. We first use `v2.Resize` to resize the input image so that the shortest side (for our input image: the height) will have 256 pixels; the width of the image will be resized accordingly to preserve the aspect ratio. Then we apply `v2.CenterCrop` to extract a squared patch of the image of 224x224 pixels with respect to the center.

In [None]:
preprocess_image = v2.Compose([
    v2.ToImage(),                         # Convert to tensor, only needed if input is a PIL image
    v2.ToDtype(torch.uint8, scale=True),  # Optional; most inputs are already in uint8
    v2.Resize(256, antialias=True),       # Resize image so the shortest side has 256 pixels
    v2.CenterCrop(224)                    # Crop out squared patch of size 224 pixels from the center
])

**Side note:** In the example above, we choose a target size of 224x224 pixels as this is a common image size for many popular public image dataset, and many state-of-the-art machine learning models for image analytics have been trained using images of that size. However, keep in mind that there is nothing intrinsically special about this image size of 224x224!

Let's apply the method `preprocess_image()` on our input image and have a look at the result.

In [None]:
image_preprocessed = preprocess_image(image)

display(transform_to_PIL(image_preprocessed))

In a practical application, we can now apply the method `preprocess_image()` to all of our images that are part of our analysis, to ensure that the processed results of all images have all the same sizes, i.e., the same dimensions and the same number of pixels. This is a very common requirement for many image analytics such (e.g., image classification).

**Side note:** You can easily see that particularly the cropping of the image might remove important parts. As mentioned before, the common assumption is that the most important parts of an image are more likely to be in the center than towards the edges. Of course, this assumption might always hold.

---

### Advanced Preprocessing & Augmentation

When applying the method `preprocess_image()` to an image, the output will always be the same. However, there are often reasons to randomize the performed preprocessing steps. The most common purpose to do this is called **Data Augmentation**. Data augmentation for images refers to a set of techniques used to create new training examples from existing ones by applying various transformations or modifications to the original images. It's a crucial step in training machine learning models, particularly in computer vision, to improve their generalization, robustness, and ability to handle different variations in the input data.

Common examples of data augmentation techniques for images include:

* **Rotation:** Rotating images by a certain angle (e.g., 90 degrees, 180 degrees) to simulate different orientations.

* **Flip (Horizontal/Vertical):** Flipping images horizontally or vertically to create mirror images. For instance, flipping an image of a cat horizontally would show the cat facing the opposite direction.

* **Random Crop:** Extracting random sections of the image to create variations in framing or object placement. This helps models become more tolerant to object positions within an image.

* **Scaling and Resizing:** Changing the size of images while maintaining their aspect ratios. Scaling images up or down can help models generalize better to different object sizes.

* **Translation:** Shifting an image along its horizontal or vertical axis. This can simulate changes in object location within an image.

* **Brightness and Contrast Adjustment:** Modifying brightness, contrast, saturation, or hue of images to simulate different lighting conditions.

* **Noise Injection:** Adding random noise to images to make models more robust to noise in real-world scenarios.

* **Color Jitter:** Randomly altering color channels to change the appearance of images.

* **Shearing:** Distorting images by shifting pixels in a certain direction, creating a sheared effect.

* **Elastic Transformations:** Applying local deformations to images to simulate distortions or warping.

These augmentation techniques help in increasing the diversity of the training dataset without collecting new data. By presenting modified versions of images during training, models become more robust and less sensitive to variations that might exist in the real-world data. However, it's essential to apply these transformations judiciously, considering the specific requirements of the task and the characteristics of the dataset, to avoid introducing unrealistic variations that could potentially confuse the model.

The `torchvision.transforms` package provides a wide range of methods for data augmentation. Let's look at some examples.

In [None]:
augment_image = v2.Compose([
    v2.ToImage(),                         # Convert to tensor, only needed if input is a PIL image
    v2.ToDtype(torch.uint8, scale=True),  # Optional; most inputs are already in uint8 
    v2.Resize(256, antialias=True),       # Resize image so the shortest sides has 256 pixels
    v2.RandomCrop(224),                   # Crop out a squared patch of size 224 pixels from a random position
    v2.RandomGrayscale(0.50),             # Convert to grayscale image with 50% probability
    v2.ColorJitter(),                     # Adjust the contrast, saturation, hue, brightness, and also randomly permutes channels
    v2.RandomHorizontalFlip(),            # Randomly flip image horizontally
    v2.RandomErasing()                    # Randomly remove patch of the image (rectangular shape)
])

The code cell below appies the method `augment_image()` to our original input image. Run the code cell below multiple times to observe how the output of the method will change.

In [None]:
image_preprocessed = augment_image(image)

display(transform_to_PIL(image_preprocessed))

A complete list of preprocessing methods together with their parameters can be found [here](https://pytorch.org/vision/stable/transforms.html#v2-api-ref).

---

## Summary

Image preprocessing and augmentation are critical steps in preparing and enhancing datasets for training machine learning models, especially in computer vision tasks.

Preprocessing involves transforming raw images into a format suitable for machine learning algorithms. This step includes tasks like resizing, normalization, and noise reduction. Proper preprocessing ensures that the data is standardized, making it easier for models to learn patterns and features effectively. Normalizing pixel values, for example, by scaling them to a certain range, helps models converge faster during training by reducing the effect of varying pixel intensities.

Augmenting the dataset through techniques like rotation, flipping, and cropping generates additional training examples, thereby improving model robustness and generalization. By presenting modified versions of images, augmentation helps models learn invariant features and become more tolerant to variations in the input data. It reduces overfitting by exposing the model to a wider range of scenarios and variations that might occur in real-world data.

The combined impact of preprocessing and augmentation is significant. Preprocessing ensures data uniformity and consistency, making it easier for models to learn, while augmentation expands the dataset's diversity, making models more adaptable to real-world complexities. These steps collectively contribute to enhancing a model's performance, accuracy, and ability to handle unseen variations, leading to more reliable and robust AI systems in various applications like object detection, image classification, and segmentation.