# Overview

Data does not always come in its final processed form that is required for training machine learning algorithms. We use `transformers` to perform some manipulation of the data and make it suitable for training.

For example, the FashionMNIST features are in `PIL Image format`, and the labels are integers. For training, we need the features as normalized tensors, and the labels as one-hot encoded tensors. To make these transformations, we use `ToTensor` and `Lambda`.

In [1]:
import torch
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

ds=datasets.FashionMNIST(
    root='data',
    train=True,
    download=True,
    transform=ToTensor(),
    target_transform=Lambda(
        lambda y: torch.zeros(10, dtype=torch.float).scatter_(0,torch.tensor(y), value=1))
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:01<00:00, 16571768.29it/s]


Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 298336.87it/s]


Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:00<00:00, 5474026.97it/s]


Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 11876940.04it/s]

Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw






## ToTensor()

ToTensor converts a PIL image or NumPy `ndarray` into a `FloatTensor` and scales the image's pixel intensity values in the range [0,1].


## Lambda Transforms

Lambda transforms apply any user-defined lambda function. Here, we define a function to turn the integer into a one-hot encoded tensor. It first creates a zero tensor of size 10(the number of labels in our dataset) and calls `scatter_` which assigns a `value=1` on the index as given by the label `y`.

In [2]:
target_transform=Lambda(lambda y: torch.zeros(
    10, dtype=torch.float).scatter_(dim=0, index=torch.tensor(y), value=1))

# Transforming and Agumenting Images

Torchvision supports common computer vision transformations in the `torchvision.transforms` and `torchvision.transforms.v2` modules. Transforms can be used to trainsform or augment data for training or inference of different tasks, like:

* image classification
* detection
* segmentation
* video classification

Most transformation accept both PIL images and tensot inputs. Both CPU and CUDA tensors are supported. However, we use tensor backend **for performance**.

**Tensor image**

Tensor image are expected to be of shape (C,H,W), where C is the number of channels, and H and W refer to height and width. Most transforms support batched tensor input. A batch of Tensor images is a tensor of shape(N,C,H,W), where N is a number of images in the batch. The v2 transforms generally accept an arbitrary number of leading dimensions(...,C,H,W) and can handle batched images or batched videos.

**Dtype and expected value range**

The expected range of the value of a tensor image is impliicity defined by the tensor dtype. Tensir images with a float dtype are expected to have values in [0,1]. Tensor images with an integer dtype expected to have values in [0, MAX_DTYPE] where MAX_DTYPE is the largest value that can be represented in that dtype. Typically, images of dtype `torch.unit8` are expected to have values in [0,255].

In [3]:
# Image Classitication
import torch
from torchvision.transforms import v2

H,W=32,32
# here we use tensor image
img=torch.randint(0, 256, size=(3, H, W), dtype=torch.uint8)

transforms=v2.Compose(
    [
        # Resize(antialias=True)
        v2.RandomResizedCrop(size=(224,224), antialias=True),
        v2.RandomHorizontalFlip(p=0.5),
        # Normalize expects float input
        v2.ToDtype(torch.float32, scale=True),
        v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)

img=transforms(img)
img

tensor([[[ 0.7933,  0.7933,  0.7933,  ..., -1.0562, -1.0562, -1.0562],
         [ 0.7933,  0.7933,  0.7933,  ..., -1.0562, -1.0562, -1.0562],
         [ 0.7933,  0.7933,  0.7933,  ..., -1.0562, -1.0562, -1.0562],
         ...,
         [-0.1828, -0.1828, -0.1828,  ..., -0.9020, -0.9020, -0.9020],
         [-0.1828, -0.1828, -0.1828,  ..., -0.9020, -0.9020, -0.9020],
         [-0.1828, -0.1828, -0.1828,  ..., -0.9020, -0.9020, -0.9020]],

        [[ 0.1176,  0.1176,  0.1176,  ...,  2.2010,  2.2010,  2.2010],
         [ 0.1176,  0.1176,  0.1176,  ...,  2.2010,  2.2010,  2.2010],
         [ 0.1176,  0.1176,  0.1176,  ...,  2.2010,  2.2010,  2.2010],
         ...,
         [-0.1450, -0.1450, -0.1450,  ...,  2.0609,  2.0609,  2.0609],
         [-0.1450, -0.1450, -0.1450,  ...,  2.0609,  2.0609,  2.0609],
         [-0.1450, -0.1450, -0.1450,  ...,  2.0609,  2.0609,  2.0609]],

        [[-1.3861, -1.3861, -1.3861,  ..., -0.7936, -0.7936, -0.7936],
         [-1.3861, -1.3861, -1.3861,  ..., -0

In [4]:
# detection
from torchvision import tv_tensors

img=torch.randint(0, 256, size=(3, H, W), dtype=torch.uint8)
boxes=torch.randint(0, H//2, size=(3,4))
boxes[:, 2:]+=boxes[:,:2]
boxes=tv_tensors.BoundingBoxes(boxes, format="XYXY", canvas_size=(H,W))

# the same transforms can be used
img, boxes=transforms(img, boxes)
# and you can pass arbitary input structures
output_dict=transforms({"image": img, "boxes": boxes})
output_dict

{'image': tensor([[[-4.6361, -4.7707, -4.9725,  ..., -0.5351,  0.0146, -1.1962],
          [-3.3942, -3.5753, -3.8254,  ..., -0.7680, -0.3320, -1.4032],
          [-1.9681, -2.2026, -2.5081,  ..., -1.0356, -0.7337, -1.6773],
          ...,
          [-7.2151, -7.3478, -7.5477,  ..., -0.4729, -1.1234, -1.0871],
          [-7.1326, -7.1792, -7.2897,  ..., -0.7800, -1.4644, -1.2884],
          [-7.0291, -7.0291, -7.0914,  ..., -1.0473, -1.7644, -1.4954]],
 
         [[-1.9013, -0.7061,  0.5336,  ..., -7.6463, -8.2241, -8.1538],
          [-2.7125, -1.6633, -0.5750,  ..., -6.5285, -6.9479, -7.0722],
          [-3.6441, -2.7625, -1.8480,  ..., -5.2447, -5.4785, -5.7920],
          ...,
          [ 0.8555,  0.4355, -0.0317,  ...,  1.4106,  1.3505,  0.8279],
          [-0.2004, -0.4869, -0.8242,  ...,  0.8549,  0.7226,  0.4793],
          [-1.1197, -1.2603, -1.5363,  ...,  0.4042,  0.2089,  0.2089]],
 
         [[ 2.2600,  1.2148,  0.1307,  ...,  0.4651,  0.0136,  0.0136],
          [ 2.4745,

# Reference

* https://pytorch.org/tutorials/beginner/basics/transforms_tutorial.html
* https://pytorch.org/vision/stable/transforms.html
* https://pytorch.org/vision/stable/auto_examples/transforms/plot_custom_transforms.html#sphx-glr-auto-examples-transforms-plot-custom-transforms-py