# Overview

Data does not always come in its final processed form that is required for training machine learning algorithms. We use `transformers` to perform some manipulation of the data and make it suitable for training.

For example, the FashionMNIST features are in `PIL Image format`, and the labels are integers. For training, we need the features as normalized tensors, and the labels as one-hot encoded tensors. To make these transformations, we use `ToTensor` and `Lambda`.

In [1]:
import torch
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

ds=datasets.FashionMNIST(
    root='data',
    train=True,
    download=True,
    transform=ToTensor(),
    target_transform=Lambda(
        lambda y: torch.zeros(10, dtype=torch.float).scatter_(0,torch.tensor(y), value=1))
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:03<00:00, 7833102.11it/s] 


Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 138744.77it/s]


Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:01<00:00, 2534033.48it/s]


Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 13939494.51it/s]

Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw






## ToTensor()

ToTensor converts a PIL image or NumPy `ndarray` into a `FloatTensor` and scales the image's pixel intensity values in the range [0,1].


## Lambda Transforms

Lambda transforms apply any user-defined lambda function. Here, we define a function to turn the integer into a one-hot encoded tensor. It first creates a zero tensor of size 10(the number of labels in our dataset) and calls `scatter_` which assigns a `value=1` on the index as given by the label `y`.

In [2]:
target_transform=Lambda(lambda y: torch.zeros(
    10, dtype=torch.float).scatter_(dim=0, index=torch.tensor(y), value=1))

# Transforming and Agumenting Images

Torchvision supports common computer vision transformations in the `torchvision.transforms` and `torchvision.transforms.v2` modules. Transforms can be used to trainsform or augment data for training or inference of different tasks, like:

* image classification
* detection
* segmentation
* video classification

Most transformation accept both PIL images and tensot inputs. Both CPU and CUDA tensors are supported. However, we use tensor backend **for performance**.

**Tensor image**

Tensor image are expected to be of shape (C,H,W), where C is the number of channels, and H and W refer to height and width. Most transforms support batched tensor input. A batch of Tensor images is a tensor of shape(N,C,H,W), where N is a number of images in the batch. The v2 transforms generally accept an arbitrary number of leading dimensions(...,C,H,W) and can handle batched images or batched videos.

**Dtype and expected value range**

The expected range of the value of a tensor image is impliicity defined by the tensor dtype. Tensir images with a float dtype are expected to have values in [0,1]. Tensor images with an integer dtype expected to have values in [0, MAX_DTYPE] where MAX_DTYPE is the largest value that can be represented in that dtype. Typically, images of dtype `torch.unit8` are expected to have values in [0,255].

In [3]:
# Image Classitication
import torch
from torchvision.transforms import v2

H,W=32,32
# here we use tensor image
img=torch.randint(0, 256, size=(3, H, W), dtype=torch.uint8)

transforms=v2.Compose(
    [
        # Resize(antialias=True)
        v2.RandomResizedCrop(size=(224,224), antialias=True),
        v2.RandomHorizontalFlip(p=0.5),
        # Normalize expects float input
        v2.ToDtype(torch.float32, scale=True),
        v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)

img=transforms(img)
img

tensor([[[-1.3987, -1.3987, -1.3987,  ...,  0.1597,  0.1597,  0.1597],
         [-1.3987, -1.3987, -1.3987,  ...,  0.1597,  0.1597,  0.1597],
         [-1.3987, -1.3987, -1.3987,  ...,  0.1597,  0.1597,  0.1597],
         ...,
         [-0.7137, -0.7137, -0.7137,  ..., -1.9980, -1.9980, -1.9980],
         [-0.7137, -0.7137, -0.7137,  ..., -1.9980, -1.9980, -1.9980],
         [-0.7137, -0.7137, -0.7137,  ..., -1.9980, -1.9980, -1.9980]],

        [[ 1.7108,  1.7108,  1.7108,  ..., -0.6176, -0.6176, -0.6176],
         [ 1.7108,  1.7108,  1.7108,  ..., -0.6176, -0.6176, -0.6176],
         [ 1.7108,  1.7108,  1.7108,  ..., -0.6176, -0.6176, -0.6176],
         ...,
         [ 2.1310,  2.1310,  2.1310,  ...,  1.4132,  1.4132,  1.4132],
         [ 2.1310,  2.1310,  2.1310,  ...,  1.4132,  1.4132,  1.4132],
         [ 2.1310,  2.1310,  2.1310,  ...,  1.4132,  1.4132,  1.4132]],

        [[-0.0441, -0.0441, -0.0441,  ...,  0.6182,  0.6182,  0.6182],
         [-0.0441, -0.0441, -0.0441,  ...,  0

In [4]:
# detection
from torchvision import tv_tensors

img=torch.randint(0, 256, size=(3, H, W), dtype=torch.uint8)
boxes=torch.randint(0, H//2, size=(3,4))
boxes[:, 2:]+=boxes[:,:2]
boxes=tv_tensors.BoundingBoxes(boxes, format="XYXY", canvas_size=(H,W))

# the same transforms can be used
img, boxes=transforms(img, boxes)
# and you can pass arbitary input structures
output_dict=transforms({"image": img, "boxes": boxes})
output_dict

{'image': tensor([[[ 0.6733,  0.9070,  0.9631,  ...,  2.3558,  2.1315,  1.9446],
          [ 0.6733,  0.9070,  0.9631,  ...,  2.3558,  2.1315,  1.9446],
          [ 0.6733,  0.9070,  0.9631,  ...,  2.3558,  2.1315,  1.9446],
          ...,
          [-0.4484, -0.1213,  0.2994,  ...,  2.9354,  2.1782,  1.5706],
          [-0.4484, -0.1213,  0.2994,  ...,  2.9354,  2.1782,  1.5706],
          [-0.4484, -0.1213,  0.2994,  ...,  2.9354,  2.1782,  1.5706]],
 
         [[ 7.0867,  6.9401,  6.7936,  ..., -5.5062, -3.8943, -2.5265],
          [ 7.0867,  6.9401,  6.7936,  ..., -5.5062, -3.8943, -2.5265],
          [ 7.0867,  6.9401,  6.7936,  ..., -5.5062, -3.8943, -2.5265],
          ...,
          [-1.0416, -1.2370, -1.4128,  ...,  6.7448,  6.7741,  6.7741],
          [-1.0416, -1.2370, -1.4128,  ...,  6.7448,  6.7741,  6.7741],
          [-1.0416, -1.2370, -1.4128,  ...,  6.7448,  6.7741,  6.7741]],
 
         [[-3.4722, -4.7310, -6.0091,  ...,  4.8744,  4.7582,  4.6614],
          [-3.4722,

In [5]:
# Load image to PyTorch tensor
from torchvision.io import read_image

img=read_image("/kaggle/input/vision/astronaut.jpg")
img

tensor([[[148, 112,  73,  ..., 125, 125, 125],
         [178, 150, 119,  ..., 125, 125, 125],
         [203, 186, 168,  ..., 125, 125, 125],
         ...,
         [185, 184, 184,  ...,   3,   7,  10],
         [181, 181, 180,  ...,   9,  10,   3],
         [180, 179, 177,  ...,   3,   7,   5]],

        [[139, 103,  65,  ..., 116, 116, 116],
         [169, 141, 111,  ..., 116, 116, 116],
         [195, 178, 160,  ..., 116, 116, 116],
         ...,
         [170, 169, 167,  ...,   0,   3,   6],
         [168, 169, 165,  ...,   5,   6,   0],
         [167, 167, 162,  ...,   0,   3,   1]],

        [[158, 120,  80,  ..., 111, 111, 111],
         [186, 158, 124,  ..., 111, 111, 111],
         [210, 191, 171,  ..., 111, 111, 111],
         ...,
         [177, 174, 173,  ...,   0,   0,   3],
         [175, 173, 170,  ...,   2,   3,   0],
         [174, 171, 167,  ...,   0,   0,   0]]], dtype=torch.uint8)

In [6]:
print(img.dtype)
print(img.shape)

torch.uint8
torch.Size([3, 512, 512])


# Reference

* https://pytorch.org/tutorials/beginner/basics/transforms_tutorial.html
* https://pytorch.org/vision/stable/transforms.html
* https://pytorch.org/vision/stable/auto_examples/transforms/plot_custom_transforms.html#sphx-glr-auto-examples-transforms-plot-custom-transforms-py