# Transformers


The data used in training machine learning algorithms doesn't always come in the final processed form required for training. It may arrive in raw or different formats than those required by the model. In such cases, data transformations are necessary to make them suitable for training. This process is crucial to ensure that the model receives consistent and coherent data during training, which significantly impacts its performance and generalization capability.

In the case of using TorchVision datasets like FashionMNIST, transformations are essential to prepare the data before using it in deep learning model training. This is particularly important as image datasets, such as FashionMNIST, often contain images in various formats and labels in different representations.

The torchvision.transforms module provides a wide range of commonly used transformations to manipulate and prepare data. These transformations may include operations such as normalization, resizing, cropping images, and converting labels into specific formats required by the model.

For FashionMNIST, the image features are represented as PIL (Python Imaging Library) images, and the labels are represented as integers. However, for training a model, it's common to need features represented as normalized tensors and labels represented as one-hot encoded tensors.

To perform these transformations, two specific transformations provided by TorchVision are used: ToTensor and Lambda. ToTensor converts images into PyTorch tensors and normalizes pixel values to the range [0, 1], while Lambda allows defining a custom lambda function to transform labels into desired formats, such as one-hot encoding.

In summary, data transformations are a crucial step in data preprocessing for training machine learning models, ensuring that data is in the correct form and suitable for effective model training.

In [4]:
import torch
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

def function_lambda(y):
    print("test lambda")
    torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1)

ds = datasets.FashionMNIST(
    root="data",
    train=True,
    download=False,
    transform=ToTensor(),
    target_transform=Lambda(lambda y: function_lambda(y))
)