In [1]:
import mxnet as mx
from mxnet import gluon, nd

### Rescaling

In [2]:
# change the scale of the input pixels in the image dataset from between 0 and 255 to between 0 and 1
def transform_fn(data, label):
    data = data.astype('float32')/255
    return data, label

# Using the preloaded MNIST dataset in Gluon, we can apply this transformation on the dataset by passing the `transform_fn` function we just created into the `.transform` method of the dataset.
train_dataset = gluon.data.vision.datasets.MNIST(train=True).transform(transform_fn)
valid_dataset = gluon.data.vision.datasets.MNIST(train=False).transform(transform_fn)

In [3]:
#  verify that the transformation has indeed been applied
sample_image = train_dataset[19][0]
nd.max(sample_image)


[1.]
<NDArray 1 @cpu(0)>

### Vision Transforms

The gluon data API has already implemented a number of transformation functions commonly used in computer vision in the `data.vision.transforms` module.

In [5]:
from mxnet.gluon.data.vision import transforms

#### ToTensor

Converts an image NDArray of shape (H x W x C) in the range \[0,255] to float32 tensor NDArray of shape (C x H x W) in the range \[0,1]

One example transform function implemented in `gluon.data.vision.transforms` is the `ToTensor` transformation, which accomplishes the same transformation we manually implemented earlier. The `ToTensor` transformation, in addition to scaling down the image data from \[0,255] to \[0,1], also converts the shape of the image data from the image format, which is height times width times channels to tensor format which is channels times height times width. This is because neural network operations like convolutions are much faster in MXNet when the input is in tensor format.

In [6]:
train_dataset = gluon.data.vision.datasets.MNIST(train=True)
train_dataset[19][0].shape

(28, 28, 1)

In [7]:
# Since each MNIST is a grayscale or single channel square image, we can apply the ToTensor transformation on the dataset by using the .transform_first method of the dataset.
# The .transform_first method ensures that the transformation is applied only to the image data portion or the first entry of each data point and not to the label.
to_tensor = transforms.ToTensor()
train_dataset = train_dataset.transform_first(to_tensor)
train_dataset[19][0].shape

(1, 28, 28)

#### Normalize

Normalize a tensor of shape (C x H x W) with mean and standard deviation

In [8]:
mean, std = (0.1307,), (0.3081,)
normalize = transforms.Normalize(mean, std)
train_dataset = train_dataset.transform_first(normalize)

#### Compose

Sequentially compose multiple transforms

- The `transforms.compose` class takes in a list of transformations and returns a single transformation that is the result of applying each transformation in the list sequentially.
- `transforms.compose` also allows you to create your own custom transformations from predefined Gluon data vision transforms. 

In [9]:
transform_fn = transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean, std)])

In [10]:
train_dataset = gluon.data.vision.datasets.MNIST(train=True).transform(transform_fn)

#### Data Augmentation via Transformations

Transformations can also be used to apply augmentation on the dataset to artificially introduce diversity to the training data, reduce model overfitting, and improve how the model generalizes to unseen data. Some transformations can augment the dataset by randomly or deterministically adjusting the dataset samples.

Some other transforms provided by Gluon include:
- `transforms.Resize`
- `transforms.CenterCrop`
- `transforms.RandomResizedCrop`
- `transforms.RandomFlipLeftRight`
- `transforms.RandomBrightness`