site: https://docs.pytorch.org/tutorials/beginner/basics/transforms_tutorial.html

data does not always come in its final processed form that is required for training machine learning algorithms. we use transforms to performe some manipulation of the data and make it suitable for training

all torchvision datasets have two parameters -- transform to modify the features and target_transform to modify the labels -- that accep callables containing the transformation logic.

In [2]:
import torch
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda # functions

# transform = transformation for features
# target_transform = transformation for targets

# the FashionMNIST features are in PIL image format, and the labels are Ints.
# for training, we need features as normalized tensors, and the targets as 
# one-hot encoded tensors. to make this transformations, we use ToTensor and 
# Lambda.

ds = datasets.FashionMNIST (
    root = "data",
    train = True,
    download = True,
    transform = ToTensor(),
    target_transform = Lambda(lambda y : torch.zeros(10, dtype = torch.float).scatter_(0, torch.tensor(y))),
)

## ToTensor()
Converts a PIL image or NumPy ndarray into a FloatTensor and scales the image's pixel intensity values in the range [0., 1.]

## Lambda Transforms
Lambda transforms apply any user-defined lambda function. Here, we define a function to turn the integer into a one-hot encoded tensor. It first creates a zero tensor of size 10 (the number of targets in our dataset) and calls scatter_ which assigns a value=1 on the index  as given by the target y.

In [6]:
target_transform = Lambda(lambda y : torch.zeros(
    10, dtype = torch.float).scatter_(dim = 0, index = torch.tensor(y), value = 1))

video: https://www.youtube.com/watch?v=X_QOZEko5uE

In [8]:
import torchvision
from torch.utils.data import Dataset
import numpy as np

In [10]:
class FashionDataset(Dataset):
    def __init__(self, transform = None):
        xy = np.loadtxt('./data/wine.csv', delimiter = ',', 
                        dtype = np.float32, skiprows = 1)
        self.n_samples = xy.shape[0]

        # note that we do not convert to tensor here
        self.x = xy[:, 1:]
        self.y = xy[:, [0]]

        self.transform = transform

    def __getitem__(self, index):
        sample = self.x[index], self.y[index]

        if self.transform:
            sample = self.transform(sample)

        return sample

    def __len__(self):
        return self.n_samples()

In [12]:
class ToTensor():
    def __call__(self, sample):
        inputs, targets = sample
        return torch.from_numpy(inputs), torch.from_numpy(targets)

In [14]:
dataset = WineDataset(transform = ToTensor())
first_data = dataset[0]
features, targets = first_data
print(type(features), type(targets))

FileNotFoundError: ./data/wine.csv not found.