# Dataset Transforms

In this tutorial, we will talk about transforms for our datasets. In the last tutorial, we discussed the `Dataset` and `DataLoader` classes. 

Below, we will use in a built-in `pytorch` dataset with the argument `transform = ` and then apply some transforms. This will convert `numpy` arrays and images to tensors. `pytorch` already has a lot of transforms implemented for us. You can look at the all of the transforms in the documentation.

`dataset = torchvision.datasets.MNIST(
    root = './data', transform = torchvision.transforms.ToTensor()`

First, let's import some modules that we will be using.

In [1]:
import torch
import torchvision
from torch.utils.data import Dataset, DataLoader
import numpy as np
import math

Now to make the wine dataset, as we did in the previous tutorial. Before we make our own transforms, we have to modify the wine dataset code below. 

In [8]:
class WineDataset(Dataset):
    
    def __init__(self, transform=None):
        #data loading 
        #will also convert to torch
        xy = np.loadtxt('C:\\Users\\onef0\\Desktop\\PyTorch Tutorial\\wine.csv', delimiter = ",", dtype = np.float32, skiprows = 1)
       
        self.x = xy[:, 1:] #we want only the features, so all rows but all collumns excluding the first column
        self.y = xy[:, [0]] #we want only the outcome, so all rows and only the first column - we had the extra 
        #brackets because it will be the n_samples, 1
        self.n_samples = xy.shape[0] #first dimension is the number of samples
        
        self.transform = transform
        
    def __getitem__(self, index):
        #dataset[0] - this will allow us to perform indexing 
        #we have to modify the get item function for transforms - we want it to apply a transform, if it is available
        sample = self.x[index], self.y[index]
        
        if self.transform:
            sample = self.transform(sample)
        
        return sample
    
    def __len__(self):
        #len(dataset) - this will allow us to use the length function
        return self.n_samples

Now, let's create some custom transform classes. In the previous tutorial, we already converted the data to a tensor using: 

`self.x = torch.from_numpy(xy[:, 1:])`
`self.y = torch.from_numpy(xy[:, [0]])`

We do not need to do this now, since we will make our own to tensor transform. We will just leave them as `numpy` arrays.

The `ToTensor()` class with be passed to our dataset and later convert it to tensor.

In [9]:
class ToTensor():
    def __call__(self, sample):
        inputs, targets = sample
        return torch.from_numpy(inputs), torch.from_numpy(targets) 

Now, we can implement this transform.

In [13]:
dataset = WineDataset(transform = ToTensor())

#looking at the first item
first_data = dataset[0]
print(first_data)

#unpack the data
features, labels = first_data
print(type(features), type(labels))

(tensor([1.4230e+01, 1.7100e+00, 2.4300e+00, 1.5600e+01, 1.2700e+02, 2.8000e+00,
        3.0600e+00, 2.8000e-01, 2.2900e+00, 5.6400e+00, 1.0400e+00, 3.9200e+00,
        1.0650e+03]), tensor([1.]))
<class 'torch.Tensor'> <class 'torch.Tensor'>


As we can see, the dataset is now of type `torch.Tensor`. If we do not pass the `transform =` argument, the dataset will not become a tensor.

In [14]:
dataset = WineDataset(transform = None)

#looking at the first item
first_data = dataset[0]
print(first_data)

#unpack the data
features, labels = first_data
print(type(features), type(labels))

(array([1.423e+01, 1.710e+00, 2.430e+00, 1.560e+01, 1.270e+02, 2.800e+00,
       3.060e+00, 2.800e-01, 2.290e+00, 5.640e+00, 1.040e+00, 3.920e+00,
       1.065e+03], dtype=float32), array([1.], dtype=float32))
<class 'numpy.ndarray'> <class 'numpy.ndarray'>


As we can see, with the `transform` being `None`, the dataset is still of class `numpy.ndarray`.

Let's write another custom transform.

In [17]:
class MulTransform:
    def __init__(self, factor):
        self.factor = factor
    
    def __call__(self, sample):
        #unpack the sample
        inputs, target = sample
        inputs *= self.factor #apply the multiplication to only our features
        return inputs, target #as a tupple

Let's apply a compose transform to see how we can use this. A compose transform composes several transforms together. As a note, this transform does not support torchscript.

In [19]:
#making a composed transform - the input needs to be a list - for MulTransform, we are multiplying by a factor of 2
composed = torchvision.transforms.Compose([ToTensor(), MulTransform(2)])

#new dataset
dataset = WineDataset(transform = composed)

#looking at the first item
first_data = dataset[0]
print(first_data)

#unpack the data
features, labels = first_data
print(type(features), type(labels))

(tensor([2.8460e+01, 3.4200e+00, 4.8600e+00, 3.1200e+01, 2.5400e+02, 5.6000e+00,
        6.1200e+00, 5.6000e-01, 4.5800e+00, 1.1280e+01, 2.0800e+00, 7.8400e+00,
        2.1300e+03]), tensor([1.]))
<class 'torch.Tensor'> <class 'torch.Tensor'>


As we can see, the class is `torch.Tensor` and each value in our dataset got doubled. If we used another factor, everything would be multiplied by 4.

In [20]:
#making a composed transform - the input needs to be a list - for MulTransform, we are multiplying by a factor of 2
composed = torchvision.transforms.Compose([ToTensor(), MulTransform(4)])

#new dataset
dataset = WineDataset(transform = composed)

#looking at the first item
first_data = dataset[0]
print(first_data)

#unpack the data
features, labels = first_data
print(type(features), type(labels))

(tensor([5.6920e+01, 6.8400e+00, 9.7200e+00, 6.2400e+01, 5.0800e+02, 1.1200e+01,
        1.2240e+01, 1.1200e+00, 9.1600e+00, 2.2560e+01, 4.1600e+00, 1.5680e+01,
        4.2600e+03]), tensor([1.]))
<class 'torch.Tensor'> <class 'torch.Tensor'>
