### Data Loaders Pytorch Class

In [2]:
import torch
import torchvision
from torch.utils.data import DataLoader, Dataset
import numpy as np
import pandas as pd

> **Point** - gradient computation is not efficient for the whole dataset. So the data must be splited in `so` called `batches`

> **Training Loop** -with the data splitted into batches

```
for epoch in range(epochs):
    ## loop over all batches
    for batch in range(total_batches):
        batch_x, batch_y = ...
```

#### Terms
1. **epoch** - one forward pass to **ALL** trainning sample.
2. **Batch_size** - number of trainning sample used in one backward/forward pass
3. **number_of_iterations** = number of passes each epoch pass (forward + backward) ``100 samples, batch_size =20: iterations = 100/20 =5 for each epoch``

### The DataLoader
> The dataloader can do the `batch_size` and other computation for us.

#### Creation a custom Dataset.
There are 2 steps to do this:
    * inherit the Dataset Class
    * implement the following
        * `__init__`
        * `__getitem__`
        * `__len__`

In [3]:
class Wine(Dataset):
    def __init__(self):
#         Xy = np.loadtxt('wine.csv', delimiter=',', dtype='float32', skiprows=1)
#         print(Xy[0])
        xy = pd.read_csv('wine.csv').values
        self.n_samples = xy.shape[0]
        
        self.X = torch.from_numpy(xy[:, 1:].astype('float32')) # all the columns except the first one
        self.y = torch.from_numpy(xy[:, 0:1].astype('float32')) # the first column
#         print(self.y[:3])
        
    # To allow indexing such as dataset[i]
    def __getitem__(self, index):
        return self.X[index], self.y[index]

    # when we call len(dataset)
    def __len__(self):
        return self.n_samples

In [4]:
wine = Wine()
wine

<__main__.Wine at 0x19f047fcd00>

In [5]:
len(wine), wine[:2]

(178,
 (tensor([[1.4230e+01, 1.7100e+00, 2.4300e+00, 1.5600e+01, 1.2700e+02, 2.8000e+00,
           3.0600e+00, 2.8000e-01, 2.2900e+00, 5.6400e+00, 1.0400e+00, 3.9200e+00,
           1.0650e+03],
          [1.3200e+01, 1.7800e+00, 2.1400e+00, 1.1200e+01, 1.0000e+02, 2.6500e+00,
           2.7600e+00, 2.6000e-01, 1.2800e+00, 4.3800e+00, 1.0500e+00, 3.4000e+00,
           1.0500e+03]]),
  tensor([[1.],
          [1.]])))

### Load Wine dataset using the DataLoader class

In [6]:
train = DataLoader(dataset=wine,
                  batch_size=4,
                  shuffle=True,
                  )

### Loading data from `torchvision.datasets`

In [39]:
train_dataset = torchvision.datasets.MNIST(
                    '',
                    train = True,
                    transform=torchvision.transforms.ToTensor(),  
                    download=True
                    )

In [38]:
test_dataset = torchvision.datasets.MNIST(
                    '',
                    train = False,
                    transform=torchvision.transforms.ToTensor(),  
                    download=True
                    )

### Creating Loaders

In [41]:
train_loader = DataLoader(dataset=train_dataset,
                          batch_size = 10,
                          shuffle = True
                         )

In [42]:
test_loader = DataLoader(dataset=test_dataset,
                          batch_size = 10,
                          shuffle = True
                         )

### Iterating over datasets

In [43]:
for data in test_loader:
    X, y = data
    print(X, y)
    break
    

tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],


        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],


        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],


        ...,


        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0.

> Do tehe training magic with the data.