# PyTorch DataLoader

It's not efficient to calculate the loss on all of the dataset, that's why we use batches of data and for each epoch, we loop over the batches and feed them one by one.

In the Neural Network Termonology:
* One `epoch`: One forward pass and one backward pass of all the training examples.
* `Batch Size`: The number of training examples in one forward/backward pass. The Higher the batch size, the more memory space you'll need.
* number of `iterations`: Number of passes, each pass using `batch size` number of examples. To be clear, one pass = one forward pass + one backward pass (we do not count the forward pass and backward pass as two different passes).

Example: If you have 100 training examples, and the batch size is 500, you'll need 2 iterations to complete one epoch.

Here's how the DataLoader Works:

<img src="DataLoader.png" />

And here's how to work with data loaders in PyTorch:

In [2]:
# imports.
import torch
from torch.utils.data import DataLoader, Dataset

In [3]:
class DiabetesDataset(Dataset):
    '''
    The Diabetes Dataset class, based on Torch's Dataset Class.
    '''
    
    def __init__(self):
        # download, read data etc..
        # basically assigning a lot of selfs.
        return

    def __getitem__(self, index):
        # get data sample by index.
        return
    
    def __len__(self):
        # get the length of the available dataset.
        return

In [4]:
dataset = DiabetesDataset()

In [5]:
train_loader = DataLoader(dataset, batch_size=32, shuffle=True, num_workers=2)

A real example:
<img src="DataLoaderExample.png" />

There following dataset loaders are available:

* MNIST and FashionMNIST
* COCO (Captioning and Detection)
* LSUN Classification
* ImageFolder
* ImageNet-l2
* CIFAR10 - CIFAR100
* STL10
* SVHN
* PhotoTour

In [6]:
import torchvision

All of the datasets above reside in `torchvision.datasets`.

### Exercice
Build a DataLoader for the Kaggle Titanic Dataset.