# Data batching in PyTorch

For training of neural nets small batches of data (typically 8-64 records) are passed to the neural net at a time.

This may be performed using NumPy, but PyTorch also has built in methods to make batching easier.

Here we will load up the iris data set to demonstrate use of PyTorch methods.

In [1]:
import torch
import numpy as np
import pandas as pd

from torch.utils.data import TensorDataset, DataLoader

## Load the data into a Pandas DataFrame

In [2]:
df = pd.read_csv('./data/iris.csv')
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0


## Create a PyTorch dataset from the loaded Pandas DataFrame

We use the PyTorch `TensorDataset` method to create a PyTorch data type.

In [11]:
data = df.drop('target',axis=1).values
labels = df['target'].values

# FloatTensor is the same as Tensor, and creates a 32 bit tensor for X data
# LonfTensor is an integer tensor for y labels

iris = TensorDataset(torch.FloatTensor(data),torch.LongTensor(labels))

# Show data type of iris
type(iris)

torch.utils.data.dataset.TensorDataset

Let's look at the first record.

In [12]:
iris[0]

(tensor([5.1000, 3.5000, 1.4000, 0.2000]), tensor(0))

## Batching

We use the PyTorch `DataLoader` method to create batches from the PyTorch `TensorDataset`.


In [13]:
iris_loader = DataLoader(iris, batch_size=8, shuffle=True)

# Create numbered batches (batches will not usually need to be numbered)
for i_batch, sample_batched in enumerate(iris_loader):
    # Print first 2 batches
    if i_batch <2:
        print(i_batch, sample_batched)

0 [tensor([[6.9000, 3.1000, 5.1000, 2.3000],
        [6.3000, 2.7000, 4.9000, 1.8000],
        [6.4000, 3.2000, 4.5000, 1.5000],
        [7.7000, 2.6000, 6.9000, 2.3000],
        [4.9000, 3.1000, 1.5000, 0.1000],
        [5.1000, 3.5000, 1.4000, 0.2000],
        [4.7000, 3.2000, 1.3000, 0.2000],
        [7.1000, 3.0000, 5.9000, 2.1000]]), tensor([2, 2, 1, 2, 0, 0, 0, 2])]
1 [tensor([[5.0000, 3.0000, 1.6000, 0.2000],
        [4.8000, 3.0000, 1.4000, 0.1000],
        [6.8000, 3.2000, 5.9000, 2.3000],
        [6.5000, 3.0000, 5.8000, 2.2000],
        [4.9000, 3.0000, 1.4000, 0.2000],
        [5.7000, 2.9000, 4.2000, 1.3000],
        [7.0000, 3.2000, 4.7000, 1.4000],
        [5.0000, 3.4000, 1.6000, 0.4000]]), tensor([0, 0, 2, 2, 0, 1, 1, 0])]
