# PyTorch Datasets and Data Loaders


### PyTorch makes accessing data for your model a breeze! These tools ensure that the flow of information to our AI is just right, making its learning experience effective and fun.

## Technical Terms:
- ```PyTorch Dataset class```: This is like a recipe that tells your computer how to get the data it needs to learn from, including where to find it and how to parse it, if necessary.

- ```PyTorch Data Loader```: Think of this as a delivery truck that brings the data to your AI in small, manageable loads called batches; this makes it easier for the AI to process and learn from the data.

- ```Batches```: Batches are small, evenly divided parts of data that the AI looks at and learns from each step of the way.

- ```Shuffle```: It means mixing up the data so that it's not in the same order every time, which helps the AI learn better.

In [1]:
from torch.utils.data import Dataset

In [2]:
# Create a toy dataset
class NUmberProductDataset(Dataset):
    def __init__(self, data_range=(1, 10)):
        self.numbers = list(range(data_range[0], data_range[1]))
    
    def __getitem__(self, index):
        number1 = self.numbers[index]
        number2 = self.numbers[index] + 1
        return (number1, number2), number1 * number2
    
    def __len__(self):
        return len(self.numbers)


In [3]:
# Instantiate the dataset
dataset = NUmberProductDataset(
    data_range=(0, 11)
)

In [4]:
# Access the dataset
data_sample = dataset[3]
print(data_sample)

((3, 4), 12)


**An Example Data Loader**


In [9]:
from torch.utils.data import DataLoader

In [10]:
# Instantiate the dataset
dataset = NUmberProductDataset(data_range=(0, 5))

In [11]:
# Create a DataLoader instance
dataloader = DataLoader(dataset, batch_size=3, shuffle=True)

In [19]:
# Iterate over the batches
for (num_pairs, products) in dataloader:
    print(num_pairs, products)

[tensor([3, 1, 4]), tensor([4, 2, 5])] tensor([12,  2, 20])
[tensor([0, 2]), tensor([1, 3])] tensor([0, 6])


# Code Examples


## ```Datasets```

In [20]:
from torch.utils.data import Dataset

# Create a toy dataset
class NumberProductDataset(Dataset):
    def __init__(self, data_range=(1, 10)):
        self.numbers = list(range(data_range[0], data_range[1]))

    def __getitem__(self, index):
        number1 = self.numbers[index]
        number2 = self.numbers[index] + 1
        return (number1, number2), number1 * number2

    def __len__(self):
        return len(self.numbers)

# Instantiate the dataset
dataset = NumberProductDataset(
    data_range=(0, 11)
)

# Access a data sample
data_sample = dataset[3]
print(data_sample)
# ((3, 4), 12)

((3, 4), 12)


## ```Data Loaders```


In [25]:
from torch.utils.data import DataLoader

# Instantiate the dataset
dataset = NumberProductDataset(data_range=(0, 5))

# Create a DataLoader instance
dataloader = DataLoader(dataset, batch_size=3, shuffle=True)

# Iterating over batches
for (num_pairs, products) in dataloader:
    print(num_pairs, products)
# [tensor([4, 3, 1]), tensor([5, 4, 2])] tensor([20, 12, 2])
# [tensor([2, 0]), tensor([3, 1])] tensor([6, 0])

[tensor([0, 1, 2]), tensor([1, 2, 3])] tensor([0, 2, 6])
[tensor([3, 4]), tensor([4, 5])] tensor([12, 20])


## Resources

- [PyTorch Dataset documentation](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset)

- [PyTorch DataLoader documentation](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)

- [Index of PyTorch data utilities](https://pytorch.org/docs/stable/data.html)