# **Loading Dataset**

## **Utilizing Datasets Provided by PyTorch**

`torchvision` is a PyTorch package that offers a bunch of famous datasets like *MNIST* and *ImageNet*. You can peek at all the datasets PyTorch gives you at this link: [https://pytorch.org/vision/main/datasets.html](https://pytorch.org/vision/main/datasets.html)

Here's an example of how to use the MNIST dataset.

In [None]:
import torchvision.transforms as transforms

mnist_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (1.0,))
])

from torchvision.datasets import MNIST
from torch.utils.data import DataLoader
import requests

download_root = './data/mnist'
train_dataset = MNIST(download_root, transform=mnist_transform, train=True, download=True)
test_dataset = MNIST(download_root, transform=mnist_transform, train=False, download=True)
dataset = DataLoader(train_dataset, batch_size=5, shuffle=True)
for x, y in dataset:
    print('x =', x)
    print('y =', y)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
        [[[-0.5000, -0.5000, -0.5000,  ..., -0.5000, -0.5000, -0.5000],
          [-0.5000, -0.5000, -0.5000,  ..., -0.5000, -0.5000, -0.5000],
          [-0.5000, -0.5000, -0.5000,  ..., -0.5000, -0.5000, -0.5000],
          ...,
          [-0.5000, -0.5000, -0.5000,  ..., -0.5000, -0.5000, -0.5000],
          [-0.5000, -0.5000, -0.5000,  ..., -0.5000, -0.5000, -0.5000],
          [-0.5000, -0.5000, -0.5000,  ..., -0.5000, -0.5000, -0.5000]]],


        [[[-0.5000, -0.5000, -0.5000,  ..., -0.5000, -0.5000, -0.5000],
          [-0.5000, -0.5000, -0.5000,  ..., -0.5000, -0.5000, -0.5000],
          [-0.5000, -0.5000, -0.5000,  ..., -0.5000, -0.5000, -0.5000],
          ...,
          [-0.5000, -0.5000, -0.5000,  ..., -0.5000, -0.5000, -0.5000],
          [-0.5000, -0.5000, -0.5000,  ..., -0.5000, -0.5000, -0.5000],
          [-0.5000, -0.5000, -0.5000,  ..., -0.5000, -0.5000, -0.5000]]],


        [[[-0.5000, -0.5000, -0.5

KeyboardInterrupt: ignored

## **Creating and Using Custom Datasets**

To implement the CustomDataset class, it should take the following form:
```python
class CustomDataset(torch.utils.data.Dataset):
   def __init__(self):
      # initialization code

   def __len__(self):
      # return the size of the dataset

   def __getitem__(self, index):
      # return a sample from the dataset at a given index
```

Example

In [None]:
import pandas as pd
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

class CustomDataset(Dataset):
    def __init__(self, csv_file):
        # initialization code
        self.data = pd.read_csv(csv_file)

    def __len__(self):
        # return the size of the dataset
        return len(self.data)

    def __getitem__(self, index):
        # return a sample from the dataset at a given index
        x = torch.tensor(self.data.iloc[index, 0:4], dtype=float)
        y = torch.tensor(self.data.iloc[index, 4], dtype=float)
        return x, y

url = 'https://raw.githubusercontent.com/dkims/CSCI4341/main/iris.csv'
tensor_dataset = CustomDataset(url)
dataset = DataLoader(tensor_dataset, batch_size=5, shuffle=True)
for x, y in dataset:
    print('x =', x)
    print('y =', y)

x = tensor([[5.1000, 3.8000, 1.5000, 0.3000],
        [7.9000, 3.8000, 6.4000, 2.0000],
        [4.9000, 3.1000, 1.5000, 0.1000],
        [6.6000, 2.9000, 4.6000, 1.3000],
        [5.0000, 3.0000, 1.6000, 0.2000]], dtype=torch.float64)
y = tensor([0., 2., 0., 1., 0.], dtype=torch.float64)
x = tensor([[6.7000, 3.3000, 5.7000, 2.1000],
        [5.8000, 4.0000, 1.2000, 0.2000],
        [6.0000, 2.7000, 5.1000, 1.6000],
        [5.9000, 3.0000, 4.2000, 1.5000],
        [5.2000, 4.1000, 1.5000, 0.1000]], dtype=torch.float64)
y = tensor([2., 0., 1., 1., 0.], dtype=torch.float64)
x = tensor([[6.8000, 3.0000, 5.5000, 2.1000],
        [5.4000, 3.4000, 1.7000, 0.2000],
        [5.2000, 2.7000, 3.9000, 1.4000],
        [7.7000, 2.6000, 6.9000, 2.3000],
        [6.4000, 2.7000, 5.3000, 1.9000]], dtype=torch.float64)
y = tensor([2., 0., 1., 2., 2.], dtype=torch.float64)
x = tensor([[4.3000, 3.0000, 1.1000, 0.1000],
        [5.6000, 3.0000, 4.5000, 1.5000],
        [7.2000, 3.2000, 6.0000, 1.8000],
