# dataloader的使用
---

```python
class DataLoader(Generic[T_co]):

    Data loader. Combines a dataset and a sampler, and provides an iterable over
    the given dataset.

    The :class:`~torch.utils.data.DataLoader` supports both map-style and
    iterable-style datasets with single- or multi-process loading, customizing
    loading order and optional automatic batching (collation) and memory pinning.

    See :py:mod:`torch.utils.data` documentation page for more details.

    Args:
        dataset (Dataset): dataset from which to load the data.
        batch_size (int, optional): how many samples per batch to load
            (default: ``1``).
        shuffle (bool, optional): set to ``True`` to have the data reshuffled
            at every epoch (default: ``False``).
        sampler (Sampler or Iterable, optional): defines the strategy to draw
            samples from the dataset. Can be any ``Iterable`` with ``__len__``
            implemented. If specified, :attr:`shuffle` must not be specified.
        batch_sampler (Sampler or Iterable, optional): like :attr:`sampler`, but
            returns a batch of indices at a time. Mutually exclusive with
            :attr:`batch_size`, :attr:`shuffle`, :attr:`sampler`,
            and :attr:`drop_last`.
        num_workers (int, optional): how many subprocesses to use for data
            loading. ``0`` means that the data will be loaded in the main process.
            (default: ``0``)
        collate_fn (callable, optional): merges a list of samples to form a
            mini-batch of Tensor(s).  Used when using batched loading from a
            map-style dataset.
        pin_memory (bool, optional): If ``True``, the data loader will copy Tensors
            into CUDA pinned memory before returning them.  If your data elements
            are a custom type, or your :attr:`collate_fn` returns a batch that is a custom type,
            see the example below.
        drop_last (bool, optional): set to ``True`` to drop the last incomplete batch,
            if the dataset size is not divisible by the batch size. If ``False`` and
            the size of dataset is not divisible by the batch size, then the last batch
            will be smaller. (default: ``False``)
        timeout (numeric, optional): if positive, the timeout value for collecting a batch
            from workers. Should always be non-negative. (default: ``0``)
        worker_init_fn (callable, optional): If not ``None``, this will be called on each
            worker subprocess with the worker id (an int in ``[0, num_workers - 1]``) as
            input, after seeding and before data loading. (default: ``None``)
        prefetch_factor (int, optional, keyword-only arg): Number of samples loaded
            in advance by each worker. ``2`` means there will be a total of
            2 * num_workers samples prefetched across all workers. (default: ``2``)
        persistent_workers (bool, optional): If ``True``, the data loader will not shutdown
            the worker processes after a dataset has been consumed once. This allows to
            maintain the workers `Dataset` instances alive. (default: ``False``)
```
 + epoch：所有的训练样本输入到模型中称为一个epoch； 
 + iteration：一批样本输入到模型中，成为一个Iteration;
 + batchszie：批大小，决定一个epoch有多少个Iteration；
 + 迭代次数（iteration）=样本总数（epoch）/批尺寸（batchszie）
 + dataset (Dataset) – 决定数据从哪读取或者从何读取；
 + batch_size (python:int, optional) – 批尺寸(每次训练样本个数,默认为１）
 + shuffle (bool, optional) –每一个 epoch是否为乱序 (default: False)；
 + num_workers (python:int, optional) – 是否多进程读取数据（默认为０);
 + drop_last (bool, optional) – 当样本数不能被batchsize整除时，最后一批数据是否舍弃（default: False)
 + pin_memory（bool, optional) - 如果为True会将数据放置到GPU上去（默认为false） 


In [12]:
import torchvision
from torch.utils.data import DataLoader

In [20]:
# 准备测试数据集
test_data = torchvision.datasets.CIFAR10("../L14/dataset", train=False, transform=torchvision.transforms.ToTensor())
test_loader = DataLoader(dataset=test_data, batch_size=64, shuffle=True, num_workers=0, drop_last=False)

In [22]:
# 测试数据集中的第一张图片
img, target = test_data[0]
print(test_data[0])                    # 可以看到test_data[0]实际上就是(图片张量，标签索引）这种数据格式
print(img)
print(img.shape)
print(target)

(tensor([[[0.6196, 0.6235, 0.6471,  ..., 0.5373, 0.4941, 0.4549],
         [0.5961, 0.5922, 0.6235,  ..., 0.5333, 0.4902, 0.4667],
         [0.5922, 0.5922, 0.6196,  ..., 0.5451, 0.5098, 0.4706],
         ...,
         [0.2667, 0.1647, 0.1216,  ..., 0.1490, 0.0510, 0.1569],
         [0.2392, 0.1922, 0.1373,  ..., 0.1020, 0.1137, 0.0784],
         [0.2118, 0.2196, 0.1765,  ..., 0.0941, 0.1333, 0.0824]],

        [[0.4392, 0.4353, 0.4549,  ..., 0.3725, 0.3569, 0.3333],
         [0.4392, 0.4314, 0.4471,  ..., 0.3725, 0.3569, 0.3451],
         [0.4314, 0.4275, 0.4353,  ..., 0.3843, 0.3725, 0.3490],
         ...,
         [0.4863, 0.3922, 0.3451,  ..., 0.3804, 0.2510, 0.3333],
         [0.4549, 0.4000, 0.3333,  ..., 0.3216, 0.3216, 0.2510],
         [0.4196, 0.4118, 0.3490,  ..., 0.3020, 0.3294, 0.2627]],

        [[0.1922, 0.1843, 0.2000,  ..., 0.1412, 0.1412, 0.1294],
         [0.2000, 0.1569, 0.1765,  ..., 0.1216, 0.1255, 0.1333],
         [0.1843, 0.1294, 0.1412,  ..., 0.1333, 0.1333, 0

In [23]:
# 测试test_loader内部有什么
from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter("./logs")
step = 0
for data in test_loader:
    imgs, targets = data
    print(type(imgs))
    print(imgs.shape)
    print(targets)
    writer.add_images(tag="Dataloader Test", img_tensor=imgs, global_step=step)
    step = step + 1

<class 'torch.Tensor'>
torch.Size([64, 3, 32, 32])
tensor([6, 6, 9, 7, 6, 4, 0, 3, 1, 0, 6, 6, 2, 8, 7, 4, 3, 1, 1, 6, 6, 8, 8, 2,
        8, 0, 5, 8, 7, 3, 7, 1, 1, 1, 8, 3, 3, 3, 1, 4, 0, 5, 1, 8, 2, 1, 6, 0,
        5, 0, 7, 9, 1, 1, 6, 3, 2, 0, 2, 8, 1, 5, 2, 8])
<class 'torch.Tensor'>
torch.Size([64, 3, 32, 32])
tensor([1, 0, 0, 5, 9, 8, 4, 6, 3, 7, 2, 7, 0, 5, 9, 2, 4, 1, 3, 2, 9, 7, 5, 0,
        6, 9, 8, 9, 7, 2, 5, 6, 4, 5, 2, 4, 3, 3, 3, 6, 7, 6, 0, 6, 4, 3, 3, 3,
        8, 9, 7, 2, 7, 6, 5, 4, 5, 9, 7, 3, 5, 4, 6, 4])
<class 'torch.Tensor'>
torch.Size([64, 3, 32, 32])
tensor([1, 0, 1, 4, 8, 7, 5, 3, 4, 4, 5, 4, 1, 2, 8, 2, 8, 4, 8, 8, 4, 7, 3, 1,
        1, 6, 0, 1, 8, 9, 0, 4, 8, 2, 3, 4, 5, 0, 3, 0, 7, 4, 8, 4, 1, 9, 3, 6,
        7, 0, 6, 4, 9, 6, 3, 0, 7, 2, 9, 2, 3, 0, 6, 4])
<class 'torch.Tensor'>
torch.Size([64, 3, 32, 32])
tensor([1, 3, 7, 0, 1, 4, 5, 9, 1, 1, 9, 0, 6, 6, 4, 8, 8, 7, 4, 9, 9, 6, 2, 9,
        9, 3, 3, 7, 3, 4, 4, 4, 5, 2, 8, 5, 0, 6, 6, 1, 6, 1, 6, 