# Get your data ready for training

This module defines the basic [`DataBunch`](/basic_data#DataBunch) object that is used inside [`Learner`](/basic_train#Learner) to train a model. This is the generic class, that can take any kind of fastai [`Dataset`](https://pytorch.org/docs/stable/data#torch.utils.data.Dataset) or [`DataLoader`](https://pytorch.org/docs/stable/data#torch.utils.data.DataLoader). You'll find helpful functions in the data module of every application to directly create this [`DataBunch`](/basic_data#DataBunch) for you.

In [None]:
from fastai.gen_doc.nbdoc import *
from fastai.basic_data import * 

In [None]:
show_doc(DataBunch, doc_string=False)

## <a id=DataBunch></a>`class` `DataBunch`
> `DataBunch`(`train_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data#torch.utils.data.DataLoader), `valid_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data#torch.utils.data.DataLoader), `test_dl`:`Optional`\[[`DataLoader`](https://pytorch.org/docs/stable/data#torch.utils.data.DataLoader)\]=`None`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`None`, `tfms`:`Optional`\[`Collection`\[`Callable`\]\]=`None`, `path`:`PathOrStr`=`'.'`, `collate_fn`:`Callable`=`'data_collate'`)
<a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L69">[source]</a>

Bind together a `train_dl`, a `valid_dl` and optionally a `test_dl`, ensures they are on `device` and apply to them `tfms` as batch are drawn. `path` is used internally to store temporary files, `collate_fn` is passed to the pytorch `Dataloader` (replacing the one there) to explain how to collate the samples picked for a batch. By default, it applies data to the object sent (see in [`vision.image`](/vision.image#vision.image) why this can be important). 

An example of `tfms` to pass is normalization. `train_dl`, `valid_dl` and optionally `test_dl` will be wrapped in [`DeviceDataLoader`](/basic_data#DeviceDataLoader).

In [None]:
show_doc(DataBunch.create, full_name='create', doc_string=False)

#### <a id=create></a>`create`
> `create`(`train_ds`:[`Dataset`](https://pytorch.org/docs/stable/data#torch.utils.data.Dataset), `valid_ds`:[`Dataset`](https://pytorch.org/docs/stable/data#torch.utils.data.Dataset), `test_ds`:[`Dataset`](https://pytorch.org/docs/stable/data#torch.utils.data.Dataset)=`None`, `path`:`PathOrStr`=`'.'`, `bs`:`int`=`64`, `num_workers`:`int`=`8`, `tfms`:`Optional`\[`Collection`\[`Callable`\]\]=`None`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`None`, `collate_fn`:`Callable`=`'data_collate'`) → `DataBunch`
<a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L82">[source]</a>

Create a [`DataBunch`](/basic_data#DataBunch) from `train_ds`, `valid_ds` and optionally `test_ds`, with batch size `bs` and by using `num_workers`. `tfms` and `device` are passed to the init method.

In [None]:
show_doc(DataBunch.holdout, doc_string=False)

#### <a id=DataBunch.holdout></a>`holdout`
> `holdout`(`is_test`:`bool`=`False`) → [`DeviceDataLoader`](/basic_data#DeviceDataLoader)
<a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L94">[source]</a>

Return the `self.test_dl` if `is_test` is True, else `self.valid_dl`.

In [None]:
show_doc(DeviceDataLoader, doc_string=False)

## <a id=DeviceDataLoader></a>`class` `DeviceDataLoader`
> `DeviceDataLoader`(`dl`:[`DataLoader`](https://pytorch.org/docs/stable/data#torch.utils.data.DataLoader), `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device), `tfms`:`List`\[`Callable`\]=`None`, `collate_fn`:`Callable`=`'data_collate'`)
<a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L23">[source]</a>

Put the batches of `dl` on `device` after applying an optional list of `tfms`. `collate_fn` will replace the one of `dl`. All dataloaders of a [`DataBunch`](/basic_data#DataBunch) are of this type. 

### Factory method

In [None]:
show_doc(DeviceDataLoader.create, doc_string=False)

#### <a id=DeviceDataLoader.create></a>`create`
> `create`(`dataset`:[`Dataset`](https://pytorch.org/docs/stable/data#torch.utils.data.Dataset), `bs`:`int`=`64`, `shuffle`:`bool`=`False`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`device(type='cpu')`, `tfms`:`Collection`\[`Callable`\]=`None`, `num_workers`:`int`=`8`, `collate_fn`:`Callable`=`'data_collate'`, `kwargs`:`Any`)
<a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L62">[source]</a>

Create a [`DeviceDataLoader`](/basic_data#DeviceDataLoader) on `device` from a `dataset` with batch size `bs`, `num_workers`processes and a given `collate_fn`. The dataloader will `shuffle` the data if that flag is set to True, and `tfms` are passed to the init method. All `kwargs` are passed to the pytorch [`DataLoader`](https://pytorch.org/docs/stable/data#torch.utils.data.DataLoader) class initialization.

### Internal methods

In [None]:
show_doc(DeviceDataLoader.proc_batch)

#### <a id=DeviceDataLoader.proc_batch></a>`proc_batch`
> `proc_batch`(`b`:`Tensor`) → `Tensor`


Proces batch `b` of `TensorImage`.  <a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L44">[source]</a>

In [None]:
show_doc(DeviceDataLoader.add_tfm)

#### <a id=DeviceDataLoader.add_tfm></a>`add_tfm`
> `add_tfm`(`tfm`:`Callable`)
<a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L41">[source]</a>

Add `tfm` to `self.tfms`.

In [None]:
show_doc(DeviceDataLoader.remove_tfm)

#### <a id=DeviceDataLoader.remove_tfm></a>`remove_tfm`
> `remove_tfm`(`tfm`:`Callable`)
<a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L42">[source]</a>

Remove `tfm` from `self.tfms`.

## Generic classes

Those two last classes are just empty shell to be subclassed by one of the applications.

In [None]:
show_doc(DatasetBase, title_level=3)

### <a id=DatasetBase></a>`class` `DatasetBase`
> `DatasetBase`() :: [`Dataset`](https://pytorch.org/docs/stable/data#torch.utils.data.Dataset)


Base class for all fastai datasets.  <a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L6">[source]</a>

In [None]:
show_doc(LabelDataset, title_level=3)

### <a id=LabelDataset></a>`class` `LabelDataset`
> `LabelDataset`() :: [`DatasetBase`](/basic_data#DatasetBase)


Base class for fastai datasets that do classification.  <a href="https://github.com/fastai/fastai/blob/master/fastai/basic_data.py#L15">[source]</a>