Dataloader is the component that provides data to models. A dataloader usually (but not necessarily) takes raw information from datasets, and process them into a format needed by the model.
X-modaler contains a builtin data loading pipeline. It’s good to understand how it works, in case you need to write a custom one.
X-modaler provides a function xmodaler.datasets.build_xmodaler_{train,valtest}_loader
that creates a default dataloader from a given config. Here is how build_xmodaler_{train,valtest}_loader
works:
- It takes a helper class (e.g.,
xmodaler.datasets.common.DatasetFromList
) and loads a list[dict] representing the dataset items in a lightweight format. These dataset items are not yet ready to be used by the model (e.g., images are not loaded into memory). - Each dict in this list is processed by the class
xmodaler.datasets.common.MapDataset
. Users can customize this data loading for specific datasets by implementing the__call__
function in a wapper class (e.g.,xmodaler.datasets.MSCoCoDataset
), which is one of the arguments to initializeMapDataset
. The role of the wapper class is to transform the lightweight representation of a dataset item into a format that is ready for the model to consume (including, e.g., read images, caption sampling or convert to torch Tensors). - After gathering a list of items, batching schema is handled by defining the argument
collate_fn
oftorch.utils.data.DataLoader
inxmodaler.datasets.build_xmodaler_{train,valtest}_loader
functions.
The batched data is the output of the data loader. Typically, it’s also the input of model.forward()
.
Using a different “wapper class” as the argument dataset_mapper
with build_xmodaler_{train,valtest}_loader
works for most cases of custom data loading. See Use Custom Datasets to custom the “wapper class”.
If you use DefaultTrainer, you can overwrite its build_xmodaler_{train,valtest}_loader
method to use your own dataloader.