Basic datasets including MNIST and CIFAR will auto-download. Source code for these datamodules are in basic.py.
By default, data is downloaded to ./data/
by default, where .
is the top level directory of this repository (e.g. 'safari').
After downloading and preparing data, the paths can be configured in several ways.
-
Suppose that it is desired to download all data to a different folder, for example a different disk. The data path can be configured by setting the environment variable
DATA_PATH
, which defaults to./data
. -
For fine-grained control over the path of a particular dataset, set
dataset.data_dir
in the config. For example, if the LRA ListOps files are located in/home/lra/listops-1000/
instead of the default./data/listops/
, pass in+dataset.data_dir=/home/lra/listops-1000
on the command line or modify the config file directly. -
As a simple workaround, softlinks can be set, e.g.
ln -s /home/lra/listops-1000 ./data/listops
LRA must be manually downloaded.
By default, these should go under $DATA_PATH/
, which defaults to ./data
. For the remainder of this README, these are used interchangeably.
LRA can be downloaded from the GitHub page. These datasets should be organized as follows:
$DATA_PATH/
pathfinder/
pathfinder32/
pathfinder64/
pathfinder128/
pathfinder256/
aan/
listops/
The other two datasets in the suite ("Image" or grayscale sequential CIFAR-10; "Text" or char-level IMDB sentiment classification) are both auto-downloaded.