Skip to content
Albert Villanova del Moral edited this page Jan 14, 2021 · 3 revisions

The datasets library uses three caches:

  • one to store the datasets data,
  • one to store the metrics data and
  • one to store the modules (either datasets or metrics python scripts).

By default each one of these caches is a directory in the Huggingface cache home directory (by default ~/.cache/huggingface).

The three directories are named:

  • datasets,
  • metrics and
  • modules. We try to have them separate so that if users ever want to clear the data cache but keeping the modules for example they can do that easily.

Default cache paths

From utils.file_utils:

  • Cache home:
hf_cache_home = os.path.expanduser(
    os.getenv("HF_HOME", os.path.join(os.getenv("XDG_CACHE_HOME", "~/.cache"), "huggingface"))
)
  • Datasets cache:
default_datasets_cache_path = os.path.join(hf_cache_home, "datasets")
HF_DATASETS_CACHE = Path(os.getenv("HF_DATASETS_CACHE", default_datasets_cache_path))
  • Metrics cache:
default_metrics_cache_path = os.path.join(hf_cache_home, "metrics")
HF_METRICS_CACHE = Path(os.getenv("HF_METRICS_CACHE", default_metrics_cache_path))
  • Modules cache:
default_modules_cache_path = os.path.join(hf_cache_home, "modules")
HF_MODULES_CACHE = Path(os.getenv("HF_MODULES_CACHE", default_modules_cache_path))

Clone this wiki locally