forked from huggingface/datasets
-
Notifications
You must be signed in to change notification settings - Fork 1
Cache
Albert Villanova del Moral edited this page Jan 14, 2021
·
3 revisions
The datasets library uses three caches:
- one to store the datasets data,
- one to store the metrics data and
- one to store the modules (either datasets or metrics python scripts).
By default each one of these caches is a directory in the Huggingface cache home directory (by default ~/.cache/huggingface).
The three directories are named:
- datasets,
- metrics and
- modules. We try to have them separate so that if users ever want to clear the data cache but keeping the modules for example they can do that easily.
From utils.file_utils:
- Cache home:
hf_cache_home = os.path.expanduser(
os.getenv("HF_HOME", os.path.join(os.getenv("XDG_CACHE_HOME", "~/.cache"), "huggingface"))
)- Datasets cache:
default_datasets_cache_path = os.path.join(hf_cache_home, "datasets")
HF_DATASETS_CACHE = Path(os.getenv("HF_DATASETS_CACHE", default_datasets_cache_path))- Metrics cache:
default_metrics_cache_path = os.path.join(hf_cache_home, "metrics")
HF_METRICS_CACHE = Path(os.getenv("HF_METRICS_CACHE", default_metrics_cache_path))- Modules cache:
default_modules_cache_path = os.path.join(hf_cache_home, "modules")
HF_MODULES_CACHE = Path(os.getenv("HF_MODULES_CACHE", default_modules_cache_path))