You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dataset verification is slow when verifying lots of small files. This is especially true on e.g. NFS drives.
To reproduce
Download a dataset, then download it again.
from clearml import Dataset
d = Dataset.get(dataset_id="abcdefg")
# Populate cache, verification happens here and is slow
d.get_local_copy()
# Verification on a pre-downloaded/cached dataset is also slow
d.get_local_copy()
Expected behaviour
Verification (i.e. file size checking) can theoretically happen in parallel on certain disk types - especially NFS drives that have multiple copies of stored data (e.g. Ceph, GlusterFS, or in my case, GCP Filestore).
Describe the bug
Dataset verification is slow when verifying lots of small files. This is especially true on e.g. NFS drives.
To reproduce
Download a dataset, then download it again.
Expected behaviour
Verification (i.e. file size checking) can theoretically happen in parallel on certain disk types - especially NFS drives that have multiple copies of stored data (e.g. Ceph, GlusterFS, or in my case, GCP Filestore).
Environment
Related Discussion
Slack thread: https://odin-vision.slack.com/archives/C055MNE258R/p1696591022780369
The text was updated successfully, but these errors were encountered: