Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance threaded_walk with exclusion of vcs and other subfolders #1086

Open
yarikoptic opened this issue Jul 29, 2022 · 2 comments
Open

Enhance threaded_walk with exclusion of vcs and other subfolders #1086

yarikoptic opened this issue Jul 29, 2022 · 2 comments

Comments

@yarikoptic
Copy link
Member

We have https://github.com/dandi/dandi-cli/blob/master/dandi/support/threaded_walk.py which is used during zarr uploads. As now we do have Datalad datasets for (some/most) of zarr folders, we better add exclusion of .git, .datalad, .dotdirs etc folders there as we do in find_files. Ideally RF should then include RFing of find_files to use threaded_walk with those added exclusion features so we gain speed up in places where we use find_files

@jwodder
Copy link
Member

jwodder commented Jul 29, 2022

@yarikoptic The threaded walk is not used during uploads; it's only used to verify downloads and by dandi digest. Also, the results in con/fscacher#67 indicate that the threaded walk is not an improvement when you're just listing files; its value in Zarr digesting comes from being able to calculate MD5 hashes of files concurrenctly.

@yarikoptic
Copy link
Member Author

hm, interesting... may be I was under wrong impression that it does provide some benefits, will need to recall details better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants