-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zarr backup might need "optimization" #363
Comments
which asset is this? i want to check that the shape/compression characteristics did not change in the process and this is indeed a hefty zarr (i.e. could be one of the 4mm slices). also i'm going to start rolling out not storing rawest data but stitched data. |
|
this is dandiset 26, not 108. it's probably the TB one. it's an entire hemisphere and more at 15um resolution. |
i didn't read "non" 000108 dandiset - i thought it was in 108. but this one is beautiful. yael posted the neuroglancer rendering in the bids spec addition of HiPCT. |
is there a link? |
Let's consider migrated to |
I found 4 days old process still running for non 000108 dandiset. The process tree
and looking at that zarr
so it is a "hefty" zarr -- half a million files. I wonder if we could make that process anyhow faster. there was some splitindex etc.
FWIW -- above count is with folders. Without folders:
and that particular zarr is almost done so I will keep it going for now
edits:
receive.autogc=0
andgc.auto=0
-- should we trigger it "manually" but wouldn't it then interfere with running batched processes? we might need to stop and redo. Might be worth simulating that all with some dedicated script to time it all up. Also might be worth moving all the dandizarrs to some faster / dedicated medium (SSDs?)py-spy top
sampling gives the top ofso is it just jumping between different async items or really doing some useful work???
edit: some stats from ncdu. A LOT of files during the backup, then just few
at some point there were over 900,000 files in .git/annex/journal !
--- /mnt/backup/dandi/dandizarrs/5c37c233-222f-4e60-96e7-a7536e08ef61/.git/annex ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- /.. 3.5 GiB [##########] 904.4k /journal 2.0 MiB [ ] index 1.5 MiB [ ] 1 /keysdb 12.0 KiB [ ] 3 /fsck 4.0 KiB [ ] index.lck
and separate objects (no packing performed) for each tiny file
which then all get handled eventually and .git/objects packed too:
The text was updated successfully, but these errors were encountered: