Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#viewsetups >100 #26

Closed
martinschorb opened this issue Jul 1, 2020 · 10 comments
Closed

#viewsetups >100 #26

martinschorb opened this issue Jul 1, 2020 · 10 comments

Comments

@martinschorb
Copy link
Contributor

Hi,

the HDF5 structure does not support more than 2 digit ViewSetups.

Would N5 support it? I can see the setupN directory structure. Is there a limitation to the digits of N?

BTW: Where can I find the elf package? I cannot get pybdv's n5 conversion to work.

@constantinpape
Copy link
Owner

the HDF5 structure does not support more than 2 digit ViewSetups.
Would N5 support it? I can see the setupN directory structure. Is there a limitation to the digits of N?

Yes, the n5 structure supports an arbitrary number of setups. To make it work in pybdv I would need to change this check so it is only triggered for hdf5:
https://github.com/constantinpape/pybdv/blob/master/pybdv/converter.py#L90-L92

BTW: Where can I find the elf package? I cannot get pybdv's n5 conversion to work.

You can find it here; but maybe I shouldn't depend on it in pybdv. I will check if it's easy to replace later.
https://github.com/constantinpape/elf

@martinschorb
Copy link
Contributor Author

martinschorb commented Jul 2, 2020

OK, cool,

then I will just use n5 if there are more than 100 ViewSetups.

elf is a bit of an ambiguous package name. I did some research and just could not find the right one... Maybe just rename it. And yes, that obvious place I did not check...

@constantinpape
Copy link
Owner

I have updated this:

  • elf is fully optional now and you can convert to n5 as long as you have z5py in your env (conda install -c conda-forge z5py)
  • I have disabled the check for number of setup ids for n5.

@martinschorb
Copy link
Contributor Author

Hi,

this seems to work.

The conversion to n5 with the default chunk size (that is 64, correct?) takes ages. I guess this is because the group shrae filers are not optimized for dealing with such many small files...?

Is there any mechanism in n5 (or similar other data format) that would overcome this?

@martinschorb
Copy link
Contributor Author

I just found that even reading the chunks seems very slow from the group shares as compared to h5.
Is there some hybrid format, or would you just increase the chunk size?

@constantinpape
Copy link
Owner

I just found that even reading the chunks seems very slow from the group shares as compared to h5.
Is there some hybrid format, or would you just increase the chunk size?

Normally h5 and n5 should be more or less the same speed; could you maybe post the h5 and n5 file where you have observed this, the exact environment you have used and the access pattern?

@martinschorb
Copy link
Contributor Author

can you see /g/emcf/schorb/data/BDV/montages/LLP_001/bdv_LLP ?

That's both the same thing. N5 took 5x as long to create using the current master commit.

When loading in BDV, h5 appears instantaneously while N5 takes >20 s until reaching the stage where bdv-playground considers the data loaded and performs the centering and auto-contrast. This is in a VM, so IO to the group share should be comparable.

@martinschorb
Copy link
Contributor Author

that's with default chunk size (64,64,64). It gets a bit better when setting the chunks to something like (1,512,512) instead.

@constantinpape
Copy link
Owner

Ok, I had a look at the data. Indeed I also see quite a big difference in the loading speed.
However, this data is 2d, so (1, 64, 64) chunks are tiny! I would definitely go with (1, 512, 512).

In my experience, the over-head of reading the individual chunks from file system is not too large; however at some point it becomes problematic.

At some point I measured it on the Janelia distributed file system and there it wasn't a big problem if chunks were ~ 64** 3 size; we should measure this at EMBL at some point as well to determine what is a good minimal size.

In any case, for 2d data I would always go with at least (1, 512, 512) chunks.

@constantinpape
Copy link
Owner

I closed this, feel free to reopen if this is still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants