New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added benchmarks for Zarr compared to Hub #512
Added benchmarks for Zarr compared to Hub #512
Conversation
Codecov Report
@@ Coverage Diff @@
## master #512 +/- ##
=======================================
Coverage 88.46% 88.46%
=======================================
Files 52 52
Lines 3745 3745
=======================================
Hits 3313 3313
Misses 432 432 Continue to review full report at Codecov.
|
Awesome, thanks again for your sustained support. Would it be possible for you to include |
I have added the dataset and modified the existing functions to support it. However, the process is really memory inefficient. The Dataset is stored thrice, once as |
@DebadityaPal Where is places365 saved as a |
What is happening is that |
@DebadityaPal Gotcha, thanks. Can you show me where |
In Line 939 of dataset.py ,
I think the way tfds.load() is programmed is that it looks for the data in the tensorflow_dataset root directory, if the dataset doesn't exist, it downloads and prepares it. It has an argument |
Is it necessary that we use |
It's not necessary, however, it is faster. |
We've been discussing this issue. I'll try to run these benchmarks on our set-up. Afterwards, we'll investigate the problem. I think @AbhinavTuli wanted to explore the issue with the transforms. |
I'll merge it. I run the benchmarks and the results seems alright, albeit I noted the memory issue. I am hoping to raise this problem separately before our next benchmark call. |
Added benchmarking files similar to that with TileDB, except this is for Zarr.
For reproducibility, this also has been tested on a Google Colab Notebook environment.
The benchmarks are as follows: