New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark for TileDB and Hub added #508
Benchmark for TileDB and Hub added #508
Conversation
Locust summaryGit referencesInitial: c2a5bd9Terminal: da4e42f benchmarks/benchmark_tiledb_hub.pyChanges:
|
Codecov Report
@@ Coverage Diff @@
## master #508 +/- ##
=======================================
Coverage 88.46% 88.46%
=======================================
Files 52 52
Lines 3745 3745
=======================================
Hits 3313 3313
Misses 432 432 Continue to review full report at Codecov.
|
@DebadityaPal Thanks a lot for posting it. I'll fetch the results 👍 and I'll see if everything is in order. |
I just realized, i missed
|
@DebadityaPal Now it looks more realistic ;) Thanks! |
@DebadityaPal just a clarification question: you are testing the performance of sequential read right? |
Yes, these are sequential reads. Would you like me to submit a file with randomized reads as well? |
@DebadityaPal Eventually - yes. At this moment, this is great and sufficient! It would be more useful for us if you focus on zarr for now as we spoke :) Great job! 💯 |
@haiyangdeperci do you think these benchmarks are a bit unfair towards Hub because the TileDB data is stored locally whereas hub fetches the data from the cloud, thus creating overhead for cache misses? I modified this code to store both locally, hub performs significantly better in that case. Like, instead of |
@DebadityaPal I think it would be good for the benchmark to present both cases. We want to show that even if the data is fetched remotely Hub still outperforms TileDB |
Another thing that I noticed was the benchmarks for the cloud-stored hub dataset is a little inconsistent in the sense that it varies with variation in Internet Speed and Latency. |
@DebadityaPal That's a good observation. I'm running these benchmarks on a machine closer in location to hub's cloud so it is less of an issue but for the end user it may be significant. |
@DebadityaPal Thanks again for the great work! I'll update the results. |
For reproducibility I used a Google Colab environment with Hardware Acceleration set to
None