Benchmark for TileDB and Hub added #508

DebadityaPal · 2021-01-28T16:25:22Z

For reproducibility I used a Google Colab environment with Hardware Acceleration set to None

github-actions · 2021-01-28T16:25:55Z

Locust summary

Git references

Initial: c2a5bd9
Terminal: da4e42f

benchmarks/benchmark_tiledb_hub.py

Changes:

Name: time_tiledb
Type: function
Changed lines: 30
Total lines: 30
Name: time_hub
Type: function
Changed lines: 17
Total lines: 17

codecov · 2021-01-28T16:30:03Z

Codecov Report

Merging #508 (9b60667) into master (c2a5bd9) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #508   +/-   ##
=======================================
  Coverage   88.46%   88.46%           
=======================================
  Files          52       52           
  Lines        3745     3745           
=======================================
  Hits         3313     3313           
  Misses        432      432

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c2a5bd9...9b60667. Read the comment docs.

haiyangdeperci · 2021-01-28T16:39:40Z

@DebadityaPal Thanks a lot for posting it. I'll fetch the results 👍 and I'll see if everything is in order.

DebadityaPal · 2021-01-28T17:31:07Z

I just realized, i missed compute() on the hub dataset calls. So I have added that, current results are

Dataset:  activeloop/mnist with Batch Size:  70000
Performance of TileDB
Batch 1 dt: 1.1255619525909424
Time: 1.1257479190826416s
Performance of Hub
Batch 1 dt: 1.1036744117736816
Time: 1.1043024063110352s
Dataset:  activeloop/mnist with Batch Size:  7000
Performance of TileDB
Batch 1 dt: 0.9032697677612305
Batch 2 dt: 1.0399649143218994
Batch 3 dt: 1.0450794696807861
Batch 4 dt: 1.0824565887451172
Batch 5 dt: 1.0563287734985352
Batch 6 dt: 1.0369079113006592
Batch 7 dt: 1.0990698337554932
Batch 8 dt: 1.0535051822662354
Batch 9 dt: 1.1018755435943604
Batch 10 dt: 1.0933120250701904
Time: 10.51236867904663s
Performance of Hub
Batch 1 dt: 0.8278625011444092
Batch 2 dt: 0.01743602752685547
Batch 3 dt: 0.016743183135986328
Batch 4 dt: 0.2736356258392334
Batch 5 dt: 0.016682863235473633
Batch 6 dt: 0.016436100006103516
Batch 7 dt: 0.21753954887390137
Batch 8 dt: 0.019435405731201172
Batch 9 dt: 0.019253969192504883
Batch 10 dt: 0.08768510818481445
Time: 1.5131916999816895s

haiyangdeperci · 2021-01-28T17:32:59Z

@DebadityaPal Now it looks more realistic ;) Thanks!

haiyangdeperci · 2021-01-28T18:04:03Z

@DebadityaPal just a clarification question: you are testing the performance of sequential read right?

DebadityaPal · 2021-01-28T18:06:52Z

Yes, these are sequential reads. Would you like me to submit a file with randomized reads as well?

haiyangdeperci · 2021-01-28T18:10:11Z

@DebadityaPal Eventually - yes. At this moment, this is great and sufficient! It would be more useful for us if you focus on zarr for now as we spoke :) Great job! 💯

DebadityaPal · 2021-01-29T11:50:03Z

@haiyangdeperci do you think these benchmarks are a bit unfair towards Hub because the TileDB data is stored locally whereas hub fetches the data from the cloud, thus creating overhead for cache misses? I modified this code to store both locally, hub performs significantly better in that case. Like, instead of
Time: 1.5131916999816895s
in the second task, it only takes
Time: 0.233504056930542s

haiyangdeperci · 2021-01-29T11:53:35Z

@DebadityaPal I think it would be good for the benchmark to present both cases. We want to show that even if the data is fetched remotely Hub still outperforms TileDB

DebadityaPal · 2021-01-29T17:09:04Z

Another thing that I noticed was the benchmarks for the cloud-stored hub dataset is a little inconsistent in the sense that it varies with variation in Internet Speed and Latency.

haiyangdeperci · 2021-01-29T17:12:55Z

@DebadityaPal That's a good observation. I'm running these benchmarks on a machine closer in location to hub's cloud so it is less of an issue but for the end user it may be significant.

haiyangdeperci · 2021-01-29T18:05:44Z

@DebadityaPal Thanks again for the great work! I'll update the results.

Benchmark for TileDB and Hub added

da4e42f

Linting Error Solved

3deb9d0

Hub Dataset compute() added to TileDB benchmarks

b986d17

mynameisvinn added this to Committed in Development Roadmap Jan 29, 2021

mynameisvinn moved this from Committed to In Development in Development Roadmap Jan 29, 2021

Added test for locally stored hub Dataset

9b60667

haiyangdeperci merged commit 31943d2 into activeloopai:master Jan 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark for TileDB and Hub added #508

Benchmark for TileDB and Hub added #508

DebadityaPal commented Jan 28, 2021

github-actions bot commented Jan 28, 2021

codecov bot commented Jan 28, 2021 •

edited

haiyangdeperci commented Jan 28, 2021

DebadityaPal commented Jan 28, 2021

haiyangdeperci commented Jan 28, 2021

haiyangdeperci commented Jan 28, 2021

DebadityaPal commented Jan 28, 2021

haiyangdeperci commented Jan 28, 2021

DebadityaPal commented Jan 29, 2021

haiyangdeperci commented Jan 29, 2021

DebadityaPal commented Jan 29, 2021

haiyangdeperci commented Jan 29, 2021

haiyangdeperci commented Jan 29, 2021

Benchmark for TileDB and Hub added #508

Benchmark for TileDB and Hub added #508

Conversation

DebadityaPal commented Jan 28, 2021

github-actions bot commented Jan 28, 2021

Locust summary

Git references

codecov bot commented Jan 28, 2021 • edited

Codecov Report

haiyangdeperci commented Jan 28, 2021

DebadityaPal commented Jan 28, 2021

haiyangdeperci commented Jan 28, 2021

haiyangdeperci commented Jan 28, 2021

DebadityaPal commented Jan 28, 2021

haiyangdeperci commented Jan 28, 2021

DebadityaPal commented Jan 29, 2021

haiyangdeperci commented Jan 29, 2021

DebadityaPal commented Jan 29, 2021

haiyangdeperci commented Jan 29, 2021

haiyangdeperci commented Jan 29, 2021

codecov bot commented Jan 28, 2021 •

edited