Put zarr, tileDB and hub benchmarks in one file #534

DebadityaPal · 2021-02-04T15:48:54Z

A preliminary attempt at standardizing the benchmarks. Also did some optimizations. Linked to #529

github-actions · 2021-02-04T15:49:30Z

Locust summary

Git references

Initial: 0128704
Terminal: d76b956

benchmarks/benchmark_tiledb_zarr_hub.py

Changes:

Name: time_tiledb
Type: function
Changed lines: 33
Total lines: 33
Name: time_zarr
Type: function
Changed lines: 44
Total lines: 44
Name: time_hub
Type: function
Changed lines: 20
Total lines: 20

codecov · 2021-02-04T15:57:05Z

Codecov Report

Merging #534 (f90f962) into master (4d40553) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #534   +/-   ##
=======================================
  Coverage   88.46%   88.46%           
=======================================
  Files          52       52           
  Lines        3745     3745           
=======================================
  Hits         3313     3313           
  Misses        432      432

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d40553...f90f962. Read the comment docs.

haiyangdeperci · 2021-02-04T16:01:40Z

Good job! Could you rename the file to benchmark_sequential_access.py?

DebadityaPal · 2021-02-04T16:28:38Z

Yeah, I changed it to the more appropriate name.

haiyangdeperci · 2021-02-04T16:37:34Z

@DebadityaPal Great! I'll review it soon.

mynameisvinn · 2021-02-04T18:05:03Z

@DebadityaPal I'll defer to @haiyangdeperci but I see duplicate code that can be factored out.

for example:

with Timer("Time"):
    ...

DebadityaPal · 2021-02-05T05:45:16Z

@mynameisvinn I have removed the duplicate code. Thanks for pointing it out.

mynameisvinn · 2021-02-05T13:20:49Z

@DebadityaPal much better! There is nothing better than concise, well organized code :-)

haiyangdeperci

Is it absolutely necessary to put the the entire dataset in memory as below:

ds["image"].compute().reshape(ds.shape[0], -1),

The machine we are using for benchmarks has 160 GB of memory. This step requires 330 GiB of memory.

MemoryError: Unable to allocate 330. GiB for an array with shape (1803460, 256, 256, 3) and data type uint8

I've noted some other inefficiencies and I am working on them as well but this one is a blocker.

haiyangdeperci · 2021-02-11T12:01:53Z

@DebadityaPal I am still having some memory issues after the patch. For instance:

Traceback (most recent call last):
  File "Hub/benchmarks/benchmark_sequential_access.py", line 165, in <module>
    time_tiledb(dataset, batch_size, split)
  File "Hub/benchmarks/benchmark_sequential_access.py", line 90, in time_tiledb
    ds_tldb[batch * batch_size : (batch + 1) * batch_size] = ds_numpy
  File "tiledb/libtiledb.pyx", line 4909, in tiledb.libtiledb.DenseArrayImpl.__setitem__
  File "tiledb/libtiledb.pyx", line 416, in tiledb.libtiledb._write_array
  File "tiledb/libtiledb.pyx", line 480, in tiledb.libtiledb._raise_ctx_err
  File "tiledb/libtiledb.pyx", line 465, in tiledb.libtiledb._raise_tiledb_error
tiledb.libtiledb.TileDBError: [TileDB::ChunkedBuffer] Error: Chunk write error; malloc() failed

DebadityaPal · 2021-02-12T09:26:24Z

The current changes resolve the memory crash issue. The whole thing should take less than 15GB of ram now.

haiyangdeperci · 2021-02-12T09:28:21Z

@DebadityaPal Great! Thanks for the update. I'll check it out right away

DebadityaPal added 2 commits February 4, 2021 21:12

Put all benchmarks in one file

b5904fc

Added places365_small

d76b956

File renamed

fe1f48c

Removed some duplicate code.

3fd4d88

mynameisvinn requested a review from haiyangdeperci February 5, 2021 13:21

mynameisvinn assigned DebadityaPal Feb 5, 2021

mynameisvinn added the enhancement New feature or request label Feb 5, 2021

haiyangdeperci mentioned this pull request Feb 8, 2021

Added sequential write benchmarks #538

Merged

haiyangdeperci suggested changes Feb 10, 2021

View reviewed changes

DebadityaPal added 2 commits February 10, 2021 21:29

Added batchwise writing to create TileDB Array

6e1166c

Merge conflicts resolved

94132f2

haiyangdeperci approved these changes Feb 10, 2021

View reviewed changes

Memory Crash Issue resolved

f90f962

davidbuniat merged commit 11a7cb3 into activeloopai:master Feb 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Put zarr, tileDB and hub benchmarks in one file #534

Put zarr, tileDB and hub benchmarks in one file #534

DebadityaPal commented Feb 4, 2021

github-actions bot commented Feb 4, 2021

codecov bot commented Feb 4, 2021 •

edited

Loading

haiyangdeperci commented Feb 4, 2021

DebadityaPal commented Feb 4, 2021

haiyangdeperci commented Feb 4, 2021

mynameisvinn commented Feb 4, 2021 •

edited

Loading

DebadityaPal commented Feb 5, 2021

mynameisvinn commented Feb 5, 2021

haiyangdeperci left a comment

haiyangdeperci commented Feb 11, 2021

DebadityaPal commented Feb 12, 2021

haiyangdeperci commented Feb 12, 2021

Put zarr, tileDB and hub benchmarks in one file #534

Put zarr, tileDB and hub benchmarks in one file #534

Conversation

DebadityaPal commented Feb 4, 2021

github-actions bot commented Feb 4, 2021

Locust summary

Git references

codecov bot commented Feb 4, 2021 • edited Loading

Codecov Report

haiyangdeperci commented Feb 4, 2021

DebadityaPal commented Feb 4, 2021

haiyangdeperci commented Feb 4, 2021

mynameisvinn commented Feb 4, 2021 • edited Loading

DebadityaPal commented Feb 5, 2021

mynameisvinn commented Feb 5, 2021

haiyangdeperci left a comment

Choose a reason for hiding this comment

haiyangdeperci commented Feb 11, 2021

DebadityaPal commented Feb 12, 2021

haiyangdeperci commented Feb 12, 2021

codecov bot commented Feb 4, 2021 •

edited

Loading

mynameisvinn commented Feb 4, 2021 •

edited

Loading