Skip to content

Commit

Permalink
Benchmark restructuring and memory profiling (#642)
Browse files Browse the repository at this point in the history
* Refactor compression benchmarks to runnable design

Plan: each benchmark will export two functions, filename_setup and
filename_run, which designate how the benchmark is to be run. setup will
take any required parameters and return a processed param tuple to be
passed as argument to the runner. The runner is designed to be as slim
as possible so we only measure the crucial code. Then, we can externally
call these functions and time/profile/benchmark on the runtime of the
function call, allowing for a great increase of control. Further
refactors along this design coming soon.

* Refactor dataset iteration benchmarks

Following the previous commit, this refactors the benchmark_dataset_iter
into separate files with the same design as the now-refactored
`benchmark_compress_hub.py`. One step closer to full control

* Add full dataset compute benchmark

It'll be nice to keep track of this as well. Might be subsumed by the
dataset_comparison file, but I'll get to that next.

* Refactor benchmark_random_access into new format

Improves `benchmark_access_hub_full.py` and uses that as a base for
`benchmark_access_hub_slice.py` which replaces functionality from
`benchmark_random_access.py` (now deleted).

* Remove unused line in benchmark_iterate_hub TF

* Local variants of iteration benchmarks using tfds

* Remove dataset compare benchmarks

Existing refactored benchmarks now cover all cases once present in this
file.

* Rename remaining un-refactored benchmarks "legacy"

Until these can be converted, I want to have a distinction to know what
is and isn't compatible with the new runner (next few commits). This
will probably be fixed before going in

* Fix minor issues in total access benchmarks

* Initial prototype for benchmark runner notebook

* Update benchmark runner notebook

* Add psutil to benchmark requirements

* Fix pytorch and tensorflow local benchmarks

* Add network benchmarking and expand suites

* Update .gitignore with benchmark local data

* Auto-fix issues with black

* Add time to network monitor output to plot better
  • Loading branch information
benchislett committed Mar 30, 2021
1 parent 286eae2 commit da105b0
Show file tree
Hide file tree
Showing 18 changed files with 527 additions and 421 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -195,3 +195,7 @@ cov.xml
hub/api/cov.xml
hub/api/nested_seq
nested_seq

# Benchmark local test data (auto-downloaded)
benchmarks/hub_data
benchmarks/torch_data
16 changes: 16 additions & 0 deletions benchmarks/benchmark_access_hub_full.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
from hub import Dataset


def benchmark_access_hub_full_setup(dataset_name, field=None):
dset = Dataset(dataset_name, cache=False, storage_cache=False, mode="r")

keys = dset.keys
if field is not None:
keys = (field,)
return (dset, keys)


def benchmark_access_hub_full_run(params):
dset, keys = params
for k in keys:
dset[k].compute()
16 changes: 16 additions & 0 deletions benchmarks/benchmark_access_hub_slice.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
from hub import Dataset


def benchmark_access_hub_slice_setup(dataset_name, slice_bounds, field=None):
dset = Dataset(dataset_name, cache=False, storage_cache=False, mode="r")

keys = dset.keys
if field is not None:
keys = (field,)
return (dset, slice_bounds, keys)


def benchmark_access_hub_slice_run(params):
dset, slice_bounds, keys = params
for k in keys:
dset[k][slice_bounds[0] : slice_bounds[1]].compute()
28 changes: 28 additions & 0 deletions benchmarks/benchmark_compress_hub.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import numpy as np
from PIL import Image

import hub


def benchmark_compress_hub_setup(
times, image_path="./images/compression_benchmark_image.png"
):
img = Image.open(image_path)
arr = np.array(img)
ds = hub.Dataset(
"./data/bench_png_compression",
mode="w",
shape=times,
schema={"image": hub.schema.Image(arr.shape, compressor="png")},
)

batch = np.zeros((times,) + arr.shape, dtype="uint8")
for i in range(times):
batch[i] = arr

return (ds, times, batch)


def benchmark_compress_hub_run(params):
ds, times, batch = params
ds["image", :times] = batch
16 changes: 16 additions & 0 deletions benchmarks/benchmark_compress_pillow.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
from PIL import Image
from io import BytesIO


def benchmark_compress_pillow_setup(
times, image_path="./images/compression_benchmark_image.png"
):
img = Image.open(image_path)
return (img, times)


def benchmark_compress_pillow_run(params):
img, times = params
for _ in range(times):
b = BytesIO()
img.save(b, format="png")
47 changes: 0 additions & 47 deletions benchmarks/benchmark_compress_time.py

This file was deleted.

0 comments on commit da105b0

Please sign in to comment.