Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark restructuring and memory profiling #642

Merged
merged 18 commits into from
Mar 30, 2021

Conversation

benchislett
Copy link
Contributor

@benchislett benchislett commented Mar 3, 2021

Issues

Problems I've been having with benchmarks:

  • Unclear which tests exist, and for what. Unclear what each file contains, which led to duplication of benchmark code
  • Unable to isolate setup and runtime for a given benchmark, so that higher-order composition can yield analysis over arbitrary system metrics (thus enabling memory and network profiling) (Memory benchmarks  #638 [FEATURE] Benchmarking memory #516)
  • No consistent way to run benchmarks
  • No clean way to tinker with visualizations of benchmarks

Progress

First thing is to refactor all the existing benchmarks. The style I went for is benchmark_title.py which contains two exported functions benchmark_title_setup which takes many parameters, and benchmark_title_run which takes a parameter tuple returned by the previous setup call. All files have been refactored in this way excepting the tiledb/zarr benchmarks, see c7bbbd0 (ping @DebadityaPal). In the future, if our benchmarks become sufficiently complex, we can further refactor these into classes.

Next we need to sort out which suites we want to run and when, how we want to store output, and what we want to profile. Time is obvious, but also memory and network. As of right now, time, memory, and network benchmarks are all working. We'll need to choose exactly which suites we want to run and store, but this is now a trivial extension.

The next step after this PR is to configure the full suite, output to a file (yaml or similar), and run the visualizations on that (or do them in the same notebook and store them as artifacts as well).

TL;DR

Time, memory, network benchmarks for all metrics are live. This PR is ready to go!

Plan: each benchmark will export two functions, filename_setup and
filename_run, which designate how the benchmark is to be run. setup will
take any required parameters and return a processed param tuple to be
passed as argument to the runner. The runner is designed to be as slim
as possible so we only measure the crucial code. Then, we can externally
call these functions and time/profile/benchmark on the runtime of the
function call, allowing for a great increase of control. Further
refactors along this design coming soon.
Following the previous commit, this refactors the benchmark_dataset_iter
into separate files with the same design as the now-refactored
`benchmark_compress_hub.py`. One step closer to full control
It'll be nice to keep track of this as well. Might be subsumed by the
dataset_comparison file, but I'll get to that next.
Improves `benchmark_access_hub_full.py` and uses that as a base for
`benchmark_access_hub_slice.py` which replaces functionality from
`benchmark_random_access.py` (now deleted).
Existing refactored benchmarks now cover all cases once present in this
file.
Until these can be converted, I want to have a distinction to know what
is and isn't compatible with the new runner (next few commits). This
will probably be fixed before going in
@github-actions
Copy link

github-actions bot commented Mar 3, 2021

Locust summary

Git references

Initial: c214a53
Terminal: 63504ce

benchmarks/benchmark_access_hub_full.py
Changes:
benchmarks/benchmark_access_hub_slice.py
Changes:
benchmarks/benchmark_compress_hub.py
Changes:
benchmarks/benchmark_compress_pillow.py
Changes:
benchmarks/benchmark_iterate_hub_local_pytorch.py
Changes:
benchmarks/benchmark_iterate_hub_local_tensorflow.py
Changes:
benchmarks/benchmark_iterate_hub_pytorch.py
Changes:
benchmarks/benchmark_iterate_hub_tensorflow.py
Changes:
benchmarks/legacy_benchmark_sequential_access.py
Changes:
  • Name: time_batches
    Type: function
    Changed lines: 23
    Total lines: 23
    • Name: time_tiledb
      Type: function
      Changed lines: 56
      Total lines: 56
      • Name: time_zarr
        Type: function
        Changed lines: 36
        Total lines: 36
        • Name: time_hub
          Type: function
          Changed lines: 11
          Total lines: 11
          benchmarks/legacy_benchmark_sequential_write.py
          Changes:
          • Name: time_batches
            Type: function
            Changed lines: 25
            Total lines: 25
            • Name: time_tiledb
              Type: function
              Changed lines: 25
              Total lines: 25
              • Name: time_zarr
                Type: function
                Changed lines: 12
                Total lines: 12
                • Name: time_hub
                  Type: function
                  Changed lines: 22
                  Total lines: 22

                  @benchislett benchislett mentioned this pull request Mar 3, 2021
                  @DebadityaPal
                  Copy link
                  Contributor

                  @benchislett I'll change the zarr/tiledb benchmarks to follow the style the other benchmarks are following.

                  @haiyangdeperci
                  Copy link
                  Contributor

                  I'll take a look on the benchmarks, thanks a lot for your wonderful input.
                  @benchislett Could you run fix the linting in the meantime?

                  @benchislett
                  Copy link
                  Contributor Author

                  Yeah, I haven't been able to get the linter going nicely in my IDE so I usually just save it until my last commit and fix it all then. Coming soon.

                  @haiyangdeperci
                  Copy link
                  Contributor

                  @benchislett sure, no worries 👍

                  @haiyangdeperci
                  Copy link
                  Contributor

                  @benchislett Could you give me an estimate with regards to when you will incorporate these changes?

                  @benchislett
                  Copy link
                  Contributor Author

                  As soon as I get a spare couple hours. Probably this weekend. By Monday at the latest

                  @haiyangdeperci
                  Copy link
                  Contributor

                  @benchislett Amazing! Thank you. Monday sounds great!

                  @haiyangdeperci
                  Copy link
                  Contributor

                  @benchislett Have you encountered any obstacles while working on this PR? Perhaps, I could help.

                  @benchislett
                  Copy link
                  Contributor Author

                  @benchislett Have you encountered any obstacles while working on this PR? Perhaps, I could help.

                  Thanks, but there really haven't been any issues other than a distinct lack of time. This weekend was crunch time for me at work and I was busy non-stop. I'll have my commits in tonight and we can go from there

                  @haiyangdeperci
                  Copy link
                  Contributor

                  @benchislett No worries, I just wanted to make sure everything is all right. That's great. Looking forward to the update 💯

                  @haiyangdeperci
                  Copy link
                  Contributor

                  @benchislett Thanks for the changes in the notebook, I quite appreciate those. Would you mind fixing the linting though? Just run black over the code, I assume.

                  @haiyangdeperci haiyangdeperci self-requested a review March 30, 2021 14:08
                  @haiyangdeperci haiyangdeperci merged commit da105b0 into activeloopai:master Mar 30, 2021
                  Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
                  Labels
                  None yet
                  Projects
                  None yet
                  Development

                  Successfully merging this pull request may close these issues.

                  3 participants