`benchmark_time` includes cluster startup time #238

gjoseph92 · 2022-08-10T01:28:34Z

The benchmark_time fixture simply times how long it takes the entire test to run:
https://github.com/coiled/coiled-runtime/blob/54c49ccac9546188acc5bc1099ab1b31e3ccae1a/conftest.py#L206-L212

However, that doesn't distinguish between cluster startup time (and any test initialization time) and actual runtime of the core workload.

Cluster startup is somewhat availability-dependent and we should expect to be a little variable
Improvements/regressions to cluster startup time might look like improvements/regressions in dask performance, and visa versa

Both are valuable metrics to track, but they shouldn't be combined into one.

cc @ian-r-rose @jrbourbeau

The text was updated successfully, but these errors were encountered:

ian-r-rose · 2022-08-10T01:36:59Z

That's right, this is a very crude metric, but a few notes:

It's a function-scoped benchmark, so tests which include a module-scoped cluster fixture (as most of them do) will not include the cluster startup, unless they specifically include a cluster spinup in the test, like test_default_cluster_spinup_time(). Other potentially costly things like transferring large benchmarking data from the scheduler to the client, uploading performance reports, restarting the client, etc might get caught up in this, however.
An important metric that we'll want to include are timing data for task groups, including compute time, disk spill time, transfer time, etc. I have some WIP work for that, but it hasn't landed yet (though note there are spots in the benchmark schema form them already). I think that these are complementary to a "test wall-clock" time fixture.

I'm happy to hear suggestions on how to improve this fixture, or what additional metrics to collect.

gjoseph92 · 2022-08-10T02:21:04Z

It's a function-scoped benchmark, so tests which include a module-scoped cluster fixture (as most of them do) will not include the cluster startup

Okay, maybe that's the pattern we need to use then. For the benchmarks I'm adding in #191 though, I was planning to use a number of different cluster sizes and configurations, which makes using a single cluster fixture hard. Just creating the cluster inline in the test is easier to read for those cases IMO.

Maybe benchmark_time should be a contextmanager, so you can time just the relevant block, and there could be an auto_benchmark_time fixture to time the whole test that you can use when appropriate?

ian-r-rose · 2022-08-10T02:37:19Z

Maybe benchmark_time should be a contextmanager, so you can time just the relevant block, and there could be an auto_benchmark_time fixture to time the whole test that you can use when appropriate?

Yes, this is very doable -- a number of other fixtures here are structured as context managers, including the memory sampler: https://github.com/coiled/coiled-runtime/blob/80ffc2e2564e19ea4d623b4f5b52975dc9845ef6/conftest.py#L217-L231

ian-r-rose · 2022-08-10T15:32:03Z

Something like the following would probably do the trick:

diff --git a/conftest.py b/conftest.py
index a9ae89a..9528444 100644
--- a/conftest.py
+++ b/conftest.py
@@ -201,17 +201,27 @@ def test_run_benchmark(benchmark_db_session, request, testrun_uid):
         benchmark_db_session.commit()
 
 
-@pytest.fixture(scope="function", autouse=True)
+@pytest.fixture(scope="function")
 def benchmark_time(test_run_benchmark):
-    if not test_run_benchmark:
-        yield
-    else:
-        start = time.time()
+    @contextlib.contextmanager
+    def _benchmark_time():
+        if not test_run_benchmark:
+            yield
+        else:
+            start = time.time()
+            yield
+            end = time.time()
+            test_run_benchmark.duration = end - start
+            test_run_benchmark.start = datetime.datetime.utcfromtimestamp(start)
+            test_run_benchmark.end = datetime.datetime.utcfromtimestamp(end)
+
+    return _benchmark_time()
+
+
+@pytest.fixture(scope="function")
+def auto_benchmark_time(benchmark_time):
+    with benchmark_time:
         yield
-        end = time.time()
-        test_run_benchmark.duration = end - start
-        test_run_benchmark.start = datetime.datetime.utcfromtimestamp(start)
-        test_run_benchmark.end = datetime.datetime.utcfromtimestamp(end)

Note that the current benchmark_time is currently an autouse fixture, so we'd need to switch it to be explicitly included where appropriate (or autoused at the module scope).

ian-r-rose mentioned this issue Aug 10, 2022

More specific time benchmarking #239

Merged

gjoseph92 closed this as completed in #239 Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`benchmark_time` includes cluster startup time #238

`benchmark_time` includes cluster startup time #238

gjoseph92 commented Aug 10, 2022

ian-r-rose commented Aug 10, 2022

gjoseph92 commented Aug 10, 2022

ian-r-rose commented Aug 10, 2022

ian-r-rose commented Aug 10, 2022

benchmark_time includes cluster startup time #238

benchmark_time includes cluster startup time #238

Comments

gjoseph92 commented Aug 10, 2022

ian-r-rose commented Aug 10, 2022

gjoseph92 commented Aug 10, 2022

ian-r-rose commented Aug 10, 2022

ian-r-rose commented Aug 10, 2022

`benchmark_time` includes cluster startup time #238

`benchmark_time` includes cluster startup time #238