Integration test: common physical science workload #174

gjoseph92 · 2022-06-13T23:38:43Z

I believe this could be simplified to something like

import dask.array as da

a = da.random.random((1000, 900, 800), chunks=100)
b = da.random.random((1000, 900, 800), chunks=100)
x = da.random.random((900, 800), chunks=100)
y = da.random.random((900, 800), chunks=100)

result = a[1:] * x + b[1:] * y
result.mean().compute()

This is basically the vorticity example from dask/distributed#6560 (comment). (Ideally downsized a bit, so it's faster and cheaper to run frequently.)

Add rechunks before the result operation for bonus complexity. (Note that the data and chunk sizes are just representative here and would need to be set to something larger than the cluster.)

This is expected to perform poorly right now (use a ton of memory, spill a ton to disk, and be really slow) because of:

As we work on fixes to those above issues in the future, we should hopefully see performance improve.

The text was updated successfully, but these errors were encountered:

gjoseph92 · 2022-08-19T15:30:55Z

I think I already did this in #243: https://github.com/coiled/coiled-runtime/blob/669a3d36eefec2c72af12dae08e2bee8db4a7de5/tests/benchmarks/test_array.py#L90-L133

I used the original vorticity example rather than the further-simplified one here, but I think they're equivalent in terms of the problems they exercise.

One big difference is that I did wait(arr_to_devnull(result)) instead of wait(result). In @TomNicholas's original workload, he wanted to write the array to zarr (I think right)? In my benchmarks in dask/distributed#6560 (comment), the "something else is going on here" thing that confused me was that in all cases, memory just went 📈, and ended at the same level in all cases.

I realized later that I was persisting the entire result, so dask had to keep all the output in memory, so of course it used a lot of memory. In #243 I instead simulated writing it to zarr, which theoretically should be able to work with a much-larger-than-memory array, but didn't due to all the problems listed.

gjoseph92 added stability work related to stability test idea labels Jun 13, 2022

gjoseph92 mentioned this issue Jun 14, 2022

Almost embarrassingly parallel workload piles up on a single worker #141

Open

fjetter mentioned this issue Jun 21, 2022

Expand coverage of stability integration tests #138

Open

7 tasks

gjoseph92 closed this as completed Aug 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration test: common physical science workload #174

Integration test: common physical science workload #174

gjoseph92 commented Jun 13, 2022 •

edited

Loading

gjoseph92 commented Aug 19, 2022

Integration test: common physical science workload #174

Integration test: common physical science workload #174

Comments

gjoseph92 commented Jun 13, 2022 • edited Loading

gjoseph92 commented Aug 19, 2022

gjoseph92 commented Jun 13, 2022 •

edited

Loading