Cubed example using Coiled on AWS #271

rsignell · 2024-01-31T20:42:59Z

I tried following the instructions to run a Cubed demo using Coiled on AWS following these instructions, and the environment creation went smoothly:

mamba create -n cubed-coiled-examples python=3.9 pip -y
conda activate cubed-coiled-examples
pip install 'cubed[coiled]'

but when I tried running the example:

git clone git@github.com:tomwhite/cubed.git
cd cubed/examples/coiled/aws
export AWS_PROFILE=esip-qhub
python coiled-add-asarray.py "s3://esip-qhub/rsignell/testing/cubed"

I got:

Traceback (most recent call last):
  File "/home/rsignell/repos/cubed/examples/coiled/aws/coiled-add-asarray.py", line 22, in <module>
    res = c.compute(
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/site-packages/cubed/core/array.py", line 136, in compute
    result = compute(
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/site-packages/cubed/core/array.py", line 225, in compute
    plan.execute(
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/site-packages/cubed/core/plan.py", line 171, in execute
    executor.execute_dag(
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/site-packages/cubed/runtime/executors/coiled.py", line 31, in execute_dag
    coiled_function = make_coiled_function(pipeline.function, coiled_kwargs)
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/site-packages/cubed/runtime/executors/coiled.py", line 13, in make_coiled_function
    return coiled.function(**coiled_kwargs)(execution_stats(func))
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/site-packages/coiled/function.py", line 404, in decorator
    Function(func, cluster_kwargs, keepalive=keepalive, environ=environ, local=local, name=name)
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/site-packages/coiled/function.py", line 49, in __init__
    self._code = inspect.getsource(function)
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/inspect.py", line 1024, in getsource
    lines, lnum = getsourcelines(object)
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/inspect.py", line 1006, in getsourcelines
    lines, lnum = findsource(object)
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/inspect.py", line 817, in findsource
    file = getsourcefile(object)
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/inspect.py", line 697, in getsourcefile
    filename = getfile(object)
  File "/home/rsignell/miniforge3/envs/cubed-coiled-examples/lib/python3.9/inspect.py", line 677, in getfile
    raise TypeError('module, class, method, function, traceback, frame, or '
TypeError: module, class, method, function, traceback, frame, or code object was expected, got partial

I'm raising the issue here as I'm guessing something on the Coiled side changed?
But of course I don't really know so I'm prepared to be told to post it over at cubed... :)

The text was updated successfully, but these errors were encountered:

jrbourbeau · 2024-01-31T20:54:20Z

Thanks @rsignell. Looking at the traceback, I think this is probably on our end. I'll take closer look

jrbourbeau · 2024-01-31T21:02:41Z

We have some code collection logic that doesn't handle functools.partial objects well. I'll push a fix up for that

In the meantime, I'm not sure where the partial is coming from inside cubed. Unpacking the original function by getting the .func attribute on the partial-wrapped function could be a temporary workaround

rsignell · 2024-01-31T21:28:03Z

Unpacking the original function by getting the .func attribute on the partial-wrapped function could be a temporary workaround

Is this a workaround I could implement? I'm afraid I don't understand this! 😕

jrbourbeau · 2024-01-31T21:37:47Z

Ah, sorry, I meant if you're playing around with a development version of cubed, then you could make some changes to fix things. We should have a fix out in coiled soon, so it may be faster to just wait for that

jrbourbeau · 2024-01-31T21:59:55Z

We'll likely have a new release out sometime later today or tomorrow, but if you want to try a dev coiled release with the fix you can pip install coiled==1.3.12.dev10

rsignell · 2024-02-01T13:01:09Z

Thanks @jrbourbeau!

Both tests ran successfully with pip install coiled==1.3.12.dev12.

Here's the results of the larger coiled-add-random.py test:

python coiled-add-random.py "s3://esip-qhub/rsignell/testing/cubed"

rsignell · 2024-02-01T13:42:25Z

@tomwhite or @TomNicholas does the Coiled performance here appear (from the screenshots above) to be comparable to other executors? If not, I'm hoping to work with Coiled engineers to see if they have ideas about how to improve performance.

tomwhite · 2024-02-01T15:58:51Z

Thanks for looking at this @rsignell, and for the bugfix @jrbourbeau.

I tried the Coiled executor a while back and had a similar experience with the number of instances not scaling as much as it could have done (see cubed-dev/cubed#309). It would be great if Coiled could ramp up the number of instances faster (is it possible to tell it to do that?). Other executors like Lithops or Modal can scale up to 100s or 1000s of instances in seconds.

TomNicholas · 2024-02-01T16:04:15Z

We don't yet have good benchmarks so it's hard to know what the total time should be (and a lot has changed since I last ran anything at scale), but here's a way to get an idea of what the performance should be.

If I attempt to run this example locally¹ (just for demonstration purposes) then you see a progress bar for each step in the plan (i.e. each intermediate array that needs to be written to zarr).

Ignoring the "create arrays", there are 3 steps, each of which consists of 100 embarrassingly parallel tasks (each is a chunk to be saved to an intermediate zarr store). You can see them laid out in the plan:

If you don't have 100 chunks being computed in parallel on Coiled between each step then it could go faster.² If you were to run this on Lambda/GCF then that's what you would see.

Assuming your coiled cluster is computing one chunk per worker, it looks like you're only computing 10 chunks at once, so it could be at least 10x faster (before any other graph optimizations). The bottleneck here is the dynamic scaling up of the Coiled Cluster to match the very wide task graphs that Cubed is passing it.

There's probably a neater way to get to this information that I'm blanking on right now. EDIT: I could simply have hovered over the graph nodes in jupyter to see the number of tasks for that node. ↩
In fact I think since the first two arrays are independent, and since https://github.com/tomwhite/cubed/issues/80 was fixed, the first part of the calculation should be able to compute 200 chunks at once. ↩

mrocklin · 2024-02-01T16:19:00Z

Coiled uses raw VMs. This means that things are super-cheap, but also that there's a minute-long cold startup time one has to deal with (systems like Modal get around this by having VMs up all the time, but this is expensive and assumes that you're in their region/account. This gets rolled into the 10x cost difference).

Because things have minute-long startup times, the adaptive settings are configured to scale up to something large enough so that it completes in ~5 minutes. If we're going to spend one minute spinning up a VM, we want that VM to be active for at least a couple more minutes before spinning it down.

You can tweak those configurations if you like, but you'll never get sub-minute with Coiled. There's a cost-speed tradeoff here.

shughes-uk · 2024-02-01T16:28:58Z

https://docs.coiled.io/user_guide/clusters/scale.html#adaptive-scaling adaptive scaling docs if you want to tweak some settings.

I suspect Cubed can probably make better decisions about when to scale in/out though, and just achieve that through manual scale calls.

tomwhite · 2024-02-01T16:51:47Z

I suspect Cubed can probably make better decisions about when to scale in/out though, and just achieve that through manual scale calls.

Yes, we could pre-scale to the maximum width of the graph.

rsignell · 2024-02-01T18:55:53Z

I was planning on using this for rechunking a large Zarr dataset (reading 1TB Zarr from S3, writing a new rechunked 1 TB Zarr to S3). For this use case I don't care about super fast autoscaling, do I?

mrocklin · 2024-02-01T19:37:48Z

Yes, we could pre-scale to the maximum width of the graph

Cost-wise this might be bad, especially if the tasks run quickly. If you spend a minute spinning up VMs that then just run for a second, then you're spending the vast majority of VM-rental-time doing nothing other than setting up.

I was planning on using this for rechunking a large Zarr dataset (reading 1TB Zarr from S3, writing a new rechunked 1 TB Zarr to S3). For this use case I don't care about super fast autoscaling, do I?

Not at all. This is a case where Coiled's approach of trading slow spin-up time for dirt cheap spot VMs is ideal. You're spending $0.02 per CPU hour vs $0.20 per CPU hour, as long as you're fine waiting a few minutes for everything to complete.

rsignell closed this as completed Feb 1, 2024

rsignell mentioned this issue Feb 1, 2024

small changes to coiled/aws examples cubed-dev/cubed#369

Merged

rsignell reopened this Feb 1, 2024

TomNicholas mentioned this issue Feb 1, 2024

Fuse operations with different numbers of tasks cubed-dev/cubed#368

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cubed example using Coiled on AWS #271

Cubed example using Coiled on AWS #271

rsignell commented Jan 31, 2024 •

edited

Loading

jrbourbeau commented Jan 31, 2024

jrbourbeau commented Jan 31, 2024

rsignell commented Jan 31, 2024

jrbourbeau commented Jan 31, 2024

jrbourbeau commented Jan 31, 2024

rsignell commented Feb 1, 2024 •

edited

Loading

rsignell commented Feb 1, 2024 •

edited

Loading

tomwhite commented Feb 1, 2024

TomNicholas commented Feb 1, 2024 •

edited

Loading

mrocklin commented Feb 1, 2024

shughes-uk commented Feb 1, 2024

tomwhite commented Feb 1, 2024

rsignell commented Feb 1, 2024

mrocklin commented Feb 1, 2024

Cubed example using Coiled on AWS #271

Cubed example using Coiled on AWS #271

Comments

rsignell commented Jan 31, 2024 • edited Loading

jrbourbeau commented Jan 31, 2024

jrbourbeau commented Jan 31, 2024

rsignell commented Jan 31, 2024

jrbourbeau commented Jan 31, 2024

jrbourbeau commented Jan 31, 2024

rsignell commented Feb 1, 2024 • edited Loading

rsignell commented Feb 1, 2024 • edited Loading

tomwhite commented Feb 1, 2024

TomNicholas commented Feb 1, 2024 • edited Loading

Footnotes

mrocklin commented Feb 1, 2024

shughes-uk commented Feb 1, 2024

tomwhite commented Feb 1, 2024

rsignell commented Feb 1, 2024

mrocklin commented Feb 1, 2024

rsignell commented Jan 31, 2024 •

edited

Loading

rsignell commented Feb 1, 2024 •

edited

Loading

rsignell commented Feb 1, 2024 •

edited

Loading

TomNicholas commented Feb 1, 2024 •

edited

Loading