Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't serialize functions decorated with @span #7954

Open
ntabris opened this issue Jun 27, 2023 · 2 comments
Open

Can't serialize functions decorated with @span #7954

ntabris opened this issue Jun 27, 2023 · 2 comments
Labels
bug Something is broken p3 Affects a small number of users or is largely cosmetic

Comments

@ntabris
Copy link
Contributor

ntabris commented Jun 27, 2023

This gives me an error:

import distributed
from distributed.spans import span

client = distributed.Client()

@span("my-double")
def double(x):
    return x * 2

X = client.map(double, range(10))

I get

TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 1 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x166f25e90>\n 0. 6029444544\n>')

This is with dask and distributed 2023.6.2a230627, installed from dask/label/dev, as well as 2023.6.1 and 2023.6.0.

Am I supposed to be able to do this? Or am I doing something wrong? The same code minus the @span decorator works.

@crusaderky
Copy link
Collaborator

I've reproduced the issue; it's a cloudpickle bug (cloudpipe/cloudpickle#509).

This said, this is not how spans work.
span is to be used as a context manager or decorator for the client code that generates the dask graph.

The correct way to write your example is

import distributed
from distributed import span

client = distributed.Client()

def double(x):
    return x * 2

with span("my-double"):
    X = client.map(double, range(10))

or as a decorator:

def double(x):
    return x * 2

@span("my-double")
def gen_graph(client):
    return client.map(double, range(10))

X = gen_graph(client)

which of course in the above minimal example feels a bit silly, but makes a lot more sense if instead of a one-liner you have complicated code, for example:

@span("load")
def load(url: str) -> dd.DataFrame:
    ...

@span("preprocess")
def preprocess(data: dd.DataFrame) -> dd.DataFrame:
    ...

@span("train")
def train(training_data: dd.DataFrame) -> xgboost.dask.Model:
    ...

raw_data = load("s3://mybucket/mydata.parquet")
training_data = preprocess(raw_data)
model = train(training_data)

span inside a function that runs in a worker makes sense only when you want to create tasks from tasks; in other words when you use the get_client and secede API:

def f():
    client = distributed.get_client()
    with span("bar"):
        x = ... # define dask collection
    fut = client.compute(x)
    distributed.secede()
    return fut.result()

def main():
    client = distributed.Client()
    with span("foo"):
        client.submit(f).result()

The above example will generate span foo and then a subspan foo->bar.

I'm leaving this issue open (tracking the upstream cloudpickle ticket) as your usage is potentially useful in this latter use case; in other words it would make sense to write

@span("bar")
def f():
    client = distributed.get_client()
    x = ... # define dask collection
    fut = client.compute(x)
    distributed.secede()
    return fut.result()

def main():
    client = distributed.Client()
    with span("foo"):
        client.submit(f).result()

@crusaderky crusaderky changed the title span decorator not working? Can't serialize functions decorated with @span Jul 6, 2023
@crusaderky crusaderky added bug Something is broken p3 Affects a small number of users or is largely cosmetic and removed needs triage labels Jul 6, 2023
@fjetter
Copy link
Member

fjetter commented Aug 2, 2023

I understand the reasoning why the decorator does what it does but I think this distinction is confusing for users. With this limitation I wonder if it is not best to remove the decorator entirely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken p3 Affects a small number of users or is largely cosmetic
Projects
None yet
Development

No branches or pull requests

3 participants