Skip to content

[BUG] Failing to find dependencies in tuple of futures #4177

Description

@rjzamora

What happened: After #4139 there seem to be a number of broken tests in custreamz CI. The failures seem to be related to the way streamz/custreamz is using futures. More specifically, the task graph contains tuples with futures objects that the new get_all_dependencies method is failing to search for dependencies.

What you expected to happen:
I would expect distributed to handle a tuple of futures as an argument.

Minimal Complete Verifiable Example:

For example, the following seems to work with 3e5b506, but fails on master:

import pandas as pd
from distributed import LocalCluster, Client

def _create(size):
    return pd.DataFrame({"a": range(size)})

client = Client(LocalCluster(n_workers=1))

x = client.submit(_create, 5)
df = client.submit(pd.concat, (x,))  # Note the tuple argument
df.result()

Traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-0bc1dd2ca416> in <module>
      9 x = client.submit(_create, 5)
     10 df = client.submit(pd.concat, (x,))
---> 11 df.result()

~/workspace/cudf-0.16/distributed/distributed/client.py in result(self, timeout)
    224         if self.status == "error":
    225             typ, exc, tb = result
--> 226             raise exc.with_traceback(tb)
    227         elif self.status == "cancelled":
    228             raise result

/datasets/rzamora/miniconda3/envs/cudf_16/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat()
    282         verify_integrity=verify_integrity,
    283         copy=copy,
--> 284         sort=sort,
    285     )
    286 

/datasets/rzamora/miniconda3/envs/cudf_16/lib/python3.7/site-packages/pandas/core/reshape/concat.py in __init__()
    357                     "only Series and DataFrame objs are valid"
    358                 )
--> 359                 raise TypeError(msg)
    360 
    361             # consolidate

TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid

Note that the code works fine when a list of futures is used in place of a tuple: df = client.submit(pd.concat, [x])

cc @madsbk

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions