Skip to content

Modifications to published, persisted datasets don't seem to visible in another client. #1675

@funnydevnull

Description

@funnydevnull

I may well be misunderstanding something or doing something wrong but I want to have multiple clients work on a shared dataset so this is the approach I take:

client1 = Client('dask-scheduler:8786')
client2 = Client('dask-scheduler:8786')
cx=da.from_array(np.random.normal(0,1, (1000,100)), (10,100))
cxp=client1.persist(cx)
client1.publish_dataset(myarr=cxp)
x2=client2.get_dataset('myarr')

# check that this is zero
x2.compute() - cxp.compute()

x2 += 1
# this is now all 1's -- I would have thought it would still be all 0's
x2.compute() - cxp.compute()

Do I have to somehow manually re-publish the dataset after editing it? I've tried persisting x2 but this did not seem to help. I also tried re-publishing x2 under the same name (myarr) but this raised an error that such a dataset already existed.

So is there anyway to update a published dataset in a way that all clients can access (ideally immediately) the update?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions