I may well be misunderstanding something or doing something wrong but I want to have multiple clients work on a shared dataset so this is the approach I take:
client1 = Client('dask-scheduler:8786')
client2 = Client('dask-scheduler:8786')
cx=da.from_array(np.random.normal(0,1, (1000,100)), (10,100))
cxp=client1.persist(cx)
client1.publish_dataset(myarr=cxp)
x2=client2.get_dataset('myarr')
# check that this is zero
x2.compute() - cxp.compute()
x2 += 1
# this is now all 1's -- I would have thought it would still be all 0's
x2.compute() - cxp.compute()
Do I have to somehow manually re-publish the dataset after editing it? I've tried persisting x2 but this did not seem to help. I also tried re-publishing x2 under the same name (myarr) but this raised an error that such a dataset already existed.
So is there anyway to update a published dataset in a way that all clients can access (ideally immediately) the update?
I may well be misunderstanding something or doing something wrong but I want to have multiple clients work on a shared dataset so this is the approach I take:
Do I have to somehow manually re-publish the dataset after editing it? I've tried persisting x2 but this did not seem to help. I also tried re-publishing x2 under the same name (myarr) but this raised an error that such a dataset already existed.
So is there anyway to update a published dataset in a way that all clients can access (ideally immediately) the update?