You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi. I'm trying to fit xgboost models concurrently using asynchronous dask clients, however using the collection's persist method like here triggers the computation on whatever is the default client at the moment, so code like this:
won't neccesarily create the collections in its corresponding client and will trigger errors like these:
File "./api_xgb.py", line 71, in train
dvalid = await xgb.dask.DaskDMatrix(client, X_valid, y_valid)
File "/home/jose_morales/miniconda3/envs/fastapi/lib/python3.8/site-packages/xgboost/dask.py", line 257, in map_local_data
data = data.persist()
File "/home/jose_morales/miniconda3/envs/fastapi/lib/python3.8/site-packages/dask/base.py", line 254, in persist
(result,) = persist(self, traverse=False, **kwargs)
File "/home/jose_morales/miniconda3/envs/fastapi/lib/python3.8/site-packages/dask/base.py", line 755, in persist
results = client.persist(
File "/home/jose_morales/miniconda3/envs/fastapi/lib/python3.8/site-packages/distributed/client.py", line 2944, in persist
futures = self._graph_to_futures(
File "/home/jose_morales/miniconda3/envs/fastapi/lib/python3.8/site-packages/distributed/client.py", line 2543, in _graph_to_futures
dsk = highlevelgraph_pack(dsk, self, keyset)
File "/home/jose_morales/miniconda3/envs/fastapi/lib/python3.8/site-packages/distributed/protocol/highlevelgraph.py", line 115, in highlevelgrap
h_pack
"state": _materialized_layer_pack(
"state": _materialized_layer_pack(
File "/home/jose_morales/miniconda3/envs/fastapi/lib/python3.8/site-packages/distributed/protocol/highlevelgraph.py", line 40, in _materialized_
layer_pack
raise ValueError(
ValueError: Inputs contain futures that were created by another client.
The solution to this problem would be replacing data = data.persist() with data = client.persist(data) and client is already being passed to these functions so this should be fairly straightforward. I'd be happy to give it a go.
The text was updated successfully, but these errors were encountered:
Thanks for raising the issue. Feel free to open a PR and ping me on github. Please note that training multiple models is not yet supported, I need to try out dask/distributed#4503 later.
Hi. I'm trying to fit xgboost models concurrently using asynchronous dask clients, however using the collection's persist method like here triggers the computation on whatever is the default client at the moment, so code like this:
won't neccesarily create the collections in its corresponding
client
and will trigger errors like these:The solution to this problem would be replacing
data = data.persist()
withdata = client.persist(data)
andclient
is already being passed to these functions so this should be fairly straightforward. I'd be happy to give it a go.The text was updated successfully, but these errors were encountered: