-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: cannot schedule new futures after shutdown when using external Kubernetes cluster #707
Comments
Thanks for raising this. It definitely looks related to the cluster-reaping finalizer using things that may have already been shut down. In the short term a workaround would be to either manually close the cluster yourself, or to use def main():
cluster = KubeCluster(
name="my-dask-cluster",
image="ghcr.io/dask/dask:2023.3.2-py3.11",
env={"EXTRA_PIP_PACKAGES": "joblib"},
)
print("Cluster created")
cluster.scale(1)
client = cluster.get_client()
print("Client", client)
joblib.parallel_backend(
"dask", client=client, pure=False, wait_for_workers_timeout=60
)
results = joblib.Parallel(n_jobs=2)(
joblib.delayed(square)(arg) for arg in range(10)
)
print(results)
+ client.close()
+ cluster.close() |
Thank you for the quick response! Yes, this help in this example. def main():
cluster = KubeCluster(
name="my-dask-cluster",
image="ghcr.io/dask/dask:2023.3.2-py3.11",
namespace="ana-ixian",
env={"EXTRA_PIP_PACKAGES": "joblib"},
shutdown_on_close=True,
)
print("Cluster created")
cluster.scale(1)
client = cluster.get_client()
print("Client", client)
joblib.parallel_backend(
"dask", client=client, pure=False, wait_for_workers_timeout=60
)
results = joblib.Parallel(n_jobs=2)(
joblib.delayed(square)(arg) for arg in range(10)
)
print(results)
client.close()
atexit.register(cluster.close) Exception
I guess we just can't use |
I think that we could bypass the However, to use it I would need to be able to use my connector in |
It's a little risky because you can't guarantee that asyncio things will work correctly at that point. I would generally advise against what you are doing. However, another workaround could be to use the asyncio API here and use import asyncio
from dask_kubernetes.operator import KubeCluster
import time
import random
import joblib
import asyncio_atexit
def square(x):
time.sleep(random.expovariate(1.5))
return x**2
async def main():
cluster = KubeCluster(
name="my-dask-cluster",
image="ghcr.io/dask/dask:2023.3.2-py3.11",
env={"EXTRA_PIP_PACKAGES": "joblib"},
asynchronous=True,
)
print("Cluster created")
await cluster.scale(1)
client = cluster.get_client() # Not sure how this will behave off the top of my head
print("Client", client)
joblib.parallel_backend(
"dask", client=client, pure=False, wait_for_workers_timeout=60
)
results = joblib.Parallel(n_jobs=2)(
joblib.delayed(square)(arg) for arg in range(10)
)
print(results)
asyncio_atexit.register(cluster.close)
if __name__ == "__main__":
asyncio.run(main()) |
Describe the issue:
Hi,
I'm encountering the following error when trying to run Dask jobs on an external Kubernetes cluster:
However, when I use a local Kubernetes cluster like
rancher-desktop
, everything works as expected without any RuntimeError.I suspect that the issue might be related to the
getaddrinfo
call, which utilizesThreadPoolExecutor
for asynchronous operation. It seems that the defaultThreadPoolExecutor
might already be closed by the time it's called, leading to the error.Minimal Complete Verifiable Example:
This results in
Anything else we need to know?:
Environment:
The text was updated successfully, but these errors were encountered: