-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
operator.KubeCuster support? #12982
Comments
It seems to error on: Can you double check if this URL is valid? |
Hi @ahuang11 ! Thanks for your response. I did some tests and found out that this line is responsible for the
My guess is the cluster passed to the client is With |
I think
which is equivalent to
and in your case
Can you try whether running this works:
|
When I try this with async with self.cluster_class(asynchronous=True, **self.cluster_kwargs) as cluster:
print(cluster.status)
print(cluster.scheduler_address) It outputs:
and the Maybe it's unrelated but the |
Yes, it seems like it doesn't get started; only gets "created"
Maybe you can call "dask_kubernetes.operator.kubecluster.kubecluster.KubeCluster" outside of Prefect, e.g.
|
If that worked for you, maybe we can add a warning about manually starting the cluster here under an |
We can add a warning or call the I made it work with the following updates after this line: from distributed.core import Status
import asyncio
...
self._connect_to = self._cluster = await exit_stack.enter_async_context(
self.cluster_class(asynchronous=True, **self.cluster_kwargs)
)
# If used with the operator implementation of KubeCluster, the cluster is not automatically started
if self._cluster.status is not Status.running:
await self._cluster._start()
if self.adapt_kwargs:
# Depending on the cluster type (Cluster or SpecCluster), adapt returns a future or not
adapt_response = self._cluster.adapt(**self.adapt_kwargs)
if asyncio.isfuture(adapt_response):
await adapt_response
... Let me know if you want me to create a PR! |
I think this could work, my only concern is that once this cluster is started, it won't be closed (unlike the async exit context) so maybe we should add a warning that we started it and it's up to the user to close it, and I would appreciate a PR! |
Actually I think they handle the cluster cleanup with this method. The only problem is that, when used with prefect, we got the following error: Traceback (most recent call last):
File "/home/john/projects/venv/lib/python3.8/site-packages/dask_kubernetes/operator/kubecluster/kubecluster.py", line 845, in reap_clusters
loop = asyncio.get_event_loop()
File "/home/john/.pyenv/versions/3.8.13/lib/python3.8/asyncio/events.py", line 639, in get_event_loop
raise RuntimeError('There is no current event loop in thread %r.'
RuntimeError: There is no current event loop in thread 'MainThread'. It seems this line can't get the current event loop when used with prefect. If I replace it with On a side note, their dask-kubernetes-operator pod already cleanup the clusters properly so it's maybe not necessary. |
Can you provide an example of how |
With the error
Where in Prefect's code is this code being called? We run several event loops and there is often one present. |
I think this I created a draft PR with the |
That's awesome, TIL about the atexit decorator! Thanks for making the PR too. Maybe we can wrap that logic you have and replace self.cluster_class under our custom contextmanager that handles closing and starting, e.g.
|
Sure, I'll add it and add some tests. |
Regarding the |
Hi!
When I try to use the new
KubeCluster
implementation (from theoperator
module and not the one from theclassic
module) as follow:I got the following error:
It seems the
DaskTaskRunner
tries to close the cluster before starting it.Is this new class supported yet by prefect? If yes, do you have a working example?
The text was updated successfully, but these errors were encountered: