-
-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[draft] Drop Python 3.7 and allow tests to work in isolation #565
base: main
Are you sure you want to change the base?
Conversation
dask_jobqueue/core.py
Outdated
@@ -94,7 +96,22 @@ | |||
""".strip() | |||
|
|||
|
|||
class Job(ProcessInterface, abc.ABC): | |||
class FixedProcessInterface(ProcessInterface): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently it's not safe to create a ProcessInterface
without a running loop.
I can adapt ProcessInterface upstream to support this usecase with see distributed.actor._LateLoopEvent
and I will need to make a _LateLoopLock
. However it's better if jobs are not created until an event loop has started.
the problem line is commented: "trigger property to ensure that the job is valid":
dask-jobqueue/dask_jobqueue/core.py
Lines 575 to 580 in 3c26731
] | |
self._dummy_job # trigger property to ensure that the job is valid | |
super().__init__( | |
scheduler=scheduler, |
https://github.com/graingert/dask-jobqueue/runs/7541364445?check_suite_focus=true#step:8:841
dask_jobqueue/core.py:560: in __init__
self._dummy_job # trigger property to ensure that the job is valid
dask_jobqueue/core.py:589: in _dummy_job
return self.job_cls(
dask_jobqueue/local.py:40: in __init__
super().__init__(
dask_jobqueue/core.py:168: in __init__
super().__init__()
/usr/share/miniconda/envs/dask-jobqueue/lib/python3.8/site-packages/distributed/deploy/spec.py:55: in __init__
self.lock = asyncio.Lock()
/usr/share/miniconda/envs/dask-jobqueue/lib/python3.8/asyncio/locks.py:164: in __init__
self._loop = events.get_event_loop()
I'm using @functools.cached_property
here just to demo the fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so from what I'm aware of, we never get into loop questionning on dask-jobqueue. I guess we only use the default loop from MainThread when using it.
From what I understand, this is the clean
function of distributed.tests_util
that just removes the default created loop when we use the loop
fixture. Is that normal that this function might wipe the MainThread loop?
See the last commit of my branch d50cc2d (sorry I did not cherry pick some of your modifications, I still lack git knowledge on top of asyncio stuf).
Maybe this is meaning that we are not doing things right, but might this be a temporary workaround until a bigger refactor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the default event loop is deprecated, and already removed when used with asyncio.run
. I added this to our cleanup to make sure we (distributed) are ready for the removal and support running with multiple calls to asyncio.run:
import asyncio
import distributed
import dask_jobqueue
# creating a dask_jobqueue object works here
async def noop():
pass
asyncio.run(noop())
# and raises a
# RuntimeError: There is no current event loop in thread 'MainThread'
# here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. If I understand correctly, at one point dask-jobqueue would have to be ready for the removal of the default event loop, is that right? When will it be supressed?
In the meantime, do you think it's reasonable to use the workaround I proposed, and in parallel open an issue about this problem and try to fix it later on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, at one point dask-jobqueue would have to be ready for the removal of the default event loop
The default event loop already gets removed when anyone uses asyncio.run
and so is already a bug from python 3.6+
It's deprecated in 3.10 and scheduled for removal in 3.12, and so that version of python will act as though someone has already called asyncio.run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an idea for a workaround that I want to try tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my workaround didn't work, so I've made an issue upstream dask/distributed#6858
for the last remaining tests it looks like there's a ThreadPoolExecutor running and not being shutdown. I'm not sure if this is a bug in distributed or dask-jobqueue right now |
Thanks @graingert for all the contribution here. I think I'll need some time to digest all this. I'm a bit confused by the modification for the running loop. Outside of test environment, the current code of dask-jobqueue seems to be working well with recent versions of distributed. And I didn't encounter these errors in my tests, how would you explain it ? |
cc5992b
to
3799185
Compare
No description provided.