-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛♻️ Bugfix/handle ever running tasks - refactor director-v2 workflow scheduler #2798
Merged
sanderegg
merged 82 commits into
ITISFoundation:master
from
sanderegg:bugfix/handle_ever_running_tasks
Mar 1, 2022
Merged
🐛♻️ Bugfix/handle ever running tasks - refactor director-v2 workflow scheduler #2798
sanderegg
merged 82 commits into
ITISFoundation:master
from
sanderegg:bugfix/handle_ever_running_tasks
Mar 1, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov Report
@@ Coverage Diff @@
## master #2798 +/- ##
========================================
+ Coverage 77.2% 79.1% +1.8%
========================================
Files 666 672 +6
Lines 27327 27468 +141
Branches 3162 3205 +43
========================================
+ Hits 21112 21733 +621
+ Misses 5494 4988 -506
- Partials 721 747 +26
Flags with carried forward coverage won't be shown. Click here to find out more.
|
sanderegg
force-pushed
the
bugfix/handle_ever_running_tasks
branch
2 times, most recently
from
February 17, 2022 14:52
d0ab9a2
to
7ff8031
Compare
sanderegg
force-pushed
the
bugfix/handle_ever_running_tasks
branch
2 times, most recently
from
February 22, 2022 20:30
e9ccfd4
to
e84e4fb
Compare
This was referenced Feb 23, 2022
sanderegg
force-pushed
the
bugfix/handle_ever_running_tasks
branch
from
February 23, 2022 22:05
6bdae8f
to
2825dbb
Compare
sanderegg
changed the title
WIP: Bugfix/handle ever running tasks
WIP: Bugfix/handle ever running tasks - refactor director-v2 workflow scheduler
Feb 24, 2022
sanderegg
commented
Feb 24, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix my own errors
services/dask-sidecar/src/simcore_service_dask_sidecar/dask_utils.py
Outdated
Show resolved
Hide resolved
services/dask-sidecar/src/simcore_service_dask_sidecar/dask_utils.py
Outdated
Show resolved
Hide resolved
services/director-v2/src/simcore_service_director_v2/modules/comp_scheduler/base_scheduler.py
Outdated
Show resolved
Hide resolved
services/director-v2/src/simcore_service_director_v2/modules/dask_client.py
Show resolved
Hide resolved
services/director-v2/src/simcore_service_director_v2/modules/dask_client.py
Outdated
Show resolved
Hide resolved
services/director-v2/src/simcore_service_director_v2/modules/dask_client.py
Outdated
Show resolved
Hide resolved
services/director-v2/src/simcore_service_director_v2/modules/dask_client.py
Outdated
Show resolved
Hide resolved
9 tasks
sanderegg
force-pushed
the
bugfix/handle_ever_running_tasks
branch
from
February 24, 2022 20:24
0eab8b9
to
08a4dab
Compare
sanderegg
changed the title
WIP: Bugfix/handle ever running tasks - refactor director-v2 workflow scheduler
Bugfix/handle ever running tasks - refactor director-v2 workflow scheduler
Feb 24, 2022
sanderegg
changed the title
Bugfix/handle ever running tasks - refactor director-v2 workflow scheduler
🐛♻️ Bugfix/handle ever running tasks - refactor director-v2 workflow scheduler
Feb 24, 2022
sanderegg
requested review from
pcrespov,
GitHK,
KZzizzle,
mguidon,
odeimaiz,
mrnicegyu11 and
Surfict
February 24, 2022 21:50
sanderegg
force-pushed
the
bugfix/handle_ever_running_tasks
branch
from
February 25, 2022 15:03
8562527
to
985fd62
Compare
refactor separate utils
added blosc/lz4 to dask-distributed such that director-v2 also has the required libraries
remove unused method remove unused user_id
sanderegg
force-pushed
the
bugfix/handle_ever_running_tasks
branch
from
February 28, 2022 16:10
5a3c1b2
to
ef9a5f2
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What do these changes do?
Context
The original issue where a computational pipeline would be forever locked in Running state (as stated in #2786) stems from the following issues:
Also during the analysis of the problems, I found out that dask-backend shows an issue with how we retrieve logs/progress/state of the tasks using dask pubsub mechanism reference.
Details
TaskCancelledError
) when cancelled. NOTE: returningasyncio.CancelledError
is not supported and actually breaks internal of the dask-worker applicationJourney of a task
a. checking the state of PENDING/STARTED tasks in the selected dask backend, and retrieving their results/issues and updating comp_tasks table
b. finding which tasks are candidate to run next by analyzing the DAG (directed acyclic graph)
c1. starting the relevant tasks,
c2. or stopping the pipeline if the user pressed the stop button
Notes on Persistency
Related issue/s
fixes #2786
How to test
IMPORTANT: for testing, production mode is important because the dask backend appears to be more resiliant this way
# set-up prod build make build up-prod
Once running tasks: