-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix starting runs #459
Fix starting runs #459
Conversation
did half of the work of `check_for_executable_runs`. And as a side effect, run stages were increased too many times, broke the race condition disabler code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely spotted.
Only things to close this PR is to update the changelog.
@@ -263,7 +252,7 @@ def get_created_runs_and_queue(db, project_id): | |||
UPDATE runs SET | |||
state = 'queued' | |||
WHERE | |||
state IN ('created', 'queued') AND project = %s | |||
state = 'created' AND project = %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -20,7 +20,7 @@ pytest==5.1.3 | |||
pytest-xdist==1.29.0 | |||
PyYAML==5.1 | |||
redis==3.2.1 | |||
requests==2.21.0 | |||
requests==2.22.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add this dependency update in the changelog.
It all started with this strange failure in a clkhash test:
It turns out, that the entity service is naughty.
The problem is around the detection of the race condition to start a run twice.
The idea is that a run's state is set to
created
after creation, and only once all data provider have uploaded the necessary clks, the run state gets promoted toqueued
and, consequently, the run execution gets started.However, some naughty code in
views/run/list.py:post
did some premature clk checking and run state changing. After all that, it would still call thecheck_for_executable_runs
task, which will then do exactly the same again.Now, as we previously "fixed" the
get_created_runs_and_queue
by allowing to return runs that are eithercreated
ORqueued
, together with the fact thatcheck_for_executable_runs
can be called twice almost in parallel, led to double queuing of the run and double stage progressing.To stop all this nonsense, I propose to not include run logic in the views of ngnix. If we always call the same task as entrypoint (
check_for_executable_runs
), it will, most likely, keep as saner for longer. Hopefully.