Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: tone down run inactive cutoff default #392

Merged
merged 1 commit into from
Oct 23, 2023

Conversation

saikonen
Copy link
Collaborator

Changes the Run inactive cutoff time default to 6 minutes from 1 day. After the latest query changes this is used to determine whether a run heartbeat is expired or not.
This will give runs that have tasks stuck in a scheduler the 6 minutes grace period for any of them to launch, after which the run level heartbeat will expire and mark the run as failed. If a task starts after this, the run would flip back to running status, otherwise remaining failed (correctly)

Having to wait a whole day for runs to be marked as failed by default is such a huge departure from the way the UI used to display failed runs that this change is proposed as a possible middle-ground instead of reverting the query changes from #338 .

@saikonen
Copy link
Collaborator Author

another option that has been thought of is to make the heavy query logic prior to #338 conditional behind an env var. This will complicate the code significantly though, but would allow both solutions to coexist and be configurable based on deployment needs.

@saikonen saikonen merged commit 65d41dd into master Oct 23, 2023
6 checks passed
@saikonen saikonen deleted the fix/tone-down-run-timeout branch October 23, 2023 12:37
saikonen added a commit that referenced this pull request Oct 25, 2023
* Upgrade Github actions used in `dockerimage` action (#379)

* upgrade github actions used in dockerimage action

* remove setup-buildx-action and pin to hashes.

* change deprecated pkg_resources to importlib.metadata (#387)

* In a previous commit, the detection of a failure became too aggressive. (#386)

* In a previous commit, the detection of a failure became too aggressive.

This remediates this by considering a run 'failed' if the hb hasn't been
updated within heartbeat_cutoff time as opposed to the heartbeat_threshold time

* change run finished at query to heartbeat_cutoff from threshold

* clean up unused values from run query

---------

Co-authored-by: Sakari Ikonen <sakari.a.ikonen@gmail.com>

* fix PATH_PREFIX handling in metadata service so it doesn't interfere with mfgui routes (#388)

* Configurable SSL Connection (#373)

* [TRIS-297] Configurable SSL Connection (#1)

* Configurable SSL connection

* Update services/utils/__init__.py

* no ssl unit testing (#3)

* ssl seperate test (#4)

* dsn generator sslmode none (#5)

* fix run_goose.py not working without SSL mode env variables. (#390)

* change run inactive cutoff default to 6 minutes. cleanup unused constant (#392)

* clarify comment on read replica hosts

* make USE_SEPARATE_READER_POOL a boolean

* remove unnecessary conditionals for pool choice in execute_sql

---------

Co-authored-by: Tom Furmston <tfurmston@googlemail.com>
Co-authored-by: Romain <romain-intel@users.noreply.github.com>
Co-authored-by: Oleg Avdeev <oleg.v.avdeev@gmail.com>
Co-authored-by: RikishK <69884402+RikishK@users.noreply.github.com>
saikonen added a commit that referenced this pull request Oct 30, 2023
…nection pools. (#344)

* Changes for using a separate reader pool for Aurora-like use cases

* Avoid some expensive logging operations when not needed

* Refactoring execute_sql implementations and separating reader/writer endpoints

choosing the right pool in execute_sql

* Adding documentation for using separate reader pools

* use [PREFIX]_READ_REPLICA_HOST as a feature gate instead of localhost

* In a previous commit, the detection of a failure became too aggressive.

This remediates this by considering a run 'failed' if the hb hasn't been
updated within heartbeat_cutoff time as opposed to the heartbeat_threshold time

* Patch pjoshi aurora (#395)

* Upgrade Github actions used in `dockerimage` action (#379)

* upgrade github actions used in dockerimage action

* remove setup-buildx-action and pin to hashes.

* change deprecated pkg_resources to importlib.metadata (#387)

* In a previous commit, the detection of a failure became too aggressive. (#386)

* In a previous commit, the detection of a failure became too aggressive.

This remediates this by considering a run 'failed' if the hb hasn't been
updated within heartbeat_cutoff time as opposed to the heartbeat_threshold time

* change run finished at query to heartbeat_cutoff from threshold

* clean up unused values from run query

---------

Co-authored-by: Sakari Ikonen <sakari.a.ikonen@gmail.com>

* fix PATH_PREFIX handling in metadata service so it doesn't interfere with mfgui routes (#388)

* Configurable SSL Connection (#373)

* [TRIS-297] Configurable SSL Connection (#1)

* Configurable SSL connection

* Update services/utils/__init__.py

* no ssl unit testing (#3)

* ssl seperate test (#4)

* dsn generator sslmode none (#5)

* fix run_goose.py not working without SSL mode env variables. (#390)

* change run inactive cutoff default to 6 minutes. cleanup unused constant (#392)

* clarify comment on read replica hosts

* make USE_SEPARATE_READER_POOL a boolean

* remove unnecessary conditionals for pool choice in execute_sql

---------

Co-authored-by: Tom Furmston <tfurmston@googlemail.com>
Co-authored-by: Romain <romain-intel@users.noreply.github.com>
Co-authored-by: Oleg Avdeev <oleg.v.avdeev@gmail.com>
Co-authored-by: RikishK <69884402+RikishK@users.noreply.github.com>

* fix broken connection string after conflict resolve

* make codestyles happy

* fix test cases

* cleanup

* merge run_goose.py from master

* revert unnecessary changes

---------

Co-authored-by: Preetam Joshi <preetamj@netflix.com>
Co-authored-by: Romain Cledat <rcledat@netflix.com>
Co-authored-by: Chaoying Wang <chaoyingw@netflix.com>
Co-authored-by: Sakari Ikonen <64256562+saikonen@users.noreply.github.com>
Co-authored-by: Tom Furmston <tfurmston@googlemail.com>
Co-authored-by: Romain <romain-intel@users.noreply.github.com>
Co-authored-by: Oleg Avdeev <oleg.v.avdeev@gmail.com>
Co-authored-by: RikishK <69884402+RikishK@users.noreply.github.com>
Co-authored-by: Sakari Ikonen <sakari.a.ikonen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant