-
-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial commit to split timers to own process #4180
Conversation
conf/st2.conf.sample
Outdated
enable = True | ||
# Timezone pertaining to the location where st2 is run. | ||
local_timezone = America/Los_Angeles | ||
local_tz = America/Los_Angeles | ||
logging = st2reactor/conf/logging.timersengine.conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably logging = conf/logging.timersengine.conf
fits this config
conf/st2.conf.sample
Outdated
enable = True | ||
# Timezone pertaining to the location where st2 is run. | ||
local_timezone = America/Los_Angeles | ||
local_tz = America/Los_Angeles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local_tz
vs local_timezone
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was auto-generated but I'll look into why the auto-gen code did this.
@lakshmi-kannan Since we're on An additional request to include a HA description for the service in |
* master: (235 commits) Use default scope of "all" for list command and "system" for get, set and delete commands. Update ALLOWED_SCOPES - all should not be there. Make http runner password parameter a secret. Update CHANGELOG.rst Use consistent formatting. Sync changelog with v2.8.1 release. Fix typo in description Remove unused variable. Replace get_terminal_size with get_terminal_size_columns. Number the various fallbacks. Add tests for get_terminal_size. Add a note. Update get_terminal_size method to check LINES and COLUMNS environment variables first. Rewording. Add changelog entry. Truncate extra whitespace. Make sure we cast it to int. 200 -> 150.. Also use a more reasonable default terminal size. Allow user to force terminal size used by the st2 CLI formattes. ...
* master: Add a test case for it. Simplify the logic, fix test which didn't pass in Content-Type header. Update .gitignore. Also blacklist webhooks API endpoint which can take multipart/form-data content type. Add a workaround for eventlet WSGI http server. Refactor orchestra conductor interface to support the state machine updates
LOG.info(TIMER_ENABLED_LOG_LINE) | ||
return timer_thread.wait() | ||
else: | ||
LOG.info(TIMER_DISABLED_LOG_LINE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like if timer engine is disabled service will just exit immediately on startup, right?
Just something to keep in mind / document for monitoring purposes (e.g. if timer engine service is disables, st2timerengine
service won't be running and it will exit immediately on startup).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. +1 to documenting it in monitoring docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documented: StackStorm/st2docs@30e714a
|
||
try: | ||
timer_thread = None | ||
if cfg.CONF.timer.enable or cfg.CONF.timersengine.enable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a bit config is one is False and other is True, so perhaps we should be more explicit and throw in such scenario? Or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With default configs, you'll always have this behavior. So we have to actually detect if those variables are defined in /etc/st2/st2.conf. I think that's not really required. We have documented in upgrade notes. And when people upgrade, the configuration change diff will be shown to them too. So there are multiple checks and alerts to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing which is also config is that because we default one option to True, when user wants to disable the service they need to set both options to False.
I will document that (if not already).
Two small comments, besides that, LGTM. |
* master: (172 commits) Remove global cache_name environment variable definition. Ignore CryptographyDeprecationWarning deprecation warning which appears on our Ubuntu build server which runs old 2.7 release. Don't run tests under MongoDB 3.6 until we figure out why the tests are so much slower under 3.6. Add new examples.python_runner_print_python_environment action which will allow us to debug various Python runner action issues. Instead of failing the build, just warn if the job exceeds the thresold. Make sure mongodb user can write to the lib dir. Make sure we clean any old MongoDB 3.4 files laying around otherwise the service won't start. Only tail last 30 lines. Cat mongo log to see what is going on. Remove lines we don't need. Check service status. Use longer sleep. Use longer thresholds. Fix syntax error. Also print out mongod version. Add a new Travis build task which runs tests under MongoDB 3.6. MongoDB 3.6 supports 64 bit ints, update affected tests. Add changelog entry. Also upgrade pymongo. Upgrade to our forked version of mongoengine which is based on v0.15.3 and contains a fix for regression in memory usage introduced in v0.13.0. ...
I will take over that and try to get it finished and merged this week. |
timersengine config section.
There is a chicken and the egg problem with our e2e tests - they depend on st2-packages changes. But for that changes to work, this PR needs to be merged first. |
What?
This PR splits out the timer portion (one that injects trigger instances based on user specified cron expressions in rules) into its own process.
Why?
For the kubernetes (k8s) HA story, we are going to rely on a single timers engine container with failover handed natively by k8s. The scale requirements for timers aren't that rigorous and the timer doesn't do anything other than inject a trigger instance into rabbitmq. This makes the story for timers simple. This also allows us to scale rules engine horizontally without worrying about partitioning timers. Since rules engine don't modify state (other than add operational entries to DB), we can use k8s to set a scale number for rules engine and handle both failover and scaling with k8s primitives. So we make scaling rules engine story simpler too.
In the future, we can decide to split timers into different partitions and go for a more complex HA model if needed. So this change will enable future scaling optimizations for timers. We are also thinking about adding the scheduling logic into this for workflow orchestrators.
TODO