Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commit to split timers to own process #4180

Merged
merged 18 commits into from
Aug 7, 2018
Merged

Initial commit to split timers to own process #4180

merged 18 commits into from
Aug 7, 2018

Conversation

lakshmi-kannan
Copy link
Contributor

@lakshmi-kannan lakshmi-kannan commented Jun 15, 2018

What?

This PR splits out the timer portion (one that injects trigger instances based on user specified cron expressions in rules) into its own process.

Why?

For the kubernetes (k8s) HA story, we are going to rely on a single timers engine container with failover handed natively by k8s. The scale requirements for timers aren't that rigorous and the timer doesn't do anything other than inject a trigger instance into rabbitmq. This makes the story for timers simple. This also allows us to scale rules engine horizontally without worrying about partitioning timers. Since rules engine don't modify state (other than add operational entries to DB), we can use k8s to set a scale number for rules engine and handle both failover and scaling with k8s primitives. So we make scaling rules engine story simpler too.

In the future, we can decide to split timers into different partitions and go for a more complex HA model if needed. So this change will enable future scaling optimizations for timers. We are also thinking about adding the scheduling logic into this for workflow orchestrators.

TODO

enable = True
# Timezone pertaining to the location where st2 is run.
local_timezone = America/Los_Angeles
local_tz = America/Los_Angeles
logging = st2reactor/conf/logging.timersengine.conf
Copy link
Member

@arm4b arm4b Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably logging = conf/logging.timersengine.conf fits this config

enable = True
# Timezone pertaining to the location where st2 is run.
local_timezone = America/Los_Angeles
local_tz = America/Los_Angeles
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local_tz vs local_timezone ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was auto-generated but I'll look into why the auto-gen code did this.

@arm4b arm4b added the K8s label Jun 18, 2018
@lakshmi-kannan lakshmi-kannan added this to the 2.9.0 milestone Jun 19, 2018
@lakshmi-kannan lakshmi-kannan changed the title WIP: Initial commit to split timers to own process DO NOT MERGE UNTIL 2.8 RELEASE: Initial commit to split timers to own process Jun 19, 2018
@arm4b
Copy link
Member

arm4b commented Jul 18, 2018

@lakshmi-kannan Since we're on 2.9 roadmap now, is it possible to complete this PR so we'll start integrating with the new st2timersegine?

An additional request to include a HA description for the service in
https://docs.stackstorm.com/reference/ha.html#components

@lakshmi-kannan lakshmi-kannan changed the title DO NOT MERGE UNTIL 2.8 RELEASE: Initial commit to split timers to own process Initial commit to split timers to own process Jul 19, 2018
Lakshmi Kannan added 2 commits July 19, 2018 15:24
* master: (235 commits)
  Use default scope of "all" for list command and "system" for get, set and delete commands.
  Update ALLOWED_SCOPES - all should not be there.
  Make http runner password parameter a secret.
  Update CHANGELOG.rst
  Use consistent formatting.
  Sync changelog with v2.8.1 release.
  Fix typo in description
  Remove unused variable.
  Replace get_terminal_size with get_terminal_size_columns.
  Number the various fallbacks.
  Add tests for get_terminal_size.
  Add a note.
  Update get_terminal_size method to check LINES and COLUMNS environment variables first.
  Rewording.
  Add changelog entry.
  Truncate extra whitespace.
  Make sure we cast it to int.
  200 -> 150..
  Also use a more reasonable default terminal size.
  Allow user to force terminal size used by the st2 CLI formattes.
  ...
* master:
  Add a test case for it.
  Simplify the logic, fix test which didn't pass in Content-Type header.
  Update .gitignore.
  Also blacklist webhooks API endpoint which can take multipart/form-data content type.
  Add a workaround for eventlet WSGI http server.
  Refactor orchestra conductor interface to support the state machine updates
LOG.info(TIMER_ENABLED_LOG_LINE)
return timer_thread.wait()
else:
LOG.info(TIMER_DISABLED_LOG_LINE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like if timer engine is disabled service will just exit immediately on startup, right?

Just something to keep in mind / document for monitoring purposes (e.g. if timer engine service is disables, st2timerengine service won't be running and it will exit immediately on startup).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. +1 to documenting it in monitoring docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


try:
timer_thread = None
if cfg.CONF.timer.enable or cfg.CONF.timersengine.enable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a bit config is one is False and other is True, so perhaps we should be more explicit and throw in such scenario? Or?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With default configs, you'll always have this behavior. So we have to actually detect if those variables are defined in /etc/st2/st2.conf. I think that's not really required. We have documented in upgrade notes. And when people upgrade, the configuration change diff will be shown to them too. So there are multiple checks and alerts to the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing which is also config is that because we default one option to True, when user wants to disable the service they need to set both options to False.

I will document that (if not already).

@Kami
Copy link
Member

Kami commented Jul 23, 2018

Two small comments, besides that, LGTM.

lakshmi-kannan and others added 2 commits July 24, 2018 10:31
* master: (172 commits)
  Remove global cache_name environment variable definition.
  Ignore CryptographyDeprecationWarning deprecation warning which appears on our Ubuntu build server which runs old 2.7 release.
  Don't run tests under MongoDB 3.6 until we figure out why the tests are so much slower under 3.6.
  Add new examples.python_runner_print_python_environment action which will allow us to debug various Python runner action issues.
  Instead of failing the build, just warn if the job exceeds the thresold.
  Make sure mongodb user can write to the lib dir.
  Make sure we clean any old MongoDB 3.4 files laying around otherwise the service won't start.
  Only tail last 30 lines.
  Cat mongo log to see what is going on.
  Remove lines we don't need.
  Check service status.
  Use longer sleep.
  Use longer thresholds.
  Fix syntax error.
  Also print out mongod version.
  Add a new Travis build task which runs tests under MongoDB 3.6.
  MongoDB 3.6 supports 64 bit ints, update affected tests.
  Add changelog entry.
  Also upgrade pymongo.
  Upgrade to our forked version of mongoengine which is based on v0.15.3 and contains a fix for regression in memory usage introduced in v0.13.0.
  ...
@Kami
Copy link
Member

Kami commented Aug 7, 2018

I will take over that and try to get it finished and merged this week.

@Kami
Copy link
Member

Kami commented Aug 7, 2018

Pushed some fixes and test changes - 2422b8c, 04d0d98, 21dea06.

As mentioned above, I'm still not a fan of this duplicated enable option which makes for a confusing behavior, but it is what is is now.

@Kami
Copy link
Member

Kami commented Aug 7, 2018

There is a chicken and the egg problem with our e2e tests - they depend on st2-packages changes.

But for that changes to work, this PR needs to be merged first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants