Skip to content

SAMZA-2595: Updated MonitorService to use separate ScheduleExecutor for each monitor#1434

Merged
prateekm merged 8 commits intoapache:masterfrom
shekhars-li:MonitorUpdate
Oct 13, 2020
Merged

SAMZA-2595: Updated MonitorService to use separate ScheduleExecutor for each monitor#1434
prateekm merged 8 commits intoapache:masterfrom
shekhars-li:MonitorUpdate

Conversation

@shekhars-li
Copy link
Contributor

@shekhars-li shekhars-li commented Oct 12, 2020

Symptoms:

  • On deploying a new monitor that takes variable time, we observed that there is an immediate drop in the metrics emitted by other samza-admin monitors.

Cause:

  • We schedule all the monitors in samza-admin in a thread pool of size 1 with a fixed rate scheduling strategy.
  • Our new Monitor depends on a script to get data from the hosts. The script takes variable time to get the data and can sometimes takes a long time to return if the files to be scanned for change is too high.
  • Other monitors are waiting for the new Monitor to complete execution since the thread pool size is 1.

Fix:

  • We create a new ScheduledExecutorService for every monitor with a thread pool of size 1.
  • Every monitor now runs in it's own thread and do not block other monitors from scheduling/starting.
  • If a monitor takes too long to execute (time to return > scheduledTime), new work is not scheduled until previous execution is complete. This prevents queueing up of work.
  • Updated default scheduling jitter to 100, so that every thread/monitor has some jitter in the first time it is scheduled.

Test:

  • Updated relevant unit tests.

API Changes:

  • Monitors are now scheduled with their own ScheduledExecutionService and are effectively running in their own thread rather than a shared thread. This may potentially impact Monitors that access or modify shared resources like disk, external store etc. This change is backward incompatible with previous API that guaranteed the monitors are scheduled sequentially and allows monitors to be scheduled concurrently. New (or potentially old) monitors have to ensure that they get access to the shared resource in a thread safe manner.
  • Jitter default value has been changed from 0 to 100. This is another backward incompatible change since previously, any monitor not defining/overriding jitter would not have had any jitter in scheduling interval. However, with this config update, every monitor will have a jitter ms added to scheduling interval.

@shekhars-li shekhars-li changed the title Updated MonitorService to use separate ScheduleExecutor for each monitor Samza-2595: Updated MonitorService to use separate ScheduleExecutor for each monitor Oct 12, 2020
@shekhars-li shekhars-li changed the title Samza-2595: Updated MonitorService to use separate ScheduleExecutor for each monitor SAMZA-2595: Updated MonitorService to use separate ScheduleExecutor for each monitor Oct 12, 2020
@shekhars-li shekhars-li marked this pull request as ready for review October 12, 2020 21:21
* Creates a ScheduledThreadPoolExecutor with core pool size 1
* @return ScheduledExecutorService
*/
private ScheduledExecutorService createScheduler() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious.

Initially we had a single threaded scheduler to ensure when two monitors in samza-admin are trying to update/modify some shared state(either on disk or in some external store), then their executions are serialized. How are we planning to achieve that goal when we allow multiple monitors to be running at the same time in their own threads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've already run into performance limits of what single threaded execution for all monitors can do with some monitors within LinkedIn, so while the current implementation avoided premature optimization, it's not sustainable. A better way to achieve the exclusivity goal will be to write the monitors with such side-effects in a "thread-safe" way. @shanthoosh are you aware of any existing monitors could be impacted by this change? If so, we can update them.

Shekhar, let's also call this out as another potentially backwards incompatible change in the "API changes" section.

Copy link
Contributor

@shanthoosh shanthoosh Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you aware of any existing monitors could be impacted by this change?

Let's please double-check if the parallel execution of monitors(StateStoreGCMonitor, StateStoreDeltaMonitor) internally could potentially cause any correctness issues.

Other than that, updating the API spec of Monitor abstraction(& calling out backwards compatibility) should be sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, thanks for the heads up!

Copy link
Contributor

@prateekm prateekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@prateekm prateekm merged commit 13b5303 into apache:master Oct 13, 2020
lakshmi-manasa-g pushed a commit to lakshmi-manasa-g/samza that referenced this pull request Feb 9, 2021
…or each monitor (apache#1434)

Co-authored-by: Shekhar Sharma <shesharm@shesharm-mn2.linkedin.biz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants