[STORM-3112] Incremental scheduling supports by danny0405 · Pull Request #2723 · apache/storm

danny0405 · 2018-06-18T04:16:47Z

What is this patch for

Incremental scheduling supports.

As STORM-3093 described, now the scheduling work for a round is a complete scan and computation for all the topologies on cluster, which is a very heavy work when topologies increment to hundreds.

So this patch is to refactor the scheduling logic that only care about topologies that need to.

Promotions List

Cache the id to storm base mapping which reduce the pressure to ZooKeeper.
Only schedule the topologies that need to: with dead executors or not enough running workers.
For some schedulers we still need a full scheduling, i.e. IsolationScheduler.
Reuse the scheduling resources bestride multi scheduling round, i.e. nodeId -> used slot, nodeId -> used resource, nodeId -> totalResource.

Cause in STORM-3093 i already cache the storm-id -> executors mapping, now for a scheduling round, things we will do:

Scan all the active storm bases( cached ) and local storm-conf/storm-topology, then to refresh the heartbeats cache, and we will know which topologies need to schedule.
Compute scheduleAssignment only for need scheduling topologies.

About Robustness When Nimbus Restarts

The cached storm-bases are taken care of by ILocalAssignmentsBackend.
the scheduling cache will be refresh for the first time scheduling through a full topologies scheduling.

The New Scheduling mode

Test data

This is the one incremental topology scheduling time cost data produced by LargeAmountsOfTopologiesSchedulingTest.java, i also add in a ISchedulerTracer if you wanna more specific time cost data during scheduling.

Need to emphasize that this data is already promoted by removing storm-bases accessing time and re-computing of id->executors mapping compared to old scheduling mode, but we can still see
a remarkable promotion and very good performance for this new scheduling mode.

We can see that incremental scheduling is very lightweight and fast.

JIRA: STORM-3112.

danny0405 · 2018-06-20T14:14:50Z

@revans2 @HeartSaVioR
Hi, Bobby, HeartSaVioR, please help me to review this if you have time, thx in advance.

revans2 · 2018-06-20T15:48:58Z

@danny0405 There is a lot of code here and I am trying to understand what exactly this is doing, because it is hard to get it from just the code. I can see the extra caching that you put in for the StormBase, which looks great. The JIRA says that we are only scheduling topologies that need to be scheduled, but I see that you had to change the isolation scheduler to disable that feature. What exactly are the differences that the scheduler sees.

I am also a little confused about how this, besides the caching, improves performance. All of the schedulers will loop through the list of topologies and only schedule the ones that need to be scheduled. It is a tiny check that takes a few ms at the most.

danny0405 · 2018-06-21T02:19:58Z

@revans2
I draw a img to describe what the work flow was before this patch. I will add in the new work flow with this patch soon.

Then the reasons why this patch can improve performance：

We reduce the storm base fetching time totally because of the cache.
We refresh the heartbeats cache first then make a diff to the all id -> executors mapping quickly
so we can decide which ones need to be scheduled, if no one there, we skip fast.
Because of step-2, we only need to compute the needed structures and resources mapping for
very few topologies, which is very fast.
Although we already cache storm-topology and storm-conf for all the topologies, the reduced useless computation in step-3 are still considerable, including the computation in Nimbus#mkAssignments and repeated calculation in all kinds of schedulers.

The reason why IsolationScheduler need a full scheduling:

The only reason is that it needs all the topologies details to know which topologies are isolated, so it can assign slots based on this info.

HeartSaVioR · 2018-06-21T23:35:26Z

@danny0405
Providing some numbers would help persuading others if it is easy to measure. If it is not easy to get the numbers, I'm OK to skip and compare the difference of diagram/code.

danny0405 · 2018-06-22T01:40:35Z

@HeartSaVioR @revans2
Hi, i have updated the design about the new scheduling mode, but i'm very sorry that i do not have a large cluster to test for this patch, i tried on my laptop but it stuck cause opening too many processes.

Could you have a benchmark test for this patch ? I will very appreciate for it.

HeartSaVioR · 2018-06-23T10:07:02Z

@danny0405 Unfortunately I also can't run a large cluster. I could run Storm in 3~4 VM nodes but I think the patch will not address such small cluster. I'll take a look at design difference and comment sooner.

HeartSaVioR · 2018-06-24T20:47:36Z

@danny0405
I'm sorry but now I feel I need to have time to focus on current milestone - Storm 2.0.0 - and for Storm 2.0.0, this is a kind of improvement and non-blocker.
There're some issues/PRs being left on getting Storm 2.0.0, so for me these are the first things to be reviewed. Smaller issues can be handled along with that.

danny0405 · 2018-06-25T01:36:25Z

@HeartSaVioR
Really thx, BTW, i can help to fix if there some bugs left for releasing STORM-2.0.0.

HeartSaVioR · 2018-06-25T01:40:05Z

@danny0405
Remaining tasks would be mostly reviewing existing PRs. I'll sort out and add them to the epic issue for Storm 2.0.0.

revans2 · 2018-06-25T16:00:31Z

@danny0405 I have spent some time looking at your patch, I have not found any issues with the code itself, but I easily could have missed something. My biggest problems is that I just cannot get past the backwards incompatibility imposed by NeedsFullTopologiesScheduler. I also don't want to merge in a performance improvement without any actual numbers to back it up.

To get the performance numbers, you really only want to know how long it takes to schedule. You don't actually need to run a full cluster. The simplest way to make that happen is to fake out the heartbeats for the supervisors and the workers. You could do it as a stand alone application, but it might be nice to have a bit more control over it so you can simulate workers that don't come up, or workers that crash.

Once you have that working I really would like to see a breakdown of how much time is being spent computing the different parts that go into creating the new Cluster.

As for NeedsFullTopologiesScheduler would either like to see this switched so schedulers opt into getting less information, or even better have us cache the fully computed inputs to Cluster and just update the cache incrementally instead of leaving things out.

danny0405 · 2018-06-30T08:43:31Z

@revans2
Hi, bobby, i have already updated the test data for this patch.

As for NeedsFullTopologiesScheduler, i agree to remove it. We can tweak the IsolationScheduler or cache topologies data, either is ok.

But i am inclined to choose the first one, cause we do not have a powerful reliable cache for master now, everything we cached will need to have a recover strategy, too many cache will make this things a little mess.

For the first chose, i think more work will be done for refactoring the code, I will fire a another JIRA to solve this problem.

For this patch, it is enough for us.

danny0405 · 2018-07-09T01:58:24Z

@revans2 @HeartSaVioR
Can you help me to review this again ? thx very much

HeartSaVioR · 2018-07-09T20:38:43Z

@danny0405
Could you address @revans2 comment, mostly backing up your proposed patch with numbers? You may want to get help to correctly measure numbers from well-known library like JMH.
If you're not familiar with JMH, this blog post would help you: https://blog.codecentric.de/en/2017/10/performance-measurement-with-jmh-java-microbenchmark-harness/

I'd say showing the numbers is more powerful for persuading than let others dive to the code and find value, especially the patch is not addressing Storm's performance issues what we already know about. STORM-2693 addressed our long-lived issue, so that was less needed to persuade others but this doesn't look like the case.

HeartSaVioR · 2018-07-09T20:43:45Z

I'd like to say it doesn't mean I don't plan to review this. I'd rather say other issues which are directly coupled with releases (mostly Storm 2.0.0) should be reviewed prior to this, unless you show interesting numbers to let us want to include this to Storm 2.0.0.

danny0405 · 2018-10-25T02:01:30Z

@revans2
Hi, Bobby, do you have time to review this patch now, cause it's a little long time and i kind of forget details about this patch, thx in advance.

danny0405 · 2019-02-15T02:12:54Z

@srdo @revans2 @HeartSaVioR Can you please review this for me ? It's so long time since the patch we proposed, i will very appreciate it if you can review this patch.

[STORM-3112] Incremental scheduling supports

5537cc7

danny0405 force-pushed the delta-s branch from a53324d to 5537cc7 Compare June 18, 2018 12:19

merge from master

760475a

Add test case for incremental scheduling

f531a3b

Fix test failure

75e2173

danny0405 closed this Jun 28, 2020

Conversation

danny0405 commented Jun 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is this patch for

Promotions List

About Robustness When Nimbus Restarts

The New Scheduling mode

Test data

Uh oh!

danny0405 commented Jun 20, 2018

Uh oh!

revans2 commented Jun 20, 2018

Uh oh!

danny0405 commented Jun 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Jun 21, 2018

Uh oh!

danny0405 commented Jun 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Jun 23, 2018

Uh oh!

HeartSaVioR commented Jun 24, 2018

Uh oh!

danny0405 commented Jun 25, 2018

Uh oh!

HeartSaVioR commented Jun 25, 2018

Uh oh!

revans2 commented Jun 25, 2018

Uh oh!

danny0405 commented Jun 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danny0405 commented Jul 9, 2018

Uh oh!

HeartSaVioR commented Jul 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Jul 9, 2018

Uh oh!

danny0405 commented Oct 25, 2018

Uh oh!

danny0405 commented Feb 15, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

danny0405 commented Jun 18, 2018 •

edited

Loading

danny0405 commented Jun 21, 2018 •

edited

Loading

danny0405 commented Jun 22, 2018 •

edited

Loading

danny0405 commented Jun 30, 2018 •

edited

Loading

HeartSaVioR commented Jul 9, 2018 •

edited

Loading