STORM-2190: reduce contention between submission and scheduling #1764

revans2 · 2016-11-07T22:51:04Z

No description provided.

ptgoetz · 2016-11-30T20:12:52Z

examples/storm-starter/src/jvm/org/apache/storm/starter/WordCountTopology.java

@@ -90,7 +90,9 @@ public static void main(String[] args) throws Exception {
    if (args != null && args.length > 0) {
      conf.setNumWorkers(3);

-      StormSubmitter.submitTopologyWithProgressBar(args[0], conf, builder.createTopology());
+      for (String name: args) {


See comment on #1765. Is this change an artifact of manual testing?

ptgoetz · 2016-11-30T20:14:53Z

One question about the for loop in WordCountTopology. I'm +1 once that's answered/addressed.

jerrypeng · 2016-12-01T17:57:11Z

storm-core/src/clj/org/apache/storm/daemon/nimbus.clj

-          (into {})
-          (.assignSlots inimbus topologies)))
-    (log-message "not a leader, skipping assignments")))
+    (locking (:sched-lock nimbus)


Couldn't a wrong ordering of events happen since we are locking when calculating a scheduling then unlocking and then locking and uploading the new scheduling and unlocking
for example:
T0: submit
T1: rebalance
T2: rebalance - calculate new scheduling
T3: submit - calculate new scheduling
T4: rebalance - upload new scheduling to zk
T5: submit - upload new scheduling to zk

even though we should end up with the scheduling calculated by the rebalance but we end up with scheduling calculated from the original submit.

@jerrypeng You are correct that this could happen. I don't really think it will be that likely to happen in practice but I'll think about it and see if we can fix it.

maybe its also time to start thinking about decentralized scheduling mechanisms if certain scheduling strategies may take a while to compute a schedule, but that would require a major overhaul in nimbus.

I agree and now that nimbus is in java we can look at doing some refactoring along those lines. If you feel that we need to do it now and that this is a blocker I can spend some time looking into how to do that better.

revans2 · 2016-12-01T21:50:45Z

Just so we don't miss the comment from @jerrypeng

Couldn't a wrong ordering of events happen since we are locking when calculating a scheduling then unlocking and then locking and uploading the new scheduling and unlocking
for example:
T0: submit
T1: rebalance
T2: rebalance - calculate new scheduling
T3: submit - calculate new scheduling
T4: rebalance - upload new scheduling to zk
T5: submit - upload new scheduling to zk

even though we should end up with the scheduling calculated by the rebalance but we end up with scheduling calculated from the original submit.

Yes, that is correct. We should do something here, and he suggested that perhaps as part of a refactor of Nimbus we should look at support for long running scheduling.

In the short term I think I might make scheduling and writing to ZK atomic, but long term I think I will file a JIRA to look at better scheduling.

revans2 · 2016-12-02T21:34:36Z

@ptgoetz @jerrypeng

I made a few changes to thing. I fixed the race condition and I addressed the review comments, but I also put in some optimizations to storm submitter. We were literally calling getClusterInfo 3+ times for each topology submission, and because the ultimate goal of STORM-2190 is to make it more scalable this helps a lot. There is still some lock contention, but it is much better then it was before.

If things look good here I will backport the changes to my other pull request.

ptgoetz · 2016-12-02T22:31:18Z

+1

HeartSaVioR · 2016-12-03T02:47:02Z

+1. Please apply the optimization to #1765 as well.

jerrypeng · 2016-12-03T06:04:24Z

LGTM +1 @revans2 thanks for making the optimizations

revans2 · 2016-12-03T15:25:17Z

@HeartSaVioR I plan on doing that.

HeartSaVioR · 2016-12-20T08:23:29Z

+1 again.

ptgoetz reviewed Nov 30, 2016

View reviewed changes

jerrypeng reviewed Dec 1, 2016

View reviewed changes

revans2 force-pushed the STORM-2190 branch from 43b808e to 882c016 Compare December 1, 2016 21:32

Robert (Bobby) Evans added 4 commits December 7, 2016 17:16

STORM-2190: reduce contention between submission and scheduling

dafeda5

Fixed Race

984a322

Added in some optimizations for better topology submission performance

9400ac5

addressed review comments

8021038

revans2 force-pushed the STORM-2190 branch from 32757d2 to 8021038 Compare December 7, 2016 23:17

asfgit merged commit 8021038 into apache:master Dec 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STORM-2190: reduce contention between submission and scheduling #1764

STORM-2190: reduce contention between submission and scheduling #1764

revans2 commented Nov 7, 2016

ptgoetz Nov 30, 2016

ptgoetz commented Nov 30, 2016

jerrypeng Dec 1, 2016

revans2 Dec 1, 2016

jerrypeng Dec 1, 2016 •

edited

revans2 Dec 1, 2016

revans2 commented Dec 1, 2016

revans2 commented Dec 2, 2016

ptgoetz commented Dec 2, 2016

HeartSaVioR commented Dec 3, 2016

jerrypeng commented Dec 3, 2016

revans2 commented Dec 3, 2016

HeartSaVioR commented Dec 20, 2016

STORM-2190: reduce contention between submission and scheduling #1764

STORM-2190: reduce contention between submission and scheduling #1764

Conversation

revans2 commented Nov 7, 2016

ptgoetz Nov 30, 2016

Choose a reason for hiding this comment

ptgoetz commented Nov 30, 2016

jerrypeng Dec 1, 2016

Choose a reason for hiding this comment

revans2 Dec 1, 2016

Choose a reason for hiding this comment

jerrypeng Dec 1, 2016 • edited

Choose a reason for hiding this comment

revans2 Dec 1, 2016

Choose a reason for hiding this comment

revans2 commented Dec 1, 2016

revans2 commented Dec 2, 2016

ptgoetz commented Dec 2, 2016

HeartSaVioR commented Dec 3, 2016

jerrypeng commented Dec 3, 2016

revans2 commented Dec 3, 2016

HeartSaVioR commented Dec 20, 2016

jerrypeng Dec 1, 2016 •

edited