STORM-3321: Fix race in LocalCluster regarding Nimbus leadership, red… #2945

srdo · 2019-01-21T19:59:23Z

…uce poll timers for Nimbus and supervisor to speed up tests and avoid timeouts

https://issues.apache.org/jira/browse/STORM-3321

I also updated an error message to print how long Slots take to shut down, previously it would print 1000ms no matter how long the shutdown took.

The timer interval reductions in LocalCluster are set to only happen if time simulation is disabled. There are some tests (e.g. nimbus_test.clj) that use time simulation and want those timers to be longer, so it seemed easier to just leave it alone for them.

…uce poll timers for Nimbus and supervisor to speed up tests and avoid timeouts

HeartSaVioR

Thanks for looking into this and sorry to review this so lately. Left some minor comments but it looks good overall.

HeartSaVioR · 2019-02-26T19:29:47Z

storm-server/src/main/java/org/apache/storm/LocalCluster.java

@@ -231,6 +233,10 @@ private LocalCluster(Builder builder) throws Exception {
            } else {
                this.clusterState = builder.clusterState;
            }
+            if (!Time.isSimulating()) {
+                //Ensure Nimbus assigns topologies as quickly as possible
+                conf.put(DaemonConfig.NIMBUS_MONITOR_FREQ_SECS, 1);


Is this change also useful for other cases (non-test) as well? If we are unsure about it, could we setup global config for non time simulating test and add this to there instead?

I think this is still useful outside testing, since the topology will start faster, and topologies will usually be submitted right after bootup of the local Nimbus. Are we intending for people to use LocalCluster for cases other than testing though?

By "testing" I mean both automatic tests and manual tests

OK makes sense. Doesn't seem to be a big deal as I think about this again.

HeartSaVioR · 2019-02-26T19:31:59Z

storm-server/src/main/java/org/apache/storm/LocalCluster.java

@@ -690,6 +700,10 @@ public synchronized Supervisor addSupervisor(Number ports, Map<String, Object> c
        }
        superConf.put(Config.STORM_LOCAL_DIR, tmpDir.getPath());
        superConf.put(DaemonConfig.SUPERVISOR_SLOTS_PORTS, portNumbers);
+        if (!Time.isSimulating()) {
+            //Monitor for assignment changes as often as possible, so e.g. shutdown happens as fast as possible.


HeartSaVioR · 2019-02-26T19:34:07Z

storm-client/src/jvm/org/apache/storm/nimbus/ILeaderElector.java

+     * @return true is leadership was acquired, false otherwise
+     */
+    @VisibleForTesting
+    boolean awaitLeadership(long timeout, TimeUnit timeUnit) throws InterruptedException;


I assume we don't think ILeaderElector as user facing (public) API so don't mind about backward compatibility.

No, I don't think this API is supposed to be used by anyone other than Nimbus.

HeartSaVioR

+1

danny0405 · 2019-01-24T10:04:19Z

storm-client/src/jvm/org/apache/storm/testing/FixedTupleSpout.java

@@ -131,8 +130,6 @@ public void nextTuple() {
            String id = UUID.randomUUID().toString();
            _pending.put(id, ft);
            _collector.emit(ft.stream, ft.values, id);
-        } else {
-            Utils.sleep(100);


Why remove this sleep, can this cause CPU hotspot if _serverTuples is always empty ?

Storm already sleeps between nextTuple calls if nothing is emitted based on the sleep strategy. The extra sleep here doesn't do much for us I think, and manually sleeping in nextTuple is a bad habit I'd like to discourage (it blocks other message processing for the spout, e.g. ack handling).

See

storm/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutExecutor.java

Line 224 in b074136

spoutWaitStrategy(reachedMaxSpoutPending, emptyStretch);

STORM-3321: Fix race in LocalCluster regarding Nimbus leadership, red…

da12d89

…uce poll timers for Nimbus and supervisor to speed up tests and avoid timeouts

HeartSaVioR reviewed Feb 26, 2019

View reviewed changes

HeartSaVioR approved these changes Mar 6, 2019

View reviewed changes

danny0405 approved these changes Mar 6, 2019

View reviewed changes

asfgit merged commit da12d89 into apache:master Mar 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STORM-3321: Fix race in LocalCluster regarding Nimbus leadership, red… #2945

STORM-3321: Fix race in LocalCluster regarding Nimbus leadership, red… #2945

srdo commented Jan 21, 2019

HeartSaVioR left a comment

HeartSaVioR Feb 26, 2019

srdo Feb 27, 2019

srdo Feb 27, 2019

HeartSaVioR Mar 6, 2019

HeartSaVioR Feb 26, 2019

HeartSaVioR Feb 26, 2019

srdo Feb 27, 2019

HeartSaVioR left a comment

danny0405 Jan 24, 2019

srdo Mar 6, 2019

STORM-3321: Fix race in LocalCluster regarding Nimbus leadership, red… #2945

STORM-3321: Fix race in LocalCluster regarding Nimbus leadership, red… #2945

Conversation

srdo commented Jan 21, 2019

HeartSaVioR left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HeartSaVioR left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment