-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STORM-3321: Fix race in LocalCluster regarding Nimbus leadership, red… #2945
Conversation
…uce poll timers for Nimbus and supervisor to speed up tests and avoid timeouts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this and sorry to review this so lately. Left some minor comments but it looks good overall.
@@ -231,6 +233,10 @@ private LocalCluster(Builder builder) throws Exception { | |||
} else { | |||
this.clusterState = builder.clusterState; | |||
} | |||
if (!Time.isSimulating()) { | |||
//Ensure Nimbus assigns topologies as quickly as possible | |||
conf.put(DaemonConfig.NIMBUS_MONITOR_FREQ_SECS, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change also useful for other cases (non-test) as well? If we are unsure about it, could we setup global config for non time simulating test and add this to there instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still useful outside testing, since the topology will start faster, and topologies will usually be submitted right after bootup of the local Nimbus. Are we intending for people to use LocalCluster for cases other than testing though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "testing" I mean both automatic tests and manual tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK makes sense. Doesn't seem to be a big deal as I think about this again.
@@ -690,6 +700,10 @@ public synchronized Supervisor addSupervisor(Number ports, Map<String, Object> c | |||
} | |||
superConf.put(Config.STORM_LOCAL_DIR, tmpDir.getPath()); | |||
superConf.put(DaemonConfig.SUPERVISOR_SLOTS_PORTS, portNumbers); | |||
if (!Time.isSimulating()) { | |||
//Monitor for assignment changes as often as possible, so e.g. shutdown happens as fast as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
* @return true is leadership was acquired, false otherwise | ||
*/ | ||
@VisibleForTesting | ||
boolean awaitLeadership(long timeout, TimeUnit timeUnit) throws InterruptedException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we don't think ILeaderElector as user facing (public) API so don't mind about backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think this API is supposed to be used by anyone other than Nimbus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -131,8 +130,6 @@ public void nextTuple() { | |||
String id = UUID.randomUUID().toString(); | |||
_pending.put(id, ft); | |||
_collector.emit(ft.stream, ft.values, id); | |||
} else { | |||
Utils.sleep(100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this sleep, can this cause CPU hotspot if _serverTuples is always empty ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storm already sleeps between nextTuple calls if nothing is emitted based on the sleep strategy. The extra sleep here doesn't do much for us I think, and manually sleeping in nextTuple is a bad habit I'd like to discourage (it blocks other message processing for the spout, e.g. ack handling).
See
spoutWaitStrategy(reachedMaxSpoutPending, emptyStretch); |
…uce poll timers for Nimbus and supervisor to speed up tests and avoid timeouts
https://issues.apache.org/jira/browse/STORM-3321
I also updated an error message to print how long Slots take to shut down, previously it would print 1000ms no matter how long the shutdown took.
The timer interval reductions in LocalCluster are set to only happen if time simulation is disabled. There are some tests (e.g. nimbus_test.clj) that use time simulation and want those timers to be longer, so it seemed easier to just leave it alone for them.