STORM-3120: Clean up leftover null checks in Time, ensure idle thread… by srdo · Pull Request #2734 · apache/storm

srdo · 2018-06-24T11:26:21Z

…s get to run when cluster time is advanced

https://issues.apache.org/jira/browse/STORM-3120

Some of the Time code didn't make sense, e.g. checking for null on the final THREAD_SLEEP_TIMES_NANOS map. It was left over from an earlier refactor of Time. Cleaned it up, and deleted the deprecated Time methods. Also made sure that when simulated time is advanced, idle threads are removed from the THREAD_SLEEP_TIMES_NANOS immediately. When the LocalCluster waits for the topology to be idle, it checks whether all timer threads are in the THREAD_SLEEP_TIMES_NANOS map. It looks to me like there was no guarantee that sleeping threads actually got a chance to run when time was advanced, because they remained in the THREAD_SLEEP_TIMES_NANOS map until they happened to be scheduled. With bad luck, a thread might end up being counted as idle when it just hadn't exited the Time.sleepUntilNanos loop yet.

HeartSaVioR · 2018-06-24T20:29:47Z

storm-server/src/main/java/org/apache/storm/LocalCluster.java

            this.tmpDirs.add(nimbusTmp);
            stormHomeBackup = System.getProperty(ConfigUtils.STORM_HOME);
            TmpPath stormHome = new TmpPath();
+            if (!stormHome.getFile().mkdirs()) {


I might be going too strict, but this looks like follow-up of STORM-3116 (not relevant to Time) which would be better to have individual PR. The change is minor so no need to file a new issue. Just would like to avoid things getting mixed up in a commit.

Will continue reviewing other change in PR.

Yes, it is. Do you want me to put this into a STORM-3116 PR, or what do you mean?

I hope to see this being separated PR, tagging as STORM-3116 or just tagging as MINOR. Either is fine.

Moved to #2736

…s get to run when cluster time is advanced

HeartSaVioR · 2018-06-24T23:52:58Z

storm-client/src/jvm/org/apache/storm/utils/Time.java

    public static void sleep(long ms) throws InterruptedException {
        if (ms > 0) {
-            sleepUntil(currentTimeMillis() + ms);
+            if (SIMULATING.get()) {


Is there any reason to not use sleepUntil? If you think sleepUntil is broken, the method must be fixed since the method is exposed to public. If not, why not using sleepUntil?

sleepUntil works fine. It's just a little silly in the case where we aren't simulating, because this method would calculate currentTimeMillis() + ms, and sleepUntil would then do the reverse to calculate how many ms to pass to Thread.sleep.

HeartSaVioR · 2018-06-24T23:54:11Z

storm-client/src/jvm/org/apache/storm/utils/Time.java

+    public static void parkNanos(long nanos) throws InterruptedException {
        if (nanos > 0) {
-            sleepUntilNanos(nanoTime() + nanos);
+            if (SIMULATING.get()) {


Same here, and we may note that we're renaming public method, though Time class is more likely internal class though.

In this case, I'd like to use LockSupport.parkNanos if we're not simulating, rather than sleeping for the closest number of milliseconds as the method did before.

Whoops, forgot to respond to the renaming bit: Yes, the method is renamed. If we want to port this to 1.x, we can easily deprecate sleepNanos and provide both methods.

HeartSaVioR · 2018-06-24T23:59:30Z

Looks good to me overall except two comments above. (I forgot to comment...)

revans2

+1 just a single nit.

revans2 · 2018-06-25T15:28:55Z

storm-client/src/jvm/org/apache/storm/utils/Time.java

+                }
+            }
+            LOG.debug("Advanced simulated time to {}", newTime);
+        }


nit: I don't think this code change is needed, as you fixed the bug with checking for THREAD_SLEEP_TIMES_NANOS == null, so the finally should work. If it does not work, then java itself has some bugs in it.

That's not why this change is added. I want to make sure that when a test does LocalCluster.advanceClusterTime, all sleeping threads that are supposed to wake up actually get a chance to run before waitForIdle is checked again. The issue is that if we don't remove the thread from the map here, there isn't (as far as I can tell) a guarantee that the sleeping threads actually woke up.

For example, if a test has a loop

while (some condition) some code LocalCluster.advanceClusterTime

the first call to advanceClusterTime will wait until all threads are sleeping. If thread 1 is told to sleep for 1ms, it will enter the THREAD_SLEEP_TIMES_NANOS map. On the next iteration, LocalCluster.advanceClusterTime will increment the time, which should cause thread 1 to wake up. If the testing thread is "too fast", it might recheck waitForIdle and catch thread 1 before it manages to wake up and remove itself from THREAD_SLEEP_TIMES_NANOS. This causes LocalCluster.advanceClusterTime to return before the cluster is actually idle, because thread 1 will wake up and start running again after advanceClusterTime returns.

The effect is you can't be sure that all threads are actually idle once LocalCluster.advanceClusterTime returns.

…-3120 STORM-3120: Clean up leftover null checks in Time, ensure idle thread This closes #2734

srdo mentioned this pull request Jun 24, 2018

STORM-3119: Build Storm with Java 10 #2733

Merged

HeartSaVioR reviewed Jun 24, 2018

View reviewed changes

STORM-3120: Clean up leftover null checks in Time, ensure idle thread…

e5ca0c9

…s get to run when cluster time is advanced

srdo force-pushed the STORM-3120 branch from 4700949 to e5ca0c9 Compare June 24, 2018 21:02

HeartSaVioR reviewed Jun 24, 2018

View reviewed changes

revans2 approved these changes Jun 25, 2018

View reviewed changes

asfgit merged commit e5ca0c9 into apache:master Jun 27, 2018

asfgit pushed a commit that referenced this pull request Jun 27, 2018

Merge branch 'STORM-3120' of https://github.com/srdo/storm into STORM…

d880767

…-3120 STORM-3120: Clean up leftover null checks in Time, ensure idle thread This closes #2734

Conversation

srdo commented Jun 24, 2018

Uh oh!

HeartSaVioR Jun 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Jun 24, 2018

Uh oh!

revans2 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HeartSaVioR Jun 24, 2018 •

edited

Loading