STORM-1516 Fixed issue in writing pids with distributed cluster mode. #1084

satishd · 2016-02-05T12:18:43Z

Whenever a topology is submitted, it creates respective workers on supervisor/s. These worker pids are stored as files in ${storm-localdir}/workers/{worker-id}/pids/ on supervisor. But there is an issue in storing worker pids. So, supervisor could not find respective worker pids when a topology is killed. Subsequent topology deployment workers are failed because of earlier workers are still alive and bound to the respective ports.

Fixed worker.clj to have right checks while writing pids to respective locations.

abhishekagarwal87 · 2016-02-05T13:10:54Z

storm-core/src/clj/org/apache/storm/daemon/worker.clj

  ;; because in local mode, its not a separate
  ;; process. supervisor will register it in this case
-  (when (= :distributed (ConfigUtils/clusterMode conf))
+  ;; if (ConfigUtils/isLocalMode conf) returns false then it is in distributed mode.
+  (when-not (ConfigUtils/isLocalMode conf)


how is this different from checking state == "distributed"

https://issues.apache.org/jira/browse/STORM-1516?focusedCommentId=15134067&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15134067

Satish already explains why it occurs via JIRA comment.

Easy explanation:

=> (= :dist "dist") false

Thanks @HeartSaVioR . Got it now.

arunmahadevan · 2016-02-05T15:51:54Z

@satishd Is it happening with all topologies ?

satishd · 2016-02-05T18:13:12Z

@arunmahadevan Right, this issue happens with all topologies. There is nothing specific about the mentioned topology in JIRA.

arunmahadevan · 2016-02-06T05:36:47Z

+1

revans2 · 2016-02-08T17:21:34Z

The code looks fine to me, but for some reason nimbus-test is failing fairly consistently. Also storm-hdfs is very unstable right now.

revans2 · 2016-02-08T17:23:44Z

I just did some poking around and it looks like these failures are likely unrelated.

harshach · 2016-02-17T04:38:28Z

@satishd can you upmerge this and also open another PR for 1.x-branch if we want this there as well. Thanks.

satishd · 2016-02-17T05:49:02Z

@harshach Upmerged and resolved conflicts. We do not need this on 1.x.

harshach · 2016-02-18T20:39:34Z

+1

ndtreviv · 2016-08-24T09:08:53Z

Which version was this fixed in? I'm seeing the same thing in 1.0.1

HeartSaVioR · 2016-08-24T09:21:39Z

@ndtreviv This patch is only for 2.0.0. You might hit STORM-1934 which is fixed for 1.0.2. There're lots of things fixed from 1.0.2 so you're encouraged to give it a try.

ndtreviv · 2016-08-24T09:42:03Z

@HeartSaVioR I'm not sure that's the one. I'm pretty sure that I'm seeing this issue. I can see the supervisor.log saying that it can't find the worker file in workers-users. As a result, the worker processes don't get shut down. It's not a race condition, either, as I've killed the topology and waited for it all to settle before re-deploying, and done this three times over, but the worker processes are still related to the very first topology being run.

HeartSaVioR · 2016-08-24T09:51:13Z

@ndtreviv
This bug was from ported code which only resides on master (2.0.0) so not related to 1.x.

What you're explaining seems to STORM-1879 which is occurred after supervisor hits STORM-1934 and deletes worker's' (worker root) directory instead of one of them.

ndtreviv · 2016-08-24T10:34:25Z

@HeartSaVioR Perfect. Thanks

abhishekagarwal87 reviewed Feb 5, 2016
View reviewed changes

satishd force-pushed the storm-1516 branch from 354a50a to 1122c60 Compare February 17, 2016 05:45

STORM-1516 Fixed issue in writing pids with distributed cluster mode.

8749523

satishd force-pushed the storm-1516 branch from 1122c60 to 8749523 Compare February 17, 2016 05:47

asfgit merged commit 8749523 into apache:master Feb 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STORM-1516 Fixed issue in writing pids with distributed cluster mode. #1084

STORM-1516 Fixed issue in writing pids with distributed cluster mode. #1084

satishd commented Feb 5, 2016

abhishekagarwal87 Feb 5, 2016

HeartSaVioR Feb 5, 2016

HeartSaVioR Feb 5, 2016

abhishekagarwal87 Feb 5, 2016

arunmahadevan commented Feb 5, 2016

satishd commented Feb 5, 2016

arunmahadevan commented Feb 6, 2016

revans2 commented Feb 8, 2016

revans2 commented Feb 8, 2016

harshach commented Feb 17, 2016

satishd commented Feb 17, 2016

harshach commented Feb 18, 2016

ndtreviv commented Aug 24, 2016

HeartSaVioR commented Aug 24, 2016 •

edited

Loading

ndtreviv commented Aug 24, 2016

HeartSaVioR commented Aug 24, 2016 •

edited

Loading

ndtreviv commented Aug 24, 2016

STORM-1516 Fixed issue in writing pids with distributed cluster mode. #1084

STORM-1516 Fixed issue in writing pids with distributed cluster mode. #1084

Conversation

satishd commented Feb 5, 2016

abhishekagarwal87 Feb 5, 2016

Choose a reason for hiding this comment

HeartSaVioR Feb 5, 2016

Choose a reason for hiding this comment

HeartSaVioR Feb 5, 2016

Choose a reason for hiding this comment

abhishekagarwal87 Feb 5, 2016

Choose a reason for hiding this comment

arunmahadevan commented Feb 5, 2016

satishd commented Feb 5, 2016

arunmahadevan commented Feb 6, 2016

revans2 commented Feb 8, 2016

revans2 commented Feb 8, 2016

harshach commented Feb 17, 2016

satishd commented Feb 17, 2016

harshach commented Feb 18, 2016

ndtreviv commented Aug 24, 2016

HeartSaVioR commented Aug 24, 2016 • edited Loading

ndtreviv commented Aug 24, 2016

HeartSaVioR commented Aug 24, 2016 • edited Loading

ndtreviv commented Aug 24, 2016

HeartSaVioR commented Aug 24, 2016 •

edited

Loading

HeartSaVioR commented Aug 24, 2016 •

edited

Loading