New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STORM-1976 Remove cleanup-corrupt-topologies! (1.x) #1572

Merged
merged 1 commit into from Jul 19, 2016

Conversation

Projects
None yet
4 participants
@HeartSaVioR
Contributor

HeartSaVioR commented Jul 18, 2016

For master branch: #1571

  • cleanup-corrupt-topologies! was born from non-H/A Nimbus age
  • Keeping this with Nimbus H/A makes various issues which in result topology going away

Please refer STORM-1976 for more details.

STORM-1976 Remove cleanup-corrupt-topologies!
* cleanup-corrupt-topologies! was born from non-H/A Nimbus age
* Keeping this with Nimbus H/A makes various issues which in result topology going away
@harshach

This comment has been minimized.

Show comment
Hide comment
@harshach

harshach Jul 18, 2016

Contributor

+1 . build failures are not related to the patch.

Contributor

harshach commented Jul 18, 2016

+1 . build failures are not related to the patch.

@revans2

This comment has been minimized.

Show comment
Hide comment
@revans2

revans2 Jul 18, 2016

Contributor

+1, but I would like to see a tool that can kill a corrupted topology without nimbus being involved, or instructions somewhere on how to do it by logging directly into ZooKeeper.

Contributor

revans2 commented Jul 18, 2016

+1, but I would like to see a tool that can kill a corrupted topology without nimbus being involved, or instructions somewhere on how to do it by logging directly into ZooKeeper.

@HeartSaVioR

This comment has been minimized.

Show comment
Hide comment
@HeartSaVioR

HeartSaVioR Jul 18, 2016

Contributor

@revans2
Yes agreed. After removing this code we can't remove corrupted topology in normal way since no nimbus can gain leadership. It would be nice to have a tool to show replication count of topologies without leader nimbus involved to determine corrupted topology, and kill one if users want to.

Contributor

HeartSaVioR commented Jul 18, 2016

@revans2
Yes agreed. After removing this code we can't remove corrupted topology in normal way since no nimbus can gain leadership. It would be nice to have a tool to show replication count of topologies without leader nimbus involved to determine corrupted topology, and kill one if users want to.

@harshach

This comment has been minimized.

Show comment
Hide comment
@harshach

harshach Jul 18, 2016

Contributor

@revans2 @HeartSaVioR why can't we enforce the min.replication.count by looking nimbus.seeds if there are more than one host in that list we should set the min.replication.count to 2. User can override the setting by specifying it in storm.yaml. My understanding is they are not doing it and we are relying on min.replication.count to be 1.

Contributor

harshach commented Jul 18, 2016

@revans2 @HeartSaVioR why can't we enforce the min.replication.count by looking nimbus.seeds if there are more than one host in that list we should set the min.replication.count to 2. User can override the setting by specifying it in storm.yaml. My understanding is they are not doing it and we are relying on min.replication.count to be 1.

@revans2

This comment has been minimized.

Show comment
Hide comment
@revans2

revans2 Jul 18, 2016

Contributor

@HeartSaVioR actually I have thought about it more and I am probably overreacting. If you want to call these blockers it is not that big of a deal. I would just like the community have a follow on discussion about how we really want HA to behave in these situations now that we have had some experience with people running with HA.

Contributor

revans2 commented Jul 18, 2016

@HeartSaVioR actually I have thought about it more and I am probably overreacting. If you want to call these blockers it is not that big of a deal. I would just like the community have a follow on discussion about how we really want HA to behave in these situations now that we have had some experience with people running with HA.

@revans2

This comment has been minimized.

Show comment
Hide comment
@revans2

revans2 Jul 18, 2016

Contributor

@harshach I don't think that would fix the issue, although it is an improvement. If I think about it more I think what happened was that they were not running with HA/expecting HA to work, but ran into a bad situation where they really would have wanted to be running with HA, and then when they tried to recover, by bringing up a new nimbus instance, all of the topologies were killed.

Contributor

revans2 commented Jul 18, 2016

@harshach I don't think that would fix the issue, although it is an improvement. If I think about it more I think what happened was that they were not running with HA/expecting HA to work, but ran into a bad situation where they really would have wanted to be running with HA, and then when they tried to recover, by bringing up a new nimbus instance, all of the topologies were killed.

@harshach

This comment has been minimized.

Show comment
Hide comment
@harshach

harshach Jul 18, 2016

Contributor

@HeartSaVioR @revans2 are we ok with merging this in and file a follow-up JIRA to have a discussion around the HA and expectations from user.

Contributor

harshach commented Jul 18, 2016

@HeartSaVioR @revans2 are we ok with merging this in and file a follow-up JIRA to have a discussion around the HA and expectations from user.

@revans2

This comment has been minimized.

Show comment
Hide comment
@revans2

revans2 Jul 19, 2016

Contributor

Yes I am good with merging this in.

Contributor

revans2 commented Jul 19, 2016

Yes I am good with merging this in.

@HeartSaVioR

This comment has been minimized.

Show comment
Hide comment
@HeartSaVioR

HeartSaVioR Jul 19, 2016

Contributor

Yes I think discussion of Nimbus H/A and BlobStore would be longer, so we need to apply short-term fix first. There're lots of critical bugfixes in 1.0.2 so we shouldn't drag the release because of struggling long-term fix.

Contributor

HeartSaVioR commented Jul 19, 2016

Yes I think discussion of Nimbus H/A and BlobStore would be longer, so we need to apply short-term fix first. There're lots of critical bugfixes in 1.0.2 so we shouldn't drag the release because of struggling long-term fix.

@asfgit asfgit merged commit d677f54 into apache:1.x-branch Jul 19, 2016

1 check failed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
@HeartSaVioR

This comment has been minimized.

Show comment
Hide comment
@HeartSaVioR

HeartSaVioR Jul 20, 2016

Contributor

Regarding tool, I filed an issue: https://issues.apache.org/jira/browse/STORM-1985

Contributor

HeartSaVioR commented Jul 20, 2016

Regarding tool, I filed an issue: https://issues.apache.org/jira/browse/STORM-1985

@HeartSaVioR HeartSaVioR deleted the HeartSaVioR:STORM-1976-1.x branch Jun 30, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment