Skip to content
This repository has been archived by the owner on Mar 28, 2020. It is now read-only.

cluster: don't report failed on controller restore #629

Closed
wants to merge 1 commit into from

Conversation

hongchaodeng
Copy link
Member

@hongchaodeng hongchaodeng commented Jan 7, 2017

If needDeleteCluster is false, it's controller restore case.
On controller restore, we didn’t delete or fail the cluster, and shouldn't report failed.

@@ -211,11 +211,10 @@ func (c *Cluster) run(stopC <-chan struct{}, wg *sync.WaitGroup) {
if needDeleteCluster {
c.logger.Infof("deleting cluster")
c.delete()
go c.reportFailedStatus()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not seem right either. we should still report failure if reconcile fails, not just when we delete the cluster.

also for cluster delete case, the tpr might already be deleted. there is no point to update it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if reconcile fails => needDeleteCluster = true

the tpr might already be deleted => See reportFailedStatus() impl. It will return if TPR deleted

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not delete the cluster when reconcile fails. we should keep the trp so that user can see why it fails. user should delete the tpr themselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not delete the cluster when reconcile fails.

Reconcile only fail when it should. Create an new issue and let's discuss elsewhere.
This PR is trying to fix controller restore case. We shouldn't report failed.

we should keep the trp so that user can see why it fails.

Operator didn't delete it.

user should delete the tpr themselves.

Same as above

Copy link
Collaborator

@xiang90 xiang90 Jan 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconcile only fail when it should. Create an new issue and let's discuss elsewhere.

This pr is incorrect. Now the code logic is that we report TPR status as failure when the cluster TPR is deleted. This makes no sense to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pr doesn't change previous logic. If reconcile fail, we already set status failed.

Actually this PR is fixing it. If reconcile exits and needDeleteCluster = true, it must be error case and must be reported failed. But if needDeleteCluster != true, we shouldn't report failed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right logic is to set cluster TPR status to fail when cluster fails. It should be straightforward and obvious.

That's what's happening now and what this PR is going to fix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to fix this in another pr. needDeleteCluster is incorrect and confusing at the first place. That is why we put c.reportFailedStatus() outside before without even noticing it is wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally understand why the current behavior is wrong. but the root problem is the logical error.

moving reportFailedStatus into a bad designed condition will not make things better from code logic perspective.

@hongchaodeng
Copy link
Member Author

Closing in favor of #631

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants