Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "HDDS-5740. Enable ratis by default for SCM." #3362

Merged
merged 1 commit into from May 3, 2022

Conversation

adoroszlai
Copy link
Contributor

@adoroszlai adoroszlai commented Apr 28, 2022

What changes were proposed in this pull request?

Revert #2637. The PR was merged regardless of my concern about changing the default setting without providing seamless upgrade path.

It turns out that enabling Ratis by default is indeed causing problems during upgrade. SCM fails to start with:

Configuration ozone.scm.ratis.enable cannot be used until SCM upgrade has been finalized

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5740

How was this patch tested?

Regular CI:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/2237408109

@adoroszlai adoroszlai self-assigned this Apr 28, 2022
@adoroszlai adoroszlai added bug Something isn't working upgrade compatibility labels Apr 28, 2022
Copy link
Contributor

@mukul1987 mukul1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adoroszlai I agree that scm ha should not be enabled by default for upgrade. but for other profiles i feel we can still enable scm ha ?

Also, can you please include the problem faced during upgrade in the description ?

@adoroszlai
Copy link
Contributor Author

@mukul1987 SCM HA is definitely recommended for new deployments. Please correct me if I'm wrong, but my understanding is that simply enabling Ratis for single-node SCM is not SCM HA, as that requires multiple nodes and explicit configuration.

@errose28
Copy link
Contributor

When upgrading from an old version to a version supporting SCM HA, none of the SCM HA configs are allowed to be turned on to prevent writing incompatible data before finalization. What happened with the original change was when a non-Ratis SCM was upgraded to a version supporting SCM HA, the ratis enabled flag was automatically set to true. The upgrade framework stopped this before any changes could be made, hence all pre-SCM HA clusters upgraded will fail immediately on startup with the message: Configuration ozone.scm.ratis.enable cannot be used until SCM upgrade has been finalized.
For the user to proceed, the config must be manually set to true and the cluster restarted, poor experience IMO. The upgrade acceptance tests correctly flagged this as incompatible, but they were modified in the original PR to mask the issue.

I am +1 for reverting the change for now to get the incompatibility out of master. However, I see @mukul1987's point that for new clusters, at least 1 node Ratis for SCM would be the preferred default. I think we can come up with a way to do this in a follow-up PR.

@adoroszlai
Copy link
Contributor Author

I am +1 for reverting the change for now to get the incompatibility out of master. However, I see @mukul1987's point that for new clusters, at least 1 node Ratis for SCM would be the preferred default. I think we can come up with a way to do this in a follow-up PR.

I agree: if enabling SCM Ratis for single node has benefits, we can have an improvement Jira for that. But it is totally out of scope for this revert, as the original change is clearly breaking upgrades.

@adoroszlai adoroszlai dismissed mukul1987’s stale review May 3, 2022 16:46

information provided, change request out of scope

@adoroszlai adoroszlai merged commit e70bfd0 into apache:master May 3, 2022
@adoroszlai adoroszlai deleted the revert-HDDS-5740 branch May 3, 2022 16:47
@adoroszlai
Copy link
Contributor Author

Thanks @errose28, @mukul1987 for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working compatibility upgrade
Projects
None yet
3 participants