Skip to content

HDDS-15138. SCM safemode pipeline rules should honor default EC replication config.#10157

Open
aryangupta1998 wants to merge 1 commit intoapache:masterfrom
aryangupta1998:ec_safemode
Open

HDDS-15138. SCM safemode pipeline rules should honor default EC replication config.#10157
aryangupta1998 wants to merge 1 commit intoapache:masterfrom
aryangupta1998:ec_safemode

Conversation

@aryangupta1998
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Today SCM safemode pipeline exit checks are effectively hardcoded to RATIS/THREE, which is incorrect when cluster default replication is EC. In EC-default deployments, safemode should validate pipelines for the configured default replication instead of only RATIS/THREE.

This patch generalizes safemode pipeline validation to use ReplicationConfig.getDefault(conf):

HealthyPipelineSafeModeRule now evaluates pipelines matching the configured default replication config and uses required node count from that config.
OneReplicaPipelineSafeModeRule now tracks/report-validates pipelines matching the configured default replication config.
To keep behavior consistent, BackgroundPipelineCreator is updated to include EC pipeline creation when default replication type is EC, while preserving existing RATIS behavior when EC is not configured.

Expected outcome:
RATIS default: behavior remains unchanged.
EC default: safemode pipeline checks validate EC pipelines, and background creation can create EC pipelines accordingly.
Validation
Added/updated below SCM tests for EC-default and RATIS-default paths.

TestHealthyPipelineSafeModeRule
TestOneReplicaPipelineSafeModeRule
TestSCMSafeModeManager
TestBackgroundPipelineCreator

Note:
For EC-default setups in SCM safemode paths, ensure both are set:

ozone.replication.type=EC
ozone.replication=RS-3-2-1024k

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15138

How was this patch tested?

Unit tests.

@sodonnel
Copy link
Copy Markdown
Contributor

sodonnel commented May 5, 2026

Its never really been clear to me what the purpose of the pipeline safemode rule is and how it differs from EC to Ratis. Could you explain why its important?

Also, while a cluster can be configured to have EC as the default, that does not stop Ratis pipelines getting created and some data on the cluster being written in the Ratis format. The same goes the opposite way around too - a ratis default cluster can have EC data on it.

One difference between EC and Ratis pipelines is that EC pipelines are short lived and Ratis pipelines are long lived. Ratis pipelines also survive cluster restarts, EC pipelines do not.

Depending on the EC scheme, there will be a set of pipelines for each EC scheme 3-2, 6-3, 10-4 etc.

I guess my main question is - if the cluster is default EC and only has EC data, what problem could occur if there is no safemode pipeline check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants