New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-4295. Add SCMServiceManager to SCM HA. #1784
Conversation
7978388
to
b03897f
Compare
*/ | ||
public synchronized void leaderToFollower() { | ||
LOG.info("leaderToFollower is called."); | ||
raftCondition = RaftCondition.FOLLOWER_TO_LEADER; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RaftCondition.FOLLOWER_TO_LEADER -> RaftCondition.LEADER_TO_FOLLOWER
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch.
* @return term | ||
* @throws NotLeaderException if isLeader is false | ||
*/ | ||
public long getTerm() throws NotLeaderException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the function name shoulld be changed to getTermOfLeader() or something like this to make it intuitive enough as this is supposed to be only a leader executed code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
* @param newIsLeader : is leader or not | ||
* @param newTerm : term if current SCM becomes leader | ||
*/ | ||
public void updateIsLeaderAndTerm(boolean newIsLeader, long newTerm) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newIsLeader doesn't make much sense here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, will find a better name. Previously it is isLeader
and term
, but can not pass checkstyle.
* @param newIsLeader : is leader or not | ||
* @param newTerm : term if current SCM becomes leader | ||
*/ | ||
public void updateIsLeaderAndTerm(boolean newIsLeader, long newTerm) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updateIsLeaderAndTerm -> updateLeaderAndTerm
* @param raftCondition latest raft condition | ||
* @param safeModeCondition latest safe mode condition | ||
*/ | ||
void notifyRaftCondOrSafeModeCondChanged( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some more info here, regarding what would be one time transitional events and what would be recurring which needs to be notified ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
Thanks @GlenGeng for working on this. I went through the patch once and given some comments inline. Still need to look through in more details. I also want to know, how are we handling case of a stale leader here? I know there is an effort inside ratis to address/detect a case of stale leader., but don't we really need anything specific to handle such case here ? Also, the transitional cases seem to be handled for LEADER_TO_FOLLOWER and FOLLOWER_TO_LEADER? How about transition to CANDIDATE state in general? |
cc @amaliujia |
@bshashikant Thanks for reviewing this patch!
The request inside SCM can be divided into two categories, read/write request and read only request. The read/write request will go through ratis as a RaftClientRequest, so we don’t need to care about whether the underlying RaftServer is a still leader or has already stepped down. These read/write requests include add/remove/update pipeline/container/deleted block. For the read only request, they will not go through ratis, a stale leader SCM might make a dangerous decision if it misses the latest committed writes, a.k.a the split-brain issues occurred, two leaders of different terms live together. Nanda pointed out such a case:
See SCM HA Handling Stale Leader drafted by @nandakumar131 for more details. With help of the lease solution, leader SCM can handle the read only request, then call
The candidate state is not expose by Ratis, and SCM does not cares about this state either. SCM just treat candidate the same as follower: it should stop the work if not leader.
We may have a offline talk to discuss more about the subtle cases in how we integrate SCM HA with ratis. |
@bshashikant @amaliujia I've update according to your comments, please take one more look Thanks! |
2686e67
to
6e8b787
Compare
* - SafeMode related info, e.g., inSafeMode, preCheckComplete. | ||
*/ | ||
public final class SCMContext implements EventHandler<SafeModeStatus> { | ||
private static final Logger LOG = LoggerFactory.getLogger(SCMContext.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case where SCM HA feature is not enabled (no Ratis), SCM context still seem to be relevant. In such cases, leader info and term info don't make sense. Can we add some constants to define leader state and term when ratis is not enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @bshashikant, this is a good point.
Due to PR #1744, integration test which is bases on MiniOzoneCluster has been running upon single-server raft cluster. But we still have cases that run without Ratis, e.g., UnitTest, Recon.
Here is the design decision of SCMContext:
The initial value of currentTerm
in raft is 0. After RaftServer starts, they are all followers of term 0, then election timeout happens, one becomes candidate of term 1, finally the first leader will be elected, which is on a term that is greater or equal to 1.
SCMContext has two raft related field, term
and isLeader
.
- For the cases using Ratis (no matter single server or multi server), the initial value of
term
andisLeader
for SCMContext is0
andfalse
, they will be updated bySCMStateMachine#notifyNotLeader
andSCMStateMachine#notifyLeaderChanged
. - For the cases not using Ratis, the initial value of
term
andisLeader
for SCMContext is0
andtrue
, means leader of term 0.
Since all the isLeader
and getTermOfLeader
check are going through SCMConext
, and they are spreading across the whole SCM, for Ratis mode, it returns real raft info, for non Ratis mode, it behaviors as leader on term 0, so that we treat them in the same way.
What do you think ? We can discuss offline if needed.
6e8b787
to
d0c9ef9
Compare
972c969
to
58145fa
Compare
cc @bshashikant Please take one more look ! |
...-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/SCMBlockDeletingService.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMService.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMServiceManager.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMServiceManager.java
Outdated
Show resolved
Hide resolved
...erver-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/BackgroundPipelineCreatorV2.java
Show resolved
Hide resolved
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMContext.java
Show resolved
Hide resolved
@bshashikant Comments for SCMServiceManager has been addressed, Please take one more look. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GlenGeng for updating the patch.The changes look good to me. I need a clarification regarding replicationManager.
*/ | ||
enum ServiceStatus { | ||
RUNNING, | ||
PAUSING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we should add a status called STOPPED as well here. For example, if i want to suspend replication i can issue, ozone admin replicationManager stop. This state might need to be propagated over Ratis as well. Even after leader re-relection, the service should not auto start on the new leader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the insight.
I will handle bin/ozone admin replicationmanager start|stop|status
in this PR.
I agree that the status of RM should also be replicated through ratis. I created a jira https://issues.apache.org/jira/browse/HDDS-4740 to track this issue, will schedule a discussion in this week's sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create jira for Add STOP state to SCMService.
https://issues.apache.org/jira/browse/HDDS-4744
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the JIRA link of adding STOP state to SCM service was not properly linked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch.
* SCMService related variables. | ||
*/ | ||
private final Lock serviceLock = new ReentrantLock(); | ||
private ServiceStatus serviceStatus = ServiceStatus.PAUSING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of setting default value as ServiceStatus.PAUSING
, will it make sense to initialize this variable as RUNNING during construction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually not. BlockDeletingService should not run on follower SCMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O yes thanks!
*/ | ||
enum ServiceStatus { | ||
RUNNING, | ||
PAUSING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the JIRA link of adding STOP state to SCM service was not properly linked?
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMContext.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMServiceManager.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMServiceManager.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMServiceManager.java
Outdated
Show resolved
Hide resolved
/** | ||
* Notify raft or safe mode related status changed. | ||
*/ | ||
public synchronized void notifyStatusChanged() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Does notifyStatusChanged
has to be a blocking call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execution of this method is quite quick. We may consider relax it if we meet performance issue in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG!
/** | ||
* Implements api for running background pipeline creation jobs. | ||
*/ | ||
public class BackgroundPipelineCreatorV2 implements SCMService { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have missed: is this class tested somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BackgroundPipelineCreator
is replaced by BackgroundPipelineCreatorV2
, and it has been fully test in integration test and docker test.
LGTM! |
@GlenGeng Thanks the patch. @bshashikant @amaliujia Thanks for review. I have merged it. |
What changes were proposed in this pull request?
SCM ServiceManager is going to control all the SCM background service so that they are only serving as the leader.
Design doc: https://docs.google.com/document/d/1DbbqP0m3g_iEpY9qkSGOuQgcCN-QqlSNgWpvBOLv5h0/edit
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4295
How was this patch tested?
CI