Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-4295. Add SCMServiceManager to SCM HA. #1784

Merged
merged 5 commits into from Jan 28, 2021

Conversation

GlenGeng-awx
Copy link
Contributor

What changes were proposed in this pull request?

SCM ServiceManager is going to control all the SCM background service so that they are only serving as the leader.

Design doc: https://docs.google.com/document/d/1DbbqP0m3g_iEpY9qkSGOuQgcCN-QqlSNgWpvBOLv5h0/edit

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4295

How was this patch tested?

CI

*/
public synchronized void leaderToFollower() {
LOG.info("leaderToFollower is called.");
raftCondition = RaftCondition.FOLLOWER_TO_LEADER;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RaftCondition.FOLLOWER_TO_LEADER -> RaftCondition.LEADER_TO_FOLLOWER

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch.

* @return term
* @throws NotLeaderException if isLeader is false
*/
public long getTerm() throws NotLeaderException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function name shoulld be changed to getTermOfLeader() or something like this to make it intuitive enough as this is supposed to be only a leader executed code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

* @param newIsLeader : is leader or not
* @param newTerm : term if current SCM becomes leader
*/
public void updateIsLeaderAndTerm(boolean newIsLeader, long newTerm) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newIsLeader doesn't make much sense here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, will find a better name. Previously it is isLeader and term, but can not pass checkstyle.

* @param newIsLeader : is leader or not
* @param newTerm : term if current SCM becomes leader
*/
public void updateIsLeaderAndTerm(boolean newIsLeader, long newTerm) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateIsLeaderAndTerm -> updateLeaderAndTerm

* @param raftCondition latest raft condition
* @param safeModeCondition latest safe mode condition
*/
void notifyRaftCondOrSafeModeCondChanged(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some more info here, regarding what would be one time transitional events and what would be recurring which needs to be notified ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@bshashikant
Copy link
Contributor

Thanks @GlenGeng for working on this. I went through the patch once and given some comments inline. Still need to look through in more details.

I also want to know, how are we handling case of a stale leader here? I know there is an effort inside ratis to address/detect a case of stale leader., but don't we really need anything specific to handle such case here ?

Also, the transitional cases seem to be handled for LEADER_TO_FOLLOWER and FOLLOWER_TO_LEADER? How about transition to CANDIDATE state in general?

@amaliujia
Copy link
Contributor

cc @amaliujia

@GlenGeng-awx
Copy link
Contributor Author

@bshashikant Thanks for reviewing this patch!

I also want to know, how are we handling case of a stale leader here? I know there is an effort inside ratis to address/detect a case of stale leader., but don't we really need anything specific to handle such case here ?

The request inside SCM can be divided into two categories, read/write request and read only request.

The read/write request will go through ratis as a RaftClientRequest, so we don’t need to care about whether the underlying RaftServer is a still leader or has already stepped down. These read/write requests include add/remove/update pipeline/container/deleted block.

For the read only request, they will not go through ratis, a stale leader SCM might make a dangerous decision if it misses the latest committed writes, a.k.a the split-brain issues occurred, two leaders of different terms live together.

Nanda pointed out such a case:

Our major concern here is the DeleteContainerCommand sent by the stale leader. It is possible that the ICR of the recently created container is received by the stale leader which has no idea about the new container. The stale leader will end up sending DeleteContainerCommand to the datanode in this case.

See SCM HA Handling Stale Leader drafted by @nandakumar131 for more details.

With help of the lease solution, leader SCM can handle the read only request, then call isLeader/getTermOfLeader in SCMContext to confirm its leadership, ensuring that when it handles the read only request, it has seen the latest commit writes.

Also, the transitional cases seem to be handled for LEADER_TO_FOLLOWER and FOLLOWER_TO_LEADER? How about transition to CANDIDATE state in general?

The candidate state is not expose by Ratis, and SCM does not cares about this state either. SCM just treat candidate the same as follower: it should stop the work if not leader.

SCMServiceManager#leaderToFollower is called in StateMachine#notifyNotLeader and SCMServiceManager#followerToLeader is called in StateMachine#notifyLeaderChanged when groupMemberId.getPeerId().equals(newLeaderId)

We may have a offline talk to discuss more about the subtle cases in how we integrate SCM HA with ratis.

@GlenGeng-awx
Copy link
Contributor Author

@bshashikant @amaliujia I've update according to your comments, please take one more look Thanks!

@GlenGeng-awx GlenGeng-awx force-pushed the HDDS-4295-Latest branch 3 times, most recently from 2686e67 to 6e8b787 Compare January 18, 2021 09:38
* - SafeMode related info, e.g., inSafeMode, preCheckComplete.
*/
public final class SCMContext implements EventHandler<SafeModeStatus> {
private static final Logger LOG = LoggerFactory.getLogger(SCMContext.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case where SCM HA feature is not enabled (no Ratis), SCM context still seem to be relevant. In such cases, leader info and term info don't make sense. Can we add some constants to define leader state and term when ratis is not enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bshashikant, this is a good point.

Due to PR #1744, integration test which is bases on MiniOzoneCluster has been running upon single-server raft cluster. But we still have cases that run without Ratis, e.g., UnitTest, Recon.

Here is the design decision of SCMContext:

The initial value of currentTerm in raft is 0. After RaftServer starts, they are all followers of term 0, then election timeout happens, one becomes candidate of term 1, finally the first leader will be elected, which is on a term that is greater or equal to 1.

SCMContext has two raft related field, term and isLeader.

  • For the cases using Ratis (no matter single server or multi server), the initial value of term and isLeader for SCMContext is 0 and false, they will be updated by SCMStateMachine#notifyNotLeader and SCMStateMachine#notifyLeaderChanged.
  • For the cases not using Ratis, the initial value of term and isLeader for SCMContext is 0 and true, means leader of term 0.

Since all the isLeader and getTermOfLeader check are going through SCMConext, and they are spreading across the whole SCM, for Ratis mode, it returns real raft info, for non Ratis mode, it behaviors as leader on term 0, so that we treat them in the same way.

What do you think ? We can discuss offline if needed.

@GlenGeng-awx
Copy link
Contributor Author

cc @bshashikant Please take one more look !

@GlenGeng-awx
Copy link
Contributor Author

@bshashikant Comments for SCMServiceManager has been addressed, Please take one more look. Thanks!

Copy link
Contributor

@bshashikant bshashikant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GlenGeng for updating the patch.The changes look good to me. I need a clarification regarding replicationManager.

*/
enum ServiceStatus {
RUNNING,
PAUSING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we should add a status called STOPPED as well here. For example, if i want to suspend replication i can issue, ozone admin replicationManager stop. This state might need to be propagated over Ratis as well. Even after leader re-relection, the service should not auto start on the new leader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the insight.
I will handle bin/ozone admin replicationmanager start|stop|status in this PR.
I agree that the status of RM should also be replicated through ratis. I created a jira https://issues.apache.org/jira/browse/HDDS-4740 to track this issue, will schedule a discussion in this week's sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create jira for Add STOP state to SCMService.
https://issues.apache.org/jira/browse/HDDS-4744

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the JIRA link of adding STOP state to SCM service was not properly linked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch.

* SCMService related variables.
*/
private final Lock serviceLock = new ReentrantLock();
private ServiceStatus serviceStatus = ServiceStatus.PAUSING;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of setting default value as ServiceStatus.PAUSING, will it make sense to initialize this variable as RUNNING during construction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually not. BlockDeletingService should not run on follower SCMs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O yes thanks!

*/
enum ServiceStatus {
RUNNING,
PAUSING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the JIRA link of adding STOP state to SCM service was not properly linked?

/**
* Notify raft or safe mode related status changed.
*/
public synchronized void notifyStatusChanged() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Does notifyStatusChanged has to be a blocking call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The execution of this method is quite quick. We may consider relax it if we meet performance issue in future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG!

/**
* Implements api for running background pipeline creation jobs.
*/
public class BackgroundPipelineCreatorV2 implements SCMService {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed: is this class tested somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BackgroundPipelineCreator is replaced by BackgroundPipelineCreatorV2, and it has been fully test in integration test and docker test.

@amaliujia
Copy link
Contributor

LGTM!

@runzhiwang runzhiwang merged commit 3b6918e into apache:HDDS-2823 Jan 28, 2021
@runzhiwang
Copy link
Contributor

@GlenGeng Thanks the patch. @bshashikant @amaliujia Thanks for review. I have merged it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants