Skip to content

HDDS-5202. Use scm#checkLeader before processing client requests .#2229

Merged
bharatviswa504 merged 2 commits intoapache:masterfrom
bharatviswa504:HDDS-5202
May 13, 2021
Merged

HDDS-5202. Use scm#checkLeader before processing client requests .#2229
bharatviswa504 merged 2 commits intoapache:masterfrom
bharatviswa504:HDDS-5202

Conversation

@bharatviswa504
Copy link
Contributor

What changes were proposed in this pull request?

SCM server should start accepting requests when it is leader and isLeaderReady.

We need isLeaderReady also because Statemachine should apply all the log committed transactions to start accepting requests.

So, instead of scmContext#isLeader, use scm#checkLeader.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5202

How was this patch tested?

Existing CI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we separate NotLeaderException and LeaderNotReadyException ? client should failover for the former one, and retry the same SCM for the latter one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether ratis now has the feature of leader-lease, with which scm will not need to check this every time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not implemented yet, tracked by https://issues.apache.org/jira/browse/RATIS-1273

Copy link
Contributor Author

@bharatviswa504 bharatviswa504 May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason behind adding this check is that read requests should not be processed before becoming leader becomes ready.

The reason is We do not want to serve the reads based on not up-to-date SCM DB. For writes as this is going via ratis, this is already taken care.

Copy link
Contributor Author

@bharatviswa504 bharatviswa504 May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we separate NotLeaderException and LeaderNotReadyException ? client should failover for the former one, and retry the same SCM for the latter one

Good idea, let me update based on that.

Copy link
Contributor Author

@bharatviswa504 bharatviswa504 May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we separate NotLeaderException and LeaderNotReadyException ? client should failover for the former one, and retry the same SCM for the latter one.

@GlenGeng As we have suggested Leader handling in Client, even though it throws error NotLeader Exception to the client, but it passes the leaderAddress in the case of LeaderNotReady it will be the same node address so retry will happen on the same Node. So, I believe we are good here. (We can keep simple as we have suggested leader handling in SCM)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  public boolean checkLeader() {
    // For NON-HA setup, the node will always be the leader
    if (!SCMHAUtils.isSCMHAEnabled(configuration)) {
      Preconditions.checkArgument(scmContext.isLeader());
      return true;
    } else {
      // FOR HA setup, the node has to be the leader and ready to serve
      // requests.
      return scmContext.isLeader() && getScmHAManager().getRatisServer()
          .getDivision().getInfo().isLeaderReady();
    }
  }

    public boolean isLeaderReady() {
      return this.isLeader() && RaftServerImpl.this.getRole().isLeaderReady();
    }

in checkLeader , the above isLeaderReady will be called and another isLeader will be called ( not scmcontext.isLeader()), which is implemented by ratis.

scmcontext.isLeader is notified and thus changed by ratis. so, maybe here we can just only use getScmHAManager().getRatisServer().getDivision().getInfo().isLeaderReady() to checkLeader(), and I think that is enough. what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya we can use getScmHAManager().getRatisServer().getDivision().getInfo().isLeaderReady() in checkLeader.
Thanks for the suggestion, I will change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, PTAL.

@bharatviswa504
Copy link
Contributor Author

Thank You @GlenGeng and @JacksonYao287 for the review.
I have addressed/replied to the review comments.

@GlenGeng-awx
Copy link
Contributor

+1.
Thanks @bharatviswa504 for the work. Thanks @JacksonYao287 for the review. Waiting for CI.

@JacksonYao287
Copy link
Contributor

Thanks @bharatviswa504 for the work. Thanks @GlenGeng for the review.
LGTM. +1

@bharatviswa504 bharatviswa504 merged commit f8a06e0 into apache:master May 13, 2021
@bharatviswa504
Copy link
Contributor Author

Thank You @GlenGeng and @JacksonYao287 for the review.

bharatviswa504 added a commit to bharatviswa504/hadoop-ozone that referenced this pull request Jul 25, 2021
…pache#2229)

(cherry picked from commit f8a06e0)
Change-Id: I514fb26992a5b562dacc1768b721940cb03c6c03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants