Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement raft journal and snapshot management with Ratis #12181

Merged
merged 20 commits into from
Oct 6, 2020

Conversation

bf8086
Copy link
Contributor

@bf8086 bf8086 commented Oct 5, 2020

This PR contains the following changes that implement a raft based journal system and snapshot management using Ratis framework:

  • Updated embedded journal state machine to use Ratis framework
  • Removed dependency on copycat
  • Implemented a snapshot management system that periodically takes snapshots of master metadata on secondary masters and replicates them to the primary master
  • Added proper timeouts when clients and workers attempt to ping masters

bf8086 and others added 19 commits August 25, 2020 10:46
Initial changes for hooking up embedded journal state machine with ratis
API. Implemented leader election, journal application and taking
snapshot on followers.

pr-link: #11940
change-id: cid-0e6c9dff7df0bfdacd6d88bd320a5728ad938edb
pr-link: #12056
change-id: cid-45b0ca0655735d011ca3babafc83fc817b67860f
Remove the copycat dependency and change it to a much smaller catalyst
dependency.
Catalyst dependency is needed for Grpc messaging serialization and
deserialization.

pr-link: #12082
change-id: cid-c8b21501a3379f0ee334ccbf7067fe1c5604a223
This change implements snapshot replication workflows for embedded
journal:

- When a raft leader needs a snapshot, instead of taking snapshot
locally it copies a recent snapshot from one of the followers.
- When a raft follower receives a notification to download a snapshot,
it downloads the latest snapshot from the leader.

pr-link: #12053
change-id: cid-d2f322ef07db1db70e1a2ab6182afbb5cd439764
When journal is suspended, snapshot should not be taken because some
entries included up to the snapshot index might not have been applied by
the suspended journal applier.

pr-link: #12100
change-id: cid-cd1adb3aa61dc28b8dfde0283f48303c1dd20e44
This change implements the quorum join process for new master.

Ratis provides an API for update quorum member list. To implement an
automatic join comparable with what we did with Copycat, the new master
will send a join request to the quorum leader after it initialized the
journal. The leader will atomically convert the request to a raft
configuration change and apply it to the quorum.

This operation is a noop for a master that is already in the quorum.

pr-link: #12099
change-id: cid-35f7de4ffcc56f0ce4969fd632f48c68adf1887a
Added a deadline for master ping requests.

pr-link: #12145
change-id: cid-9c7dcf07e967728651df7367aaf295a297e8ea0b
Sometimes a master attempts to join a quorum during leader election.
This change extends the retry period so the retry can extends pass an
election. Also lower the log severity given its best effort nature.

pr-link: #12152
change-id: cid-65c9b9f2f4360c5b16cc4edb237467a3d5c34ad3
Sometimes a snapshot is ordered when the journal log is not consistent
with the state machine. This change makes it more restrictive that
snapshots can only be taken when a follower is in running state.

pr-link: #12150
change-id: cid-91dca0f71c9a9688e1a3cd58240b61b5df3df79b
Sometimes a Ratis client decides to timeout and close connection when
master tries to gain primacy. This change makes the master to notify
Ratis client to reconnect when an exception is throw, so we don't just
keep retrying on a closed client.

pr-link: #12151
change-id: cid-2da82374da558c9ca3c9ae2446134938b9343e8a
In a cluster with multiple masters using embedded journal, when network
partitioned the original leading master, leadership changes to a new
leading master. The workers hang in GRPC connections with the previous
leading master and don't register with the new leading master.

This PR adds the timeout to periodical connections (operations) between
workers and leading masters to prevent workers from hanging forever.

pr-link: #12149
change-id: cid-47af90077dfc0fed081a315e0f2b2bcd20f9fb96
In an unlucky situation when raft server throw an exception while losing
primacy, the master will attempt to shutdown but stuck in a deadlock of
`RaftJournalSystem::losePrimacy -> System::exit ->
ProcessUtils::stopProcessOnShutdown -> RaftJournalSystem::stop`. It can
be fixed with throwing directly within `losePrimacy`.

pr-link: #12166
change-id: cid-e20b652c8baf6a53f26b79635a9ad4813c66afe5
Current the raft client for writing journal entries will always checking
for leadership and send the request through network interface,
introducing significant overhead. This change implements a raft client
which attempts sending messages directly to local server first, and
falls back to the default client if the local server is no longer a
leader.

The client also has built-in retry function for raft client which is
closed due to timeout.

pr-link: #12174
change-id: cid-df05031cb47f6f931e67c9b4c853b9b426ba145d
@alluxio-bot alluxio-bot added POM Change API Change Changes covering public API labels Oct 5, 2020
@bf8086
Copy link
Contributor Author

bf8086 commented Oct 5, 2020

@yuzhu PTAL.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Alluxio-PR-Builder/11388/
Test FAILed.

Ratis leader sometimes send a request to a follower to install a
snapshot that is older than the follower's log. This change implements a
check to avoid reloading the state machine after such snapshots are
downloaded to prevent some accounting issue.

pr-link: #12179
change-id: cid-fc9d759f0ec0370b89dd8789b47e0cc7b807df12
@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Alluxio-PR-Builder/11389/
Test PASSed.

Copy link
Contributor

@yuzhu yuzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor comment otherwise LGTM

<groupId>org.apache.ratis</groupId>
<artifactId>ratis-grpc</artifactId>
</dependency>
<!-- TODO(lu) remove catalyst dependency which used for serialization and move to Grpc impl -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this todo need to be addressed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a TODO for future, not target for 2.4

@bf8086
Copy link
Contributor Author

bf8086 commented Oct 6, 2020

alluxio-bot, feature-merge this please.

@alluxio-bot alluxio-bot merged commit f0f72c6 into master Oct 6, 2020
@Alluxio Alluxio deleted a comment from alluxio-bot Oct 6, 2020
@apc999 apc999 deleted the ratis-journal branch March 11, 2021 04:48
alluxio-bot pushed a commit that referenced this pull request Mar 2, 2023
### What changes are proposed in this pull request?

Correct docs about Ratis.

### Why are the changes needed?

Embedded journal is introduced in #8219,
then Copycat is replaced by Ratis in #12181.
Some docs are not updated.

### Does this PR introduce any user facing changes?

No.

pr-link: #16985
change-id: cid-592a5c06991c5b067690031ef7fffed9d2e6fac6
YangchenYe323 pushed a commit to YangchenYe323/alluxio that referenced this pull request Apr 16, 2023
### What changes are proposed in this pull request?

Correct docs about Ratis.

### Why are the changes needed?

Embedded journal is introduced in Alluxio#8219,
then Copycat is replaced by Ratis in Alluxio#12181.
Some docs are not updated.

### Does this PR introduce any user facing changes?

No.

pr-link: Alluxio#16985
change-id: cid-592a5c06991c5b067690031ef7fffed9d2e6fac6
jiacheliu3 pushed a commit to jiacheliu3/alluxio that referenced this pull request May 17, 2023
### What changes are proposed in this pull request?

Correct docs about Ratis.

### Why are the changes needed?

Embedded journal is introduced in Alluxio#8219,
then Copycat is replaced by Ratis in Alluxio#12181.
Some docs are not updated.

### Does this PR introduce any user facing changes?

No.

pr-link: Alluxio#16985
change-id: cid-592a5c06991c5b067690031ef7fffed9d2e6fac6
jiacheliu3 pushed a commit to jiacheliu3/alluxio that referenced this pull request May 17, 2023
### What changes are proposed in this pull request?

Correct docs about Ratis.

### Why are the changes needed?

Embedded journal is introduced in Alluxio#8219,
then Copycat is replaced by Ratis in Alluxio#12181.
Some docs are not updated.

### Does this PR introduce any user facing changes?

No.

pr-link: Alluxio#16985
change-id: cid-592a5c06991c5b067690031ef7fffed9d2e6fac6
jiacheliu3 pushed a commit to jiacheliu3/alluxio that referenced this pull request May 17, 2023
### What changes are proposed in this pull request?

Correct docs about Ratis.

### Why are the changes needed?

Embedded journal is introduced in Alluxio#8219,
then Copycat is replaced by Ratis in Alluxio#12181.
Some docs are not updated.

### Does this PR introduce any user facing changes?

No.

pr-link: Alluxio#16985
change-id: cid-592a5c06991c5b067690031ef7fffed9d2e6fac6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Change Changes covering public API POM Change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants