Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RATIS-2111] Reinitialize should load the latest snapshot #1111

Merged
merged 5 commits into from
Jun 13, 2024

Conversation

AngersZhuuuu
Copy link
Contributor

What changes were proposed in this pull request?

When we setting INSTALL_SNAPSHOT_ENABLED_KEY as true and when master sync snapshot to follower,
the status will be PAUSED and won't take snapshot, also follower StateMachine will call reinitialize().
Then we need to load the the latest snapshot.

  /**
   * Re-initializes the State Machine in PAUSED state. The
   * state machine is responsible reading the latest snapshot from the file system (if any) and
   * initialize itself with the latest term and index there including all the edits.
   */
  void reinitialize() throws IOException;

But current SimpleStateMachineStorage when init(), latestSnapshot was set. But the master's snapshot won't be taken so the latestSnapshot won't be updated to master's snapshot.

So here we should load the latest snapshot and update it before do reinitialize.
Similar thing was done in alluxio https://github.com/Alluxio/alluxio/blob/efed93c95e6f1cf3b2131066208f03cfd7821b58/dora/core/server/common/src/main/java/alluxio/master/journal/raft/JournalStateMachine.java#L209-L214

and celeborn also meet such issue. apache/celeborn#2547

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/RATIS-2111

How was this patch tested?

(Please explain how this patch was tested. Ex: unit tests, manual tests)
(If this patch involves UI changes, please attach a screen-shot; otherwise, remove this)

@AngersZhuuuu
Copy link
Contributor Author

ping @szetszwo

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AngersZhuuuu , thanks a lot for working on this!

Please see the comment inlined. After that change we make replace getLatestSnapshot() with loadLatestSnapshot() as below.

diff --git a/ratis-examples/src/main/java/org/apache/ratis/examples/arithmetic/ArithmeticStateMachine.java b/ratis-examples/src/main/java/org/apache/ratis/examples/arithmetic/ArithmeticStateMachine.java
index e8b142f5dc..c4adff5987 100644
--- a/ratis-examples/src/main/java/org/apache/ratis/examples/arithmetic/ArithmeticStateMachine.java
+++ b/ratis-examples/src/main/java/org/apache/ratis/examples/arithmetic/ArithmeticStateMachine.java
@@ -81,7 +81,7 @@ public class ArithmeticStateMachine extends BaseStateMachine {
   @Override
   public void reinitialize() throws IOException {
     close();
-    loadSnapshot(storage.getLatestSnapshot());
+    loadSnapshot(storage.loadLatestSnapshot());
   }
 
   @Override
diff --git a/ratis-examples/src/main/java/org/apache/ratis/examples/counter/server/CounterStateMachine.java b/ratis-examples/src/main/java/org/apache/ratis/examples/counter/server/CounterStateMachine.java
index 47880af553..a65582ea00 100644
--- a/ratis-examples/src/main/java/org/apache/ratis/examples/counter/server/CounterStateMachine.java
+++ b/ratis-examples/src/main/java/org/apache/ratis/examples/counter/server/CounterStateMachine.java
@@ -138,7 +138,7 @@ public class CounterStateMachine extends BaseStateMachine {
    */
   @Override
   public void reinitialize() throws IOException {
-    load(storage.getLatestSnapshot());
+    load(storage.loadLatestSnapshot());
   }
 
   /**

@@ -228,6 +228,17 @@ public SingleFileSnapshotInfo getLatestSnapshot() {
}
}

public void loadLatestSnapshot() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it return SingleFileSnapshotInfo and then reuse it in getLatestSnapshot().

diff --git a/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/SimpleStateMachineStorage.java b/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/SimpleStateMachineStorage.java
index 88cc57dab5..7e8afbaa85 100644
--- a/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/SimpleStateMachineStorage.java
+++ b/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/SimpleStateMachineStorage.java
@@ -217,6 +217,10 @@ public class SimpleStateMachineStorage implements StateMachineStorage {
     if (s != null) {
       return s;
     }
+    return loadLatestSnapshot();
+  }
+
+  public SingleFileSnapshotInfo loadLatestSnapshot() {
     final File dir = stateMachineDir;
     if (dir == null) {
       return null;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

@AngersZhuuuu
Copy link
Contributor Author

@szetszwo GA passed

@szetszwo szetszwo merged commit 25a41e3 into apache:master Jun 13, 2024
12 checks passed
szetszwo pushed a commit to szetszwo/ratis that referenced this pull request Jun 16, 2024
RexXiong pushed a commit to apache/celeborn that referenced this pull request Jul 11, 2024
### What changes were proposed in this pull request?

Bump Ratis version from 3.0.1 to 3.1.0. Meanwhile, remove `CelebornStateMachineStorage` with the release of apache/ratis#1111.

### Why are the changes needed?

Bump Ratis version from 3.0.1 to 3.1.0. Ratis has released v3.1.0, of which release note refers to [3.1.0](https://ratis.apache.org/post/3.1.0.html). The 3.1.0 version is a minor release with multiple improvements and bugfixes including [[RATIS-2111] Reinitialize should load the latest snapshot](https://issues.apache.org/jira/browse/RATIS-2111). See the [changes between 3.0.1 and 3.1.0](apache/ratis@ratis-3.0.1...ratis-3.1.0) releases.

Follow up #2547.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`MasterStateMachineSuiteJ#testInstallSnapshot`

Closes #2610 from SteNicholas/CELEBORN-1499.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: Shuang <lvshuang.xjs@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants