RATIS-1587. Fix snapshot multi-chunk bug & support snapshot hierarchy#655
RATIS-1587. Fix snapshot multi-chunk bug & support snapshot hierarchy#655szetszwo merged 4 commits intoapache:masterfrom
Conversation
szetszwo
left a comment
There was a problem hiding this comment.
@SzyWilliam , thanks a lot for working on this. Some comments/questions inlined. Please see if you could create a unit test.
| String fileNameToStateMachineDir = fileName.substring( | ||
| (dir.STATE_MACHINE_DIR_NAME.length())); |
There was a problem hiding this comment.
We need handle the path separator. How about using Path to relativize it?
@@ -74,6 +74,7 @@ public class SnapshotManager {
// TODO: Make sure that subsequent requests for the same installSnapshot are coming in order,
// and are not lost when whole request cycle is done. Check requestId and requestIndex here
+ final Path stateMachineDir = dir.getStateMachineDir().toPath();
for (FileChunkProto chunk : snapshotChunkRequest.getFileChunksList()) {
SnapshotInfo pi = stateMachine.getLatestSnapshot();
if (pi != null && pi.getTermIndex().getIndex() >= lastIncludedIndex) {
@@ -83,9 +84,9 @@ public class SnapshotManager {
}
String fileName = chunk.getFilename(); // this is relative to the root dir
- // TODO: assumes flat layout inside SM dir
- File tmpSnapshotFile = new File(tmpDir,
- new File(dir.getRoot(), fileName).getName());
+ final Path relative = stateMachineDir.relativize(new File(dir.getRoot(), fileName).toPath());
+ final File tmpSnapshotFile = new File(tmpDir, relative.toString());
+ FileUtils.createDirectories(tmpSnapshotFile);
There was a problem hiding this comment.
It would be great, I'll take this way
|
|
||
| rpc installSnapshot(stream ratis.common.InstallSnapshotRequestProto) | ||
| returns(ratis.common.InstallSnapshotReplyProto) {} | ||
| returns(stream ratis.common.InstallSnapshotReplyProto) {} |
There was a problem hiding this comment.
Is this an incompatible change? If yes, we should document it.
There was a problem hiding this comment.
I think it is a bug when using GRPC as communication protocol and involves multiple snapshot chunks. The handler of InstallSnapshot is implemented bidirectional-streaming, but the proto is declaimed client-streaming. The mismatch will cause the leader received a HTTP RST CANCEL for the last installSnapshot request.
There was a problem hiding this comment.
I agree that this is a bug. If it is an incompatible change, we still have to document it even for a bug since it will break existing, working applications. Currently, applications with single InstallSnapshotRequestProto are working.
Indeed, it may not be incompatible. Could you test it if an old server (without stream) can talk to a new server (with stream) ?
|
@szetszwo I added unit test which covers the scenario of Leader InstallSnapshot to Followers with snapshot containing multiple files in nested folder. |
8a13514 to
39eae00
Compare
szetszwo
left a comment
There was a problem hiding this comment.
@SzyWilliam , thanks for the update! Both new files need license header. Please test if adding stream is incompatible or not.
| @@ -0,0 +1,154 @@ | |||
| package org.apache.ratis; | |||
|
@szetszwo Add License header. Also, I did the test and I think the |
9e5e719 to
b15b3a5
Compare
b15b3a5 to
b0cfda3
Compare
szetszwo
left a comment
There was a problem hiding this comment.
+1 the change looks good.
|
@SzyWilliam , thanks a lot for testing the compatibility! |
What changes were proposed in this pull request?
Fix snapshot multiple-chunk bug. Currently, when leader install a snapshot(multiple chunks) to a newly joined follower, leader will send multiple InstallSnapshot RPCs. However, each RPC will create a tmp dir with Random UUID, place the chunk in this tmp dir, and only renames the last tmp dir to sm-dir. In this PR, I propose to create tmp dir using request.uuid(), which remains unchanged among multiple RPCs.
Fix Grpc Stream errors. Currently In grpc.proto, InstallSnapshot is declaimed as client-end streaming rpc, but it is actual bi-directional streaming rpc. In this PR, I addded stream to InstallSnapshot proto so that it becomes bi-directional.
Support snapshot file hierarchy. Currently all files of a snapshot will be placed in statemachine dir and file hierarchy is flattened. In this PR, I name each file using its original filename (which contains hierarchy information).
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/RATIS-1587
How was this patch tested?