Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: distinguish between normal node startup and snapshot loading #319

Merged

Conversation

panlei-coder
Copy link
Collaborator

@panlei-coder panlei-coder commented May 15, 2024

1.方案:
(1)如果数据库正常关闭,理论上是不需要进行日志回放的,因为数据库在关闭是会将内存数据刷盘,并将日志截断到最新的位置,这样就可以保证正常关闭下的节点启动是no-op的。那么在数据库非正常关闭时,节点启动如果加载了快照,只会是一个多余的操作,因为快照本身就是空的,只有snapshot_meta数据。所以,在节点启动的时候,第一次调用on_snapshot_load接口理论上是不需要进行快照的加载,需要对第一个启动和正常快照安装进行区分。但是有一点需要注意,节点在刚加入集群且没有数据时,它在初始化之后是不会加载本地的快照(因为快照文件夹是空的),所以在安装来自 leader 的快照时需要 LoadDBFromCheckpoint 过程。
(2)节点在启动的时候,它是按照快照中 snapshot_meta 记录的 last_applied_index 作为日志回放点,那么在故障重启时,节点需要过滤掉一些日志,这无疑是影响效率的。我们可以在节点调用 on_load_snapshot 开始位置就设置这个日志回放点的位置,根据 DB 中最小 flush-index 来设置(这里的逻辑需要进一步的完善)。
(3)braft 的 pr : pikiwidb/braft#4

2.测试
因为获取 DB 中最小的 flush-index 目前还没有完善,所以在测试的时候将一个 DB 下 rocksdb 的数量设置成了1,使用save_load.sh 脚本进行测试。
47b1ee317bbd6abc2713ba8fc34f44a
d8c1c35829b6ed6534ff03275342364

Summary by CodeRabbit

  • 新功能

    • 添加了获取最小刷盘日志索引的新功能(适用于 StorageRedis 类)。
    • 新增了 PRaft 类中的 GetTermGetLastLogIndex 方法。
  • 配置更新

    • 更新了 pikiwidb.conf 配置文件:db-instance-num 从 3 改为 1,use-raft 从 "no" 改为 "yes"。
    • 修改了 CMakeLists.txtADDRESS_SANITIZER 选项默认设置,从 ON 改为 OFF
  • 性能改进

    • 优化了 DB::Open 方法的快照功能和数据库加载逻辑。
    • 更新了快照加载和节点启动时的行为逻辑。
  • 代码重构

    • 重构了 DoSnapshot 方法以处理同步和异步快照操作。
    • 修正了 PRaft 类中的注释错误,并优化了 prefix 的构造方式。

@github-actions github-actions bot added 🧹 Updates This will not be worked on Invalid PR Title labels May 15, 2024
@panlei-coder panlei-coder changed the title New log playback to latest feat: distinguish between normal node startup and snapshot loading May 15, 2024
@AlexStocks
Copy link
Collaborator

@panlei-coder please fix ci failure and file confliction.

@panlei-coder
Copy link
Collaborator Author

@panlei-coder please fix ci failure and file confliction.

ok

@github-actions github-actions bot added the ✏️ Feature New feature or request label May 27, 2024
pikiwidb.conf Outdated Show resolved Hide resolved
src/praft/praft.cc Outdated Show resolved Hide resolved
src/praft/praft.cc Outdated Show resolved Hide resolved
Copy link

coderabbitai bot commented Jun 4, 2024

Walkthrough

此次变更涉及多个文件,主要集中在对外部项目获取的调整、配置文件的修改、存储和快照处理方法的更新,以及类方法的重构。这些更改旨在提升项目的功能和性能,同时增强系统的稳定性与灵活性。

Changes

文件 变更摘要
cmake/braft.cmake 修改了GIT_REPOSITORYGIT_TAG以指向不同的仓库和分支进行测试。
CMakeLists.txt ADDRESS_SANITIZER选项的默认设置从ON更改为OFF
pikiwidb.conf 修改了db-instance-num从3到1,use-raftnoyes
src/db.cc 使用lambda函数替代std::bind以捕获引用raft;在LoadDBFromCheckpoint方法中,添加了关闭旧存储的步骤。
src/praft/CMakeLists.txt PRIVATE范围内添加了新的包含目录${PROJECT_SOURCE_DIR}/src/storage/include
src/praft/praft.cc 增加了注释修正、构建prefix、新增方法GetTermGetLastLogIndex,重构DoSnapshot以处理同步和异步快照操作。
src/praft/praft.h 添加了新方法GetTermGetLastLogIndex,以及新的布尔成员变量is_node_first_start_up_
src/storage/include/storage/... Storage类中添加了新方法GetSmallestFlushedLogIndex()
src/storage/src/log_index.h 修改了GetLastFlushIndex方法以返回last_flush_index_的常量引用。
src/storage/src/redis.h Redis类中添加了新方法GetSmallestFlushedLogIndex
src/storage/src/storage.cc 移除了LoadCheckpointInternal函数中对insts_[index]的重置调用,添加了GetSmallestFlushedLogIndex方法。

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant RaftNodeCmd
    participant PSTORE
    participant PRAFT
    User->>RaftNodeCmd: 启动DoCmdSnapshot
    RaftNodeCmd->>PSTORE: 获取self_snapshot_index
    PSTORE->>RaftNodeCmd: 返回self_snapshot_index
    RaftNodeCmd->>PRAFT: 调用DoSnapshot(self_snapshot_index)
    PRAFT-->>RaftNodeCmd: 快照完成
    RaftNodeCmd-->>User: 快照成功
Loading

Poem

在代码的世界里,我们舞动,
新的功能,悄然加入;
快照与存储,皆有提升,
兔子欢跃,系统更坚固。

🌟🚀代码更新,未来可期!🚀🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Outside diff range and nitpick comments (1)
src/praft/psnapshot.cc (1)

Line range hint 69-89: The snapshot generation logic is comprehensive. However, consider replacing the hardcoded last_log_index with a dynamically retrieved or configured value to avoid potential issues in different environments.

- auto last_log_index = 30000;  // @todo
+ auto last_log_index = PSTORE.GetBackend(db_id)->GetStorage()->GetSmallestFlushedLogIndex();  // Dynamic retrieval
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between eec3030 and dcdf232.

Files selected for processing (14)
  • cmake/braft.cmake (1 hunks)
  • cmake/openssl.cmake (2 hunks)
  • pikiwidb.conf (1 hunks)
  • save_load.sh (2 hunks)
  • src/CMakeLists.txt (1 hunks)
  • src/cmd_raft.cc (1 hunks)
  • src/db.cc (1 hunks)
  • src/praft/praft.cc (5 hunks)
  • src/praft/praft.h (2 hunks)
  • src/praft/psnapshot.cc (3 hunks)
  • src/storage/include/storage/storage.h (1 hunks)
  • src/storage/src/log_index.cc (2 hunks)
  • src/storage/src/redis.h (1 hunks)
  • src/storage/src/storage.cc (1 hunks)
Files skipped from review due to trivial changes (3)
  • cmake/openssl.cmake
  • pikiwidb.conf
  • src/CMakeLists.txt
Additional context used
Shellcheck
save_load.sh

[info] 6-6: Use ./glob or -- glob so names with dashes won't become options. (SC2035)


[info] 7-7: Use ./glob or -- glob so names with dashes won't become options. (SC2035)

Additional comments not posted (12)
save_load.sh (1)

14-16: The addition of repeated redis-benchmark and redis-cli commands appears to be for stress testing or functionality checks.

cmake/braft.cmake (1)

19-23: The change in GIT repository and tag points to a different branch for testing. Ensure that this is intended for permanent integration or revert after testing.

src/db.cc (1)

40-43: Switching from std::bind to a lambda expression for do_snapshot_function enhances readability and performance. Good use of modern C++ practices.

src/praft/psnapshot.cc (1)

Line range hint 69-89: Handling of files in the snapshot directory is appropriately managed to ensure only relevant files are considered.

src/praft/praft.h (1)

140-141: The addition of new methods and a member variable supports new functionalities and enhances the RAFT module's capabilities. Good use of default arguments for backward compatibility.

Also applies to: 170-170

src/storage/src/log_index.cc (1)

172-172: The addition of logging statements enhances system visibility and aids in debugging and monitoring. Good practice.

src/storage/src/redis.h (1)

113-115: Implementation of GetSmallestFlushedLogIndex looks correct.

This method correctly retrieves the smallest flushed log index, which is crucial for ensuring that the node's state is consistent with the persisted state.

src/praft/praft.cc (3)

245-252: Added methods GetTerm and GetLastLogIndex enhance the functionality of the PRaft class by providing access to specific log details.

Also applies to: 254-261


659-682: The logic in on_snapshot_load handles different startup scenarios effectively. However, ensure that the method GetSmallestFlushedLogIndex is implemented correctly and efficiently.

Verification successful

The method GetSmallestFlushedLogIndex is indeed implemented in the codebase. Here are the relevant locations:

  • Declaration: src/storage/include/storage/storage.h
  • Implementation: src/storage/src/storage.cc

This confirms that the method is implemented and used correctly.

  • src/storage/include/storage/storage.h: LogIndex GetSmallestFlushedLogIndex() const;
  • src/storage/src/storage.cc: LogIndex Storage::GetSmallestFlushedLogIndex() const {
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the implementation of `GetSmallestFlushedLogIndex`.

# Test: Search for the implementation and usage of `GetSmallestFlushedLogIndex`. Expect: Efficient and correct implementation.
ast-grep --lang cpp --pattern $'class Storage { $$$ LogIndex GetSmallestFlushedLogIndex() const { $$$ } $$$ }' src/storage/include/storage/storage.h

Length of output: 494



Script:

#!/bin/bash
# Description: Verify the implementation of `GetSmallestFlushedLogIndex`.

# Test: Search for the implementation and usage of `GetSmallestFlushedLogIndex`. Expect: Efficient and correct implementation.
# Refine the search pattern and broaden the search scope.
ast-grep --lang cpp --pattern $'LogIndex GetSmallestFlushedLogIndex() const { $$$ }' 

Length of output: 85



Script:

#!/bin/bash
# Description: Verify the implementation of `GetSmallestFlushedLogIndex`.

# Test: Search for the implementation and usage of `GetSmallestFlushedLogIndex`. Expect: Efficient and correct implementation.
# Broaden the search to include potential variations of the method name and search across all files.
rg --glob '!*.md' 'GetSmallestFlushedLogIndex'

Length of output: 869


154-154: Ensure the path construction uses secure and efficient string handling.

src/storage/include/storage/storage.h (1)

1126-1127: Add documentation for GetSmallestFlushedLogIndex.

It's good practice to document public methods, especially for complex functionalities like this one. Consider adding a brief comment explaining what "smallest flushed log index" refers to and how it should be used.

src/storage/src/storage.cc (1)

2520-2527: The implementation of GetSmallestFlushedLogIndex method is correct and efficient.

src/cmd_raft.cc Outdated
Comment on lines 125 to 127
// @todo need to get self_snapshot_index
// auto self_snapshot_index = PSTORE.GetBackend(client->GetCurrentDB())->GetStorage()->GetSmallestFlushedLogIndex();
auto s = PRAFT.DoSnapshot(); // self_snapshot_index
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete the implementation for self_snapshot_index or clarify the TODO.

The commented-out code suggests that self_snapshot_index is intended to be used but is not implemented. Would you like assistance in completing this part of the code?

@@ -9,6 +9,7 @@

#include <cassert>

#include "braft/raft.h"
Copy link

@coderabbitai coderabbitai bot Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Codebase Verification

The verification process did not find any direct or indirect usage of entities from braft/raft.h in src/praft/praft.cc. The references to raft found were in comments and error messages, which do not necessitate the inclusion of the header file.

  • Lines: 12-12
  • File: src/praft/praft.cc
Analysis chain

Ensure the newly added #include "braft/raft.h" is necessary.


Let's further investigate if there are any indirect usages or dependencies that might require the inclusion of braft/raft.h.

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify if the newly added include is used in the file.

# Test: Search for usage of entities from `braft/raft.h`. Expect: At least one usage.
ast-grep --lang cpp --pattern $'$_ { $$$ braft::raft::$$$ }' src/praft/praft.cc

Length of output: 80



Script:

#!/bin/bash
# Description: Verify if there are any indirect usages or dependencies that might require the inclusion of `braft/raft.h`.

# Test 1: Search for any usage of `braft` namespace in the file.
ast-grep --lang cpp --pattern $'$_ { $$$ braft::$$$ }' src/praft/praft.cc

# Test 2: Search for any usage of `raft` keyword in the file.
rg --word-regexp 'raft' src/praft/praft.cc

Length of output: 343

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

兔子 留言说这个头文件没用到,却引入了,请确认

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexStocks 经过验证,#include "braft/raft.h" 头文件在 src/praft/praft.cc 中没有被直接使用。仅在注释和错误消息中出现了 raft 关键字。因此,可以安全地移除这个头文件。

Analysis chain
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify if the newly added include is used in the file.

# Test: Search for usage of entities from `braft/raft.h`. Expect: At least one usage.
ast-grep --lang cpp --pattern $'$_ { $$$ braft::raft::$$$ }' src/praft/praft.cc

# Test 1: Search for any usage of `braft` namespace in the file.
ast-grep --lang cpp --pattern $'$_ { $$$ braft::$$$ }' src/praft/praft.cc

# Test 2: Search for any usage of `raft` keyword in the file.
rg --word-regexp 'raft' src/praft/praft.cc

Length of output: 424

@@ -1,4 +1,5 @@
#!/bin/bash

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Codebase Verification

The save_load.sh script contains glob patterns (rm -fr *) that could potentially be affected by filenames with dashes. To prevent filenames with dashes from being interpreted as options, consider using ./* instead of *.

  • Line 2: rm -fr * should be updated to rm -fr ./ *
  • Line 7: rm -fr * should be updated to rm -fr ./ *
Analysis chain

Consider using ./ or -- with glob patterns to prevent filenames with dashes from being interpreted as options.

Also applies to: 7-7

Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify if filenames with dashes are being interpreted as options.

# Test: Check if filenames with dashes exist and could potentially be interpreted as options.
find . -name "*-*" -exec echo "Filename with dash found: {}" \;

Length of output: 3925



Script:

#!/bin/bash
# Description: Check for glob patterns in save_load.sh that could be affected by filenames with dashes.

# Test: Search for glob patterns in save_load.sh
grep -E '\*|\?' save_load.sh

Length of output: 217

Comment on lines +14 to +21
#include "braft/snapshot.h"
#include "butil/files/file_path.h"

#include "pstd/log.h"
#include "pstd/pstd_string.h"

#include "config.h"
#include "praft.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additions and changes to snapshot path parsing are necessary for the new functionality. Consider adding unit tests to cover potential edge cases in the parsing logic.

Would you like me to help in creating unit tests for the snapshot path parsing logic?

Also applies to: 31-51

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between dcdf232 and 6f9f6fb.

Files selected for processing (1)
  • src/praft/praft.cc (5 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/praft/praft.cc

@AlexStocks
Copy link
Collaborator

@CodiumAI-Agent /improve

@CodiumAI-Agent
Copy link

PR Code Suggestions ✨

CategorySuggestion                                                                                                                                    Score
Possible bug
Ensure the snapshot index is correctly retrieved and used in the snapshot operation

The line where DoSnapshot is called uses a placeholder comment for self_snapshot_index but
does not implement the retrieval of this index. It's crucial to ensure the snapshot index
is correctly retrieved and passed to the DoSnapshot method to maintain data consistency.

src/cmd_raft.cc [127]

-auto s = PRAFT.DoSnapshot();  // self_snapshot_index
+auto self_snapshot_index = PSTORE.GetBackend(client->GetCurrentDB())->GetStorage()->GetSmallestFlushedLogIndex();
+auto s = PRAFT.DoSnapshot(self_snapshot_index);
 
Suggestion importance[1-10]: 10

Why: This suggestion addresses a critical issue where the snapshot index is not retrieved and passed correctly, which can lead to data inconsistency. Implementing this change ensures the snapshot operation is performed accurately.

10
Enhancement
Replace the hardcoded last log index with a dynamically retrieved value

The snapshot meta update process uses a hardcoded last_log_index value. This should be
dynamically retrieved to ensure the snapshot reflects the current state accurately.

src/praft/psnapshot.cc [84]

-auto last_log_index = 30000;  // @todo PSTORE.GetBackend(db_id)->GetStorage()->GetSmallestFlushedLogIndex();
+auto last_log_index = PSTORE.GetBackend(db_id)->GetStorage()->GetSmallestFlushedLogIndex();
 
Suggestion importance[1-10]: 9

Why: Replacing the hardcoded value with a dynamically retrieved value ensures that the snapshot accurately reflects the current state, which is crucial for data integrity and correctness.

9
Possible issue
Initialize the status object with a default value to ensure consistent behavior

The method DoSnapshot should handle the case where is_sync is false more robustly.
Currently, it initializes a butil::Status object but does not set any value before
returning it, which could lead to undefined behavior or incorrect status handling.

src/praft/praft.cc [551-552]

-butil::Status status;
+butil::Status status = butil::Status::OK();
 return status;
 
Suggestion importance[1-10]: 8

Why: Initializing the status object with a default value ensures consistent behavior and prevents potential undefined behavior or incorrect status handling, which is important for robustness.

8
Best practice
Add checks to ensure only valid log indices are considered when determining the smallest flushed log index

The method GetSmallestFlushedLogIndex iterates over instances and uses std::min without
checking if the instances are valid or if GetSmallestFlushedLogIndex returns a meaningful
value. It's important to add error handling or checks to ensure the value is valid.

src/storage/src/storage.cc [2523]

-smallest_flushed_log_index = std::min(smallest_flushed_log_index, inst->GetSmallestFlushedLogIndex());
+auto inst_index = inst->GetSmallestFlushedLogIndex();
+if (inst_index != INT64_MAX) {
+  smallest_flushed_log_index = std::min(smallest_flushed_log_index, inst_index);
+}
 
Suggestion importance[1-10]: 7

Why: Adding checks to ensure only valid log indices are considered improves the robustness and reliability of the method, although it addresses a less critical issue compared to the other suggestions.

7

@AlexStocks
Copy link
Collaborator

@CodiumAI-Agent /review

@CodiumAI-Agent
Copy link

PR Review 🔍

⏱️ Estimated effort to review [1-5]

4, due to the complexity of the changes involving snapshot management, log indexing, and integration with the raft protocol. The PR modifies critical parts of the system which require careful review to ensure correctness and stability.

🧪 Relevant tests

No

⚡ Possible issues

Possible Bug: The method PRaft::DoSnapshot uses a synchronous closure only when is_sync is true, but does not handle the asynchronous case properly. The method returns an uninitialized butil::Status which might lead to undefined behavior.

Possible Bug: The snapshot path parsing in PPosixFileSystemAdaptor::open assumes directory structure which might not be robust against changes or errors in path formatting.

🔒 Security concerns

No

Code feedback:
relevant filesrc/praft/praft.cc
suggestion      

Initialize butil::Status before returning in the asynchronous branch of PRaft::DoSnapshot. This avoids returning an uninitialized status which can lead to undefined behavior. [important]

relevant linereturn status;

relevant filesrc/praft/psnapshot.cc
suggestion      

Validate db_id after parsing from the snapshot path to ensure it is a valid database identifier before using it. This adds a layer of error checking that can prevent potential runtime errors. [important]

relevant lineassert(db_id >= 0);

relevant filesrc/praft/praft.cc
suggestion      

Consider handling the case where node_ is null in PRaft::GetLastLogIndex and PRaft::GetTerm by returning a default value or an error status, instead of just logging an error. This can prevent further undefined behavior in the system. [important]

relevant lineERROR("Node is not initialized");

relevant filesrc/praft/psnapshot.cc
suggestion      

Add error handling for the failure of snapshot_meta_memtable.save_to_file in PPosixFileSystemAdaptor::open. Logging the error is good, but the system should also handle the error appropriately, possibly by retrying or aborting the operation. [important]

relevant lineif (rc == 0) {

@AlexStocks
Copy link
Collaborator

@CodiumAI-Agent /improve

@CodiumAI-Agent
Copy link

PR Code Suggestions ✨

CategorySuggestion                                                                                                                                    Score
Possible bug
Add error handling for uninitialized db_id after parsing the snapshot path

Add error handling for the scenario where db_id remains uninitialized (-1) after parsing
the snapshot path, which could lead to incorrect behavior or crashes.

src/praft/psnapshot.cc [69]

-assert(db_id >= 0);
+if (db_id < 0) {
+    ERROR("Failed to parse db_id from snapshot path");
+    return -1;
+}
 
Suggestion importance[1-10]: 10

Why: Adding error handling for an uninitialized db_id is crucial to prevent potential crashes or incorrect behavior. This suggestion addresses a significant potential bug and improves the robustness of the code.

10
Initialize the self_snapshot_index properly before using it in DoSnapshot()

Ensure that the self_snapshot_index is properly initialized before calling DoSnapshot().
Currently, the code is marked with a @todo comment and uses a placeholder, which might
lead to incorrect behavior or runtime errors.

src/cmd_raft.cc [125-127]

-// @todo need to get self_snapshot_index
-// auto self_snapshot_index = PSTORE.GetBackend(client->GetCurrentDB())->GetStorage()->GetSmallestFlushedLogIndex();
-auto s = PRAFT.DoSnapshot();  // self_snapshot_index
+auto self_snapshot_index = PSTORE.GetBackend(client->GetCurrentDB())->GetStorage()->GetSmallestFlushedLogIndex();
+auto s = PRAFT.DoSnapshot(self_snapshot_index);
 
Suggestion importance[1-10]: 9

Why: This suggestion addresses a potential bug where self_snapshot_index is not initialized, which could lead to incorrect behavior or runtime errors. Proper initialization is crucial for the correct functioning of the snapshot process.

9
Enhancement
Use a dynamically retrieved last_log_index for setting snapshot metadata

Replace the hard-coded last_log_index with a dynamic value retrieved from the storage
system to ensure the snapshot metadata is accurate and up-to-date.

src/praft/praft.cc [84-85]

-auto last_log_index = 30000;  // @todo PSTORE.GetBackend(db_id)->GetStorage()->GetSmallestFlushedLogIndex();
+auto last_log_index = PSTORE.GetBackend(db_id)->GetStorage()->GetSmallestFlushedLogIndex();
 new_meta.set_last_included_index(last_log_index);
 
Suggestion importance[1-10]: 8

Why: This suggestion improves the accuracy and reliability of the snapshot metadata by using a dynamically retrieved last_log_index instead of a hard-coded value. This ensures that the snapshot reflects the current state of the system.

8
Performance
Optimize the loop for calculating the smallest flushed log index by adding an early exit condition

Optimize the calculation of smallest_flushed_log_index by early exiting the loop if the
smallest possible index is found, reducing unnecessary iterations.

src/storage/src/storage.cc [2522-2523]

 for (auto& inst : insts_) {
     smallest_flushed_log_index = std::min(smallest_flushed_log_index, inst->GetSmallestFlushedLogIndex());
+    if (smallest_flushed_log_index == 0) break;  // Early exit if smallest possible index is found
 }
 
Suggestion importance[1-10]: 6

Why: This suggestion offers a minor performance improvement by adding an early exit condition to the loop. While it optimizes the code, the impact is relatively small compared to other suggestions.

6

@@ -66,6 +79,14 @@ braft::FileAdaptor* PPosixFileSystemAdaptor::open(const std::string& path, int o
PSTORE.HandleTaskSpecificDB(tasks);
AddAllFiles(snapshot_path, &snapshot_meta_memtable, snapshot_path);

// update snapshot last log index and last_log_term
auto& new_meta = const_cast<braft::SnapshotMeta&>(snapshot_meta_memtable.meta());
auto last_log_index = 30000; // @todo PSTORE.GetBackend(db_id)->GetStorage()->GetSmallestFlushedLogIndex();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这行是不是忘换了

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯嗯 忘记换了

*/
// @todo GetSmallestFlushedLogIndex
uint64_t replay_point = PSTORE.GetBackend(db_id_)->GetStorage()->GetSmallestFlushedLogIndex();
node_->set_self_playback_point(replay_point);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node_->set_self_playback_point(replay_point+1)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里 set_self_playback_point 设置的 last_applied_index,所以就是 replay_point

src/praft/praft.cc Outdated Show resolved Hide resolved
src/praft/praft.cc Outdated Show resolved Hide resolved
src/praft/praft.cc Outdated Show resolved Hide resolved
@@ -137,6 +137,8 @@ class PRaft : public braft::StateMachine {
std::string GetGroupID() const;
braft::NodeStatus GetNodeStatus() const;
butil::Status GetListPeers(std::vector<braft::PeerId>* peers);
uint64_t GetTerm(uint64_t log_index);
uint64_t GetLastLogIndex(bool is_flush = false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如上所示,把 uint64_t 改为 LogIndex

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/praft/praft.cc Outdated Show resolved Hide resolved
src/praft/praft.cc Outdated Show resolved Hide resolved
src/praft/praft.cc Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Outside diff range, codebase verification and nitpick comments (1)
src/praft/psnapshot.cc (1)

验证失败: db_id 在快照元数据更新逻辑中未正确初始化。

  • src/praft/psnapshot.cc 文件中,db_id 被初始化为 -1,但没有其他赋值操作。
  • 这可能导致 assert(db_id >= 0); 断言失败,并引发潜在问题。

请检查并确保 db_id 在使用前被正确初始化。

Analysis chain

Line range hint 69-89:
验证快照元数据更新逻辑

确保 db_id 被正确初始化并用于更新快照元数据。

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that db_id is correctly initialized and used in snapshot metadata update.

# Test: Search for db_id initialization and usage. Expect: Proper initialization and usage in snapshot metadata update.
rg --type cpp -A 5 'db_id ='

Length of output: 1375


Script:

#!/bin/bash
# Description: Trace the assignment of db_id in src/praft/psnapshot.cc to ensure it is set to a valid value before usage.

# Search for assignments to db_id in src/praft/psnapshot.cc
rg 'db_id\s*=\s*' -A 5 src/praft/psnapshot.cc

Length of output: 212

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 6f9f6fb and e46d213.

Files selected for processing (8)
  • CMakeLists.txt (1 hunks)
  • cmake/braft.cmake (1 hunks)
  • pikiwidb.conf (2 hunks)
  • src/cmd_raft.cc (1 hunks)
  • src/praft/CMakeLists.txt (1 hunks)
  • src/praft/praft.cc (6 hunks)
  • src/praft/praft.h (3 hunks)
  • src/praft/psnapshot.cc (3 hunks)
Files skipped from review due to trivial changes (2)
  • CMakeLists.txt
  • cmake/braft.cmake
Additional comments not posted (14)
src/praft/CMakeLists.txt (1)

38-38: 更改已批准

添加 src/storage/include 目录是必要的,以便访问存储相关的头文件。

src/praft/psnapshot.cc (3)

14-14: 更改已批准

添加 braft/snapshot.h 是必要的,以便处理快照相关的功能。


18-18: 更改已批准

添加 pstd/pstd_string.h 是必要的,以便处理字符串相关的功能。


21-21: 更改已批准

添加 praft.h 是必要的,以便访问 PRaft 类的功能。

src/praft/praft.h (3)

21-21: 更改已批准

添加 storage/storage.h 是必要的,以便访问存储相关的功能。


141-142: 更改已批准

新方法 GetTermGetLastLogIndex 是必要的,以便检索日志索引和任期。


171-171: 更改已批准

新成员变量 is_node_first_start_up_ 是必要的,以跟踪节点启动状态。

pikiwidb.conf (2)

322-322: 确认减少 db-instance-num 的原因

db-instance-num 从 3 改为 1 可能会影响数据库性能。请确认这样做的原因。


346-346: 确认启用 RAFT 的影响

启用 use-raft 选项会对系统的分布式一致性产生重大影响。请确认所有相关配置已正确设置,并且系统已准备好启用 RAFT。

src/praft/praft.cc (5)

245-252: 验证 GetTerm 方法的错误处理

请确保 GetTerm 方法在节点未初始化时正确处理错误,并确认返回类型是否合适。


254-261: 验证 GetLastLogIndex 方法的错误处理

请确保 GetLastLogIndex 方法在节点未初始化时正确处理错误,并确认返回类型是否合适。


543-552: 验证 DoSnapshot 方法的重构

请确保 DoSnapshot 方法在处理同步和异步快照操作时正确处理所有情况。


657-687: 验证节点启动和快照加载的逻辑

请确保新逻辑正确区分正常启动和快照加载,并正确处理重放点。


12-12: 确认包含 braft/raft.h 的必要性

请确认 #include "braft/raft.h" 是否在本文件中有直接或间接的使用。

Comment on lines +31 to +51
int db_id = -1;

// parse snapshot path
butil::FilePath parse_snapshot_path(path);
std::vector<std::string> components;
bool is_find_db = false;
parse_snapshot_path.GetComponents(&components);
for (auto component : components) {
for (const auto& component : components) {
snapshot_path += component + "/";

if (is_find_db && pstd::String2int(component, &db_id)) {
is_find_db = false;
}

if (component.find("snapshot_") != std::string::npos) {
break;
} else if (component == "db") {
is_find_db = true;
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议:为快照路径解析逻辑添加单元测试

快照路径解析逻辑对于新功能是必要的。建议添加单元测试以覆盖潜在的边界情况。

需要我帮助创建快照路径解析逻辑的单元测试吗?

Comment on lines +125 to +127
auto self_snapshot_index = PSTORE.GetBackend(client->GetCurrentDB())->GetStorage()->GetSmallestFlushedLogIndex();
INFO("DoCmdSnapshot self_snapshot_index:{}", self_snapshot_index);
auto s = PRAFT.DoSnapshot(self_snapshot_index);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

更改已批准

检索并使用 self_snapshot_index 是必要的,以便处理新的快照功能。

需要帮助完成 self_snapshot_index 的实现吗?

注释的代码表明 self_snapshot_index 计划被使用,但尚未实现。需要我帮助完成这部分代码吗?

GIT_REPOSITORY "https://github.com/pikiwidb/braft.git"
GIT_TAG master
GIT_TAG v1.1.2-alpha1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改为 v1.1.2-alpha2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯嗯

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between e46d213 and 929a5db.

Files selected for processing (6)
  • cmake/braft.cmake (1 hunks)
  • pikiwidb.conf (1 hunks)
  • src/db.cc (3 hunks)
  • src/storage/src/log_index.h (1 hunks)
  • src/storage/src/redis.h (1 hunks)
  • src/storage/src/storage.cc (2 hunks)
Files skipped from review as they are similar to previous changes (1)
  • cmake/braft.cmake
Additional comments not posted (5)
src/db.cc (1)

97-103: 处理旧存储的逻辑

在打开新存储之前关闭并重置旧存储的处理逻辑是合理的。这确保了资源得到正确释放,避免了潜在的资源泄露问题。

src/storage/src/log_index.h (1)

137-137: 增强常量正确性

通过将 GetLastFlushIndex 方法返回类型改为常量引用,并将方法标记为 const,增强了代码的常量正确性。这有助于防止不必要的修改并可能提高性能,因为避免了不必要的数据复制。

pikiwidb.conf (1)

322-322: 调整数据库实例数量

db-instance-num 从 3 改为 1,这可能是为了简化配置或优化某些环境的性能。确保这一改变与系统需求一致,并且不会引入配置错误。

src/storage/src/redis.h (1)

113-113: 新增方法获取最小刷新日志索引

Redis 类中新增 GetSmallestFlushedLogIndex 方法,用于获取最小刷新日志索引。这对于确保数据一致性和优化日志管理非常重要。确保该方法正确实现,并与现有功能良好集成。

src/storage/src/storage.cc (1)

2519-2526: 审查新增的 GetSmallestFlushedLogIndex 方法

此方法用于计算最小的已刷新日志索引。代码逻辑看起来正确,使用了 INT64_MAX 作为初始值,并在所有实例中找到最小值。这种方法是有效的,因为它确保了即使在极端情况下也能找到正确的最小日志索引。

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Outside diff range, codebase verification and nitpick comments (1)
src/praft/praft.cc (1)

Line range hint 113-153: 优化初始化逻辑和错误处理

函数 PRaft::Init 在初始化过程中多次调用 server_.reset() 来处理错误状态。建议将这些重复的清理逻辑封装到一个单独的函数中,以提高代码的可维护性和可读性。

- server_.reset();
+ resetServerAndNode();

并在类中添加一个新的成员函数来处理重置逻辑。

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 929a5db and 8e56daf.

Files selected for processing (2)
  • src/praft/praft.cc (5 hunks)
  • src/storage/include/storage/storage.h (1 hunks)
Additional comments not posted (1)
src/storage/include/storage/storage.h (1)

1126-1127: 新增方法审查:GetSmallestFlushedLogIndex

此方法似乎用于获取最小的已刷新日志索引,这对于区分正常节点启动和快照加载非常关键。建议添加适当的文档注释来描述此方法的用途、预期行为以及任何可能的边界条件或异常情况。

Comment on lines +657 to +679
if (is_node_first_start_up_) {
// get replay point of one db's
/*
1. When a node starts normally, because all memory data is flushed to disks and
snapshots are truncated to the latest, the flush-index and apply-index are the
same when the node starts, so the maximum log index should be obtained.
2. When a node is improperly shut down and restarted, the minimum flush-index should
be obtained as the starting point for fault recovery.
*/
uint64_t replay_point = PSTORE.GetBackend(db_id_)->GetStorage()->GetSmallestFlushedLogIndex();
node_->set_self_playback_point(replay_point);
is_node_first_start_up_ = false;
INFO("set replay_point: {}", replay_point);

/*
If a node has just joined the cluster and does not have any data,
it does not load the local snapshot at startup. Therefore,
LoadDBFromCheckPoint is required after loading the snapshot from the leader.
*/
if (GetLastLogIndex() != 0) {
return 0;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

优化快照加载逻辑

函数 PRaft::on_snapshot_load 使用布尔标志 is_node_first_start_up_ 来区分首次启动。考虑使用状态枚举而不是布尔值,以清晰地表示节点的不同状态。

- bool is_node_first_start_up_;
+ enum class NodeStartupState { First, Subsequent };
+ NodeStartupState startup_state_;

在逻辑中相应地替换这个状态的检查和设置。

Committable suggestion was skipped due to low confidence.

Comment on lines +244 to +260
storage::LogIndex PRaft::GetTerm(uint64_t log_index) {
if (!node_) {
ERROR("Node is not initialized");
return 0;
}

return node_->get_term(log_index);
}

storage::LogIndex PRaft::GetLastLogIndex(bool is_flush) {
if (!node_) {
ERROR("Node is not initialized");
return 0;
}

return node_->get_last_log_index(is_flush);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

检查默认返回值和错误处理

函数 PRaft::GetTermPRaft::GetLastLogIndex 在节点未初始化时返回 0。考虑使用一个明确的错误代码或异常处理机制,而不是返回一个可能会被误解的魔术数字。

- return 0;
+ throw std::runtime_error("Node is not initialized");
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
storage::LogIndex PRaft::GetTerm(uint64_t log_index) {
if (!node_) {
ERROR("Node is not initialized");
return 0;
}
return node_->get_term(log_index);
}
storage::LogIndex PRaft::GetLastLogIndex(bool is_flush) {
if (!node_) {
ERROR("Node is not initialized");
return 0;
}
return node_->get_last_log_index(is_flush);
}
storage::LogIndex PRaft::GetTerm(uint64_t log_index) {
if (!node_) {
ERROR("Node is not initialized");
throw std::runtime_error("Node is not initialized");
}
return node_->get_term(log_index);
}
storage::LogIndex PRaft::GetLastLogIndex(bool is_flush) {
if (!node_) {
ERROR("Node is not initialized");
throw std::runtime_error("Node is not initialized");
}
return node_->get_last_log_index(is_flush);
}

@@ -66,6 +79,14 @@ braft::FileAdaptor* PPosixFileSystemAdaptor::open(const std::string& path, int o
PSTORE.HandleTaskSpecificDB(tasks);
AddAllFiles(snapshot_path, &snapshot_meta_memtable, snapshot_path);

// update snapshot last log index and last_log_term
auto& new_meta = const_cast<braft::SnapshotMeta&>(snapshot_meta_memtable.meta());
auto last_log_index = PSTORE.GetBackend(db_id)->GetStorage()->GetSmallestFlushedLogIndex();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经维护了一个 last_flush_index_ ,这里使用 GetLastFlushIndex 可以少很多计算。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯嗯,已经用了

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8e56daf and deedd3c.

Files selected for processing (5)
  • cmake/braft.cmake (1 hunks)
  • cmake/openssl.cmake (1 hunks)
  • src/CMakeLists.txt (1 hunks)
  • src/db.cc (3 hunks)
  • src/praft/CMakeLists.txt (1 hunks)
Files skipped from review due to trivial changes (2)
  • cmake/openssl.cmake
  • src/CMakeLists.txt
Files skipped from review as they are similar to previous changes (2)
  • cmake/braft.cmake
  • src/praft/CMakeLists.txt

Comment on lines +97 to +103
// close the old storage, then open the new storage
std::unique_ptr<storage::Storage> old_storage = std::move(storage_);
if (old_storage != nullptr) {
old_storage->Close();
old_storage.reset();
}
storage_ = std::make_unique<storage::Storage>();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议:在 DB::LoadDBFromCheckpoint 中使用 Lambda 表达式。

为了保持一致性,建议将 storage_options.do_snapshot_function 中的 std::bind 替换为 Lambda 表达式。

- storage_options.do_snapshot_function =
-     std::bind(&pikiwidb::PRaft::DoSnapshot, &pikiwidb::PRAFT, std::placeholders::_1, std::placeholders::_2);
+ storage_options.do_snapshot_function = [raft = &pikiwidb::PRAFT](auto&& self_snapshot_index, auto&& is_sync) {
+   raft->DoSnapshot(std::forward<decltype(self_snapshot_index)>(self_snapshot_index),
+                    std::forward<decltype(is_sync)>(is_sync));
+ };

Committable suggestion was skipped due to low confidence.

Comment on lines +40 to +43
storage_options.do_snapshot_function = [raft = &pikiwidb::PRAFT](auto&& self_snapshot_index, auto&& is_sync) {
raft->DoSnapshot(std::forward<decltype(self_snapshot_index)>(self_snapshot_index),
std::forward<decltype(is_sync)>(is_sync));
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议:在 Lambda 表达式中使用值捕获。

使用引用捕获 raft 可能导致潜在的生命周期问题,尤其是在 raft 对象的生命周期可能比 storage_options 短的场景中。建议改为值捕获或确保 raft 的生命周期总是足够长。

- storage_options.do_snapshot_function = [raft = &pikiwidb::PRAFT](auto&& self_snapshot_index, auto&& is_sync) {
+ storage_options.do_snapshot_function = [raft = pikiwidb::PRAFT](auto&& self_snapshot_index, auto&& is_sync) {
  raft->DoSnapshot(std::forward<decltype(self_snapshot_index)>(self_snapshot_index),
                   std::forward<decltype(is_sync)>(is_sync));
};
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
storage_options.do_snapshot_function = [raft = &pikiwidb::PRAFT](auto&& self_snapshot_index, auto&& is_sync) {
raft->DoSnapshot(std::forward<decltype(self_snapshot_index)>(self_snapshot_index),
std::forward<decltype(is_sync)>(is_sync));
};
storage_options.do_snapshot_function = [raft = pikiwidb::PRAFT](auto&& self_snapshot_index, auto&& is_sync) {
raft->DoSnapshot(std::forward<decltype(self_snapshot_index)>(self_snapshot_index),
std::forward<decltype(is_sync)>(is_sync));
};

@AlexStocks AlexStocks merged commit 2848690 into OpenAtomFoundation:unstable Jul 20, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✏️ Feature New feature or request 🧹 Updates This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants