Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多次重启造成数据丢失? #280

Open
zouyonghao opened this issue Apr 26, 2021 · 1 comment
Open

多次重启造成数据丢失? #280

zouyonghao opened this issue Apr 26, 2021 · 1 comment

Comments

@zouyonghao
Copy link
Contributor

对不同节点多次重启,其中一个节点有一个报错如下:

F0417 12:13:43.014115 108347 /home/zyh/braft/src/braft/log_manager.cpp:310] Can't truncate logs before _applied_id=1, last_log_kept=0
F0417 12:13:43.043589 108347 /home/zyh/braft/src/braft/configuration_manager.cpp:24] Check failed: false. Did you forget to call truncate_suffix before  the last log index goes back
#0 0x000000b76fd0 logging::LogMessage::~LogMessage()
#1 0x000000b56111 braft::ConfigurationManager::add()
#2 0x00000092cd43 braft::LogManager::append_entries()
#3 0x0000009cc7d4 braft::NodeImpl::handle_append_entries_request()
#4 0x000000a406e1 braft::RaftServiceImpl::append_entries()
#5 0x0000008907ca braft::RaftService::CallMethod()
#6 0x000000bfb833 brpc::policy::ProcessRpcRequest()
#7 0x000000bef587 brpc::ProcessInputMessage()
#8 0x000000bf0442 brpc::InputMessenger::OnNewMessages()
#9 0x000000cb3dfd brpc::Socket::ProcessEvent()
#10 0x000000bb575f bthread::TaskGroup::task_runner()
#11 0x000000d4b051 bthread_make_fcontext

F0417 12:13:43.043766 108347 /home/zyh/braft/src/braft/log.cpp:720] There's gap between appending entries and _last_log_index path: ./data2/log

还有一个节点类似产生了 #279 的日志

结果测试框架生成的类似Jepsen的operation_log如下:

[{:process 1362142777, :type :invoke, :f :cas, :value [458004641 3039794309 ]},
{:process 1362142777, :type :fail, :f :cas, :value [458004641 3039794309 ]},
{:process 1353750073, :type :invoke, :f :cas, :value [1772247958 1358966402 ]},
{:process 1353750073, :type :fail, :f :cas, :value [1772247958 1358966402 ]},
{:process 1353750073, :type :invoke, :f :write, :value 3876838356},
{:process 1353750073, :type :ok, :f :write, :value 3876838356},
{:process 1362142777, :type :invoke, :f :read, :value nil},
{:process 1353750073, :type :invoke, :f :write, :value 395967576},
{:process 1353750073, :type :ok, :f :write, :value 395967576},
{:process 1362142777, :type :ok, :f :read, :value nil},
{:process 1362142777, :type :invoke, :f :write, :value 450086335},
{:process 1362142777, :type :fail, :f :write, :value 450086335},
{:process 1362142777, :type :invoke, :f :read, :value nil},
{:process 1362142777, :type :ok, :f :read, :value 0},
]

可以看到最后一个read结果为0,但之前是有写成功的

@PFZheng
Copy link
Collaborator

PFZheng commented Apr 27, 2021

“_applied_id=1, last_log_kept=0” 这个意味着要干掉所有的日志,看起来像是脑裂的 case。这个问题需要提供几个 sync 参数和各节点详细的日志才能判断。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants