多次重启造成数据丢失？ #280

zouyonghao · 2021-04-26T10:04:50Z

对不同节点多次重启，其中一个节点有一个报错如下：

F0417 12:13:43.014115 108347 /home/zyh/braft/src/braft/log_manager.cpp:310] Can't truncate logs before _applied_id=1, last_log_kept=0
F0417 12:13:43.043589 108347 /home/zyh/braft/src/braft/configuration_manager.cpp:24] Check failed: false. Did you forget to call truncate_suffix before  the last log index goes back
#0 0x000000b76fd0 logging::LogMessage::~LogMessage()
#1 0x000000b56111 braft::ConfigurationManager::add()
#2 0x00000092cd43 braft::LogManager::append_entries()
#3 0x0000009cc7d4 braft::NodeImpl::handle_append_entries_request()
#4 0x000000a406e1 braft::RaftServiceImpl::append_entries()
#5 0x0000008907ca braft::RaftService::CallMethod()
#6 0x000000bfb833 brpc::policy::ProcessRpcRequest()
#7 0x000000bef587 brpc::ProcessInputMessage()
#8 0x000000bf0442 brpc::InputMessenger::OnNewMessages()
#9 0x000000cb3dfd brpc::Socket::ProcessEvent()
#10 0x000000bb575f bthread::TaskGroup::task_runner()
#11 0x000000d4b051 bthread_make_fcontext

F0417 12:13:43.043766 108347 /home/zyh/braft/src/braft/log.cpp:720] There's gap between appending entries and _last_log_index path: ./data2/log

还有一个节点类似产生了 #279 的日志

结果测试框架生成的类似Jepsen的operation_log如下：

[{:process 1362142777, :type :invoke, :f :cas, :value [458004641 3039794309 ]},
{:process 1362142777, :type :fail, :f :cas, :value [458004641 3039794309 ]},
{:process 1353750073, :type :invoke, :f :cas, :value [1772247958 1358966402 ]},
{:process 1353750073, :type :fail, :f :cas, :value [1772247958 1358966402 ]},
{:process 1353750073, :type :invoke, :f :write, :value 3876838356},
{:process 1353750073, :type :ok, :f :write, :value 3876838356},
{:process 1362142777, :type :invoke, :f :read, :value nil},
{:process 1353750073, :type :invoke, :f :write, :value 395967576},
{:process 1353750073, :type :ok, :f :write, :value 395967576},
{:process 1362142777, :type :ok, :f :read, :value nil},
{:process 1362142777, :type :invoke, :f :write, :value 450086335},
{:process 1362142777, :type :fail, :f :write, :value 450086335},
{:process 1362142777, :type :invoke, :f :read, :value nil},
{:process 1362142777, :type :ok, :f :read, :value 0},
]

可以看到最后一个read结果为0，但之前是有写成功的

The text was updated successfully, but these errors were encountered:

PFZheng · 2021-04-27T01:04:14Z

“_applied_id=1, last_log_kept=0” 这个意味着要干掉所有的日志，看起来像是脑裂的 case。这个问题需要提供几个 sync 参数和各节点详细的日志才能判断。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

多次重启造成数据丢失？ #280

多次重启造成数据丢失？ #280

zouyonghao commented Apr 26, 2021

PFZheng commented Apr 27, 2021

多次重启造成数据丢失？ #280

多次重启造成数据丢失？ #280

Comments

zouyonghao commented Apr 26, 2021

PFZheng commented Apr 27, 2021