-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support log length based snapshot #318
base: master
Are you sure you want to change the base?
Conversation
在Closure里处理不是很优雅。这里是否可以抽象一个通用的机制,卡住未来的某个log index做snapshot(将log index提交给fsm caller的on_apply线程),这个未来可以扩展到一些额外的用途,例如做副本间的一致性校验(相同的log index的snapshot数据应该是等价的)。 然后具体到本issue,只需要在snapshot开始/完成时计算下一次的卡位点即可。 NodeImpl的参数上,可以增加一个snapshot方式的参数,interval/log length选一个。log length需要改一个更贴切的名字 |
卡住未来的某个 log index 做 snapshot,这 个 log index 一定要求是一个精确值吗?比如卡位点在 10, 那么 index 为 10 的 log apply之后就要做snapshot? “(将log index提交给fsm caller的 on_apply 线程)" 这句话不是很理解,fsm caller 会在 do_commit 的时候会调用用户自己实现的on_apply,所以是等待用户on_apply 卡点 log 之后,开始do_snapshot吗? |
现有的 snapshot 流程是由 int FSMCaller::on_snapshot_save(SaveSnapshotClosure* done) 触发的: braft/src/braft/fsm_caller.cpp Line 321 in f5fbf92
在这个函数里会提交一个task到execution queue,和调用用户回调on_apply函数是一个线程。在同一个execution queue里做检查几乎没有冗余的开销,不需要像现在提交的这个patch检查较重。例如,一种改法是将在void FSMCaller::do_committed(int64_t committed_index)中将遍历日志的迭代器做一下修改,分成snapshot前和snapshot后两个,就可以在中间插入一个snapshot任务: braft/src/braft/fsm_caller.cpp Line 279 in f5fbf92
这里的实现也有两种方式:
考虑到第二种做法未来可以有其它可扩展的用途,所以个人认为第二种做法更好。 |
9c5c3db
to
6c57200
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
参见评论,另外,需要补充一些单测
src/braft/fsm_caller.cpp
Outdated
if(_snapshot_log_interval > 0 && | ||
last_index == break_index){ | ||
// Achieve break point to trigger snapshot. | ||
_node->snapshot(NULL, true/* in_place*/); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
出现 error 的情况下不应该触发 snapshot,这时候不能确定状态机是否正确,snapshot可能做进去错误的数据
src/braft/node.cpp
Outdated
@@ -363,6 +363,10 @@ int NodeImpl::init_fsm_caller(const LogId& bootstrap_id) { | |||
fsm_caller_options.closure_queue = _closure_queue; | |||
fsm_caller_options.node = this; | |||
fsm_caller_options.bootstrap_id = bootstrap_id; | |||
if (_options.snapshot_trigger_type == LOG_INTERVAL) { | |||
assert(_options.snapshot_log_interval > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert -> CHECK
src/braft/node.cpp
Outdated
@@ -498,7 +502,10 @@ int NodeImpl::init(const NodeOptions& options) { | |||
CHECK_EQ(0, _vote_timer.init(this, options.election_timeout_ms + options.max_clock_drift_ms)); | |||
CHECK_EQ(0, _election_timer.init(this, options.election_timeout_ms)); | |||
CHECK_EQ(0, _stepdown_timer.init(this, options.election_timeout_ms)); | |||
CHECK_EQ(0, _snapshot_timer.init(this, options.snapshot_interval_s * 1000)); | |||
if (options.snapshot_trigger_type == TIMER) { | |||
assert(options.snapshot_interval_s > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert -> CHECK
src/braft/raft.h
Outdated
@@ -474,13 +474,22 @@ struct LeaderLeaseStatus { | |||
int64_t lease_epoch; | |||
}; | |||
|
|||
enum SnapshotTriggerType{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enum SnapshotTriggerType {
src/braft/raft.h
Outdated
// Trigger saving snapshot according to timer. | ||
TIMER = 2, | ||
// Trigger saving snapshot according to log interval. | ||
LOG_INTERVAL = 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些enum变量应该加上合适的前缀,以和其他变量区分,便于理解含义
src/braft/snapshot_executor.cpp
Outdated
run_closure_in_bthread(snapshot_save_done, _usercode_in_pthread); | ||
} | ||
|
||
if (in_place){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个分支看着没差别?
已经增加相关测试用例 |
Support log length based snapshot