bug？ shake采用主备模式，主备切换期间，源的数据变动会丢失 #56

shiftyman · 2018-09-30T08:56:00Z

1.环境信息：
源：复制集
目标：单mongod
shake：双实例，主备模式
mongodb3.2

2.场景：
当kill-9将主杀掉，几秒后从会变成主，就在这个时间间隙中，往源insert一条记录A，目标找不到这条记录。再次向源insert一条记录B，目标能同步B，但是A依旧缺失。当将shake全部停止并重启（或许需要重置checkpoint），目标能重新得到A。

3.两各shake配置信息：
（1）
mongo_urls = mongodb://172.17.160.241:27001,172.17.160.241:27002,172.17.160.241:27003
collector.id = mongoshake2
checkpoint.interval = 5000
http_profile = 9100
system_profile = 9200
log_level = info
log_file = collector.log
log_buffer = false
filter.namespace.black =
filter.namespace.white =
oplog.gids =
shard_key = collection
syncer.reader.buffer_time = 3
worker = 3
worker.batch_queue_size = 64
worker.oplog_compressor = none
tunnel = direct
tunnel.address = mongodb://172.17.160.242:27001
context.storage = database
context.address = ckpt_default
context.start_position = 2000-01-01T00:00:01Z
master_quorum = true
replayer.dml_only = true
replayer.executor = 3
replayer.executor.upsert = false
replayer.executor.insert_on_dup_update = false
replayer.conflict_write_to = none
replayer.durable = true

（2）
mongo_urls = mongodb://172.17.160.241:27001,172.17.160.241:27002,172.17.160.241:27003
collector.id = mongoshake
checkpoint.interval = 5000
http_profile = 9101
system_profile = 9201
log_level = info
log_file = collector2.log
log_buffer = false
filter.namespace.black =
filter.namespace.white =
oplog.gids =
shard_key = collection
syncer.reader.buffer_time = 3
worker = 1
worker.batch_queue_size = 64
worker.oplog_compressor = none
tunnel = direct
tunnel.address = mongodb://172.17.160.242:27001
context.storage = database
context.address = ckpt_default
context.start_position = 2000-01-01T00:00:01Z
master_quorum = true
replayer.dml_only = true
replayer.executor = 1
replayer.executor.upsert = true
replayer.executor.insert_on_dup_update = false
replayer.conflict_write_to = none
replayer.durable = true

vinllen · 2018-10-01T00:24:31Z

你主备是跑在一台机器上？shake会记录checkpoint，重启后会重新从之前的checkpoint拉取，机器时间是一致的？

shiftyman · 2018-10-02T13:10:09Z

主备在同一台跑的，所以时间是一致的。

其实源存储的checkpoint是相对于源来说的吧，表示源已经被拉取后的最后的时间点，所以下次从相对于源的oplog的这个checkpoint时间点继续拉取，理应不会有漏掉才对，不管各个组件的时间是否一致，对不对？

但是现在就是丢了

vinllen · 2018-10-05T00:15:21Z

是否有错误日志

shiftyman · 2018-10-08T01:48:42Z

日志没发现有错误提示，你可以模拟我说的场景试试

vinllen · 2018-10-08T03:20:46Z

给一下shake和Mongodb的版本号

shiftyman · 2018-10-08T08:47:05Z

shake是master分支，是1.4.3？
Mongodb是3.2.11版本，用的副本集

vinllen · 2018-10-09T09:24:45Z

启用upsert和insert_on_dup_update开关可以解决。这个原因是因为插入是bulkWriter，多条聚合后插入，如果有1条有重复数据将导致这一批数据都插入失败，启用开关可以解决这个问题。
replayer.executor.upsert = true
replayer.executor.insert_on_dup_update = true

vinllen closed this as completed Oct 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug？ shake采用主备模式，主备切换期间，源的数据变动会丢失 #56

bug？ shake采用主备模式，主备切换期间，源的数据变动会丢失 #56

shiftyman commented Sep 30, 2018

vinllen commented Oct 1, 2018

shiftyman commented Oct 2, 2018

vinllen commented Oct 5, 2018

shiftyman commented Oct 8, 2018

vinllen commented Oct 8, 2018

shiftyman commented Oct 8, 2018

vinllen commented Oct 9, 2018 •

edited

bug？ shake采用主备模式，主备切换期间，源的数据变动会丢失 #56

bug？ shake采用主备模式，主备切换期间，源的数据变动会丢失 #56

Comments

shiftyman commented Sep 30, 2018

vinllen commented Oct 1, 2018

shiftyman commented Oct 2, 2018

vinllen commented Oct 5, 2018

shiftyman commented Oct 8, 2018

vinllen commented Oct 8, 2018

shiftyman commented Oct 8, 2018

vinllen commented Oct 9, 2018 • edited

vinllen commented Oct 9, 2018 •

edited