Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug? shake采用主备模式,主备切换期间,源的数据变动会丢失 #56

Closed
shiftyman opened this issue Sep 30, 2018 · 7 comments

Comments

@shiftyman
Copy link

1.环境信息:
源:复制集
目标:单mongod
shake:双实例,主备模式
mongodb3.2

2.场景:
当kill-9将主杀掉,几秒后从会变成主,就在这个时间间隙中,往源insert一条记录A,目标找不到这条记录。再次向源insert一条记录B,目标能同步B,但是A依旧缺失。当将shake全部停止并重启(或许需要重置checkpoint),目标能重新得到A。

3.两各shake配置信息:
(1)
mongo_urls = mongodb://172.17.160.241:27001,172.17.160.241:27002,172.17.160.241:27003
collector.id = mongoshake2
checkpoint.interval = 5000
http_profile = 9100
system_profile = 9200
log_level = info
log_file = collector.log
log_buffer = false
filter.namespace.black =
filter.namespace.white =
oplog.gids =
shard_key = collection
syncer.reader.buffer_time = 3
worker = 3
worker.batch_queue_size = 64
worker.oplog_compressor = none
tunnel = direct
tunnel.address = mongodb://172.17.160.242:27001
context.storage = database
context.address = ckpt_default
context.start_position = 2000-01-01T00:00:01Z
master_quorum = true
replayer.dml_only = true
replayer.executor = 3
replayer.executor.upsert = false
replayer.executor.insert_on_dup_update = false
replayer.conflict_write_to = none
replayer.durable = true

(2)
mongo_urls = mongodb://172.17.160.241:27001,172.17.160.241:27002,172.17.160.241:27003
collector.id = mongoshake
checkpoint.interval = 5000
http_profile = 9101
system_profile = 9201
log_level = info
log_file = collector2.log
log_buffer = false
filter.namespace.black =
filter.namespace.white =
oplog.gids =
shard_key = collection
syncer.reader.buffer_time = 3
worker = 1
worker.batch_queue_size = 64
worker.oplog_compressor = none
tunnel = direct
tunnel.address = mongodb://172.17.160.242:27001
context.storage = database
context.address = ckpt_default
context.start_position = 2000-01-01T00:00:01Z
master_quorum = true
replayer.dml_only = true
replayer.executor = 1
replayer.executor.upsert = true
replayer.executor.insert_on_dup_update = false
replayer.conflict_write_to = none
replayer.durable = true

@vinllen
Copy link
Collaborator

vinllen commented Oct 1, 2018

你主备是跑在一台机器上?shake会记录checkpoint,重启后会重新从之前的checkpoint拉取,机器时间是一致的?

@shiftyman
Copy link
Author

主备在同一台跑的,所以时间是一致的。

其实源存储的checkpoint是相对于源来说的吧,表示源已经被拉取后的最后的时间点,所以下次从相对于源的oplog的这个checkpoint时间点继续拉取,理应不会有漏掉才对,不管各个组件的时间是否一致,对不对?

但是现在就是丢了

@vinllen
Copy link
Collaborator

vinllen commented Oct 5, 2018

是否有错误日志

@shiftyman
Copy link
Author

日志没发现有错误提示,你可以模拟我说的场景试试

@vinllen
Copy link
Collaborator

vinllen commented Oct 8, 2018

给一下shake和Mongodb的版本号

@shiftyman
Copy link
Author

shake是master分支,是1.4.3?
Mongodb是3.2.11版本,用的副本集

@vinllen
Copy link
Collaborator

vinllen commented Oct 9, 2018

启用upsert和insert_on_dup_update开关可以解决。这个原因是因为插入是bulkWriter,多条聚合后插入,如果有1条有重复数据将导致这一批数据都插入失败,启用开关可以解决这个问题。
replayer.executor.upsert = true
replayer.executor.insert_on_dup_update = true

@vinllen vinllen closed this as completed Oct 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants