Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

主从同步失败 #70

Closed
DearEirs opened this issue Feb 8, 2021 · 5 comments
Closed

主从同步失败 #70

DearEirs opened this issue Feb 8, 2021 · 5 comments

Comments

@DearEirs
Copy link

DearEirs commented Feb 8, 2021

目前集群是一个6master6slave的集群, 但是创建完集群后发现部分节点主从同步失败了. 报错如下:

slave 节点看到的Replication信息

tidb3:30011> info Replication
role:slave
master_host:192.168.30.14
master_port:30016
master_link_status:down
master_last_io_seconds_ago:676
master_sync_in_progress:0
slave_repl_offset:93507108
master_link_down_since_seconds:676
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:93507108
rocksdb0_master:ip=192.168.30.14,port=30016,src_store_id=0,state=error,binlog_pos=9390342,lag=1612767803,error=store:0 incrsync master bad return:-ERR invalid binlogPos,storeId:0,master firstPos:22703562,slave binlogPos:9390342,lastFlushBinlogId:0
rocksdb1_master:ip=192.168.30.14,port=30016,src_store_id=1,state=error,binlog_pos=9234531,lag=1612767803,error=store:1 incrsync master bad return:-ERR invalid binlogPos,storeId:1,master firstPos:22529194,slave binlogPos:9234531,lastFlushBinlogId:0
rocksdb2_master:ip=192.168.30.14,port=30016,src_store_id=2,state=error,binlog_pos=9322788,lag=1612767803,error=store:2 incrsync master bad return:-ERR invalid binlogPos,storeId:2,master firstPos:22532273,slave binlogPos:9322788,lastFlushBinlogId:0
rocksdb3_master:ip=192.168.30.14,port=30016,src_store_id=3,state=error,binlog_pos=9322274,lag=1612767803,error=store:3 incrsync master bad return:-ERR invalid binlogPos,storeId:3,master firstPos:22535760,slave binlogPos:9322274,lastFlushBinlogId:0
rocksdb4_master:ip=192.168.30.14,port=30016,src_store_id=4,state=error,binlog_pos=9316453,lag=1612767803,error=store:4 incrsync master bad return:-ERR invalid binlogPos,storeId:4,master firstPos:22532519,slave binlogPos:9316453,lastFlushBinlogId:0
rocksdb5_master:ip=192.168.30.14,port=30016,src_store_id=5,state=error,binlog_pos=9292681,lag=1612767803,error=store:5 incrsync master bad return:-ERR invalid binlogPos,storeId:5,master firstPos:22539063,slave binlogPos:9292681,lastFlushBinlogId:0
rocksdb6_master:ip=192.168.30.14,port=30016,src_store_id=6,state=error,binlog_pos=9445644,lag=1612767803,error=store:6 incrsync master bad return:-ERR invalid binlogPos,storeId:6,master firstPos:22698010,slave binlogPos:9445644,lastFlushBinlogId:0
rocksdb7_master:ip=192.168.30.14,port=30016,src_store_id=7,state=error,binlog_pos=9386296,lag=1612767803,error=store:7 incrsync master bad return:-ERR invalid binlogPos,storeId:7,master firstPos:22711204,slave binlogPos:9386296,lastFlushBinlogId:0
rocksdb8_master:ip=192.168.30.14,port=30016,src_store_id=8,state=error,binlog_pos=9375383,lag=1612767803,error=store:8 incrsync master bad return:-ERR invalid binlogPos,storeId:8,master firstPos:22701614,slave binlogPos:9375383,lastFlushBinlogId:0
rocksdb9_master:ip=192.168.30.14,port=30016,src_store_id=9,state=error,binlog_pos=9420716,lag=1612767803,error=store:9 incrsync master bad return:-ERR invalid binlogPos,storeId:9,master firstPos:22701503,slave binlogPos:9420716,lastFlushBinlogId:0

该节点对应的日志报错信息

I0208 15:08:14.730785 4091 spov.cpp:344] store:8 reconn with:192.168.30.14,30016,8
W0208 15:08:14.732232 4091 spov.cpp:356] store:8 incrsync master bad return:-ERR invalid binlogPos,storeId:8,master firstPos:22701614,slave binlogPos:9375383,lastFlushBinlogId:0
I0208 15:08:15.953619 4092 spov.cpp:344] store:6 reconn with:192.168.30.14,30016,6
W0208 15:08:15.955471 4092 spov.cpp:356] store:6 incrsync master bad return:-ERR invalid binlogPos,storeId:6,master firstPos:22698010,slave binlogPos:9445644,lastFlushBinlogId:0
I0208 15:08:17.015657 4091 spov.cpp:344] store:0 reconn with:192.168.30.14,30016,0
I0208 15:08:17.015720 4092 spov.cpp:344] store:1 reconn with:192.168.30.14,30016,1
W0208 15:08:17.017359 4092 spov.cpp:356] store:1 incrsync master bad return:-ERR invalid binlogPos,storeId:1,master firstPos:22529194,slave binlogPos:9234531,lastFlushBinlogId:0
W0208 15:08:17.017401 4091 spov.cpp:356] store:0 incrsync master bad return:-ERR invalid binlogPos,storeId:0,master firstPos:22703562,slave binlogPos:9390342,lastFlushBinlogId:0
I0208 15:08:17.106743 4092 spov.cpp:344] store:5 reconn with:192.168.30.14,30016,5
W0208 15:08:17.108459 4092 spov.cpp:356] store:5 incrsync master bad return:-ERR invalid binlogPos,storeId:5,master firstPos:22539063,slave binlogPos:9292681,lastFlushBinlogId:0
I0208 15:08:17.360031 4091 spov.cpp:344] store:4 reconn with:192.168.30.14,30016,4
W0208 15:08:17.361721 4091 spov.cpp:356] store:4 incrsync master bad return:-ERR invalid binlogPos,storeId:4,master firstPos:22532519,slave binlogPos:9316453,lastFlushBinlogId:0
I0208 15:08:17.501593 4092 spov.cpp:344] store:2 reconn with:192.168.30.14,30016,2
W0208 15:08:17.503087 4092 spov.cpp:356] store:2 incrsync master bad return:-ERR invalid binlogPos,storeId:2,master firstPos:22532273,slave binlogPos:9322788,lastFlushBinlogId:0
I0208 15:08:17.542035 4091 spov.cpp:344] store:3 reconn with:192.168.30.14,30016,3
W0208 15:08:17.543447 4091 spov.cpp:356] store:3 incrsync master bad return:-ERR invalid binlogPos,storeId:3,master firstPos:22535760,slave binlogPos:9322274,lastFlushBinlogId:0
I0208 15:08:18.148780 4092 spov.cpp:344] store:7 reconn with:192.168.30.14,30016,7
W0208 15:08:18.150100 4092 spov.cpp:356] store:7 incrsync master bad return:-ERR invalid binlogPos,storeId:7,master firstPos:22711204,slave binlogPos:9386296,lastFlushBinlogId:0
I0208 15:08:18.906631 4091 spov.cpp:344] store:9 reconn with:192.168.30.14,30016,9
W0208 15:08:18.908128 4091 spov.cpp:356] store:9 incrsync master bad return:-ERR invalid binlogPos,storeId:9,master firstPos:22701503,slave binlogPos:9420716,lastFlushBinlogId:0
W0208 15:08:21.040879 3845 rocks_kvttlcompactfilter.cpp:94] The currentTs is 0, the kvttlcompaction would do nothing
I0208 15:08:24.739842 4092 spov.cpp:344] store:8 reconn with:192.168.30.14,30016,8

@DearEirs
Copy link
Author

DearEirs commented Feb 8, 2021

尝试过移除故障slave节点后再添加新的slave节点, 但还是会有同样的错误.
操作步骤:

  1. 关闭slave进程
  2. cluster forget slave_id把节点移除集群
  3. 删除slave节点的数据目录.(data和dump)
  4. 开启slave进程
  5. cluster meet把节点添加到集群
  6. cluster replicate 把节点设置为同一个master节点的slave

@yanghaojmvp
Copy link

@TendisDev 大佬,是否知道主从失败的原因呢和解决方式呢

@TendisDev
Copy link
Collaborator

当mater没有slave节点,binlog会自动清理
上述情况,如果slave脱离master过长时间重新连接,就会出现binlog不存在的错误

可以把binlog保存的相关参数变大,详细可看下文档http://tendis.cn/#/Tendisplus/%E8%BF%90%E7%BB%B4/configuration
minbinlogkeepsec
maxbinlogkeepnum

@DearEirs
Copy link
Author

@TendisDev 那对于已经有部分master节点丢失binlog, 如何在尽量避免影响原有集群的情况下把节点恢复?

@TendisDev
Copy link
Collaborator

重建slave,方式跟redis一致

TendisDev added a commit that referenced this issue Apr 21, 2021
fix some bugs
fix some bugs:
1. add check for changing ReplManager::_logRecycStatus::saveBinlogId
2. RocksKVStore::truncateBinlogV2 need change newSave #71 
3. RepllogCursorV2 use BinlogCursor instead of Cursor
4. change Transaction::createCursor() and RocksTxn::createCursor() to be protected #67 
5. RocksKVCursor dont call seek("") anymore, and add a check of seeked. #67 

[OPT] deleteRangeBinlog call compactRange() after deleteRange() #70
TendisDev added a commit that referenced this issue Jun 4, 2021
[feature]getMinbinlogid maybe too time costing : saveMinBinlogId in rocksdb
### MR描述
<!--- 详细描述MR的细节 -->
把minbinlogid保存到rocksdb里面去


### 修改动机和上下文背景
<!--- 为什么需要此修改, 解决了什么问题 -->
<!---如果解决了相关的#issue, 在此处进行关联(#issue, close #issue) -->
getMinbinlogid性能比较差,把minbinlogid保存到rocksdb里面去  

#70  

#52   

### 此MR如何进行测试 ?
<!--- 请描述测试MR的细节 -->
<!--- 包括测试的环境以及执行的测试用例 -->
<!--- 说明 change 如何影响其他部分的代码 etc. -->
参考 #70 #52

### change 类型
<!---你的代码引入了何种类型的change, 在所有关联的复选框前选择"x" -->
- [ ] Bug fix (修复了issue的非侵入式修改)
- [ ] New feature (增加功能的非侵入式修改)
- [ ] Breaking change (修复或者增加特性, 但是会造成现有行为的非预期行为)

### 清单
<!--- 查看下述选项,并进行"x"勾选 -->
<!--- 如果你对所有都不确定, 请随时咨询我们 -->
- [ ] 遵循项目的Code-Style
- [ ] Change 需要文档的修改
- [ ] 我已经进行相关文档的修改
- [ ] 我的MR已经通过的相关流水线测试
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants