Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.5 Slaver会忽略手动指定的master ip,而使用master的网卡ip,导致在逻辑局域网络下无法进行同步 #1904

Closed
aploium opened this issue Aug 14, 2023 · 2 comments

Comments

@aploium
Copy link

aploium commented Aug 14, 2023

Describe the bug
master和slaver通过逻辑网络(SDN,VPN或者类似的东西)连通,在实际同步时,slaver会忽略slaveof设置中指定的master逻辑ip,而读取master的物理网卡ip并尝试连接,导致无法连通。
在3.3.6中无此bug

Repreduce
master:
主网卡(公网) 192.168.1.1
逻辑网卡 23.23.23.1

slaver:
主网卡(公网) 192.168.100.1 和master不在同一个局域网
逻辑网卡 23.23.23.2 和master网络打通
在配置中设置 slaveof : 23.23.23.1:9221

启动master和slaver

Logs

I20230814 16:38:22.278599 1348452 pika_repl_client.cc:138] Try Send Meta Sync Request to Master (23.23.23.2:9221)  # 这里是正确的
I20230814 16:38:22.416779 1346716 pika_repl_client_conn.cc:126] Run id is not equal, need to do full sync, remote master run id: 2d55602e1b5796ffbaa713a8f22356f9918e2cba, local run id:
I20230814 16:38:22.795418 1346716 pika_server.cc:493] Mark try connect finish
I20230814 16:38:22.795466 1346716 pika_repl_client_conn.cc:136] Finish to handle meta sync response
I20230814 16:38:23.015070 1346717 pika_repl_client_conn.cc:169] Slot: db5 Need Wait To Sync
I20230814 16:38:23.015354 1346718 pika_repl_client_conn.cc:169] Slot: db7 Need Wait To Sync
I20230814 16:38:23.015386 1346719 pika_repl_client_conn.cc:169] Slot: db4 Need Wait To Sync
I20230814 16:38:23.015527 1346720 pika_repl_client_conn.cc:169] Slot: db3 Need Wait To Sync
I20230814 16:38:23.015596 1346721 pika_repl_client_conn.cc:169] Slot: db6 Need Wait To Sync
I20230814 16:38:23.015647 1346722 pika_repl_client_conn.cc:169] Slot: db2 Need Wait To Sync
I20230814 16:38:23.015705 1346723 pika_repl_client_conn.cc:169] Slot: db0 Need Wait To Sync
I20230814 16:38:23.059607 1346724 pika_repl_client_conn.cc:169] Slot: db1 Need Wait To Sync
W20230814 16:38:23.080103 1348452 pika_rm.cc:650] ActivateRsync ...
I20230814 16:38:23.216496 1348452 rsync_client.cc:363] receive rsync meta infos, snapshot_uuid: 75b14398131dcfa4505dca5e7b2c64a0files count: 16
W20230814 16:38:23.216588 1348452 rsync_client.cc:381] DUMP_META_DATA not exist
I20230814 16:38:23.216984 1348452 rsync_client.cc:315] copy meta data done, slot_id: 0 snapshot_uuid: 75b14398131dcfa4505dca5e7b2c64a0 file count: 16 expired file count: 0 local file count: 0 remote file count: 16 remote snapshot_uuid: 75b14398131dcfa4505dca5e7b2c64a0 local snapshot_uuid:  file_set_: 16
W20230814 16:38:23.217003 1348452 rsync_client.cc:325] file_set: hashes/CURRENT
W20230814 16:38:23.217021 1348452 rsync_client.cc:325] file_set: hashes/MANIFEST-000160
W20230814 16:38:23.217034 1348452 rsync_client.cc:325] file_set: hashes/OPTIONS-000162

... 一堆同步详情 ...

I20230814 16:38:24.210623 1348504 rsync_client.cc:40] copy remote file, filename: info
# 下面的日志里的master_ip变成了master的主网卡,而不是手动指定的逻辑网卡
I20230814 16:38:24.310474 1348452 pika_slot.cc:205] Slot: db7 Information from dbsync info,  master_ip: 192.168.1.1, master_port: 9221, filenum: 0, offset:0, term: 0, index: 0 
W20230814 16:38:24.310526 1348452 pika_slot.cc:211] Slot: db7 Error master node ip port: 192.168.1.1:9221
I20230814 16:38:24.310561 1348452 pika_slot.cc:205] Slot: db5 Information from dbsync info,  master_ip: 192.168.1.1, master_port: 9221, filenum: 0, offset:0, term: 0, index: 0
W20230814 16:38:24.310570 1348452 pika_slot.cc:211] Slot: db5 Error master node ip port: 192.168.1.1:9221
W20230814 16:38:25.064426 1348499 rsync_client.cc:196] rsync request timeout

Expected behavior
slaver应当始终使用 slaveof 中提供的master ip

Additional context
版本: 3.5.0-alpha

@wangshao1
Copy link
Collaborator

Describe the bug master和slaver通过逻辑网络(SDN,VPN或者类似的东西)连通,在实际同步时,slaver会忽略slaveof设置中指定的master逻辑ip,而读取master的物理网卡ip并尝试连接,导致无法连通。 在3.3.6中无此bug

Repreduce master: 主网卡(公网) 192.168.1.1 逻辑网卡 23.23.23.1

slaver: 主网卡(公网) 192.168.100.1 和master不在同一个局域网 逻辑网卡 23.23.23.2 和master网络打通 在配置中设置 slaveof : 23.23.23.1:9221

启动master和slaver

Logs

I20230814 16:38:22.278599 1348452 pika_repl_client.cc:138] Try Send Meta Sync Request to Master (23.23.23.2:9221)  # 这里是正确的
I20230814 16:38:22.416779 1346716 pika_repl_client_conn.cc:126] Run id is not equal, need to do full sync, remote master run id: 2d55602e1b5796ffbaa713a8f22356f9918e2cba, local run id:
I20230814 16:38:22.795418 1346716 pika_server.cc:493] Mark try connect finish
I20230814 16:38:22.795466 1346716 pika_repl_client_conn.cc:136] Finish to handle meta sync response
I20230814 16:38:23.015070 1346717 pika_repl_client_conn.cc:169] Slot: db5 Need Wait To Sync
I20230814 16:38:23.015354 1346718 pika_repl_client_conn.cc:169] Slot: db7 Need Wait To Sync
I20230814 16:38:23.015386 1346719 pika_repl_client_conn.cc:169] Slot: db4 Need Wait To Sync
I20230814 16:38:23.015527 1346720 pika_repl_client_conn.cc:169] Slot: db3 Need Wait To Sync
I20230814 16:38:23.015596 1346721 pika_repl_client_conn.cc:169] Slot: db6 Need Wait To Sync
I20230814 16:38:23.015647 1346722 pika_repl_client_conn.cc:169] Slot: db2 Need Wait To Sync
I20230814 16:38:23.015705 1346723 pika_repl_client_conn.cc:169] Slot: db0 Need Wait To Sync
I20230814 16:38:23.059607 1346724 pika_repl_client_conn.cc:169] Slot: db1 Need Wait To Sync
W20230814 16:38:23.080103 1348452 pika_rm.cc:650] ActivateRsync ...
I20230814 16:38:23.216496 1348452 rsync_client.cc:363] receive rsync meta infos, snapshot_uuid: 75b14398131dcfa4505dca5e7b2c64a0files count: 16
W20230814 16:38:23.216588 1348452 rsync_client.cc:381] DUMP_META_DATA not exist
I20230814 16:38:23.216984 1348452 rsync_client.cc:315] copy meta data done, slot_id: 0 snapshot_uuid: 75b14398131dcfa4505dca5e7b2c64a0 file count: 16 expired file count: 0 local file count: 0 remote file count: 16 remote snapshot_uuid: 75b14398131dcfa4505dca5e7b2c64a0 local snapshot_uuid:  file_set_: 16
W20230814 16:38:23.217003 1348452 rsync_client.cc:325] file_set: hashes/CURRENT
W20230814 16:38:23.217021 1348452 rsync_client.cc:325] file_set: hashes/MANIFEST-000160
W20230814 16:38:23.217034 1348452 rsync_client.cc:325] file_set: hashes/OPTIONS-000162

... 一堆同步详情 ...

I20230814 16:38:24.210623 1348504 rsync_client.cc:40] copy remote file, filename: info
# 下面的日志里的master_ip变成了master的主网卡,而不是手动指定的逻辑网卡
I20230814 16:38:24.310474 1348452 pika_slot.cc:205] Slot: db7 Information from dbsync info,  master_ip: 192.168.1.1, master_port: 9221, filenum: 0, offset:0, term: 0, index: 0 
W20230814 16:38:24.310526 1348452 pika_slot.cc:211] Slot: db7 Error master node ip port: 192.168.1.1:9221
I20230814 16:38:24.310561 1348452 pika_slot.cc:205] Slot: db5 Information from dbsync info,  master_ip: 192.168.1.1, master_port: 9221, filenum: 0, offset:0, term: 0, index: 0
W20230814 16:38:24.310570 1348452 pika_slot.cc:211] Slot: db5 Error master node ip port: 192.168.1.1:9221
W20230814 16:38:25.064426 1348499 rsync_client.cc:196] rsync request timeout

Expected behavior slaver应当始终使用 slaveof 中提供的master ip

Additional context 版本: 3.5.0-alpha

嗯,这个的确是有问题。原因是master同步给slave的数据中包含了一个info文件,里边记录了master的host和port。slave在接收完所有数据之后会去比对master的host和port,如果不一致会认为同步失败。这个逻辑的确有问题。

@chejinge
Copy link
Collaborator

chejinge commented Aug 22, 2023

#1922

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants