Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two nodes can't connect, but consul rtt reports estimated time of milliseconds #20955

Open
misterbisson opened this issue Apr 4, 2024 · 0 comments

Comments

@misterbisson
Copy link

misterbisson commented Apr 4, 2024

Overview of the Issue

Consul rtt command gives estimated times between two hosts that its logs show cannot connect. Test scenario: the Consul server(s) are all in network A, but hosts are in networks A, B, and C. And while B and C can communicate with A, they cannot communicate with each other. Yet consul rtt between hosts in networks B and C shows an estimated connection time of milliseconds:

[auto@b-httpd-02 ~]$ consul rtt c-httpd-06
Estimated c-httpd-06 <-> b-httpd-02 rtt: 0.296 ms (using LAN coordinates)

Reproduction Steps

Test scenario: the Consul server(s) are all in network A, but hosts are in networks A, B, and C. And while B and C can communicate with A, they cannot communicate with each other.

You can tell what network each node is on by the first letter in the name designation in the following list:

Node         Address              Status  Type    Build   Protocol  DC        Partition  Segment
a-consul-01  192.168.20.3:8301    alive   server  1.18.1  2         dca       default    <all>
a-httpd-03   192.168.20.16:8301   alive   client  1.18.1  2         dca       default    <default>
a-httpd-04   192.168.20.18:8301   alive   client  1.18.1  2         dca       default    <default>
a-httpd-08   192.168.20.22:8301   alive   client  1.18.1  2         dca       default    <default>
a-httpd-11   192.168.20.25:8301   alive   client  1.18.1  2         dca       default    <default>
a-httpd-13   192.168.20.27:8301   alive   client  1.18.1  2         dca       default    <default>
b-httpd-02   192.168.20.80:8301   alive   client  1.18.1  2         dca       default    <default>
b-httpd-04   192.168.20.82:8301   alive   client  1.18.1  2         dca       default    <default>
b-httpd-06   192.168.20.84:8301   alive   client  1.18.1  2         dca       default    <default>
b-httpd-08   192.168.20.86:8301   alive   client  1.18.1  2         dca       default    <default>
c-httpd-02   192.168.20.144:8301  alive   client  1.18.1  2         dca       default    <default>
c-httpd-04   192.168.20.146:8301  alive   client  1.18.1  2         dca       default    <default>
c-httpd-05   192.168.20.147:8301  alive   client  1.18.1  2         dca       default    <default>
c-httpd-06   192.168.20.148:8301  alive   client  1.18.1  2         dca       default    <default>

Logs demonstrate Consul clients cannot connect between networks B and C:

[auto@b-httpd-02 ~]$ consul monitor c-httpd-02 --debug
2024-04-04T13:49:19.994-0500 [INFO]  agent.client.memberlist.lan: memberlist: Suspect c-httpd-02 has failed, no acks received
2024-04-04T13:49:31.469-0500 [ERROR] agent.client.memberlist.lan: memberlist: Push/Pull with c-httpd-05 failed: dial tcp 192.168.20.147:8301: i/o timeout
2024-04-04T13:49:37.995-0500 [INFO]  agent.client.memberlist.lan: memberlist: Suspect c-httpd-05 has failed, no acks received
2024-04-04T13:49:43.994-0500 [INFO]  agent.client.memberlist.lan: memberlist: Suspect c-httpd-04 has failed, no acks received
2024-04-04T13:49:47.995-0500 [INFO]  agent.client.memberlist.lan: memberlist: Suspect c-httpd-06 has failed, no acks received
2024-04-04T13:49:56.531-0500 [WARN]  agent.client.memberlist.lan: memberlist: Refuting a suspect message (from: c-httpd-04)

Yet querying the RTT between hosts in B and C yields an estimated time like the following:

[auto@b-httpd-02 ~]$ consul rtt c-httpd-06
Estimated c-httpd-06 <-> b-httpd-02 rtt: 0.296 ms (using LAN coordinates)

Consul info for both Client and Server

Client info
Output from client 'consul info' command here
[auto@b-httpd-02 ~]$ consul info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 1
	services = 1
build:
	prerelease =
	revision = 98cb473c
	version = 1.18.1
	version_metadata =
consul:
	acl = disabled
	known_servers = 1
	server = false
runtime:
	arch = amd64
	cpu_count = 1
	goroutines = 52
	max_procs = 1
	os = linux
	version = go1.21.8
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 6
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 17
	members = 14
	query_queue = 0
	query_time = 1
Client agent HCL config
Server info
Output from server 'consul info' command here
[root@a-consul-01 ~]# consul info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 0
build:
	prerelease =
	revision = 98cb473c
	version = 1.18.1
	version_metadata =
consul:
	acl = disabled
	bootstrap = true
	known_datacenters = 1
	leader = true
	leader_addr = 192.168.20.3:8300
	server = true
raft:
	applied_index = 19333
	commit_index = 19333
	fsm_pending = 0
	last_contact = 0
	last_log_index = 19333
	last_log_term = 3
	last_snapshot_index = 16390
	last_snapshot_term = 3
	latest_configuration = [{Suffrage:Voter ID:2535d27a-b261-5c06-f925-98f0a080decf Address:192.168.20.3:8300}]
	latest_configuration_index = 0
	num_peers = 0
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Leader
	term = 3
runtime:
	arch = amd64
	cpu_count = 2
	goroutines = 322
	max_procs = 2
	os = linux
	version = go1.21.8
serf_lan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 6
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 17
	members = 14
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1
Server agent HCL config

Operating system and Environment details

Log Fragments

[auto@b-httpd-02 ~]$ consul monitor c-httpd-02 --debug
2024-04-04T13:49:19.994-0500 [INFO]  agent.client.memberlist.lan: memberlist: Suspect c-httpd-02 has failed, no acks received
2024-04-04T13:49:31.469-0500 [ERROR] agent.client.memberlist.lan: memberlist: Push/Pull with c-httpd-05 failed: dial tcp 192.168.20.147:8301: i/o timeout
2024-04-04T13:49:37.995-0500 [INFO]  agent.client.memberlist.lan: memberlist: Suspect c-httpd-05 has failed, no acks received
2024-04-04T13:49:43.994-0500 [INFO]  agent.client.memberlist.lan: memberlist: Suspect c-httpd-04 has failed, no acks received
2024-04-04T13:49:47.995-0500 [INFO]  agent.client.memberlist.lan: memberlist: Suspect c-httpd-06 has failed, no acks received
2024-04-04T13:49:56.531-0500 [WARN]  agent.client.memberlist.lan: memberlist: Refuting a suspect message (from: c-httpd-04)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant