Skip to content

infinite ResolveNow() loop on net transients #8154

Closed as not planned
Closed as not planned
@lev-lb

Description

@lev-lb

What version of gRPC are you using?

client: v1.70.0, server: v1.61.1, v1.63.2, v1.66.0.

What version of Go are you using (go version)?

go version go1.22.11 linux/amd64

What operating system (Linux, Windows, …) and version?

several Linux versions, including: kernel 4.15.0-112-generic on Ubuntu

What did you do?

gRPC client with a custom resolver (using pick_first LB), connecting to a multi-node server cluster. this codebase appeared to be working fine for about 6 years, up to and including gRPC v1.63.2 - or at least the issue below was never brought to our attention before. since upgrading to v1.70.0, the same code appears to occasionally enter infinite loop calling the custom resolver's ResolveNow() method. the issue is non-deterministic, but has been observed 4-5 times in the past month, AFAICT - always on network transients (impaired connectivity to the gRPC servers cluster).

What did you expect to see?

the problem appears some time after the initial conn was established, lost, re-established, etc. after a net hiccup, i'd expect the lower gRPC layers to keep trying to dial the addresses returned by ResolveNow() as per the backoff and connect timeout config - which it normally does, and has been doing for years. an example of this is at the top of the attached log.

What did you see instead?

occasionally (but rarely, despite forcibly injecting various network faults), the gRPC code enters an infinite loop of calling ResolveNow() at a rate of >15 thousand calls/second, pegging several CPU cores (>210% CPU on an otherwise near-idle machine). the process does not recover. the issue appears to manifest somewhere around here:

2025-03-07T12:48:28.072	INFO	[core] [Channel #1 SubChannel #2]Subchannel Connectivity change to IDLE
2025-03-07T12:48:28.072	INFO	resolver.ResolveNow()	{"lbgrpc": "1i855hu", "targets": "192.168.19.217:443,192.168.19.218:443,192.168.19.220:443,192.168.25.23:443,192.168.26.25:443,192.168.29.142:443,192.168.29.144:443,192.168.16.233:443,192.168.17.106:443,192.168.17.107:443,192.168.17.30:443,192.168.19.213:443,192.168.19.214:443,192.168.19.215:443", "calls-last-sec": 0}
2025-03-07T12:48:28.072	INFO	[transport] [client-transport 0xc000bc46c8] loopyWriter exiting with error: received GOAWAY with no active streams
2025-03-07T12:48:28.072	INFO	[pick-first-lb] [pick-first-lb 0xc000a37d70] Received SubConn state update: 0xc0002bb5e0, {ConnectivityState:IDLE ConnectionError:<nil> connectedAddress:{Addr: ServerName: Attributes:<nil> BalancerAttributes:<nil> Metadata:<nil>}}
2025-03-07T12:48:28.073	INFO	[core] blockingPicker: the picked transport is not ready, loop back to repick
2025-03-07T12:48:28.073	INFO	[core] [Channel #1]Resolver state updated: { "Addresses": [ { "Addr": "192.168.19.217:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.218:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.220:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.25.23:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.26.25:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.29.142:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.29.144:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.16.233:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.106:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.107:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.30:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.213:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.214:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.215:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Endpoints": null, "ServiceConfig": null, "Attributes": null} ()
2025-03-07T12:48:28.073	INFO	[core] [Channel #1]Channel Connectivity change to IDLE
2025-03-07T12:48:28.073	INFO	[transport] [client-transport 0xc000bc46c8] Closing: connection error: desc = "error reading from server: read tcp 10.212.1.6:32844->192.168.19.215:443: use of closed network connection"
2025-03-07T12:48:28.073	INFO	[pick-first-lb] [pick-first-lb 0xc000a37d70] Received new config { "shuffleAddressList": false}, resolver state { "Addresses": [ { "Addr": "192.168.19.217:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.218:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.220:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.25.23:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.26.25:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.29.142:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.29.144:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.16.233:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.106:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.107:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.30:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.213:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.214:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.215:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Endpoints": null, "ServiceConfig": null, "Attributes": null}
2025-03-07T12:48:28.073	INFO	[core] [Channel #1 SubChannel #2]addrConn: updateAddrs addrs (5 of 14): [{Addr: "192.168.19.217:443", ServerName: "", } {Addr: "192.168.19.218:443", ServerName: "", } {Addr: "192.168.19.220:443", ServerName: "", } {Addr: "192.168.25.23:443", ServerName: "", } {Addr: "192.168.26.25:443", ServerName: "", }]
2025-03-07T12:48:28.073	INFO	[core] [Channel #1 SubChannel #2]Subchannel Connectivity change to CONNECTING
2025-03-07T12:48:28.073	INFO	[core] [Channel #1 SubChannel #2]Subchannel picks a new address "192.168.19.215:443" to connect
2025-03-07T12:48:28.073	INFO	[pick-first-lb] [pick-first-lb 0xc000a37d70] Received SubConn state update: 0xc0002bb5e0, {ConnectivityState:CONNECTING ConnectionError:<nil> connectedAddress:{Addr: ServerName: Attributes:<nil> BalancerAttributes:<nil> Metadata:<nil>}}
2025-03-07T12:48:28.073	INFO	[core] [Channel #1]Channel Connectivity change to CONNECTING
2025-03-07T12:48:28.073	INFO	[core] [Channel #1 SubChannel #2]Subchannel picks a new address "192.168.19.217:443" to connect
2025-03-07T12:48:28.073	INFO	[core] Creating new client transport to "{Addr: \"192.168.19.215:443\", ServerName: \"lb-resolver\", }": connection error: desc = "transport: Error while dialing: dial tcp 192.168.19.215:443: operation was canceled"
2025-03-07T12:48:28.073	WARN	[core] [Channel #1 SubChannel #2]grpc: addrConn.createTransport failed to connect to {Addr: "192.168.19.215:443", ServerName: "lb-resolver", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 192.168.19.215:443: operation was canceled"
2025-03-07T12:48:28.073	INFO	resolver.ResolveNow()	{"lbgrpc": "1i855hu", "targets": "192.168.19.218:443,192.168.19.220:443,192.168.25.23:443,192.168.26.25:443,192.168.29.142:443,192.168.29.144:443,192.168.16.233:443,192.168.17.106:443,192.168.17.107:443,192.168.17.30:443,192.168.19.213:443,192.168.19.214:443,192.168.19.215:443,192.168.19.217:443", "calls-last-sec": 1}
2025-03-07T12:48:28.074	INFO	[core] [Channel #1]Resolver state updated: { "Addresses": [ { "Addr": "192.168.19.218:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.220:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.25.23:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.26.25:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.29.142:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.29.144:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.16.233:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.106:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.107:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.30:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.213:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.214:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.215:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.217:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Endpoints": null, "ServiceConfig": null, "Attributes": null} ()
2025-03-07T12:48:28.074	INFO	[pick-first-lb] [pick-first-lb 0xc000a37d70] Received new config { "shuffleAddressList": false}, resolver state { "Addresses": [ { "Addr": "192.168.19.218:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.220:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.25.23:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.26.25:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.29.142:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.29.144:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.16.233:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.106:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.107:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.30:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.213:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.214:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.215:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.217:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Endpoints": null, "ServiceConfig": null, "Attributes": null}
2025-03-07T12:48:28.074	INFO	[core] [Channel #1 SubChannel #2]addrConn: updateAddrs addrs (5 of 14): [{Addr: "192.168.19.218:443", ServerName: "", } {Addr: "192.168.19.220:443", ServerName: "", } {Addr: "192.168.25.23:443", ServerName: "", } {Addr: "192.168.26.25:443", ServerName: "", } {Addr: "192.168.29.142:443", ServerName: "", }]
2025-03-07T12:48:28.074	INFO	[core] Creating new client transport to "{Addr: \"192.168.19.217:443\", ServerName: \"lb-resolver\", }": connection error: desc = "transport: Error while dialing: dial tcp 192.168.19.217:443: operation was canceled"
2025-03-07T12:48:28.074	WARN	[core] [Channel #1 SubChannel #2]grpc: addrConn.createTransport failed to connect to {Addr: "192.168.19.217:443", ServerName: "lb-resolver", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 192.168.19.217:443: operation was canceled"
2025-03-07T12:48:28.074	INFO	resolver.ResolveNow()	{"lbgrpc": "1i855hu", "targets": "192.168.19.220:443,192.168.25.23:443,192.168.26.25:443,192.168.29.142:443,192.168.29.144:443,192.168.16.233:443,192.168.17.106:443,192.168.17.107:443,192.168.17.30:443,192.168.19.213:443,192.168.19.214:443,192.168.19.215:443,192.168.19.217:443,192.168.19.218:443", "calls-last-sec": 2}
2025-03-07T12:48:28.074	INFO	[core] [Channel #1 SubChannel #2]Subchannel picks a new address "192.168.19.218:443" to connect
2025-03-07T12:48:28.074	INFO	[core] [Channel #1]Resolver state updated: { "Addresses": [ { "Addr": "192.168.19.220:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.25.23:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.26.25:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.29.142:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.29.144:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.16.233:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.106:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.107:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.30:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.213:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.214:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.215:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.217:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.218:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Endpoints": null, "ServiceConfig": null, "Attributes": null} ()
2025-03-07T12:48:28.074	INFO	[pick-first-lb] [pick-first-lb 0xc000a37d70] Received new config { "shuffleAddressList": false}, resolver state { "Addresses": [ { "Addr": "192.168.19.220:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.25.23:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.26.25:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.29.142:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.29.144:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.16.233:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.106:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.107:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.30:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.213:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.214:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.215:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.217:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.218:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Endpoints": null, "ServiceConfig": null, "Attributes": null}
2025-03-07T12:48:28.074	INFO	[core] [Channel #1 SubChannel #2]addrConn: updateAddrs addrs (5 of 14): [{Addr: "192.168.19.220:443", ServerName: "", } {Addr: "192.168.25.23:443", ServerName: "", } {Addr: "192.168.26.25:443", ServerName: "", } {Addr: "192.168.29.142:443", ServerName: "", } {Addr: "192.168.29.144:443", ServerName: "", }]
2025-03-07T12:48:28.075	INFO	[core] Creating new client transport to "{Addr: \"192.168.19.218:443\", ServerName: \"lb-resolver\", }": connection error: desc = "transport: Error while dialing: dial tcp 192.168.19.218:443: operation was canceled"
2025-03-07T12:48:28.075	WARN	[core] [Channel #1 SubChannel #2]grpc: addrConn.createTransport failed to connect to {Addr: "192.168.19.218:443", ServerName: "lb-resolver", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 192.168.19.218:443: operation was canceled"
2025-03-07T12:48:28.075	INFO	resolver.ResolveNow()	{"lbgrpc": "1i855hu", "targets": "192.168.25.23:443,192.168.26.25:443,192.168.29.142:443,192.168.29.144:443,192.168.16.233:443,192.168.17.106:443,192.168.17.107:443,192.168.17.30:443,192.168.19.213:443,192.168.19.214:443,192.168.19.215:443,192.168.19.217:443,192.168.19.218:443,192.168.19.220:443", "calls-last-sec": 3}
2025-03-07T12:48:28.075	INFO	[core] [Channel #1]Resolver state updated: { "Addresses": [ { "Addr": "192.168.25.23:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.26.25:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.29.142:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.29.144:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.16.233:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.106:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.107:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.30:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.213:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.214:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.215:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.217:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.218:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.220:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Endpoints": null, "ServiceConfig": null, "Attributes": null} ()
2025-03-07T12:48:28.075	INFO	[core] [Channel #1 SubChannel #2]Subchannel picks a new address "192.168.19.220:443" to connect
2025-03-07T12:48:28.075	INFO	[pick-first-lb] [pick-first-lb 0xc000a37d70] Received new config { "shuffleAddressList": false}, resolver state { "Addresses": [ { "Addr": "192.168.25.23:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.26.25:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.29.142:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.29.144:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.16.233:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.106:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.107:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.17.30:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.213:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.214:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.215:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.217:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.218:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null }, { "Addr": "192.168.19.220:443", "ServerName": "", "Attributes": null, "BalancerAttributes": null, "Metadata": null } ], "Endpoints": null, "ServiceConfig": null, "Attributes": null}
2025-03-07T12:48:28.075	INFO	[core] [Channel #1 SubChannel #2]addrConn: updateAddrs addrs (5 of 14): [{Addr: "192.168.25.23:443", ServerName: "", } {Addr: "192.168.26.25:443", ServerName: "", } {Addr: "192.168.29.142:443", ServerName: "", } {Addr: "192.168.29.144:443", ServerName: "", } {Addr: "192.168.16.233:443", ServerName: "", }]

see attached log, the relevant portion (reproduced above) starts at around PROBLEM MANIFESTS AROUND HERE text.

sample.2025-03-07.12-54-08.log.gz

the only unusual entry in the log that i see before this ResolveNow() infinite loop but don't normally find in the logs on any other net transients (which are handled gracefully) is this one (UPDATED): [core] blockingPicker: the picked transport is not ready, loop back to repick loopyWriter exiting with error: connection error: desc = "keepalive ping failed to receive ACK within timeout".

potentially relevant gRPC client options:

kal := keepalive.ClientParameters{
    Time:                10 * time.Second,
    Timeout:             10 * time.Second,
    PermitWithoutStream: true, 
}
dialBackoffConfig := backoff.Config{
    BaseDelay:  10 * time.Millisecond,
    Multiplier: 5,
    Jitter:     0.1,  
    MaxDelay:   3 * time.Second,
}
cp := grpc.ConnectParams{
    Backoff:           dialBackoffConfig,
    MinConnectTimeout: 1 * time.Second,
}
opts := []grpc.DialOption{
    grpc.WithBlock(),
    grpc.WithDisableRetry(),
    grpc.WithDefaultCallOptions(grpc.WaitForReady(true)),
    grpc.WithKeepaliveParams(kal),
    grpc.WithConnectParams(cp),
    grpc.WithResolvers(customResolver),
}

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions