fix(gravity): keep discovery enabled for DNS-derived direct IPs#246
fix(gravity): keep discovery enabled for DNS-derived direct IPs#246
Conversation
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughReplaces the previous “all URLs are direct IPs” check with a new predicate, shouldDisablePeerDiscoveryForResolvedURLs, which disables peer discovery only for explicit direct-IP primary overrides, when the resolver is absent, or when the normalized resolved direct‑IP set differs from resolver output. Adds sameGravityURLSet, shouldResolveForPeerDiscoveryDisableCheck, and tests. ChangesPeer Discovery Disable Decision Logic
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
5b5356b to
7bff694
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@gravity/grpc_client.go`:
- Around line 1252-1259: The startup path currently calls
g.discoveryResolveFunc() directly (in the return !sameGravityURLSet(urls,
g.discoveryResolveFunc()) line) which can block; instead invoke the resolver
through the existing timeout-wrapping path used by startMultiEndpoint so the
call cannot hang indefinitely—replace the direct g.discoveryResolveFunc() call
with the timeout-protected resolver invocation (the same helper/wrapper used by
startMultiEndpoint) and pass its result into sameGravityURLSet(urls, ...) so
discovery check uses the timed resolver.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a29c8d4c-efc9-412a-99c0-0b231fa12682
📒 Files selected for processing (2)
gravity/grpc_client.gogravity/peer_discovery_test.go
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: Analyze (go)
🔇 Additional comments (2)
gravity/grpc_client.go (1)
1215-1239: Set comparison helper is solid
sameGravityURLSetcorrectly normalizes trim/dedupe and ignores order, which fits the DNS-derived direct-IP equivalence check.gravity/peer_discovery_test.go (1)
253-316: Great regression coverage for direct-IP origin semanticsThese cases collectively validate the intended distinction between explicit/static direct IP configs and DNS-derived direct-IP sets, including startup gating and below-capacity add behavior.
Also applies to: 509-540, 972-1043
| if g.discoveryResolveFunc == nil { | ||
| return true | ||
| } | ||
|
|
||
| // Hadron may resolve DNS before constructing the Gravity client, which | ||
| // means DNS-derived peers arrive here as direct IP URLs. Keep peer discovery | ||
| // enabled when the resolver confirms this URL set came from discovery. | ||
| return !sameGravityURLSet(urls, g.discoveryResolveFunc()) |
There was a problem hiding this comment.
Bound resolver execution in startup path
Line 1259 calls discoveryResolveFunc() directly during startup. If the resolver blocks (DNS stall/network hang), startMultiEndpoint() can hang indefinitely before the client is usable. Please use the existing timeout path instead of a direct call.
Suggested fix
func (g *GravityClient) shouldDisablePeerDiscoveryForResolvedURLs(urls []string) bool {
if !allGravityURLsAreDirectIPs(urls) {
return false
}
@@
if g.discoveryResolveFunc == nil {
return true
}
@@
- return !sameGravityURLSet(urls, g.discoveryResolveFunc())
+ resolved, ok := g.resolvePeerDiscoveryURLs()
+ if !ok {
+ // Resolver timed out/canceled; avoid blocking startup.
+ // Conservative choice: keep discovery disabled when origin cannot be confirmed.
+ return true
+ }
+ return !sameGravityURLSet(urls, resolved)
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@gravity/grpc_client.go` around lines 1252 - 1259, The startup path currently
calls g.discoveryResolveFunc() directly (in the return !sameGravityURLSet(urls,
g.discoveryResolveFunc()) line) which can block; instead invoke the resolver
through the existing timeout-wrapping path used by startMultiEndpoint so the
call cannot hang indefinitely—replace the direct g.discoveryResolveFunc() call
with the timeout-protected resolver invocation (the same helper/wrapper used by
startMultiEndpoint) and pass its result into sameGravityURLSet(urls, ...) so
discovery check uses the timed resolver.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
gravity/grpc_client.go (1)
844-848:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBound resolver execution in startup discovery gating.
Line 1261 directly invokes
discoveryResolveFunc(). If resolver execution stalls,startMultiEndpoint()can block during startup. Use the existing timeout-bounded resolver path and handle timeout/cancel conservatively.Suggested fix
- primaryURL := g.url - discoveryResolveFunc := g.discoveryResolveFunc + primaryURL := g.url g.mu.Unlock() - g.peerDiscoveryDisabled = shouldDisablePeerDiscoveryForResolvedURLs(urls, primaryURL, discoveryResolveFunc) + g.peerDiscoveryDisabled = g.shouldDisablePeerDiscoveryForResolvedURLs(urls, primaryURL)-func shouldDisablePeerDiscoveryForResolvedURLs(urls []string, primaryURL string, discoveryResolveFunc func() []string) bool { +func (g *GravityClient) shouldDisablePeerDiscoveryForResolvedURLs(urls []string, primaryURL string) bool { if !allGravityURLsAreDirectIPs(urls) { return false } @@ - if discoveryResolveFunc == nil { + if g.discoveryResolveFunc == nil { return true } @@ - return !sameGravityURLSet(urls, discoveryResolveFunc()) + resolved, ok := g.resolvePeerDiscoveryURLs() + if !ok { + return true + } + return !sameGravityURLSet(urls, resolved) }Also applies to: 1243-1262
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@gravity/grpc_client.go` around lines 844 - 848, The code currently calls discoveryResolveFunc() directly while holding no timeout, which can block startMultiEndpoint() during startup; instead, use the same timeout-bounded resolver helper already used elsewhere (the bounded resolve/cancel path) to invoke discoveryResolveFunc with a context deadline and handle timeout/cancellation conservatively; update the call site around startMultiEndpoint()/shouldDisablePeerDiscoveryForResolvedURLs to call the timeout-wrapped resolver, treat timeouts as non-fatal (assume discovery disabled or retry later), and ensure g.peerDiscoveryDisabled is set based on the bounded-result or explicit timeout error rather than a potentially blocking direct call to discoveryResolveFunc.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@gravity/grpc_client.go`:
- Around line 844-848: The code currently calls discoveryResolveFunc() directly
while holding no timeout, which can block startMultiEndpoint() during startup;
instead, use the same timeout-bounded resolver helper already used elsewhere
(the bounded resolve/cancel path) to invoke discoveryResolveFunc with a context
deadline and handle timeout/cancellation conservatively; update the call site
around startMultiEndpoint()/shouldDisablePeerDiscoveryForResolvedURLs to call
the timeout-wrapped resolver, treat timeouts as non-fatal (assume discovery
disabled or retry later), and ensure g.peerDiscoveryDisabled is set based on the
bounded-result or explicit timeout error rather than a potentially blocking
direct call to discoveryResolveFunc.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cd7a96a7-b488-4c79-afad-fe721245563a
📒 Files selected for processing (2)
gravity/grpc_client.gogravity/peer_discovery_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- gravity/peer_discovery_test.go
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (1)
gravity/grpc_client.go (1)
1217-1241: URL-set comparison helper is a good fit.
sameGravityURLSetcorrectly ignores order, duplicates, and surrounding whitespace, which is exactly what this discovery-origin check needs.
7bff694 to
a945d54
Compare
Summary
Fixes the Gravity peer discovery regression introduced by
32ce707.That change correctly tried to prevent peer discovery when an operator explicitly configures Gravity with direct IP URLs, but it also disabled discovery for Hadron's production path where DNS is resolved into per-ion direct IP URLs before constructing the Gravity client.
This PR keeps the explicit/static direct-IP safeguard while allowing DNS-derived direct-IP URL sets to continue peer discovery.
Changes
shouldDisablePeerDiscoveryForResolvedURLs.GravityURLswith no resolverGravityURLswhose resolver returns a different setDiscoveryResolveFuncreturns the same direct-IP URL set, which indicates the URLs were DNS-derived.MaxGravityPeers.Validation
Summary by CodeRabbit
Bug Fixes
Tests