Skip to content

fix: preserve session config on multi-tunnel reuse#229

Merged
jhaynie merged 12 commits intomainfrom
fix/preserve-session-config
Apr 19, 2026
Merged

fix: preserve session config on multi-tunnel reuse#229
jhaynie merged 12 commits intomainfrom
fix/preserve-session-config

Conversation

@jhaynie
Copy link
Copy Markdown
Member

@jhaynie jhaynie commented Apr 18, 2026

Summary

Multi-tunnel reuse sessions return minimal responses without ApiUrl, OtlpUrl, HostMapping, etc. Previously these empty values overwrote the good config from the primary session, causing the hadron to lose its API URL and fail to download deployment code.

Now only non-empty values update the stored config. Added debug logging for session config response and effective values.

Summary by CodeRabbit

  • Bug Fixes

    • Prevented session settings from being unintentionally cleared during reconnect so endpoints, tokens, and session identity persist across handshakes.
    • Preserved primary session and machine identifiers (keys, certificates, routing info) across multi-tunnel reuse sessions.
    • Improved session-config logging to show incoming and effective configuration values.
    • Fixed errors when control streams or circuit breakers were uninitialized and improved ping handling when no control streams exist.
  • New Features

    • Added optional SNI hostname support for endpoint mappings.

Multi-tunnel reuse sessions return minimal responses without ApiUrl,
OtlpUrl, HostMapping, etc. Previously these empty values overwrote the
good config from the primary session, causing the hadron to lose its
API URL and fail to download deployment code.

Now only non-empty values update the stored config. Added debug logging
for session config response and effective values.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

gravity/grpc_client.go now preserves primary session fields across reused multi-tunnel sessions and only updates them when response values are non-empty; it also tightens control-stream error checks and adds reconnection panic stack logging. gravity/gossip/types.go adds SNIHostname to EndpointMapping.

Changes

Cohort / File(s) Summary
Session configuration & provider
gravity/grpc_client.go
Adds persisted GravityClient fields (subnetRoutes, machineSubnet, sessionHostname, sessionOrgID, machineToken, machineCertBundle, sshPublicKey, signingPublicKey). handleSessionHelloResponse conditionally updates session/OTLP/API/host fields only when response values are non-empty; provider configuration and subnet routing use preserved g.* values; MachineSubnet validation uses preserved g.machineSubnet.
Control-stream safety & reconnection handling
gravity/grpc_client.go
establishControlStreams errors when zero gRPC connections or zero session clients and bounds stream establishment to allocated slice size. sendSessionHello (single-endpoint path) returns errors if controlStreams or circuitBreakers are uninitialized. sendPing checks len(streams)==0 before indexing. Reconnection panics now capture and log a stack trace before returning the error.
Logging and diagnostics
gravity/grpc_client.go
Adds [session-config] Info logs for raw SessionHelloResponse fields and the effective session configuration after conditional updates.
Gossip message struct
gravity/gossip/types.go
Adds SNIHostname string serialized as sni_hostname with omitempty to EndpointMapping.
🚥 Pre-merge checks | ✅ 1
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
gravity/grpc_client.go (1)

2582-2583: Use debug level for session-config dumps.

These logs are emitted at Info and include stable identifiers/URLs. Since this is diagnostic merge tracing, Debug is safer and less noisy in production.

💡 Proposed change
-g.logger.Info("[session-config] response: apiUrl=%q otlpUrl=%q env=%d hostMappings=%d machineId=%s server=%s",
+g.logger.Debug("[session-config] response: apiUrl=%q otlpUrl=%q env=%d hostMappings=%d machineId=%s server=%s",
 	response.ApiUrl, response.OtlpUrl, len(response.Environment), len(response.HostMapping), response.MachineId, response.GravityServer)

-g.logger.Info("[session-config] effective: apiUrl=%q otlpUrl=%q hostMappings=%d", g.apiURL, g.otlpURL, len(g.hostMapping))
+g.logger.Debug("[session-config] effective: apiUrl=%q otlpUrl=%q hostMappings=%d", g.apiURL, g.otlpURL, len(g.hostMapping))

Also applies to: 2599-2599

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gravity/grpc_client.go` around lines 2582 - 2583, The session-config
diagnostic log is using g.logger.Info and should be lowered to Debug to avoid
noisy production logs; change the calls that format the response (the
g.logger.Info call that logs "response: apiUrl=%q otlpUrl=%q env=%d
hostMappings=%d machineId=%s server=%s" with response.ApiUrl, response.OtlpUrl,
len(response.Environment), len(response.HostMapping), response.MachineId,
response.GravityServer) and the similar logger invocation around the same
session-config handling (the other g.logger.Info at the later occurrence) to
g.logger.Debug so these diagnostic URL/ID dumps are only emitted at debug level.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@gravity/grpc_client.go`:
- Around line 2587-2589: The code saves the OTLP token to g.otlpToken but
provider.Configure still receives response.OtlpKey, which can overwrite a
preserved token with an empty value; modify the call site in the Configure flow
(where provider.Configure is invoked) to pass the effective token (e.g., use
response.OtlpKey if non-empty, otherwise g.otlpToken) instead of raw
response.OtlpKey so preserved tokens are reused when responses omit the key.

---

Nitpick comments:
In `@gravity/grpc_client.go`:
- Around line 2582-2583: The session-config diagnostic log is using
g.logger.Info and should be lowered to Debug to avoid noisy production logs;
change the calls that format the response (the g.logger.Info call that logs
"response: apiUrl=%q otlpUrl=%q env=%d hostMappings=%d machineId=%s server=%s"
with response.ApiUrl, response.OtlpUrl, len(response.Environment),
len(response.HostMapping), response.MachineId, response.GravityServer) and the
similar logger invocation around the same session-config handling (the other
g.logger.Info at the later occurrence) to g.logger.Debug so these diagnostic
URL/ID dumps are only emitted at debug level.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 87e8dab3-b6a3-4604-a316-fdc28176809d

📥 Commits

Reviewing files that changed from the base of the PR and between 2164202 and a971efd.

📒 Files selected for processing (1)
  • gravity/grpc_client.go
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
gravity/grpc_client.go (1)

2584-2598: Good non-empty merge guards for reused sessions.

Nice fix: these guards prevent minimal multi-tunnel hello responses from clobbering previously valid session config values.

Comment thread gravity/grpc_client.go
jhaynie added 2 commits April 18, 2026 17:45
…ponse

The provider.Configure call was passing response.OtlpKey directly,
bypassing the preservation logic that keeps good values from the
primary session when multi-tunnel reuse sessions send empty responses.
Store SubnetRoutes, MachineSubnet, Hostname, OrgID, MachineToken,
MachineCertBundle, SSHPublicKey, and SigningPublicKey on the gravity
client. Pass preserved values to provider.Configure() instead of raw
response values that may be empty from multi-tunnel reuse sessions.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
gravity/grpc_client.go (2)

2585-2624: ⚠️ Potential issue | 🟠 Major

Preserve authorizationToken together with MachineToken.

Line 2587 still overwrites g.authorizationToken from the raw response on every hello. On a minimal reuse response, that blanks the bearer token even though the primary token is now preserved in g.machineToken, so GetSecret(), GetDeploymentMetadata(), and GetSandboxMetadata() can start sending an empty token.

💡 Suggested fix
  g.machineID = response.MachineId
- g.authorizationToken = response.MachineToken
  g.connectionIDs = append(g.connectionIDs, response.MachineId)
  ...
  if response.MachineToken != "" {
  	g.machineToken = response.MachineToken
+ 	g.authorizationToken = response.MachineToken
  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gravity/grpc_client.go` around lines 2585 - 2624, The code unconditionally
sets g.authorizationToken = response.MachineToken on every hello, which lets
minimal reuse responses clear the bearer token; only update g.authorizationToken
when the response actually contains a non-empty MachineToken (i.e., mirror the
pattern used for g.machineToken and other fields). Modify the assignment so that
g.authorizationToken is only overwritten if response.MachineToken != ""
(otherwise leave the existing g.authorizationToken intact) and ensure
g.machineToken continues to be updated as before so GetSecret(),
GetDeploymentMetadata(), and GetSandboxMetadata() always have a valid token.

2590-2705: ⚠️ Potential issue | 🔴 Critical

Snapshot the merged session config before releasing g.mu to prevent concurrent mutations.

Lines 2636 unlocks g.mu, but lines 2664–2705 then read g.machineSubnet, g.apiURL, g.hostMapping, g.hostEnvironment, g.subnetRoutes, g.sessionHostname, g.sessionOrgID, g.machineToken, g.machineID, g.machineCertBundle, g.sshPublicKey, and g.signingPublicKey without holding the lock. Since handleSessionHelloResponse runs concurrently on multiple control-stream goroutines (lines 1864, 1947, 4076), another hello can mutate these fields while this one is validating/configuring/routing, creating a race that feeds provider.Configure or RouteTraffic a mixed config from two different responses.

Additionally, line 2587 unconditionally assigns g.authorizationToken = response.MachineToken, so multi-tunnel reuse sessions returning an empty token will blank the auth token used by lines 6034 and 6052 for bearer authentication. While g.machineToken is preserved conditionally (line 2603), g.authorizationToken should also be preserved conditionally to maintain auth across reuse sessions.

💡 Suggested fix
  g.logger.Info("[session-config] effective: apiUrl=%q otlpUrl=%q hostMappings=%d subnetRoutes=%d machineSubnet=%q", g.apiURL, g.otlpURL, len(g.hostMapping), len(g.subnetRoutes), g.machineSubnet)
+
+ effectiveAPIURL := g.apiURL
+ effectiveOTLPURL := g.otlpURL
+ effectiveOTLPToken := g.otlpToken
+ effectiveHostMapping := append([]*pb.HostMapping(nil), g.hostMapping...)
+ effectiveEnvironment := append([]string(nil), g.hostEnvironment...)
+ effectiveSubnetRoutes := append([]string(nil), g.subnetRoutes...)
+ effectiveMachineSubnet := g.machineSubnet
+ effectiveHostname := g.sessionHostname
+ effectiveOrgID := g.sessionOrgID
+ effectiveMachineToken := g.machineToken
+ effectiveMachineID := g.machineID
+ effectiveMachineCertBundle := g.machineCertBundle
+ effectiveSSHPublicKey := append([]byte(nil), g.sshPublicKey...)
+ effectiveSigningPublicKey := append([]byte(nil), g.signingPublicKey...)
 
  g.mu.Unlock()
 
- if g.machineSubnet != "" {
- 	if _, _, err := net.ParseCIDR(g.machineSubnet); err != nil {
- 		g.logger.Error("session hello returned invalid MachineSubnet %q: %v", g.machineSubnet, err)
+ if effectiveMachineSubnet != "" {
+ 	if _, _, err := net.ParseCIDR(effectiveMachineSubnet); err != nil {
+ 		g.logger.Error("session hello returned invalid MachineSubnet %q: %v", effectiveMachineSubnet, err)
  		signalConnectionID("")
  		closeSessionReady()
  		return
  	}
  }
 
  if err := g.provider.Configure(provider.Configuration{
  	Server:            g,
  	Context:           g.context,
  	Logger:            g.logger,
- 	APIURL:            g.apiURL,
- 	SSHPublicKey:      g.sshPublicKey,
- 	TelemetryURL:      g.otlpURL,
- 	TelemetryAPIKey:   g.otlpToken,
+ 	APIURL:            effectiveAPIURL,
+ 	SSHPublicKey:      effectiveSSHPublicKey,
+ 	TelemetryURL:      effectiveOTLPURL,
+ 	TelemetryAPIKey:   effectiveOTLPToken,
  	GravityURL:        g.url,
  	AgentuityCACert:   g.caCert,
- 	HostMapping:       g.hostMapping,
- 	Environment:       g.hostEnvironment,
- 	SubnetRoutes:      g.subnetRoutes,
- 	Hostname:          g.sessionHostname,
- 	OrgID:             g.sessionOrgID,
- 	MachineToken:      g.machineToken,
- 	MachineID:         g.machineID,
- 	MachineCertBundle: g.machineCertBundle,
- 	MachineSubnet:     g.machineSubnet,
- 	SigningPublicKey:  g.signingPublicKey,
+ 	HostMapping:       effectiveHostMapping,
+ 	Environment:       effectiveEnvironment,
+ 	SubnetRoutes:      effectiveSubnetRoutes,
+ 	Hostname:          effectiveHostname,
+ 	OrgID:             effectiveOrgID,
+ 	MachineToken:      effectiveMachineToken,
+ 	MachineID:         effectiveMachineID,
+ 	MachineCertBundle: effectiveMachineCertBundle,
+ 	MachineSubnet:     effectiveMachineSubnet,
+ 	SigningPublicKey:  effectiveSigningPublicKey,
  }); err != nil {
  	...
  }
 
- if g.networkInterface != nil {
- 	if err := g.networkInterface.RouteTraffic(g.subnetRoutes); err != nil {
+ if g.networkInterface != nil {
+ 	if err := g.networkInterface.RouteTraffic(effectiveSubnetRoutes); err != nil {
  		...
  	}
  }

For the auth token, change line 2587:

- g.authorizationToken = response.MachineToken
+ if response.MachineToken != "" {
+     g.authorizationToken = response.MachineToken
+ }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gravity/grpc_client.go` around lines 2590 - 2705, The handler unlocks g.mu
too early causing races: snapshot all session-derived fields (apiURL,
otlpURL/otlpToken, hostMapping, hostEnvironment, subnetRoutes, machineSubnet,
sessionHostname, sessionOrgID, machineToken, machineID, machineCertBundle,
sshPublicKey, signingPublicKey, etc.) into local variables while holding g.mu,
then release the lock and use those locals for validation, provider.Configure
and networkInterface.RouteTraffic to avoid concurrent mutation; also change the
unconditional assignment to g.authorizationToken = response.MachineToken so it
is only updated when response.MachineToken != "" (preserve existing
authorizationToken on reuse sessions).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@gravity/grpc_client.go`:
- Around line 2703-2705: The code currently calls
g.networkInterface.RouteTraffic(g.subnetRoutes) on every session hello even when
the incoming message omits SubnetRoutes, reusing the preserved g.subnetRoutes
and risking duplicate route application; update the handshake logic in the
function handling hellos to either 1) compare the new SubnetRoutes payload to
the preserved g.subnetRoutes and skip calling
g.networkInterface.RouteTraffic(...) when they are equal/unchanged, or 2) if
reapplication is required, call
g.networkInterface.UnrouteTraffic(g.subnetRoutes) (or the appropriate unroute
method) before invoking g.networkInterface.RouteTraffic(...) to clear prior
state; use the g.subnetRoutes field and the
networkInterface.RouteTraffic/UnrouteTraffic methods to locate and implement
this change.

---

Outside diff comments:
In `@gravity/grpc_client.go`:
- Around line 2585-2624: The code unconditionally sets g.authorizationToken =
response.MachineToken on every hello, which lets minimal reuse responses clear
the bearer token; only update g.authorizationToken when the response actually
contains a non-empty MachineToken (i.e., mirror the pattern used for
g.machineToken and other fields). Modify the assignment so that
g.authorizationToken is only overwritten if response.MachineToken != ""
(otherwise leave the existing g.authorizationToken intact) and ensure
g.machineToken continues to be updated as before so GetSecret(),
GetDeploymentMetadata(), and GetSandboxMetadata() always have a valid token.
- Around line 2590-2705: The handler unlocks g.mu too early causing races:
snapshot all session-derived fields (apiURL, otlpURL/otlpToken, hostMapping,
hostEnvironment, subnetRoutes, machineSubnet, sessionHostname, sessionOrgID,
machineToken, machineID, machineCertBundle, sshPublicKey, signingPublicKey,
etc.) into local variables while holding g.mu, then release the lock and use
those locals for validation, provider.Configure and
networkInterface.RouteTraffic to avoid concurrent mutation; also change the
unconditional assignment to g.authorizationToken = response.MachineToken so it
is only updated when response.MachineToken != "" (preserve existing
authorizationToken on reuse sessions).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b2572332-689e-4d1c-9434-494fb192d8a2

📥 Commits

Reviewing files that changed from the base of the PR and between ba05040 and 538fd27.

📒 Files selected for processing (1)
  • gravity/grpc_client.go
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)

Comment thread gravity/grpc_client.go
Comment on lines +2703 to +2705
g.logger.Debug("configuring subnet routing for routes %v", g.subnetRoutes)
if g.networkInterface != nil {
if err := g.networkInterface.RouteTraffic(response.SubnetRoutes); err != nil {
if err := g.networkInterface.RouteTraffic(g.subnetRoutes); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== RouteTraffic declarations and call sites =="
rg -n -C3 '\bRouteTraffic\s*\('

echo
echo "== NetworkInterface definitions =="
rg -n -C3 'type .*NetworkInterface|interface .*RouteTraffic'

Repository: agentuity/go-common

Length of output: 3764


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Context around RouteTraffic call in grpc_client.go (lines 2695-2715) =="
sed -n '2695,2715p' gravity/grpc_client.go

echo
echo "== Search for function containing line 2703 (context) =="
sed -n '2650,2750p' gravity/grpc_client.go | head -100

echo
echo "== Check for 'reuse hello' or similar logic =="
rg -n -i 'reuse.*hello|hello.*reuse' gravity/

echo
echo "== Look for actual implementations of NetworkInterface =="
rg -n 'func.*RouteTraffic|type.*NetworkInterface' --glob '!*test*' --glob '!*_test*'

Repository: agentuity/go-common

Length of output: 4242


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Where is g.subnetRoutes assigned/updated =="
rg -n 'g\.subnetRoutes\s*=' gravity/

echo
echo "== Session hello response structure =="
rg -n -A5 'type.*SessionHello|struct.*SessionHello' gravity/ --type go

echo
echo "== Check what gets updated from response =="
rg -n 'response\.' gravity/grpc_client.go | grep -i subnet | head -20

echo
echo "== Search for any cleanup or UnrouteTraffic calls =="
rg -n 'UnrouteTraffic' gravity/

Repository: agentuity/go-common

Length of output: 4975


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Context around UnrouteTraffic call at line 3461 =="
sed -n '3450,3475p' gravity/grpc_client.go

echo
echo "== Search for actual RouteTraffic implementations (non-test) =="
rg -n -C5 'func.*RouteTraffic' gravity/ --glob '!*test*' --glob '!*.pb.go'

echo
echo "== Search for comments about reuse hello or multi-tunnel =="
rg -n -B2 -A2 'reuse.*session|multi.*tunnel' gravity/ --type go

echo
echo "== Check SessionHelloResponse fields for SubnetRoutes =="
sed -n '790,850p' gravity/proto/gravity_session.pb.go | grep -i subnet

Repository: agentuity/go-common

Length of output: 872


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Search for tests involving reuse hello or SessionHelloResponse =="
rg -n -C3 'SessionHelloResponse|reuse.*hello' gravity/ --type go

echo
echo "== Check network/README for RouteTraffic documentation =="
cat gravity/network/README.md 2>/dev/null || echo "No README found"

echo
echo "== Search for any comments about idempotency or route handling =="
rg -n -i 'idempotent|duplicate.*route|route.*duplicate' gravity/ --type go

echo
echo "== Check if there are integration tests for RouteTraffic =="
fd -e go -path '*/test*' | xargs grep -l 'RouteTraffic' 2>/dev/null | head -5

Repository: agentuity/go-common

Length of output: 19740


RouteTraffic is reapplied unconditionally on every session hello, risking duplicate route operations.

On reuse hellos where SubnetRoutes is omitted from the response, the preserved g.subnetRoutes value (lines 2610–2611) is still passed to RouteTraffic at line 2705. If the implementation treats duplicate routes as errors rather than idempotently, the handshake will fail on the second or subsequent hello message (e.g., multi-stream scenarios where multiple endpoints acknowledge their hellos).

Consider either:

  • Skipping RouteTraffic if routes haven't changed since the last call, or
  • Calling UnrouteTraffic before reapplying to clear prior state
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gravity/grpc_client.go` around lines 2703 - 2705, The code currently calls
g.networkInterface.RouteTraffic(g.subnetRoutes) on every session hello even when
the incoming message omits SubnetRoutes, reusing the preserved g.subnetRoutes
and risking duplicate route application; update the handshake logic in the
function handling hellos to either 1) compare the new SubnetRoutes payload to
the preserved g.subnetRoutes and skip calling
g.networkInterface.RouteTraffic(...) when they are equal/unchanged, or 2) if
reapplication is required, call
g.networkInterface.UnrouteTraffic(g.subnetRoutes) (or the appropriate unroute
method) before invoking g.networkInterface.RouteTraffic(...) to clear prior
state; use the g.subnetRoutes field and the
networkInterface.RouteTraffic/UnrouteTraffic methods to locate and implement
this change.

jhaynie added 3 commits April 18, 2026 20:31
The reconnection path could panic with 'index out of range [0]' when
controlStreams was empty (not yet initialized). Add length checks
before accessing index 0.
Propagate the deployment d-hash / sandbox hostname via gossip so peer
ions can use it as the TLS SNI for backend connections.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
gravity/grpc_client.go (2)

2592-2593: ⚠️ Potential issue | 🟠 Major

Don't clear authorizationToken on minimal hello responses.

g.machineToken is preserved below, but g.authorizationToken is still overwritten unconditionally here. On reuse hellos with an empty MachineToken, downstream authenticated calls in the same file still send an empty bearer token at Line [6040] and Line [6058].

Suggested fix
  // Store session response fields
  g.machineID = response.MachineId
- g.authorizationToken = response.MachineToken
  g.connectionIDs = append(g.connectionIDs, response.MachineId)
@@
- if response.MachineToken != "" {
- 	g.machineToken = response.MachineToken
- }
+ if response.MachineToken != "" {
+ 	g.authorizationToken = response.MachineToken
+ 	g.machineToken = response.MachineToken
+ }

Also applies to: 2628-2630

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gravity/grpc_client.go` around lines 2592 - 2593, The code unconditionally
overwrites g.authorizationToken from response.MachineToken even when the
response provides an empty MachineToken; change the assignment so
g.authorizationToken is only updated if response.MachineToken is non-empty
(i.e., check response.MachineToken != "" before setting g.authorizationToken)
while still updating g.machineID and preserving g.machineToken behavior; apply
the same conditional update in the other similar block (the assignments around
the second occurrence covering the 2628-2630 region) so downstream authenticated
calls use the previously valid bearer token instead of an empty string.

2589-2642: ⚠️ Potential issue | 🟠 Major

Take a stable config snapshot before unlocking.

handleSessionHelloResponse merges into shared g.* state under g.mu, then unlocks and later validates/configures the provider from those same fields. In multi-endpoint mode, another hello can land in between and rewrite g.apiURL, g.machineSubnet, g.hostMapping, etc., so provider.Configure() can observe a mixed config assembled from multiple responses.

Suggested direction
  g.mu.Lock()
  ...
+ effectiveAPIURL := g.apiURL
+ effectiveOTLPURL := g.otlpURL
+ effectiveOTLPToken := g.otlpToken
+ effectiveHostMapping := append([]*pb.HostMapping(nil), g.hostMapping...)
+ effectiveEnvironment := append([]string(nil), g.hostEnvironment...)
+ effectiveSubnetRoutes := append([]string(nil), g.subnetRoutes...)
+ effectiveMachineSubnet := g.machineSubnet
+ effectiveHostname := g.sessionHostname
+ effectiveOrgID := g.sessionOrgID
+ effectiveMachineToken := g.machineToken
+ effectiveMachineID := g.machineID
+ effectiveMachineCertBundle := g.machineCertBundle
+ effectiveSSHPublicKey := append([]byte(nil), g.sshPublicKey...)
+ effectiveSigningPublicKey := append([]byte(nil), g.signingPublicKey...)
  g.mu.Unlock()

- if g.machineSubnet != "" {
- 	if _, _, err := net.ParseCIDR(g.machineSubnet); err != nil {
+ if effectiveMachineSubnet != "" {
+ 	if _, _, err := net.ParseCIDR(effectiveMachineSubnet); err != nil {
  		...
  	}
  }

  if err := g.provider.Configure(provider.Configuration{
- 	APIURL:            g.apiURL,
- 	SSHPublicKey:      g.sshPublicKey,
- 	TelemetryURL:      g.otlpURL,
- 	TelemetryAPIKey:   g.otlpToken,
- 	HostMapping:       g.hostMapping,
- 	Environment:       g.hostEnvironment,
- 	SubnetRoutes:      g.subnetRoutes,
- 	Hostname:          g.sessionHostname,
- 	OrgID:             g.sessionOrgID,
- 	MachineToken:      g.machineToken,
- 	MachineID:         g.machineID,
- 	MachineCertBundle: g.machineCertBundle,
- 	MachineSubnet:     g.machineSubnet,
- 	SigningPublicKey:  g.signingPublicKey,
+ 	APIURL:            effectiveAPIURL,
+ 	SSHPublicKey:      effectiveSSHPublicKey,
+ 	TelemetryURL:      effectiveOTLPURL,
+ 	TelemetryAPIKey:   effectiveOTLPToken,
+ 	HostMapping:       effectiveHostMapping,
+ 	Environment:       effectiveEnvironment,
+ 	SubnetRoutes:      effectiveSubnetRoutes,
+ 	Hostname:          effectiveHostname,
+ 	OrgID:             effectiveOrgID,
+ 	MachineToken:      effectiveMachineToken,
+ 	MachineID:         effectiveMachineID,
+ 	MachineCertBundle: effectiveMachineCertBundle,
+ 	MachineSubnet:     effectiveMachineSubnet,
+ 	SigningPublicKey:  effectiveSigningPublicKey,
  }); err != nil {

Also applies to: 2670-2700

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gravity/grpc_client.go` around lines 2589 - 2642, The code currently writes
multiple shared fields under g.mu then unlocks, allowing another hello to mutate
g.* before provider.Configure() reads them; fix by taking a stable snapshot of
all config values you will pass to provider.Configure (e.g., copy g.apiURL,
g.otlpURL, g.hostMapping, g.subnetRoutes, g.machineSubnet, g.sessionHostname,
g.sessionOrgID, g.machineToken, g.machineCertBundle, g.sshPublicKey,
g.signingPublicKey, etc.) into local variables or a small struct while still
holding g.mu, then unlock and call provider.Configure using that snapshot; apply
the same snapshotting change to the other similar block around the 2670-2700
region so provider.Configure always sees a consistent config.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@gravity/grpc_client.go`:
- Around line 2601-2633: The code currently uses proto3 scalar/repeated fields
(e.g., response.Environment, response.HostMapping, response.SubnetRoutes and
string fields assigned to g.otlpURL, g.otlpToken, g.apiURL, g.machineSubnet,
g.sessionHostname, g.sessionOrgID, g.machineToken, g.machineCertBundle) so it
cannot distinguish “empty” vs “unset”; update the protobuf and assignment logic
to carry presence: either add an explicit boolean signal in the
SessionHelloResponse (e.g., apply_full_state or clear_state) and when true apply
fields even if empty, or mark the relevant fields as proto3 optional (or wrapper
types) to get presence semantics, regenerate the Go protos, and then replace the
current zero-value guards with presence checks (use the optional/pointer
presence or the new signal) so empty values will clear previous g.* state when
intended.

---

Outside diff comments:
In `@gravity/grpc_client.go`:
- Around line 2592-2593: The code unconditionally overwrites
g.authorizationToken from response.MachineToken even when the response provides
an empty MachineToken; change the assignment so g.authorizationToken is only
updated if response.MachineToken is non-empty (i.e., check response.MachineToken
!= "" before setting g.authorizationToken) while still updating g.machineID and
preserving g.machineToken behavior; apply the same conditional update in the
other similar block (the assignments around the second occurrence covering the
2628-2630 region) so downstream authenticated calls use the previously valid
bearer token instead of an empty string.
- Around line 2589-2642: The code currently writes multiple shared fields under
g.mu then unlocks, allowing another hello to mutate g.* before
provider.Configure() reads them; fix by taking a stable snapshot of all config
values you will pass to provider.Configure (e.g., copy g.apiURL, g.otlpURL,
g.hostMapping, g.subnetRoutes, g.machineSubnet, g.sessionHostname,
g.sessionOrgID, g.machineToken, g.machineCertBundle, g.sshPublicKey,
g.signingPublicKey, etc.) into local variables or a small struct while still
holding g.mu, then unlock and call provider.Configure using that snapshot; apply
the same snapshotting change to the other similar block around the 2670-2700
region so provider.Configure always sees a consistent config.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 48135857-66eb-4814-9206-73f724e495ce

📥 Commits

Reviewing files that changed from the base of the PR and between 538fd27 and fa41af0.

📒 Files selected for processing (2)
  • gravity/gossip/types.go
  • gravity/grpc_client.go
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (3)
gravity/gossip/types.go (1)

44-53: Backward-compatible EndpointMapping extension.

omitempty keeps legacy payloads unchanged, and older JSON decoders will just ignore sni_hostname, so this is a safe wire-format addition.

gravity/grpc_client.go (2)

2169-2174: Nice guard before indexing control-stream state.

This turns the reconnection panic into a normal error path when startup hasn't populated the slices yet.


5887-5890: Nice empty/nil guard in the ping path.

This closes the remaining streams[0] panic window during reconnection.

Comment thread gravity/grpc_client.go
Comment on lines +2601 to +2633
if response.OtlpUrl != "" {
g.otlpURL = response.OtlpUrl
}
if response.OtlpKey != "" {
g.otlpToken = response.OtlpKey
}
if response.ApiUrl != "" {
g.apiURL = response.ApiUrl
}
if len(response.Environment) > 0 {
g.hostEnvironment = response.Environment
}
if len(response.HostMapping) > 0 {
g.hostMapping = response.HostMapping
}
if len(response.SubnetRoutes) > 0 {
g.subnetRoutes = response.SubnetRoutes
}
if response.MachineSubnet != "" {
g.machineSubnet = response.MachineSubnet
}
if response.Hostname != "" {
g.sessionHostname = response.Hostname
}
if response.OrgId != "" {
g.sessionOrgID = response.OrgId
}
if response.MachineToken != "" {
g.machineToken = response.MachineToken
}
if response.MachineCertBundle != "" {
g.machineCertBundle = response.MachineCertBundle
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Generated SessionHelloResponse struct =="
sed -n '/type SessionHelloResponse struct {/,/^}/p' gravity/proto/gravity_session.pb.go

echo
echo "== Protobuf source, if present =="
fd 'gravity_session.proto' gravity -x sed -n '/message SessionHelloResponse {/,/^}/p' {}

Repository: agentuity/go-common

Length of output: 4564


Cannot distinguish between empty and omitted fields in protobuf response.

The SessionHelloResponse fields being checked (Environment, HostMapping, SubnetRoutes, and all strings) are plain proto3 scalars and repeated fields with no presence metadata. In proto3, an empty value (empty slice, empty string) is indistinguishable from an unset field, collapsing both to zero values. This means the current logic cannot clear stale state from prior sessions when a field is legitimately sent as empty—the code will skip applying empty values, leaving old data intact.

To fix this, either:

  1. Add an explicit signal field indicating whether the full response should be applied or only non-empty fields preserved, or
  2. Mark fields as optional in the protobuf definition (proto3.12+ feature) to introduce presence semantics.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gravity/grpc_client.go` around lines 2601 - 2633, The code currently uses
proto3 scalar/repeated fields (e.g., response.Environment, response.HostMapping,
response.SubnetRoutes and string fields assigned to g.otlpURL, g.otlpToken,
g.apiURL, g.machineSubnet, g.sessionHostname, g.sessionOrgID, g.machineToken,
g.machineCertBundle) so it cannot distinguish “empty” vs “unset”; update the
protobuf and assignment logic to carry presence: either add an explicit boolean
signal in the SessionHelloResponse (e.g., apply_full_state or clear_state) and
when true apply fields even if empty, or mark the relevant fields as proto3
optional (or wrapper types) to get presence semantics, regenerate the Go protos,
and then replace the current zero-value guards with presence checks (use the
optional/pointer presence or the new signal) so empty values will clear previous
g.* state when intended.

jhaynie added 6 commits April 18, 2026 20:48
…lients

Prevent index-out-of-range panics during reconnection when gRPC
connections haven't been established yet. The contexts/cancels slices
are allocated from g.connections length, but the loop iterates
g.sessionClients which could be longer.
Guard ALL slice accesses (contexts, cancels, controlStreams, controlSendMu)
against concurrent modification during reconnection. Log warning when
index exceeds capacity to aid debugging.
@jhaynie jhaynie merged commit 526b6b6 into main Apr 19, 2026
5 checks passed
@jhaynie jhaynie deleted the fix/preserve-session-config branch April 19, 2026 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant