Skip to content

lakebox: read SSH gateway host from Sandbox response, cache per profile#5292

Merged
akshaysingla-db merged 1 commit into
databricks:demo-lakeboxfrom
akshaysingla-db:akshay/lakebox-gateway-host
May 22, 2026
Merged

lakebox: read SSH gateway host from Sandbox response, cache per profile#5292
akshaysingla-db merged 1 commit into
databricks:demo-lakeboxfrom
akshaysingla-db:akshay/lakebox-gateway-host

Conversation

@akshaysingla-db
Copy link
Copy Markdown
Collaborator

Consumes Sandbox.gateway_host (universe#1966484) so the CLI stops hardcoding regional defaults. Falls through to the existing resolveGatewayHost heuristic when the field is absent — safe to land against any manager build.

  • api.go: add gatewayHost to sandboxEntry and createResponse with omitempty so old/new wire shapes round-trip cleanly.

  • state.go: per-profile gatewayHosts map in lakebox.json, with getGatewayHost / setGatewayHost helpers that skip the write when the value is unchanged or empty (matches the existing setDefault pattern from f6f28eb).

  • ssh.go: 4-tier gateway resolution: --gateway flag → fresh API response (default's get or auto-create) → cached value for profile → workspace-host heuristic. Explicit-id ssh does a one-time get to warm the cache only when the cache is empty, so steady-state SSH against a known sandbox stays at zero round-trips.

  • list.go: after fetching all pages, validate the saved default still exists (clear + warn if not) and cache the gateway from any returned entry. Self-heals the cache on the cheapest user-driven path.

  • create.go, status.go, config.go, stop.go: opportunistically cache the gateway host from each Sandbox response.

Net wire change vs. today: zero extra round-trips once warm; one extra get on the very first explicit-id ssh per (machine × profile) until something else populates the cache. The hardcoded constants and resolveGatewayHost stay until every deployed manager stamps gateway_host — separate cleanup PR later.

Co-authored-by: Isaac

Changes

Why

Tests

Consumes `Sandbox.gateway_host` (universe#1966484) so the CLI stops
hardcoding regional defaults. Falls through to the existing
`resolveGatewayHost` heuristic when the field is absent — safe to land
against any manager build.

- api.go: add `gatewayHost` to `sandboxEntry` and `createResponse` with
  `omitempty` so old/new wire shapes round-trip cleanly.

- state.go: per-profile `gatewayHosts` map in `lakebox.json`, with
  `getGatewayHost` / `setGatewayHost` helpers that skip the write when
  the value is unchanged or empty (matches the existing `setDefault`
  pattern from f6f28eb).

- ssh.go: 4-tier gateway resolution: `--gateway` flag → fresh API
  response (default's `get` or auto-create) → cached value for profile
  → workspace-host heuristic. Explicit-id ssh does a one-time `get` to
  warm the cache only when the cache is empty, so steady-state SSH
  against a known sandbox stays at zero round-trips.

- list.go: after fetching all pages, validate the saved default still
  exists (clear + warn if not) and cache the gateway from any returned
  entry. Self-heals the cache on the cheapest user-driven path.

- create.go, status.go, config.go, stop.go: opportunistically cache the
  gateway host from each Sandbox response.

Net wire change vs. today: zero extra round-trips once warm; one extra
`get` on the very first explicit-id ssh per (machine × profile) until
something else populates the cache. The hardcoded constants and
`resolveGatewayHost` stay until every deployed manager stamps
`gateway_host` — separate cleanup PR later.

Co-authored-by: Isaac
@github-actions
Copy link
Copy Markdown
Contributor

Waiting for approval

Based on git history, these people are best suited to review:

  • @pietern -- recent work in cmd/lakebox/
  • @shuochen0311 -- recent work in cmd/lakebox/

Eligible reviewers: @andrewnester, @anton-107, @denik, @renaudhartert-db, @shreyas-goenka, @simonfaltum

Suggestions based on git history. See OWNERS for ownership rules.

@akshaysingla-db akshaysingla-db merged commit db3e3b9 into databricks:demo-lakebox May 22, 2026
13 of 20 checks passed
akshaysingla-db added a commit that referenced this pull request May 22, 2026
Two small additions surfaced during end-to-end testing of #5292.

1. `databricks lakebox start <id>` — wraps the existing StartSandbox
RPC. Symmetric to `stop`. Useful for pre-warming a sandbox before
connecting (the gateway also auto-starts on ssh, so this is mostly for
scripting).

2. `databricks lakebox ssh <unknown-id>` now fails with `no lakebox
named "..." — `databricks lakebox list` shows available IDs` instead of
falling through to ssh and surfacing the generic `Permission denied
(publickey)` from the gateway. Implemented by making the existing
pre-ssh `api.get` (added in #5292 for gateway discovery) treat
`apierr.ErrNotFound` as fatal. Non-NotFound errors still fall through so
transient API hiccups don't block a connection the gateway can still
route.

Co-authored-by: Isaac

## Changes
<!-- Brief summary of your changes that is easy to understand -->

## Why
<!-- Why are these changes needed? Provide the context that the reviewer
might be missing.
For example, were there any decisions behind the change that are not
reflected in the code itself? -->

## Tests
<!-- How have you tested the changes? -->

<!-- If your PR needs to be included in the release notes for next
release,
add a separate entry in NEXT_CHANGELOG.md as part of your PR. -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant