Skip to content

feat(providers): add gateway-owned credential refresh for short-lived provider tokens #1306

@johntmyers

Description

@johntmyers

Problem Statement

OpenShell needs first-class gateway-owned credential refresh for providers that use short-lived access tokens. The current provider model can inject static credentials into sandbox traffic, and recent provider work added provider profiles, attach/detach, custom profiles, and provider environment revision polling for running sandboxes. However, OpenShell does not yet own the lifecycle for refresh material such as OAuth refresh tokens, client secrets, or service-account private keys.

This blocks long-running sandboxes that depend on providers such as Microsoft Graph / Azure / Entra-backed APIs, Google Workspace, Google Vertex AI, IBM watsonx, ChatGPT/OpenAI OAuth-style integrations, and other enterprise SSO-bound APIs. Access tokens commonly expire during a sandbox session. Users should not have to restart sandboxes, expose refresh material inside the sandbox, or build one-off host refresh daemons.

Refresh material must stay in the OpenShell control plane / gateway. Sandboxes should continue to see only placeholder values or proxy-injected short-lived access tokens, and should observe refreshed access tokens through the existing provider environment revision path.

Proposed Design

Add gateway-owned provider credential refresh while keeping placeholder-based credential injection as the delivery path for this first chunk.

The core model should be:

  • Store refresh material in a separate non-injectable refresh-state object type in the existing objects table.
  • Keep Provider.credentials as the injectable current credential map used by GetSandboxProviderEnvironment.
  • Add refresh metadata to provider profile credential declarations so profiles can describe how a credential is renewed.
  • Add a gateway refresh worker that reads refresh-state objects, refreshes due credentials, writes the resulting short-lived access token into the owning provider's injectable credential key, and updates the provider object so the provider environment revision changes.
  • Rely on the existing sandbox provider-env revision polling and ProviderCredentialState mechanics so running sandboxes observe refreshed credentials without restarting.
  • Keep refresh tokens, client secrets, service-account JSON, private keys, and equivalent refresh material out of Provider.credentials and out of GetSandboxProviderEnvironment.

Initial refresh strategies should include:

  • static: no active minting. Provider credential updates are picked up through the current provider environment revision path.
  • external: OpenShell does not mint tokens. An external process updates the provider credential store, and running sandboxes pick up the update through the revision path.
  • oauth2_refresh_token: gateway refreshes an access token using stored refresh-token material.
  • oauth2_client_credentials: gateway mints access tokens using client credentials, covering Microsoft S2S / Entra-style flows.
  • google_service_account_jwt: gateway signs a JWT assertion using stored service-account material and exchanges it for a Google access token, covering Vertex/GCP service-account use cases.

The first implementation should prioritize Microsoft S2S / Entra, Google OAuth refresh-token, and Google service-account / Vertex flows. The design should leave room for IBM watsonx IAM, AWS STS, OIDC variants, and other future refresh strategies without coupling them to the first implementation.

User-facing UX should include a way to inspect refresh status, force rotation, and update refresh configuration without exposing refresh material in normal provider output. Candidate CLI shape:

openshell provider refresh-status <provider>
openshell provider rotate <provider> [--credential <name>]
openshell provider refresh-config <provider> ...

Provider create/update should accept refresh material separately from injectable credentials, so users cannot accidentally place refresh tokens or service-account keys in the sandbox-visible credential map.

Related Requirements

Alternatives Considered

  • Store refresh tokens directly in Provider.credentials. Rejected because Provider.credentials is the injectable map returned through GetSandboxProviderEnvironment; refresh material must not enter the sandbox.
  • Wait for profile-side/proxy-side credential injection before refresh. Rejected because placeholder-based injection can already deliver refreshed access tokens, and short-lived token demand is immediate.
  • Require host-side refresh daemons. Rejected because this duplicates provider-specific logic, fails for long-running autonomous sandboxes, and moves lifecycle ownership outside OpenShell.
  • Add a new database table for refresh state. Rejected for this scope; profile and provider follow-up work has kept provider storage in the existing objects table.
  • Implement a single provider-specific refresher first. Rejected because Microsoft S2S, Google OAuth, and Google service-account flows are already known requirements and need a shared scheduler/state model.

Agent Investigation

The current repo already has much of the sandbox-side propagation path needed for refreshed credentials:

  • Provider.credentials is the current secret map on provider records.
  • GetSandboxProviderEnvironment resolves attached providers into an environment map and returns provider_env_revision.
  • crates/openshell-sandbox/src/provider_credentials.rs maintains revision-scoped credential snapshots and keeps recent generations so existing placeholders continue to resolve.
  • The sandbox settings poll loop refreshes provider environment when provider_env_revision changes and installs the new environment into ProviderCredentialState.
  • The proxy resolves placeholders per request through SecretResolver, so updated provider credential values can affect later proxied requests without mutating already-running process environments.
  • Provider profiles currently define credentials, endpoints, binaries, category, and inference capability, but they do not yet define refresh metadata.
  • Custom provider profiles are stored in the existing objects table; this issue should use the same persistence boundary for refresh-state objects rather than adding new tables.
  • Existing OpenShell CLI OIDC login code already contains OAuth refresh-token helper logic for gateway authentication, but provider refresh needs separate gateway-side storage, scheduling, status, and provider credential updates.

Non-Goals

  • Do not replace placeholder-based credential injection in this issue.
  • Do not implement profile-side/proxy-side credential injection in this issue.
  • Do not expose refresh tokens, client secrets, service-account JSON, or private keys through provider env resolution.
  • Do not add new database tables.
  • Do not make already-running process environments mutable.
  • Do not implement every possible provider strategy in the first PR; keep the strategy interface extensible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:cliCLI-related workarea:gatewayGateway server and control-plane workarea:inferenceInference routing and configuration workarea:supervisorProxy and routing-path workstate:review-readyReady for human reviewtest:e2eRequires end-to-end coverage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions