Skip to content

feat(server): declare gRPC auth (mode + scope + role) at the handler, enforce at the router#1596

Merged
TaylorMutch merged 7 commits into
NVIDIA:mainfrom
mrunalp:feat/per-handler-rpc-auth
May 27, 2026
Merged

feat(server): declare gRPC auth (mode + scope + role) at the handler, enforce at the router#1596
TaylorMutch merged 7 commits into
NVIDIA:mainfrom
mrunalp:feat/per-handler-rpc-auth

Conversation

@mrunalp
Copy link
Copy Markdown
Collaborator

@mrunalp mrunalp commented May 27, 2026

Summary

Move gRPC auth metadata (auth mode, Bearer scope, required role) from four
hand-maintained constants into per-handler #[rpc_auth(...)] annotations
generated by a new proc macro, and close the asymmetric router enforcement
where a Principal::User could reach handlers intended for sandbox
supervisors. A descriptor-set-driven exhaustiveness test pins the surface
so a new RPC can't silently fall back to openshell:all.

Related Issue

Fixes #1586

Changes

  • New proc-macro crate openshell-server-macros exposing #[rpc_authz]
    (impl-level) and #[rpc_auth] (per-method) attributes. First proc macro
    in the workspace; intentionally small and focused on auth metadata.
  • Per-handler annotations on every RPC in OpenShellService and
    InferenceService, declaring auth = "unauthenticated" | "sandbox" | "bearer" | "dual" and (when bearer / dual) the required scope and
    role. The macro derives canonical gRPC paths from the proto service
    name + PascalCased method name, so paths cannot drift from the proto.
  • Aggregator + lookup in auth/method_authz.rs exposing
    lookup, required_scope, required_role, is_unauthenticated,
    is_sandbox_callable, and the new is_user_callable. Single source of
    truth for the four old constants.
  • Router-side AuthMode check in multiplex.rs: Principal::User is
    now rejected with PermissionDenied: this method requires a sandbox principal for methods declared sandbox (and for unauthenticated
    methods which would already short-circuit). Mirrors the existing
    is_sandbox_callable check on Principal::Sandbox. Closes the gap
    where GetSandboxProviderEnvironment, ReportPolicyStatus,
    SubmitPolicyAnalysis, PushSandboxLogs, ConnectSupervisor, and
    RelayStream were reachable by a user token because their handlers
    use ensure_sandbox_scope (which intentionally lets users through) or
    no guard at all.
  • Deleted SCOPED_METHODS and ADMIN_METHODS from auth/authz.rs,
    UNAUTHENTICATED_METHODS from auth/oidc.rs, and
    ALLOWED_SANDBOX_METHODS from auth/sandbox_methods.rs. Their public
    predicates (is_unauthenticated_method, is_sandbox_callable) now
    delegate to the aggregator; the existing unit tests keep passing.
    UNAUTHENTICATED_PREFIXES stays — prefix matching for
    /grpc.reflection.* and /grpc.health.* is structural, not per-method.
  • Compile-time enforcement. #[rpc_authz] fails compilation on:
    missing #[rpc_auth], scope/role on unauthenticated/sandbox
    methods, missing scope/role on bearer/dual methods, duplicate
    gRPC paths within a service, duplicate keys inside one #[rpc_auth],
    and invalid auth mode/role strings.
  • Descriptor-set exhaustiveness test. openshell-core/build.rs now
    calls tonic_build::configure().file_descriptor_set_path(...) and
    exposes openshell_core::FILE_DESCRIPTOR_SET. A new test in
    openshell-server enumerates every (service, method) from the
    descriptor and asserts it is covered exactly once by a MethodAuth
    entry. Catches new RPCs without annotations, stale annotations after
    renames, and duplicates across services.
  • Router regression test asserting an openshell-admin + openshell:all
    bearer user is denied on each sandbox-annotated method.

Auth model

Auth mode Principal::Sandbox accepted? Bearer accepted? Scope applies? Role applies?
unauthenticated n/a n/a no no
sandbox yes no no no
bearer no yes yes yes
dual yes yes Bearer path only Bearer path only

sandbox here refers to the per-sandbox gateway-minted JWT introduced in
#1404 — the old shared sandbox secret no longer exists. A handler
annotated sandbox authenticates as a specific Principal::Sandbox.

Backwards compatibility

Visible behavior changes for deployed gateways:

  • Six methods (GetSandboxProviderEnvironment, ReportPolicyStatus,
    SubmitPolicyAnalysis, PushSandboxLogs, ConnectSupervisor,
    RelayStream) start rejecting Bearer users at the router. Nothing in
    the CLI or any user-facing flow calls them; only sandbox supervisors
    do, via the per-sandbox JWT path.
  • A handful of provider-profile methods and ExecSandboxInteractive that
    previously fell back to openshell:all now have explicit scope/role
    declarations. openshell:all tokens still work; provider:read-only
    tokens gain access to ListProviderProfiles / GetProviderProfile.

Testing

  • mise run pre-commit passes (rust:lint, rust:format, rust:check,
    helm:lint, helm:docs:check — all green; the only mise run ci failures
    are markdown lint errors in pre-existing files outside this branch).
  • Unit tests added/updated:
    • auth::method_authz::tests — three exhaustiveness tests
      (every_proto_rpc_has_an_annotation,
      every_annotated_path_matches_a_real_rpc,
      no_duplicate_paths_across_services) plus user_callable_matches_auth_mode.
    • multiplex::tests::auth_router::user_principal_is_denied_on_sandbox_only_methods
      — proves the router rejects admin + openshell:all on all nine
      sandbox-only methods.
    • Existing tests in authz.rs, oidc.rs, sandbox_methods.rs,
      multiplex.rs continue to pass against the new aggregator.
  • E2E tests added/updated (if applicable): N/A — no new e2e harness
    added in this PR; existing e2e:kubernetes smoke run continues to
    exercise the auth path through the gateway.
  • Full workspace test suite: 2509 tests pass, 0 failures.

Manual verification

Verified end-to-end against a local OIDC + Keycloak deployment.

Method Before this PR After this PR
GetSandboxProviderEnvironment NotFound: sandbox not found (handler reached, store queried) PermissionDenied: this method requires a sandbox principal
ReportPolicyStatus InvalidArgument: sandbox_id is required PermissionDenied
SubmitPolicyAnalysis InvalidArgument: name is required PermissionDenied
PushSandboxLogs {}stream accepted, push succeeded PermissionDenied
ConnectSupervisor InvalidArgument: expected SupervisorHello (bidi opened) PermissionDenied
RelayStream InvalidArgument: first RelayFrame must be init… (bidi opened) PermissionDenied
IssueSandboxToken / RefreshSandboxToken / GetInferenceBundle PermissionDenied (handler-level guard) PermissionDenied (router-level)

Positive paths (reader token can ListSandboxes, writer token reaches
CreateSandbox validation, admin + provider:write can CreateProvider
/ ListProviders / DeleteProvider, scope-only and role-only denials
return the existing AuthzPolicy messages) are unchanged.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@mrunalp mrunalp requested review from a team, derekwaynecarr and maxamillion as code owners May 27, 2026 16:42
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@TaylorMutch
Copy link
Copy Markdown
Collaborator

/ok to test 5de68c4

mrunalp added 4 commits May 27, 2026 10:21
Move scope, role, and auth-mode metadata to the handler definition site
via #[rpc_authz] + #[rpc_auth] proc macros. The previously hand-maintained
SCOPED_METHODS, ADMIN_METHODS, UNAUTHENTICATED_METHODS, and
ALLOWED_SANDBOX_METHODS tables are now generated from per-method
annotations on the tonic service impls, with canonical gRPC paths
derived from the service name and method name.

Adds a new openshell-server-macros proc-macro crate, an aggregator in
auth/method_authz.rs, and an exhaustiveness test that decodes the
protobuf FileDescriptorSet (now emitted by openshell-core/build.rs) and
verifies every RPC has an annotation.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
PR NVIDIA#1404 replaced the shared sandbox secret with per-sandbox
gateway-minted JWTs. A handler marked `sandbox` now authenticates as a
specific `Principal::Sandbox`, not as a holder of a shared credential.

Rename `auth = "sandbox-secret"` to `auth = "sandbox"` and
`AuthMode::SandboxSecret` to `AuthMode::Sandbox` so the name matches
the post-NVIDIA#1404 identity model.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Addresses review feedback on the per-handler auth-annotation work.

- Router-level enforcement of #[rpc_auth] auth mode (HIGH). The previous
  router only checked is_sandbox_callable() for Principal::Sandbox; user
  principals still flowed into AuthzPolicy::check() and bypassed the
  per-handler declaration. A user with `openshell:all` could therefore
  reach `sandbox`-only handlers like GetSandboxProviderEnvironment,
  ReportPolicyStatus, PushSandboxLogs, and SubmitPolicyAnalysis even
  though their annotations said sandbox-only. Adds an
  is_user_callable() predicate and rejects User principals at the
  router for `sandbox` / `unauthenticated` methods.

- Proc macro now errors on duplicate keys in #[rpc_auth(...)] (LOW). A
  second `auth`, `scope`, or `role` previously silently overwrote the
  first value; now it fails to compile.

- Regression tests: a unit test for is_user_callable() and a router
  test that proves a user with admin role + openshell:all cannot reach
  the nine sandbox-only handlers.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
…hz doc comments

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
@mrunalp mrunalp force-pushed the feat/per-handler-rpc-auth branch from 5de68c4 to 0f54133 Compare May 27, 2026 17:22
Comment thread crates/openshell-server/src/inference.rs Outdated
Comment thread crates/openshell-server-macros/src/lib.rs Outdated
mrunalp added 2 commits May 27, 2026 11:19
The stub was a safety net that fired only when a method had
`#[rpc_auth(...)]` without an enclosing `#[rpc_authz]`. Triggering it
required `rpc_auth` to be imported, which is why both call sites carried
`#[allow(unused_imports)] use openshell_server_macros::{rpc_auth, rpc_authz};`.

Drop the stub and the unused-import workaround. A missing `#[rpc_authz]`
now surfaces as rustc's standard "cannot find attribute `rpc_auth` in
this scope" — clear enough, and one fewer import + lint exception.

Addresses review comment on PR NVIDIA#1596.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
The previous trait-derived const name turned `OpenShell` into
`OPEN_SHELL_AUTH_METADATA`, splitting the project name across an
underscore. Each impl already lives in its own module
(`crate::grpc::`, `crate::inference::`), so the module path is enough
to disambiguate between services — a fixed `AUTH_METADATA` name reads
more naturally.

Aggregator in `auth/method_authz.rs` now references
`crate::grpc::AUTH_METADATA` and `crate::inference::AUTH_METADATA`
directly.

Addresses review comment on PR NVIDIA#1596.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
@TaylorMutch
Copy link
Copy Markdown
Collaborator

/ok to test a8f611e

@TaylorMutch TaylorMutch added the test:e2e Requires end-to-end coverage label May 27, 2026
@github-actions
Copy link
Copy Markdown

Label test:e2e applied for a8f611e. Open the existing run and click Re-run all jobs to execute with the label set. The run will execute the standard E2E suite after building the required gateway and supervisor images once. The matching required CI gate status on this PR will flip green automatically once the run finishes.

Comment thread crates/openshell-server-macros/src/lib.rs Outdated
TaylorMutch
TaylorMutch previously approved these changes May 27, 2026
OpenShell is one word; reference name in the doc should be
OPENSHELL_AUTH_METADATA, not OPEN_SHELL_AUTH_METADATA.

Addresses review nit on PR NVIDIA#1596.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
@TaylorMutch
Copy link
Copy Markdown
Collaborator

/ok to test cd2669f

@TaylorMutch TaylorMutch merged commit 3f520dd into NVIDIA:main May 27, 2026
37 checks passed
mrunalp added a commit to mrunalp/OpenShell that referenced this pull request May 28, 2026
…ust clients

Addresses three findings from the branch review (`feat-python-sdk-bearer-auth-review.md`):

Finding 1 (HIGH): HTTPS OIDC gateways without a full mTLS bundle were
falling back to `grpc.insecure_channel`. Made `TlsConfig.ca_path`,
`cert_path`, and `key_path` all optional with the cert/key pair
required-together-or-not-at-all, so callers can express:

- Full mTLS (all three): server trusts client identity.
- CA-only (`ca_path` only): custom CA trust, no client identity.
- System roots (`TlsConfig()`): OS trust store; the right default for
  OIDC gateways behind a public CA.

`from_active_cluster` now mirrors `crates/openshell-tui/src/lib.rs`
`build_oidc_channel`: for any `https://` gateway, always build a
secure channel and pick the strongest TLS profile available
(mTLS → CA-only → system roots).

Finding 2 (MEDIUM): `from_active_cluster` snapshotted the access token
once. Replaced with `_make_cluster_bearer_provider`, a per-RPC closure
that re-reads `oidc_token.json` each call. A long-lived
`SandboxClient` now picks up rotations performed by `openshell gateway
login` without being reconstructed. Provider fails closed with
`SandboxError` (and a "re-authenticate with: openshell gateway login"
hint) when the token file is missing, malformed, or expired.

Finding 3 (MEDIUM): `from_active_cluster` was attaching bearer
metadata whenever `oidc_token.json` existed, even for gateways
registered as `mtls` or `plaintext`. Now gates on
`metadata.json.auth_mode == "oidc"`, matching
`crates/openshell-cli/src/main.rs` and the TUI.

Test coverage expands 14 → 23: HTTPS-OIDC-without-mTLS, CA-only
layout, stale-token-with-wrong-auth-mode, per-RPC reload, expired
token rejection, missing-file rejection, partial-TlsConfig validation,
and the existing channel/interceptor matrix.

Verified end-to-end against the OpenShift Keycloak deployment:
positive admin / reader paths, scope enforcement, and PR NVIDIA#1596
sandbox-principal gate all still pass via the SDK.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(server): declare gRPC auth (mode + scope + role) at the handler, enforce at the router

2 participants