Skip to content

Remoting: long-poll wait endpoint (with session ownership enforcement) #1474

@michaellwest

Description

@michaellwest

Problem

Wait-RemoteScriptSession and Wait-RemoteSitecoreJob poll by re-submitting a PowerShell scriptblock to the remoting endpoint every N seconds. Each poll pays the full HTTP + JWT validation + policy scan + runspace setup overhead and burns one request against the API Key's throttle budget.

Two real-world consequences:

  1. Rate-limit pressure. Polling a long job at 1-second intervals against a key with RequestLimit=60/minute exhausts the budget in 60 seconds. The wait silently fails or surfaces as a 429 after the fix in feature/stream-fix.
  2. ConstrainedLanguage incompatibility. Wait-RemoteSitecoreJob's polling script uses [Sitecore.Jobs.JobManager]::GetJob(...) and [Sitecore.Handle]::Parse(...) - .NET static method invocations that ConstrainedLanguage blocks regardless of AllowedCommands. Operators with restrictive policies cannot use the cmdlet.

A third concern surfaced while reviewing this: the existing per-poll script path also doesn't enforce session ownership. ScriptSessionManager.GetSession(sessionId, ...) returns whatever session matches the id; any authenticated caller who guesses or observes a session id can attach to a session another identity created. Pre-existing gap, but the new long-poll endpoint would inherit it. Fixing it alongside.

Proposed design

New wire route: GET /-/script/wait/

Added to the existing RemoteScriptCall.ashx.cs dispatcher. No new handler file, no new config, no new assembly - ships with the normal SPE deployment pipeline.

GET /-/script/wait/?sessionId=X&jobId=Y&jobType=scriptsession|sitecore&timeoutSeconds=30

Auth: same JWT + API Key flow as every other remoting request (AuthenticateRequest). Throttle: counts as one request against the key's budget regardless of hold time.

Response (200 regardless of isDone):

{ \"isDone\": true, \"status\": \"Idle|Busy|Done|Failed|NotFound\", \"name\": \"...\", \"elapsedSeconds\": 17 }

401 / 403 / 429 on auth / policy / throttle failures - same as existing endpoints.

Implementation notes

  • Handler becomes async (HttpTaskAsyncHandler). Existing sync routes still work - ProcessRequestAsync delegates to the existing ProcessRequest for anything that isn't the new route.
  • Internal 200 ms Task.Delay poll loop inside ProcessWaitAsync, checking IJobManager.GetJob(handle) or ScriptSessionManager.GetSession(id, ...) for state transitions.
  • Not event-based. Subscribing to JobManager / ScriptSessionManager internals is tight coupling across the Spe.Abstractions / Spe.Sitecore92 boundary; 200 ms polling inside one async handler is already sub-second to the caller without paying per-call HTTP overhead.
  • Max hold time: 60 s. timeoutSeconds clamped to 1..60. Client loops on timeout.
  • Uniform response for unknown ids: {\"status\":\"NotFound\",\"isDone\":true} with 200. Prevents session-id enumeration via 404 vs 200 probing.

Client changes

  • New internal helper modules/SPE/Invoke-RemoteWait.ps1: builds the GET URL, issues the request, handles the 404 fallback signal.
  • Wait-RemoteScriptSession: uses Invoke-RemoteWait for polling; keeps the existing Invoke-RemoteScript { Receive-ScriptSession } for receive-after-done.
  • Wait-RemoteSitecoreJob: uses Invoke-RemoteWait only. No more .NET static calls in a scriptblock - closes the CLM gap.
  • Back-compat fallback: on 404 from the wait endpoint, fall back to the legacy per-poll scriptblock path for that session. One verbose log line per session so operators see the downgrade.

Session ownership enforcement (expanded scope)

  • Capture creator identity at session creation time. Add CreatedByIdentity on ScriptSession or equivalent. For API Key auth, store the API Key name. For config-based shared-secret auth, store the username.
  • Verify on every subsequent session access (existing script-execute endpoint + new wait endpoint). Mismatch returns 403 with X-SPE-Restriction: session-not-owned.
  • Backward compatibility: sessions are in-memory, app-domain-bound. App recycle (automatic on SPE deploy) wipes them. No migration needed - pre-existing sessions are gone after the deploy that introduces this change. Clean break.

Acceptance criteria

  • GET /-/script/wait/ route responds under the existing auth / throttle / policy pipeline.
  • Wait-RemoteScriptSession completes a short-running -AsJob within expected time (no 429 on tight budgets).
  • Wait-RemoteSitecoreJob works under ConstrainedLanguage + narrow AllowedCommands policy (no scriptblock shipped).
  • Accessing a session created by identity A from identity B returns 403 on both the execute endpoint and the wait endpoint.
  • Old client against new server: existing per-poll flow still works (no regression).
  • New client against old server: 404 triggers automatic fallback with a single verbose log line per session.
  • Integration test: Wait-RemoteSitecoreJob against a Constrained policy (currently fails, should pass).
  • Unit test: mocked HttpMessageHandler verifies long-poll + 404 fallback + 429 retry paths.

Out of scope

  • Event-driven notification from JobManager (version-coupling).
  • WebSocket / SSE push (requires IIS WebSocket module - violates the constraint of no additional server-side install).
  • Combining wait + receive into one response (keeping the receive pipeline separate preserves its auditability).
  • New throttle bucket class for long-polls (existing bucket is fine).

Related

  • Follow-up to feature/stream-fix which added 429/503 retry, policy stream-baseline, and server-side stream capture.
  • Supersedes the tracking note under "Deferred UX / compat items" in .claude_worklog.md that flagged the Wait-RemoteSitecoreJob CLM incompatibility.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions