Skip to content

feat: spawner app + native uv support + deploy lifecycle fix#52

Closed
dgokeeffe wants to merge 32 commits intodatasciencemonkey:mainfrom
dgokeeffe:feat/opencode-fork-gh-latency
Closed

feat: spawner app + native uv support + deploy lifecycle fix#52
dgokeeffe wants to merge 32 commits intodatasciencemonkey:mainfrom
dgokeeffe:feat/opencode-fork-gh-latency

Conversation

@dgokeeffe
Copy link
Copy Markdown
Contributor

@dgokeeffe dgokeeffe commented Mar 10, 2026

Summary

  • Spawner app (spawner/): One-click provisioning UI that creates personal coding-agents instances for any developer. Admin bootstraps once; users paste a PAT and get their own app deployed from a shared template.
  • Native uv support: Replaced requirements.txt/pip with pyproject.toml + uv.lock for both the main app and spawner, enabling the platform's native uv sync + uv run flow.
  • Deploy lifecycle fix: Discovered that Databricks Apps deploy API requires compute_status == ACTIVE (~80s after app creation), not app_status == RUNNING. Added polling + bumped gunicorn timeout to 300s to handle the full provision flow.
  • Misc fixes: Template app.yaml upload (pipe→file), deploy error surfacing, WebSocket transport detection, supply chain hardening.

Test plan

  • Spawner deployed and RUNNING on daveok workspace
  • Full provision flow tested via /api/provision — all 6 steps complete in ~114s
  • Spawned app reaches RUNNING and passes health check
  • Main coding-agents app deployed with native uv support

This pull request was AI-assisted by Isaac.

@dgokeeffe dgokeeffe changed the title feat: OpenCode fork, GitHub CLI, latency fix feat: OpenCode fork integration, GitHub CLI, perf fixes Mar 10, 2026
@dgokeeffe
Copy link
Copy Markdown
Contributor Author

Updated: Spawner admin bootstrap + self-service provisioning

New commits add the full spawner app with:

  • Admin bootstrap — workspace admin PAT handles secret scopes, ACLs, deploy
  • SCIM identity — resolves PAT owner to derive coding-agents-{username} app name
  • Secret resource in app creation — no separate PATCH needed
  • UUID secret keys — each PAT stored with unique key
  • Spawned apps dashboard/api/apps + UI table
  • Makefile with deploy/redeploy/run-polling targets
  • README documenting architecture, token model, and deploy steps

datasciencemonkey added a commit that referenced this pull request Mar 11, 2026
…nCode

OpenCode intermittently sends empty text content blocks in messages, which
Databricks Foundation Model API strictly rejects with "text content blocks
must be non-empty" (OpenCode #5028). This adds a LiteLLM proxy running on
localhost:4000 inside the container that strips these blocks before they
reach the API.

Simpler alternative to PR #52's fork approach — no fork maintenance, proven
fix via LiteLLM PR #20384, preserves full AI Gateway/MLflow/UC governance.

Changes:
- setup_litellm.py: new setup script, starts LiteLLM proxy with health check
- setup_opencode.py: route baseURL through localhost:4000 instead of direct
- app.py: add litellm setup step (sequential, before parallel agent setup)
- requirements.txt: add litellm>=1.60
- docs/plans: design document with analysis of PR #52 trade-offs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
datasciencemonkey added a commit that referenced this pull request Mar 11, 2026
Socket.IO reports connected=true even when falling back to HTTP
long-polling through the Databricks Apps reverse proxy. The app was
prematurely stopping the poll-worker, leaving users with no data
transport when true WebSocket wasn't available.

Now checks socket.io.engine.transport.name before deciding:
- 'websocket' → stop poll-worker, use WS as primary
- 'polling' → keep poll-worker active as primary transport
- Listen for late 'upgrade' event if transport upgrades later

Cherry-picked from PR #52 (dgokeeffe).

Fixes #54

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
datasciencemonkey added a commit that referenced this pull request Mar 11, 2026
… (#59)

Socket.IO reports connected=true even when falling back to HTTP
long-polling through the Databricks Apps reverse proxy. The app was
prematurely stopping the poll-worker, leaving users with no data
transport when true WebSocket wasn't available.

Now checks socket.io.engine.transport.name before deciding:
- 'websocket' → stop poll-worker, use WS as primary
- 'polling' → keep poll-worker active as primary transport
- Listen for late 'upgrade' event if transport upgrades later

Cherry-picked from PR #52 (dgokeeffe).

Fixes #54

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dgokeeffe and others added 16 commits March 13, 2026 16:00
- Install OpenCode from dgokeeffe/opencode fork with native Databricks
  provider (auto-discovers models, shares Claude Code skills)
- Add GitHub CLI (gh) setup with xterm.js-safe auth wrapper
- Reduce select() timeout 500ms→50ms and poll interval 100ms→50ms
- Add Makefile for deployment automation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace single global sessions_lock block in get_output_batch() with
3-step resolve/swap/join pattern matching get_output(). Snapshot session
dict in cleanup_stale_sessions() to iterate with per-session locks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Socket.IO sets connected=true even when falling back to its own
long-polling (Databricks Apps proxy blocks WS upgrade). This stopped
the fast poll-worker, routing all output through slow long-polling.

Now checks socket.io.engine.transport.name and only stops poll-worker
when transport is true 'websocket'. Also listens for late upgrades.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy prd-writer, test-generator, implementer, and build-feature agent
definitions to ~/.claude/agents/ during setup. Stripped model overrides
so agents inherit the Databricks model serving endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Admin token handles privileged ops (secret scopes, ACLs, deploy)
- User PAT creates app (ownership) + stored as runtime secret
- SCIM /Me resolves PAT owner to derive app name
- Secret resource included in app creation (no separate PATCH)
- Each PAT stored with unique UUID key
- /api/apps endpoint lists all spawned coding-agents apps
- Makefile for deploy/redeploy with run polling
- README documenting architecture and token model

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Switch to uv run gunicorn in app.yaml (10-100x faster installs)
- Delete requirements.txt, use pyproject.toml + uv.lock exclusively
- Require Python >=3.12, add missing deps (gunicorn, flask-socketio)
- Run setup scripts via uv run python for consistent Python 3.12 env
- Strip DATABRICKS_TOKEN whitespace at startup to fix auth failures
- Add ~/.local/bin to PATH in _run_step for uv/tool discovery

Co-authored-by: Isaac
Pipe-based workspace import writes empty content. Write to temp file
first, then import with --file flag.

Co-authored-by: Isaac
New apps are UNAVAILABLE until first deploy, so waiting for
RUNNING causes a deadlock. Retry the deploy call with backoff.

Co-authored-by: Isaac
Deploy API requires compute_status=ACTIVE (~80s after app creation).
Gunicorn timeout bumped to 300s to handle the full provision flow.

Co-authored-by: Isaac
@dgokeeffe dgokeeffe force-pushed the feat/opencode-fork-gh-latency branch from d2fb8f5 to 98b68cf Compare March 13, 2026 10:53
@dgokeeffe dgokeeffe changed the title feat: OpenCode fork integration, GitHub CLI, perf fixes feat: spawner app + native uv support + deploy lifecycle fix Mar 13, 2026
Falls back to direct model serving when no AI Gateway is configured.

Co-authored-by: Isaac
Prevents fitAddon.fit() from thrashing scroll position on every
resize pixel. Adds explicit scrollback and scrollOnUserInput.

Co-authored-by: Isaac
Previously skipped install if binary existed, leaving stale versions
across redeployments.

Co-authored-by: Isaac
Random UUID secret keys caused re-provisions to store the PAT under a
new key while the app still referenced the old one. Users had to enter
their PAT twice because the first attempt's secret was orphaned.

Co-authored-by: Isaac
Secret values stored via `echo | databricks secrets put-secret`
include a trailing newline, causing invalid Authorization headers.

Co-authored-by: Isaac
The Databricks Apps API returns state under `app_status`, not `status`.
This caused the early-exit check to never detect running apps, and the
spawned apps table to always show UNKNOWN.

Co-authored-by: Isaac
Provision runs in a background thread so the endpoint returns
immediately. UI polls /api/provision-status every 3s showing
step-by-step progress with checkmarks. Apps table auto-refreshes
every 10s and shows in-flight provisions. Supports multiple
concurrent provisions.

Co-authored-by: Isaac
The list apps endpoint doesn't return app_status. Derive state from
compute_status and active_deployment.status instead.

Co-authored-by: Isaac
xterm.js intercepts Ctrl+V/C as raw control characters on non-Mac
platforms. Added attachCustomKeyEventHandler to let the browser handle
Ctrl+V (paste), Ctrl+C (copy when text selected), and Ctrl+Shift+C/V.
Also added clipboard section to shortcuts help with platform-aware
labels (Cmd on Mac, Ctrl on Windows) and upload toast for image paste.

Co-authored-by: Isaac
Runtime image ships an older CLI (v0.251.0). Added a setup step that
fetches the latest release from GitHub API and installs it to
~/.local/bin, same pattern as the GitHub CLI install.

Co-authored-by: Isaac
@dgokeeffe
Copy link
Copy Markdown
Contributor Author

Closing in favor of focused PRs: #68 (OpenCode/perf/WebSocket), #69 (TDD subagents), #70 (native uv), #71 (spawner), #72 (misc fixes)

@dgokeeffe dgokeeffe closed this Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant