Skip to content

feat: bundle React frontend in Python wheel (Streamlit-style)#275

Open
jamesbroadhead wants to merge 14 commits intodatabricks:mainfrom
jamesbroadhead:feat/bundle-frontend-in-wheel
Open

feat: bundle React frontend in Python wheel (Streamlit-style)#275
jamesbroadhead wants to merge 14 commits intodatabricks:mainfrom
jamesbroadhead:feat/bundle-frontend-in-wheel

Conversation

@jamesbroadhead
Copy link
Copy Markdown
Contributor

Summary

  • Adds a standalone React frontend (packages/appkit-py/frontend/) that dynamically discovers enabled plugins from window.__appkit__ config — pages for analytics, files, and genie appear automatically based on what the Python backend exposes
  • Pre-built static assets are bundled into the Python wheel via pyproject.toml package-data, so pip install appkit-py gives Python devs a working UI without needing Node.js/npm (same pattern as Streamlit)
  • ServerPlugin._find_static_dir() now falls back to the bundled appkit_py/static/ when no user-provided frontend directory (e.g., client/dist) exists

Details

  • scripts/build_frontend.sh builds appkit-ui then compiles the Vite+React app into src/appkit_py/static/ (run at release time, not by end users)
  • Frontend is ~2.1MB total (React, appkit-ui, all page components)
  • User-provided static directories still take priority — the bundled frontend is only a fallback
  • Unit tests added for static file discovery logic (51 unit tests pass)

Test plan

  • npm run build produces correct output in src/appkit_py/static/
  • pytest tests/unit/ — all 51 tests pass
  • Manual: run python -m appkit_py without a client/dist dir and verify the bundled UI loads
  • Manual: run with a client/dist dir and verify it takes priority over bundled assets
  • Verify wheel includes static assets: pip wheel . && unzip -l *.whl | grep static

Depends on #274.

This pull request was AI-assisted by Isaac.

Some serverless warehouses only support ARROW_STREAM with INLINE
disposition, but the analytics plugin only offered JSON_ARRAY (INLINE)
and ARROW_STREAM (EXTERNAL_LINKS). This adds a new "ARROW_STREAM"
format option that uses INLINE disposition, making the plugin
compatible with these warehouses.

Fixes databricks#242
Tests verify:
- ARROW_STREAM format passes INLINE disposition + ARROW_STREAM format
- ARROW format passes EXTERNAL_LINKS disposition + ARROW_STREAM format
- Default JSON format does not pass disposition or format overrides
The server-side ARROW_STREAM format added in the previous commit was
not exposed to the frontend or typegen:

- Add "ARROW_STREAM" to AnalyticsFormat in appkit-ui hooks
- Add "arrow_stream" to DataFormat in chart types
- Handle "arrow_stream" in useChartData's resolveFormat()
- Make typegen resilient to ARROW_STREAM-only warehouses by
  retrying DESCRIBE QUERY without format when JSON_ARRAY is rejected

Co-authored-by: Isaac
Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
…compatibility

ARROW_STREAM with INLINE disposition is the only format that works
across all warehouse types, including serverless warehouses that reject
JSON_ARRAY. Change the default from JSON to ARROW_STREAM throughout:

- Server: defaults.ts, analytics plugin request handler
- Client: useAnalyticsQuery, UseAnalyticsQueryOptions, useChartData
- Tests: update assertions for new default

JSON and ARROW formats remain available via explicit format parameter.

Co-authored-by: Isaac
Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
When using the default ARROW_STREAM format, the analytics plugin now
automatically falls back through formats if the warehouse rejects one:
ARROW_STREAM → JSON → ARROW.

This handles warehouses that only support a subset of format/disposition
combinations without requiring users to know their warehouse's
capabilities. Explicit format requests (JSON, ARROW) are respected
without fallback.

Co-authored-by: Isaac
Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
Previously, _transformDataArray unconditionally called updateWithArrowStatus
for any ARROW_STREAM response, which discards inline data and returns only
statement_id + status. This was designed for EXTERNAL_LINKS (where data is
fetched separately) but broke INLINE disposition where data is in data_array.

Changes:
- _transformDataArray now checks for data_array before routing to the
  EXTERNAL_LINKS path: if data_array is present, it falls through to the
  standard row-to-object transform.
- JSON format now explicitly sends JSON_ARRAY + INLINE rather than relying
  on connector defaults. This prevents the connector default format from
  leaking into explicit JSON requests.
- Connector defaults reverted to JSON_ARRAY for backward compatibility with
  classic warehouses (the analytics plugin sets formats explicitly).
- Added connector-level tests for _transformDataArray covering ARROW_STREAM
  + INLINE, ARROW_STREAM + EXTERNAL_LINKS, and JSON_ARRAY paths.

Co-authored-by: Isaac
Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
Some serverless warehouses return ARROW_STREAM + INLINE results as base64
Arrow IPC in `result.attachment` rather than `result.data_array`. This adds
server-side decoding using apache-arrow's tableFromIPC to convert the
attachment into row objects, producing the same response shape as JSON_ARRAY
regardless of warehouse backend.

This abstracts a Databricks internal implementation detail (different
warehouses returning different response formats) so app developers get a
consistent `type: "result"` response with named row objects.

Changes:
- Add apache-arrow@21.1.0 as a server dependency (already used client-side)
- _transformDataArray detects `attachment` field and decodes via tableFromIPC
- Connector tests use real base64 Arrow IPC captured from a live serverless
  warehouse, covering: classic JSON_ARRAY, classic EXTERNAL_LINKS,
  serverless INLINE attachment, data_array fallback, and edge cases

Co-authored-by: Isaac
Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
Python implementation of the AppKit backend using FastAPI, providing
the same HTTP API surface as the TypeScript version for all plugins:
analytics (SSE query streaming), files (11 endpoints), and genie
(3 SSE endpoints). Includes full test suite (48 unit + 41 integration
tests), SSE streaming infrastructure with reconnection support,
contextvars-based user context, interceptor chain (retry/timeout/cache),
and Databricks SDK connector wiring.

Co-authored-by: Isaac
- Fix path traversal in SPA static file serving (use resolve() + prefix check)
- Fix upload endpoint OOM: stream body with running size counter
- Fix CacheInterceptor to actually use TTL (was storing forever)
- Fix StreamManager reconnection: persist EventRingBuffer per stream_id
- Fix _UserContextProxy: only wrap async methods, leave sync methods alone
- Fix _load_query path traversal: reject /, \, .. in query_key
- Fix Content-Disposition header injection: sanitize filename
- Fix format_buffered_event: apply sanitize_event_type on replay
- Fix ruff target-version to match requires-python (py312)
- Fix __main__.py: load dotenv, use APPKIT_HOST env var
- Add abort_all() implementation to StreamManager

Co-authored-by: Isaac
…es, path traversal

- Fix OBO: create per-request WorkspaceClient from x-forwarded-access-token
  instead of reusing global service-principal client for all routes
- Fix ARROW format: use EXTERNAL_LINKS disposition and emit arrow event
  with statement_id (matching TS FORMAT_CONFIGS)
- Fix SQL connector: check for FAILED/CANCELED/CLOSED states after polling
  and raise with error message instead of returning empty result
- Fix FilesConnector.resolve_path: reject path traversal (..) sequences
- Update all file/genie endpoints to use per-request user client

Co-authored-by: Isaac
…aceId

- Add pyarrow-based Arrow IPC attachment decoding (decode_arrow_attachment)
  matching TS _transformArrowAttachment for serverless warehouse support
- Implement get_arrow_data: download external link chunks via httpx
- Use transform_result() in analytics handler for unified result processing
- Add maxSize enforcement to FilesConnector.read()
- Auto-inject workspaceId parameter in process_query_params when query
  references :workspaceId
- Add pyarrow and httpx to runtime dependencies

Co-authored-by: Isaac
…→ ARROW)

Mirrors the TS _executeWithFormatFallback: when the default ARROW_STREAM
format is rejected by a warehouse (classic warehouses don't support
INLINE + ARROW_STREAM), automatically falls back through JSON then ARROW.
Verified working against live Databricks SQL Warehouse.

Co-authored-by: Isaac
Extract monolithic server.py into proper Plugin subclasses:
- AnalyticsPlugin: SQL query execution with format fallback, query file loading
- FilesPlugin: 11 routes with volume discovery, path validation, OBO
- GeniePlugin: 3 SSE routes with space alias resolution
- ServerPlugin: orchestrates plugin mounting, static serving, shutdown

Add create_app() factory matching TS createApp():
- Plugin phase ordering (core → normal → deferred)
- WorkspaceClient injection into plugins
- Plugin exports for programmatic API (appkit.analytics.query(...))
- Client config aggregation from all plugins

Plugin base class now has:
- execute() with interceptor chain (timeout → retry → cache)
- execute_stream() for SSE responses
- route() helper for endpoint registration and tracking
- to_plugin() factory matching TS toPlugin()

server.py is now a thin wrapper: create plugins → create_app() → return app.
All 89 tests pass. Live Databricks queries verified.

Co-authored-by: Isaac
Python developers can now `pip install appkit-py` and get a working
frontend without needing Node.js/npm. The pre-built React app (using
appkit-ui components) is included as static assets in the wheel and
served automatically when no user-provided frontend directory is found.

- Add frontend/ with standalone Vite+React app that dynamically
  discovers enabled plugins from window.__appkit__ config
- Add scripts/build_frontend.sh to compile frontend at release time
- Update pyproject.toml with package-data for static/**/*
- Update ServerPlugin._find_static_dir() to fall back to bundled assets
- Add MANIFEST.in for source distribution support
- Add unit tests for static file discovery logic

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant