feat: Python backend (appkit-py) #274
feat: Python backend (appkit-py) #274jamesbroadhead wants to merge 13 commits intodatabricks:mainfrom
Conversation
Some serverless warehouses only support ARROW_STREAM with INLINE disposition, but the analytics plugin only offered JSON_ARRAY (INLINE) and ARROW_STREAM (EXTERNAL_LINKS). This adds a new "ARROW_STREAM" format option that uses INLINE disposition, making the plugin compatible with these warehouses. Fixes databricks#242
Tests verify: - ARROW_STREAM format passes INLINE disposition + ARROW_STREAM format - ARROW format passes EXTERNAL_LINKS disposition + ARROW_STREAM format - Default JSON format does not pass disposition or format overrides
The server-side ARROW_STREAM format added in the previous commit was not exposed to the frontend or typegen: - Add "ARROW_STREAM" to AnalyticsFormat in appkit-ui hooks - Add "arrow_stream" to DataFormat in chart types - Handle "arrow_stream" in useChartData's resolveFormat() - Make typegen resilient to ARROW_STREAM-only warehouses by retrying DESCRIBE QUERY without format when JSON_ARRAY is rejected Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
…compatibility ARROW_STREAM with INLINE disposition is the only format that works across all warehouse types, including serverless warehouses that reject JSON_ARRAY. Change the default from JSON to ARROW_STREAM throughout: - Server: defaults.ts, analytics plugin request handler - Client: useAnalyticsQuery, UseAnalyticsQueryOptions, useChartData - Tests: update assertions for new default JSON and ARROW formats remain available via explicit format parameter. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
When using the default ARROW_STREAM format, the analytics plugin now automatically falls back through formats if the warehouse rejects one: ARROW_STREAM → JSON → ARROW. This handles warehouses that only support a subset of format/disposition combinations without requiring users to know their warehouse's capabilities. Explicit format requests (JSON, ARROW) are respected without fallback. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
Previously, _transformDataArray unconditionally called updateWithArrowStatus for any ARROW_STREAM response, which discards inline data and returns only statement_id + status. This was designed for EXTERNAL_LINKS (where data is fetched separately) but broke INLINE disposition where data is in data_array. Changes: - _transformDataArray now checks for data_array before routing to the EXTERNAL_LINKS path: if data_array is present, it falls through to the standard row-to-object transform. - JSON format now explicitly sends JSON_ARRAY + INLINE rather than relying on connector defaults. This prevents the connector default format from leaking into explicit JSON requests. - Connector defaults reverted to JSON_ARRAY for backward compatibility with classic warehouses (the analytics plugin sets formats explicitly). - Added connector-level tests for _transformDataArray covering ARROW_STREAM + INLINE, ARROW_STREAM + EXTERNAL_LINKS, and JSON_ARRAY paths. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
Some serverless warehouses return ARROW_STREAM + INLINE results as base64 Arrow IPC in `result.attachment` rather than `result.data_array`. This adds server-side decoding using apache-arrow's tableFromIPC to convert the attachment into row objects, producing the same response shape as JSON_ARRAY regardless of warehouse backend. This abstracts a Databricks internal implementation detail (different warehouses returning different response formats) so app developers get a consistent `type: "result"` response with named row objects. Changes: - Add apache-arrow@21.1.0 as a server dependency (already used client-side) - _transformDataArray detects `attachment` field and decodes via tableFromIPC - Connector tests use real base64 Arrow IPC captured from a live serverless warehouse, covering: classic JSON_ARRAY, classic EXTERNAL_LINKS, serverless INLINE attachment, data_array fallback, and edge cases Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
Python implementation of the AppKit backend using FastAPI, providing the same HTTP API surface as the TypeScript version for all plugins: analytics (SSE query streaming), files (11 endpoints), and genie (3 SSE endpoints). Includes full test suite (48 unit + 41 integration tests), SSE streaming infrastructure with reconnection support, contextvars-based user context, interceptor chain (retry/timeout/cache), and Databricks SDK connector wiring. Co-authored-by: Isaac
- Fix path traversal in SPA static file serving (use resolve() + prefix check) - Fix upload endpoint OOM: stream body with running size counter - Fix CacheInterceptor to actually use TTL (was storing forever) - Fix StreamManager reconnection: persist EventRingBuffer per stream_id - Fix _UserContextProxy: only wrap async methods, leave sync methods alone - Fix _load_query path traversal: reject /, \, .. in query_key - Fix Content-Disposition header injection: sanitize filename - Fix format_buffered_event: apply sanitize_event_type on replay - Fix ruff target-version to match requires-python (py312) - Fix __main__.py: load dotenv, use APPKIT_HOST env var - Add abort_all() implementation to StreamManager Co-authored-by: Isaac
…es, path traversal - Fix OBO: create per-request WorkspaceClient from x-forwarded-access-token instead of reusing global service-principal client for all routes - Fix ARROW format: use EXTERNAL_LINKS disposition and emit arrow event with statement_id (matching TS FORMAT_CONFIGS) - Fix SQL connector: check for FAILED/CANCELED/CLOSED states after polling and raise with error message instead of returning empty result - Fix FilesConnector.resolve_path: reject path traversal (..) sequences - Update all file/genie endpoints to use per-request user client Co-authored-by: Isaac
…aceId - Add pyarrow-based Arrow IPC attachment decoding (decode_arrow_attachment) matching TS _transformArrowAttachment for serverless warehouse support - Implement get_arrow_data: download external link chunks via httpx - Use transform_result() in analytics handler for unified result processing - Add maxSize enforcement to FilesConnector.read() - Auto-inject workspaceId parameter in process_query_params when query references :workspaceId - Add pyarrow and httpx to runtime dependencies Co-authored-by: Isaac
…→ ARROW) Mirrors the TS _executeWithFormatFallback: when the default ARROW_STREAM format is rejected by a warehouse (classic warehouses don't support INLINE + ARROW_STREAM), automatically falls back through JSON then ARROW. Verified working against live Databricks SQL Warehouse. Co-authored-by: Isaac
Extract monolithic server.py into proper Plugin subclasses: - AnalyticsPlugin: SQL query execution with format fallback, query file loading - FilesPlugin: 11 routes with volume discovery, path validation, OBO - GeniePlugin: 3 SSE routes with space alias resolution - ServerPlugin: orchestrates plugin mounting, static serving, shutdown Add create_app() factory matching TS createApp(): - Plugin phase ordering (core → normal → deferred) - WorkspaceClient injection into plugins - Plugin exports for programmatic API (appkit.analytics.query(...)) - Client config aggregation from all plugins Plugin base class now has: - execute() with interceptor chain (timeout → retry → cache) - execute_stream() for SSE responses - route() helper for endpoint registration and tracking - to_plugin() factory matching TS toPlugin() server.py is now a thin wrapper: create plugins → create_app() → return app. All 89 tests pass. Live Databricks queries verified. Co-authored-by: Isaac
Review: Python Backend (appkit-py)Reviewed by nitpicker (multi-model debate: Claude Opus + Gemini Pro + Gemini Flash, 3 rounds) and manually. Verdict: do not merge. Multiple critical bugs, security flaws, and pervasive JS-style patterns. Critical Bugs1. Infinite recursion in interceptor chainFile: All lambdas capture Fix: Use from functools import partial
current = partial(interceptor.intercept, prev)2.
|
Summary
Python implementation of the AppKit backend using FastAPI, providing the same HTTP API surface as the TypeScript version. Drop-in replacement that works with the existing React frontend (
appkit-ui).Depends on #273 (inline Arrow IPC support) — merge that first.
What's included
connectSSE()WorkspaceClientfromx-forwarded-*headers viacontextvarsasyncio.wait_for), TTL cachedatabricks-sdk+pyarrowfor SQL warehouse, files, genie_executeWithFormatFallback)<script id="__appkit__">config injectionSecurity (ACE multi-model review: GPT 5.4 + Gemini 3.1 Pro + Claude)
Running
Test plan
Related
This pull request was AI-assisted by Isaac.