Skip to content

[hackathon] UDF Copilot: schema-aware AI in the Python UDF editor#5081

Open
mengw15 wants to merge 3 commits into
apache:mainfrom
mengw15:hackathon-1
Open

[hackathon] UDF Copilot: schema-aware AI in the Python UDF editor#5081
mengw15 wants to merge 3 commits into
apache:mainfrom
mengw15:hackathon-1

Conversation

@mengw15
Copy link
Copy Markdown
Contributor

@mengw15 mengw15 commented May 15, 2026

Demo

Screen.Recording.2026-05-15.at.4.48.45.PM.mov

The problem

Open a Python UDF in Texera today and the editor greets you with an empty file. You don't know what's in tuple_. You don't know which columns the upstream operator emits, what their types are, or what a sample row looks like. So you guess — write tuple_["name"], run, watch it crash on KeyError, flip back to the upstream operator's output tab to remind yourself of the real column name, edit, run, crash again on a type mismatch. When something does break at runtime, the traceback lands in a separate panel disconnected from the editor; you read it, decode it, fix by hand. And when your code starts producing a new column, you have to remember to also declare it in the operator's "Extra Output Columns" property — forget, and the execution fail.

Summary

  • Schema-aware AI assistant embedded in the Python UDF Monaco editor — four interaction modes (ghost text, Cmd+K rewrite, Quick Fix on errors, side chat) plus a dataflow context panel.
  • AI sees the upstream schema, a real sample row, and the downstream consumer's expected schema — so suggestions reference actual column names and types instead of generic placeholders.
  • Auto-detects schema drift between UDF code and the operator's Extra Output Columns; one-click sync via regex fast-path or AI deep-analysis (handles add / remove / type-update / retainInputColumns).

What's in this PR

  • Monaco integrations: registerInlineCompletionsProvider (ghost text + column dropdown), addAction (Cmd+K rewrite, Fix-with-AI), registerCodeActionProvider (Pyright lightbulb), side panel for chat.
  • New agent-service router under /api/udf-copilot/: /complete, /chat, /rewrite, /fix, /sync-schema, /sample-capture, /sample-row. Diagnose-then-fix prompt with 3-way classification (UDF code error vs API-contract violation vs framework error). Output validation + one-shot retry for known anti-patterns (yield tuple_["x"] scalar yield, .items() on Tuple).
  • Python worker hook (amber/.../data_processor.py): captures the first input tuple per UDF and asynchronously POSTs to agent-service so the AI gets real data even for workflows where no operator is paginated.
  • "Fix with AI" buttons on console + error panels that auto-open the editor and pre-fill the traceback. Cross-component flow via UdfCopilotService.requestFixAndOpen.
  • Reindent-after-Accept so AI output lands at the right indent level relative to the surrounding selection.

Test plan

  • Open a Python UDF; type tuple_[" — column-name dropdown appears with all upstream columns
  • Type after tuple_["a"] > — ghost text suggests a value-aware threshold based on the sample row
  • Select a line, Cmd+K, "add None handling" — preview shows diff, Accept lands code at correct indent
  • Add tuple_["foo"] = 1 — yellow banner shows + foo:integer; Sync writes to Extra Output Columns
  • Remove that line — banner shows − foo (strikethrough); Sync removes from property panel
  • Click the 🔍 audit icon — AI-driven schema analysis runs and proposes outputColumns + retainInputColumns
  • Run a workflow with tuple_.items() bug; in Console tab click red "Fix with AI" button next to the error title; editor auto-opens, Fix overlay pre-filled with traceback, AI rewrites to as_key_value_pairs()
  • Side chat: ask "what columns do I have?" — AI quotes the real schema and sample values

🤖 Generated with [Claude Code]

mengw15 and others added 3 commits May 14, 2026 19:14
Restores gui.conf, llm.conf, storage.conf, udf.conf, user-system.conf, and
docker-compose.yml to upstream defaults. The leaked values (LiteLLM master
key, local DB credentials, Google OAuth client ID, host paths) were
introduced in 0d9a128 ("init"). Local overrides should be supplied via
the existing `${?VAR_NAME}` environment indirection in each .conf file
instead.

Note: the secret in the prior commit remains in git history. The LiteLLM
key must be rotated independently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added python frontend Changes related to the frontend GUI agent-service labels May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-service frontend Changes related to the frontend GUI python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant