Skip to content

Durable execution for agent-openai-advanced + agent-langgraph-advanced#195

Open
dhruv0811 wants to merge 39 commits intomainfrom
dhruv0811/durable-execution-templates
Open

Durable execution for agent-openai-advanced + agent-langgraph-advanced#195
dhruv0811 wants to merge 39 commits intomainfrom
dhruv0811/durable-execution-templates

Conversation

@dhruv0811
Copy link
Copy Markdown
Contributor

@dhruv0811 dhruv0811 commented Apr 20, 2026

Summary

Wires agent-openai-advanced + agent-langgraph-advanced + the shared e2e-chatbot-app-next frontend to the durable-execution contract in databricks-ai-bridge PR #416 (ML-64230).

Agent code stays unchanged. All durability logic lives in the bridge or in the chatbot proxy.

Template changes

  • pyproject.toml — pin databricks-ai-bridge + integration package to the bridge PR branch (revert to registry once released).
  • start_server.py — raise databricks_ai_bridge logger to LOG_LEVEL so [durable] messages surface in app logs. The LongRunningAgentServer subclass already exists on main.
  • No changes to agent.py. Read-time repair happens inside AsyncDatabricksSession.get_items() (openai) and _repair_loaded_checkpoint_tuple wrapping the checkpointer (langgraph) — both in the bridge.

Chatbot proxy (e2e-chatbot-app-next/server/src/index.ts)

The Express /invocations handler rewrites streaming POSTs into the bridge's background-mode contract and transparently resumes on upstream drops. Zero client-side changes.

  • Rewrite POST /invocations {stream: true} → backend {background: true, stream: true}
  • pumpStream forwards SSE frames to the browser; three pure helpers (parseSseFrame, extractResponseId, isTerminalErrorFrame) classify each frame
  • On upstream close without [DONE], the loop reconnects via GET /responses/{id}?stream=true&starting_after={lastSeq}, capped at 10 attempts
  • Short-circuits on task_failed / task_timeout terminal errors

AI SDK provider (packages/ai-sdk-providers/src/request-context.ts)

  • New getApiProxyUrl() helper resolves the proxy URL:
    1. explicit API_PROXY env var wins
    2. DATABRICKS_SERVING_ENDPOINT set → direct-endpoint mode, no proxy
    3. default → route via this Node server's own /invocations (advanced-template convention)
  • Advanced templates no longer declare API_PROXY / AGENT_BACKEND_URL in databricks.yml / app.yaml — the defaults live in chatbot code.

Testing

  • UI end-to-end on Claude via deployed agent-openai-advanced: multi-tool turns (get_current_time, get_weather, get_stock_price, deep_research) interrupted mid-stream via /_debug/kill_task/{id}. Durable resume inherits completed tool pairs and injects synthetic [INTERRUPTED] output for the killed call; the agent continues without re-running completed tools. Tool cards dedupe across attempts.
  • HTTP-only crash-and-recover loop (no browser) exits status=completed, attempt_number=2.
  • Local e2e suite green on openai-advanced[autoscaling] end-to-end. Other failures in the local run are environmental (Python 3.14 mlflow simulator compat, vector-embedding cold-start on langgraph LTM test) — not PR-caused.

How to test

cd agent-openai-advanced   # or agent-langgraph-advanced

uv run quickstart --profile <profile> --lakebase-provisioned-name <instance>

# Enable the debug kill endpoint in databricks.yml / app.yaml:
#   env:
#     - name: LONG_RUNNING_ENABLE_DEBUG_KILL
#       value: "1"

databricks bundle deploy --profile <profile>
databricks bundle run agent_openai_advanced --profile <profile>

# Grant Lakebase permissions to the app SP
SP=$(databricks apps get <app-name> --profile <profile> --output json | jq -r .service_principal_client_id)
DATABRICKS_CONFIG_PROFILE=<profile> uv run python scripts/grant_lakebase_permissions.py $SP

Mid-stream crash test (UI)

  1. Open the deployed app, send a long prompt (e.g. "Do deep_research on quantum computing basics").
  2. Tail app logs and grab response_id from [/invocations] background started response_id=resp_....
  3. While the response is still streaming:
    curl -sS -X POST -H "Authorization: Bearer $TOKEN" "$APP_URL/_debug/kill_task/$RID"
  4. After the ~10s stale window, the UI continues from where it left off; completed tool calls don't re-run.

Mid-stream crash test (HTTP only)

RESP=$(curl -sS -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"input":[{"role":"user","content":"Write 500 words about Linux history."}],"background":true}' \
  "$APP_URL/responses")
RID=$(echo "$RESP" | jq -r .id)

sleep 3
curl -sS -X POST -H "Authorization: Bearer $TOKEN" "$APP_URL/_debug/kill_task/$RID"
sleep 12

for i in $(seq 1 15); do
  sleep 3
  curl -sS -H "Authorization: Bearer $TOKEN" "$APP_URL/responses/$RID" | jq '{status, attempt_number}'
done
# Expect: status=completed, attempt_number=2

Pre-merge checklist

  • Bridge PR #416 merged and a release cut
  • Revert pyproject.toml git-branch pins in both advanced templates to registry versions
  • Revert APP_TEMPLATES_BRANCH default in both scripts/start_app.py from dhruv0811/durable-execution-templates to main
  • Remove LONG_RUNNING_ENABLE_DEBUG_KILL=1 from deploy configs before production use

Pins both advanced templates to the ai-bridge PR branch so the long-running
agent server crash-resumes in-flight runs via heartbeat + CAS claim. Revert
the [tool.uv.sources] entry once that PR merges and a new release is cut.

Also fixes a latent IndexError in agent-openai-advanced's deduplicate_input:
when the long-running server re-invokes the handler with input=[] to resume
from the session (the agnostic resume contract validated by prototyping),
messages[-1] blew up. Now we return [] for empty input — the session already
has prior turns so there is nothing to dedupe.

No change to either template's agent.py.
Makes the bundled chat UI durable end-to-end without any client-side
changes. The Express /invocations proxy in e2e-chatbot-app-next now:

- Rewrites streaming POSTs to { ...body, background: true, stream: true },
  so every user turn persists each SSE event to Lakebase via
  LongRunningAgentServer.
- Sniffs response.id + sequence_number out of the forwarded SSE stream.
- If upstream closes before [DONE] (pod died, lost connection), the proxy
  transparently reconnects via
    GET /responses/{id}?stream=true&starting_after=N
  and resumes emitting events to the still-connected browser client. The
  browser sees one continuous stream.

Non-streaming requests and non-POST methods keep the original passthrough
behavior.

Also points agent-openai-advanced/scripts/start_app.py at the
dhruv0811/durable-execution-templates branch of app-templates so the new
proxy code is actually deployed (override via APP_TEMPLATES_BRANCH env
var). Revert once this lands on main.
… actually fires

Previous attempt left the proxy dead-code: the Node AI SDK honored API_PROXY
verbatim and sent requests straight to http://localhost:8000/invocations
(FastAPI), skipping the Express /invocations handler at :3000 entirely.
Confirmed in logs: requests reached the backend with {"stream": true}
but never with "background": true.

Split the two concerns across env vars:
  API_PROXY=http://localhost:3000/invocations  (AI SDK -> Express proxy)
  AGENT_BACKEND_URL=http://localhost:8000/invocations  (Express proxy -> FastAPI)

Express handler prefers AGENT_BACKEND_URL, falls back to API_PROXY for
backwards compat so existing templates don't break.
response_id is buried in the raw backend SSE stream and never surfaces to
the browser because the Vercel AI SDK re-wraps the stream as its own
message format before sending to the client. Log it on the server side
instead so test instructions can `grep 'background started response_id=' `
from apps logs. Also distinguish the startup log so it's clear the
durable-resume code path is live.

No behavior change; pure observability.
app.yaml env vars were overriding databricks.yml at runtime, so the AI SDK
was still talking directly to the Python FastAPI backend and the Express
/invocations proxy never saw the request. Keep both files in sync.
…RL to FastAPI

The script was unconditionally overwriting API_PROXY with the backend URL
right before launching the frontend, which defeated our whole durable-
resume-rewrite story: the Node AI SDK bypassed the Express /invocations
handler and streamed straight from FastAPI.

Fix: API_PROXY now points at CHAT_APP_PORT (the Express proxy), and we
default AGENT_BACKEND_URL (previously unset) to the Python backend. Use
os.environ.setdefault for AGENT_BACKEND_URL so operators can still override
via databricks.yml or app.yaml.
…resp_*

Broadens the response_id parser so it works whether the backend tags frames
with top-level response_id (preferred) or the older nested-only shape.
…tally

Matches the [/invocations] prefix so the full story is greppable from apps
logs without correlating Node and Python timestamps.
The library logger inherits from root (default WARNING) so INFO-level
lifecycle messages from LongRunningAgentServer (heartbeat, claim, resume,
stream lifecycle) were being dropped. Set both the ai-bridge logger and
the root level to LOG_LEVEL so apps logs carry the full durable-resume
story without requiring callers to tune logging themselves.
When a response is killed mid-stream, the partial assistant text that was
already rendered to the client kept receiving fresh deltas from attempt 2 —
users saw attempt-1-partial + attempt-2-full concatenated in one bubble.

Express /invocations proxy now seals the in-progress assistant message
across an attempt boundary:

1. On upstream close without [DONE], immediately append a
   '(connection interrupted — reconnecting…)' suffix delta to the active
   message so the user sees something is happening during the ~10s stale
   window.
2. On the response.resumed sentinel, emit synthetic
   response.content_part.done + response.output_item.done events for the
   active message — effectively ending the first assistant bubble at
   OpenAI Responses API level.
3. Attempt 2's natural response.output_item.added (with a fresh item_id)
   then creates a clean second bubble showing the full answer.

Tool calls naturally de-dup by call_id across attempts, so no closure
synthesis needed for them.

Also mirrors the routing + logging fixes previously applied to
agent-openai-advanced onto agent-langgraph-advanced so both templates get
durable resume with the full [durable] log lifecycle visible:

- app.yaml + databricks.yml: split API_PROXY (-> Express :3000) from
  AGENT_BACKEND_URL (-> FastAPI :8000).
- scripts/start_app.py: honor AGENT_BACKEND_URL, point API_PROXY at the
  Express proxy, clone e2e-chatbot-app-next from the durable-execution
  branch.
- agent_server/start_server.py: raise databricks_ai_bridge + root logger to
  LOG_LEVEL so [durable] INFO lines surface in apps logs.
Durable-resume can interrupt the pod between an LLM emitting tool_calls
and the SDK finishing the tool executions — the Session is left with
function_call items whose matching function_call_output never got written.
The next LLM request over that session fails:

  400 BAD_REQUEST: An assistant message with 'tool_calls' must be followed
  by tool messages responding to each 'tool_call_id'. The following
  tool_call_ids did not have response messages: call_xxx, call_yyy, ...

Piggy-back on deduplicate_input (which already touches the session each
turn) to inject synthetic function_call_output items for every orphan
function_call. Message is plain-text, so the LLM sees 'tool X was
interrupted, please retry if needed' and can decide whether to re-call
or continue. No change to agent.py.
The previous heal added synthetic function_call_output at the END of the
session (add_items only appends). When the conversation has a message
between the orphan function_call and the synthetic output, the SDK
rebuilds the LLM request as an assistant-with-tool_calls message that
doesn't have its tool responses right after it, and the API rejects with
'assistant message with tool_calls must be followed by tool messages'.

Also: the Vercel AI SDK client echoes the full conversation back each
turn. deduplicate_input drops most of it but the Runner.run path can
still re-persist prior items, leaving DUPLICATE function_call rows for
the same call_id.

Replace with a clear+rebuild sanitize pass: dedupe function_call /
function_call_output by call_id, inject synthetic outputs immediately
after any orphan function_call, clear the session, and re-add the
canonical sequence. No-op when already clean.
Keep the UI minimal but fix the doubled-text issue: when a mid-stream
kill happens, the AI SDK merges all deltas within one streamText call
into one UIMessage — so our proxy-level seal events were valid but
invisible, and attempt 2's text kept appending to attempt 1's partial.

Minimal solution:

1. Express /invocations proxy already emits response.resumed at the
   attempt boundary (unchanged).
2. chat.ts server: detect response.resumed via onChunk and forward it
   to the UI stream as { type: 'data-resumed', data: { attempt } }.
3. chat.tsx client: on 'data-resumed', call setMessages to drop all
   text parts from the last (assistant) message. Tool call parts stay
   because they dedupe by call_id naturally.

Also: fix auto-resume loop burning MAX_RESUME_ATTEMPTS on terminal
errors by exiting early when an error event with code=task_failed or
code=task_timeout comes through the proxy.

No changes to agent.py. Agnosticism tenet intact.
Your 'clean up at end of stream' idea — much more robust than relying on
mid-stream mutation sticking. On data-resumed we now snapshot the
attempt-1 text length, and in onFinish we slice exactly that many chars
off the front of the last assistant message's text parts. Whatever the AI
SDK accumulator did during streaming, the final rendered state contains
only attempt 2's content.

The mid-stream mutation wipe stays in place too — when it sticks the
text visibly clears during the 10s stale window, which is nicer UX than
waiting for onFinish. When it doesn't stick, onFinish catches it.
PreviewMessage is memoized: while loading it compares prevProps.message
to nextProps.message by reference; when not loading it deep-equals the
parts array (which short-circuits on identical references). Our previous
truncate mutated part.text in place and returned [...prev] — same
message + same parts array refs, so the memo skipped the re-render and
the old text stuck on screen even though state was technically updated.

Map to NEW part objects with sliced text and wrap a NEW message object
so both the reference check (loading path) and deep-equal (done path)
see a change and re-render.
State-level wipes were getting clobbered by the AI SDK accumulator —
ReactChatState.replaceMessage deep-clones state.message on every write(),
and activeTextParts keeps mutating the originals behind the UI's back.

Solution: transform at the VIEW layer instead of fighting the state
machine. Chat component tracks attempt1TextLen per messageId (state, not
ref, so it propagates to children). Messages maps each message through a
render-time slice that drops the leading attempt-1 chars from text parts
before passing to PreviewMessage. Creates new message + part objects so
the memo's reference check trips and the component re-renders.

onFinish still does the authoritative setMessages truncate so the
persisted-to-DB final message reflects only attempt 2. That truncate now
also clears attempt1TextLen, so the render-time slice becomes a no-op
after completion (state is already truncated).
…cution-templates

# Conflicts:
#	agent-openai-advanced/databricks.yml
Drop the [chat][onData] / [chat][onFinish] / [chat][onChunk] tracing
statements that were used to trace the attempt-1 → attempt-2 flow
while tuning the render-time slice and post-stream truncate. The
server-side Express proxy still logs resume lifecycle (background
started / resume fetch / terminal error / stream done) since that's
operationally useful; the ai-bridge backend's [durable] INFO logs
stay as-is.

Co-authored-by: Isaac
Move the per-template workarounds for mid-tool crash-resume into the
databricks-ai-bridge library and wire them in:

- agent-openai-advanced/utils.py: deduplicate_input now calls
  session.repair() (new public method on AsyncDatabricksSession) instead
  of the 100-line in-template _sanitize_session. Same behavior — dedupe
  function_call/function_call_output by call_id, inject synthetic
  outputs for orphans — just owned by the library.
- agent-langgraph-advanced/agent.py: before agent.astream, call
  build_tool_resume_repair on the checkpointer's messages and apply via
  agent.aupdate_state(..., as_node="tools"). The as_node is critical —
  without it LangGraph re-evaluates the model→{tools,END} branch from
  the updated state and crashes with KeyError: 'model'.
- agent-langgraph-advanced/agent.py: when the checkpointer already has
  a thread, only forward the latest user turn from request.input — the
  UI client (Vercel AI SDK) re-echoes the full history on every turn,
  which can re-inject orphan tool_uses from a previously-interrupted
  attempt that the client kept in its buffer.

Both pyproject.toml files now pin databricks-openai / databricks-langchain
to the same ai-bridge branch (subdirectory git sources) so the new
helpers are picked up. Temporary; revert to registry once the bridge PR
merges.

Co-authored-by: Isaac
Library side (databricks-langchain, PR #416):
- New build_tool_resume_repair_middleware() returns an AgentMiddleware whose
  before_model hook runs build_tool_resume_repair. Swaps the manual
  aget_state / aupdate_state(as_node="tools") surgery in the template for a
  one-line `middleware=[...]` arg to create_agent.
- The as_node="tools" footgun (KeyError: 'model' in the model→{tools,END}
  conditional branch re-eval) disappears entirely; repair runs inside the
  graph's own execution flow, not as external state surgery.

Template (agent-langgraph-advanced):
- init_agent: add middleware=[build_tool_resume_repair_middleware()] to
  create_agent. stream_handler drops the 8-line repair block.
- utils.py process_agent_astream_events: skip None node_data (the graph's
  updates stream emits {middleware_node: None} when the middleware is a
  no-op, which is every turn on the happy path).

UI (e2e-chatbot-app-next):
- On data-resumed from the backend, wipe text parts from the last assistant
  message in one setMessages. Tool-call parts are kept as-is (they already
  dedupe across attempts by call_id). Dropped:
    * attempt1TextLen state + per-message snapshot in onData
    * render-time text slice in Messages.tsx
    * onFinish authoritative post-stream truncate
  The AI SDK's seal-on-resume synthesis (Express proxy) still creates a
  fresh output_item_id for attempt 2, so new deltas land in a fresh text
  part — our wipe of the old text part is sufficient.

Net: -99 LOC across 4 files. Same behavior for the "delete old text,
leave tools alone" UX; substantially less state-machine choreography.

Co-authored-by: Isaac
setMessages can't wipe mid-stream — the AI SDK's activeResponse.state
is a snapshot taken at makeRequest time, and every text-delta calls
write() → this.state.replaceMessage(lastIdx, activeResponse.state.message),
which overwrites any setMessages we do. Our wipe was visible for a single
chunk then reverted.

Fix: snapshot the assistant message's parts.length at data-resumed, and
at render time hide text parts at indices BEFORE that cutoff. Tool / step
parts render normally at every index. Works for openai and langgraph
because it transforms at the view layer rather than fighting the AI SDK
state machine.

Removes server-side debug log. Keeps the minimal delete-old-text UX.

Co-authored-by: Isaac
…lper

- Removed the "_(connection interrupted — reconnecting…)_" delta block.
  Render-time slice hides attempt-1 text on resume anyway, so the suffix
  was invisible past the 10s stale window and too subtle during it.
- Extracted writeEvent(type, payload) helper; sealActiveMessage went from
  45 → 22 lines, no behavior change.
- Removed readActive() TS-widening helper (no longer needed without the
  suffix block).
- Inlined onFirstResponseId helper into its single call site.

Net: 92 lines removed, 36 added in this file.

Co-authored-by: Isaac
@dhruv0811 dhruv0811 marked this pull request as ready for review April 21, 2026 23:14
@dhruv0811 dhruv0811 requested a review from bbqiu April 21, 2026 23:20
Durability mechanics now live entirely in databricks-ai-bridge's
LongRunningAgentServer (rotate conv_id on resume + full-history input
sanitizer, see ai-bridge PR #416). Templates can drop the explicit repair
surface:

- agent-langgraph-advanced/agent.py: drop
  middleware=[build_tool_resume_repair_middleware()] from create_agent
  and the unused import. Also drop the stream_handler UI-echo dedupe
  block — the server sanitizer handles mid-history orphans end-to-end.
- agent-openai-advanced/utils.py: drop await session.repair() from
  deduplicate_input. session.repair() stays available as a public method
  for callers who want destructive session cleanup.

Net: agent.py / utils.py in both advanced templates have zero
durability-specific lines. The contract becomes "use our checkpointer/
session classes with LongRunningAgentServer — durable resume + orphan
repair is free."

Co-authored-by: Isaac
Temporarily short-circuit the resumeCutIndex write so attempt-1's text
stays visible while attempt-2 streams over it. Lets us see how the
server-side inheritance + synthetic-output prompt shape the LLM's
mid-turn continuation behavior without the visual wipe hiding what
attempt-2 actually emits.

Re-enable by uncommenting the block; the rest of the wipe plumbing
(state hook, Messages prop threading, render-time slice) is left in
place so re-enabling is a 1-line flip.

Co-authored-by: Isaac
…les resume

Server-side changes earlier in this branch (prior-attempt tool-event
inheritance + partial-stream reassembly in databricks-ai-bridge) make
the client-side "wipe attempt-1 text when resume fires" machinery
unnecessary: attempt-2's LLM sees attempt-1's work as history and
continues seamlessly instead of restarting. The wipe was also hiding
the new continuation quality from the user. Turning the wipe off in
UI testing confirmed the server-side story is sufficient.

Delete the full stack:

- packages/core/src/types.ts: drop `resumed` from CustomUIDataTypes.
- server/src/routes/chat.ts: drop writerRef + emittedResumedAttempts
  + the onChunk raw-event branch that emitted data-resumed parts.
  Trace-extraction stays; only the resume-forwarding path is removed.
- client/src/components/chat.tsx: drop resumeCutIndex state hook, the
  data-resumed onData handler (was already commented out), and the
  prop pass to <Messages/>.
- client/src/components/messages.tsx: drop resumeCutIndex prop from
  MessagesProps + its destructuring + the render-time text-part slice.

The server still emits `response.resumed` as a sentinel so the Express
proxy's sealActiveMessage() call correctly closes attempt-1's open
text part before attempt-2's fresh output_item.added creates a new
one. The proxy no longer extracts it into a UI data part.

Co-authored-by: Isaac
Remove everything that isn't strictly required for durable resume with
the server-side-only approach in ai-bridge PR #416:

- agent-langgraph-advanced/agent_server/agent.py: revert entirely. The
  test-scaffolding tools (get_weather, get_stock_price, deep_research)
  were only for crash-test harnesses; the asyncio import only existed
  to support them. User-space durability surface for this template is
  now zero lines.
- agent-openai-advanced/agent_server/agent.py: revert entirely. Drop
  the test-scaffolding tools (get_weather, get_stock_price,
  search_best_restaurants, deep_research) and asyncio import. Same
  zero-user-space result.
- agent-langgraph-advanced/agent_server/utils.py: revert. The
  "middleware nodes that no-op return None" guard was defensive
  against middleware we no longer install.
- agent-openai-advanced/agent_server/utils.py: revert. The empty-input
  guard was defensive against the old input=[] resume replay that no
  longer happens — server always replays the original input.
- e2e-chatbot-app-next/server/src/index.ts: drop the activeMessage /
  sealActiveMessage / writeEvent machinery. Was synthesizing closure
  events on response.resumed to seal attempt-1's text part for the UI
  wipe. UI wipe is gone; the AI SDK creates parts by item_id so
  attempt-2's fresh output_item.added naturally starts a new part and
  attempt-1's open part finalizes on stream end.
- Plus the earlier UI cleanup (chat.tsx, messages.tsx, types.ts,
  routes/chat.ts) that removed the data-resumed / resumeCutIndex
  plumbing.

Remaining essentials:
- agent_server/start_server.py: log-level setup so [durable] logs
  surface in app logs.
- scripts/start_app.py: API_PROXY / AGENT_BACKEND_URL wiring so the
  Node AI SDK routes streaming POSTs through the Express
  background-mode + auto-resume proxy. Clone-from-branch is marked
  TEMPORARY (revert when ai-bridge ships).
- pyproject.toml: databricks-ai-bridge git source pointer (TEMPORARY).
- e2e-chatbot-app-next/server/src/index.ts: background-mode rewrite +
  auto-resume proxy for the /invocations route.

Co-authored-by: Isaac
Infinite Stream Resume loop seen with Claude multi-tool turns via
durable retrieve. Root:

  - useChat's onStreamPart reset resumeAttemptCountRef on every chunk,
    so the 3-retry cap was only enforced when a stream ended empty.
    When Claude's provider failed to emit a clean `finish` UIMessageChunk
    at the end of the stream, lastPart.type !== 'finish' kept
    streamIncomplete = true. Each resume replayed the cached stream,
    delivered chunks, reset the counter to 0, onFinish fired without
    `finish`, looped.

Fix:

  - Remove the per-chunk reset in onStreamPart.
  - Reset only in prepareSendMessagesRequest when the last message is a
    user message (a genuine new turn). Tool-result continuations
    (non-user-message continuations) don't reset.
  - Cap stays at 3; after that, fetchChatHistory() pulls the
    DB-persisted state so the user sees the final assistant output
    instead of spinning forever.

Co-authored-by: Isaac
Final stable state for durable execution. End-to-end UI-validated
scenarios that now work:

  - Multi-tool turn interrupted mid-sequence, durable resume inherits
    completed tool pairs + narrative (reordered) + synthetic output
    for the interrupted call, agent continues from where it left off.
  - Text-only mid-stream crash, partial-text reassembly + Claude
    prefill → continuation.
  - Cross-turn recall after crash-and-resume (stable thread via read-
    time checkpoint repair on LangGraph / session auto-repair on
    OpenAI).
  - Multi-tool on GPT-5 + openai-agents (single-response-per-turn).

Template fix here: process_agent_stream_events now disambiguates by
(a) item.type bucket for delta routing and (b) call_id bucket for
multiple open function_calls. The original single curr_item_id bucket
worked for GPT-5's strictly serial events but collided on Claude's
interleaved + parallel tool-call events, which produced two items
sharing one id and broke the client's part tracking.

Pairs with databricks-ai-bridge PR #416 changes (rotate + replay +
full-history sanitizer + prior-attempt tool-pair inheritance +
narrative hoist + checkpoint read-time repair + session auto-repair).

Co-authored-by: Isaac
End-to-end UI test on Claude (via deployed agent-openai-advanced with
the updated databricks-ai-bridge) confirmed that the bridge-side
ordering fix (sanitizer + narrative hoist + tool-pair inheritance +
session auto-repair) is sufficient on its own. The two template-side
guards added in earlier commits are no longer needed:

- Revert 0ddbd60: `process_agent_stream_events` per-type + per-call-id
  id tracking. The single-bucket implementation handles Claude's
  interleaved + parallel tool-call events correctly now that the
  upstream ordering is clean.
- Revert 5f3c507: `chat.tsx` user-message-only resume-counter reset.
  Claude now emits a clean `finish` UIMessageChunk through the durable
  retrieve path, so the per-chunk reset no longer traps the 3-retry
  cap in an infinite loop.

Keeps the advanced templates lean — durability logic lives entirely in
databricks-ai-bridge (LongRunningAgentServer).

Co-authored-by: Isaac
Extract three pure helpers above the route handler so the SSE frame
loop reads like prose:

- parseSseFrame(frame): classifies a frame as done / passthrough / data.
- extractResponseId(payload): tolerates FastAPI's three response_id
  locations (response_id, response.id, top-level id with resp_ prefix).
- isTerminalErrorFrame(payload): detects task_failed / task_timeout so
  the resume loop can short-circuit.

pumpStream now just drives the reader + forwards bytes; the parsing
logic is testable in isolation and the handler body is substantially
shorter.

Co-authored-by: Isaac
Both advanced templates were setting these env vars to hard-coded
localhost URLs that match the bundled-process topology (Node on 3000,
FastAPI on 8000). The values are fixed by the templates themselves —
a customer deploying the advanced stack can't change them without
breaking the bundle. Making them required in yaml adds noise without
adding configurability.

Push the defaults into the chatbot:

- New ``getApiProxyUrl()`` helper in ``packages/ai-sdk-providers/src/
  api-proxy.ts`` resolves the effective proxy URL:
    1. explicit ``API_PROXY`` wins,
    2. ``DATABRICKS_SERVING_ENDPOINT`` set → direct-endpoint mode, no
       proxy,
    3. otherwise → ``http://localhost:${CHAT_APP_PORT|PORT|3000}/invocations``
      (advanced-template convention).
  Used from ``providers-server.ts`` and ``request-context.ts`` so both
  agree on proxy activation.

- ``server/src/index.ts`` defaults ``AGENT_BACKEND_URL`` to
  ``http://localhost:8000/invocations`` when unset. Explicit empty
  string still disables the ``/invocations`` proxy route.

- Drop the ``API_PROXY`` / ``AGENT_BACKEND_URL`` block (and its comment)
  from both advanced templates' ``app.yaml`` and ``databricks.yml``.

Preserves direct-serving-endpoint CUJs: when
``DATABRICKS_SERVING_ENDPOINT`` is set (basic chatbot deployments), the
AI SDK talks straight to the endpoint and never hits ``/invocations``.

Co-authored-by: Isaac
Prior cleanup commit dropped ``API_PROXY=http://localhost:8000/invocations``
from the advanced templates' ``app.yaml`` and ``databricks.yml``. That
line pre-existed on ``main``; the PR never meant to remove it. Scope of
the previous change was only the *newly-added* ``API_PROXY`` +
``AGENT_BACKEND_URL`` block that activated the Node proxy path.

Restore the four files to exactly match ``main``. The chatbot-side
``getApiProxyUrl()`` default only fires when ``API_PROXY`` is unset, so
users with main's explicit setting keep their existing behavior.

Co-authored-by: Isaac
Both helpers answer routing-decision questions for the provider layer
(proxy URL + context-injection gate), and the separate file wasn't
buying isolation — providers-server.ts already imports from
request-context.ts. One file, same logic.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant