Skip to content

fix(vtex): read MESH_REQUEST_CONTEXT per-request, not from cached factory env#431

Merged
viktormarinho merged 2 commits intomainfrom
viktormarinho/vtex-tool-env-fix
May 6, 2026
Merged

fix(vtex): read MESH_REQUEST_CONTEXT per-request, not from cached factory env#431
viktormarinho merged 2 commits intomainfrom
viktormarinho/vtex-tool-env-fix

Conversation

@viktormarinho
Copy link
Copy Markdown
Contributor

@viktormarinho viktormarinho commented May 6, 2026

Summary

Switch every vtex tool's execute to read MESH_REQUEST_CONTEXT from runtimeContext.env (per-request, filled by the runtime's AsyncLocalStorage) instead of from the env captured in the factory closure. Affects:

  • server/lib/tool-adapter.tscreateToolFromOperation (used by all ~705 generated registry tools)
  • server/tools/custom/reorder-collection.tsVTEX_REORDER_COLLECTION
  • server/tools/custom/search-collections.tsVTEX_SEARCH_COLLECTIONS
  • server/tools/custom/update-product-specifications.tsVTEX_UPDATE_PRODUCT_SPECIFICATIONS

Why this is the actual root cause

Studio is doing the right thing: I confirmed by introspecting the farmrio connection (COLLECTION_CONNECTIONS_GET conn_9uNQJVyv_ECiRcDZl290Y) that configuration_state is { accountName: \"lojafarm\", appKey, appToken } — fully populated — and buildRequestHeaders mints a JWT with that state and forwards it as x-mesh-token to https://sites-vtex.decocache.com/mcp.

The bug was in vtex. @decocms/runtime caches tool registrations after the first request (tools.ts: let cached: Registrations | null) and creates a fresh bindings env per request (index.ts: env: { ...process.env, ...env }). Tools that capture env in the factory closure see only the FIRST request's env — every subsequent call uses that frozen snapshot. The runtime's own comment at tools.ts:821 explicitly says: "Tool execution reads per-request context from State (AsyncLocalStorage), so reusing definitions is safe." — implying tools must read from runtimeContext.env, not the captured factory env.

Symptom: when the pod's first request happened to carry a state-bearing JWT, every subsequent call appeared to work. When the first request was an unauthenticated tools/list (typical right after a Knative scale-up), every subsequent call saw state: {} and failed with "VTEX accountName is missing" — the exact intermittent behavior reported ("worked for a little bit then back to error").

Verified by direct port-forward to the pod with a fake x-mesh-token containing state — the captured-env code returned hasToken: false regardless of what the request actually carried.

Test plan

  • bun run check — clean
  • bun test — 77 pass, 0 fail (mock runtimeContext.env propagates correctly through all paths)
  • bun run build — bundles cleanly
  • After deploy, confirm the diagnostic log in client-factory.ts flips from stateKeys: [] to stateKeys: [\"accountName\", \"appKey\", \"appToken\"] and VTEX_GET_COLLECTION_PRODUCTS returns real data
  • Verify across pod scale events — the bug was timing-dependent, so the fix should be insensitive to which request hits a fresh pod first

Once verified, follow-up PR can strip the diagnostic logging added in #423.

Note about #430

The runtime bump in #430 (1.3.1 → ^1.6.2) didn't fix anything by itself — both runtime versions had the cached-registrations behavior — but it doesn't hurt and cleans up some unrelated divergence from the recent rollback, so I'm leaving it merged.

🤖 Generated with Claude Code


Summary by cubic

Fix VTEX tools to read MESH_REQUEST_CONTEXT from per-request runtimeContext.env instead of the cached factory env. This removes state leakage across requests and fixes intermittent “VTEX accountName is missing” errors after cold starts or unauthenticated first calls.

  • Bug Fixes

    • Read MESH_REQUEST_CONTEXT from runtimeContext.env in createToolFromOperation and custom tools: VTEX_REORDER_COLLECTION, VTEX_SEARCH_COLLECTIONS, VTEX_UPDATE_PRODUCT_SPECIFICATIONS.
    • Avoids using a cached env snapshot from tool registration; tools now receive the correct state on every call.
  • Dependencies

    • Bump @decocms/runtime to ^1.6.2 and @decocms/bindings to ^1.4.0.
    • Add @modelcontextprotocol/sdk to dependencies at ^1.27.1 (was in devDependencies).

Written for commit c05b085. Summary will update on new commits.

viktormarinho and others added 2 commits May 6, 2026 17:35
The kubernetes-bun rollback in #429 dropped @decocms/runtime from ^1.6.2
back to 1.3.1. With 1.3.1, requests reach the pod with a populated
MESH_REQUEST_CONTEXT envelope (token/connectionId/meshUrl all set) but
state arrives as an empty object — so state.accountName is null and
every tool call fails with "VTEX accountName is missing".

Confirmed in the deployed pod logs:
  hasMeshContext: true, hasToken: true, hasConnectionId: true,
  hasMeshUrl: true, stateKeys: [], stateAccountNamePresent: false

The Workers latency that prompted the revert was startup-CPU-budget
specific to Cloudflare Workers, not a Bun problem, so this only bumps
the runtime/bindings/sdk versions and keeps the kubernetes-bun deploy
and serve()-style entrypoint intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… closure

Every tool was capturing `env` in its factory closure and reading
`env.MESH_REQUEST_CONTEXT` from inside execute. The @decocms/runtime
caches tool registrations after the first request (see tools.ts:
`let cached: Registrations | null`) and creates a fresh `bindings` env
per request — so the captured env is the FIRST request's snapshot,
frozen for the lifetime of the pod.

When a pod's first request happened to carry an `x-mesh-token` with
populated state, every subsequent call reused that captured state
(seemingly worked). When the first request was an unauthenticated
`tools/list` (e.g. just after a Knative scale-up), every later call
saw `state: {}` and failed with "VTEX accountName is missing" — even
though studio was correctly forwarding the JWT with the connection's
configuration_state. Verified end-to-end: studio's `buildRequestHeaders`
mints a JWT containing `state: { accountName, appKey, appToken }` for
this connection, the JWT reaches the pod, but the cached tool closure
ignores it.

The runtime expects `execute` to read per-request env from
`runtimeContext.env` (filled from AsyncLocalStorage on every call) — see
the comment in @decocms/runtime tools.ts:821 ("Tool *execution* reads
per-request context from State (AsyncLocalStorage), so reusing
definitions is safe"). Switch all four execute paths
(createToolFromOperation + the three custom tools) to read from
`runtimeContext.env` and discard the captured factory env.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@viktormarinho viktormarinho merged commit ded0ce1 into main May 6, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant