feat!: unbundle @agent-relay/* so per-platform broker binaries actually install#788
feat!: unbundle @agent-relay/* so per-platform broker binaries actually install#788willwashburn merged 10 commits intomainfrom
Conversation
Three issues surfaced in post-publish verification:
1. PyPI publish retries left stale .publish.attestation files next to
wheels after a failed upload, causing later attempts to abort with
"already have publish attestations". Clear attestations between
retries and set skip-existing: true on every attempt.
2. Verify Published SDK (win32-x64) hit a false negative in the
resolver smoke: p.includes(expected) compared a backslash path to a
forward-slash package name. Normalize backslashes before the check.
3. verify-publish.yml and verify-install.sh were written for the old
bundled-broker layout (node_modules/agent-relay/{bin,packages/sdk/bin})
that was replaced by the @agent-relay/broker-<platform>-<arch>
optional-dep pattern. Rewrite each broker check to use the SDK's
own getBrokerBinaryPath() resolver — the canonical way clients
locate the binary — and run --help against the resolved path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The agent-relay CLI tarball bundles @agent-relay/sdk, which declares the per-platform broker packages as optionalDependencies. But bundled dependencies freeze their subtree at pack time — so the broker packages ship inside the tarball with empty bin/.gitkeep dirs, and npm never re-resolves them at user-install time. Users who `npm i agent-relay` end up with no broker binary, and spawn fails. Two changes land together: 1. Declare the broker packages as optionalDependencies at the top level of agent-relay's package.json. npm processes top-level optional-deps at user-install time regardless of bundling, so the real broker packages land at node_modules/@agent-relay/broker-<platform>/bin/ alongside the empty bundled copies. 2. Update getOptionalDepBinaryPath to iterate every resolution reference instead of committing to the first successful require.resolve. Now that the install tree legitimately contains two copies of the broker package (empty bundled + real top-level), the resolver has to keep trying refs until it finds one with an actual binary file. Verified against a simulated install tree: resolver picks the top-level copy with the binary, skips the empty bundled copy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
My previous iteration-through-refs attempt didn't help: every resolution reference inside the bundled agent-relay tree (node_modules/agent-relay/**) hits the same empty bundled @agent-relay/broker-* package, and Node's resolver stops at the first match without climbing out to outer node_modules scopes. Probe outer scopes explicitly: for each candidate package dir, also consider the same package at every shallower node_modules segment along its path. The real top-level copy (installed via agent-relay's own optionalDependencies) wins over the empty bundled copy. Verified against a simulated install tree with the resolver invoked from inside node_modules/agent-relay — it now returns the top-level real binary instead of the empty bundled one. Also update the verify-publish.yml and verify-install.sh broker tests to run the node script from inside node_modules/agent-relay, so Node's ESM bare-specifier resolution can find the bundled @agent-relay/sdk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause of the broken agent-relay@6.0.0 install: package.json had a stale legacy \`bundleDependencies\` field (old spelling) alongside \`bundledDependencies\`. npm honours both, so removing packages from one list did nothing. The legacy field still listed every @agent-relay/* — transitively bundling SDK (because @agent-relay/hooks depends on it), which bundled the broker optional-deps with empty bin/.gitkeep dirs, which npm then refused to replace with real registry content at user-install time. Changes: 1. Delete the duplicate \`bundleDependencies\` field and reduce \`bundledDependencies\` to just the two external packages that still need bundling (@relaycast/sdk, @relayfile/local-mount). Every @agent-relay/* now resolves from the npm registry at install time — the same pattern esbuild, tailwindcss, and sharp use. CLI tarball drops 24.2 MB → 8.2 MB, 9197 → 1774 files. 2. Simplify getOptionalDepBinaryPath in broker-path.ts: revert the ancestor-walk / multi-ref iteration that was working around the bundled-shadow problem. With no bundled shadow, Node's default resolver finds the one real copy of the broker package. Keep the behaviour of trying every resolution reference, but drop the explicit node_modules ancestor probing. 3. Drop the \`optionalDependencies\` block from root package.json. With SDK unbundled, its own \`optionalDependencies\` declaration is the single source of truth for broker package resolution. 4. Drop the \`cd node_modules/agent-relay\` workarounds from the verify broker tests. @agent-relay/sdk is now hoisted to the top of node_modules, so bare-specifier imports work from any cwd. 5. Drop the pull_request trigger and PR-only pack steps from verify-publish.yml. These tests pack a local tarball and install it — which no longer works once the tarball references @agent-relay/* deps at a version that isn't yet on the registry. The workflow is fundamentally post-publish verification; it still runs via workflow_call from publish.yml and via workflow_dispatch for manual testing. Verified against a simulated install tree: the broker binary resolves correctly via the SDK's own getBrokerBinaryPath() with no special handling, matching what \`npm i agent-relay\` will produce post-publish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
xkonjin
left a comment
There was a problem hiding this comment.
I like the direction here, especially moving verification onto the SDK resolver instead of hand-inspecting bin paths. One concern though: adding skip-existing: true to all three PyPI publish attempts can hide a real partial-publish failure. If attempt 1 uploads some artifacts and then fails mid-run, attempts 2 and 3 may quietly skip the already-seen files instead of proving the full release set is consistent. That makes the workflow more retry-friendly, but it also weakens the signal that the publish step actually recovered cleanly. It may be safer to scope skip-existing only where duplicate uploads are expected, or add an explicit post-publish verification that every expected artifact/version exists in PyPI before declaring success.
Test coverage also gets thinner on the darwin-x64 path. This PR removes a lot of direct binary/smoke assertions and replaces them with resolver-based checks, which is good, but I do not see an equivalent darwin-x64 verification for the new optional-dependency model. A minimal resolver test for that platform, or a short note on why dropping that smoke path is safe, would make this change more convincing.
…deps After unbundling @agent-relay/* from bundledDependencies, the CLI tarball still ships packages/<name>/dist/ for any workspace package that's a target of agent-relay's own subpath \`exports\` (e.g. \`agent-relay/broker\` → \`packages/sdk/dist/...\`). These shipped copies anchor the static export resolution; the actual SDK code is also installed via the regular dependency. The validator was previously enforcing "any workspace package in the tarball must be in bundledDependencies", which is now too strict. Extend it to also accept workspace packages declared as regular dependencies, optional dependencies, or peer dependencies of the root. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BREAKING CHANGE: The \`agent-relay/broker\`, \`agent-relay/broker/client\`, \`agent-relay/broker/protocol\`, \`agent-relay/broker/relay\`, \`agent-relay/broker/logs\`, \`agent-relay/broker/consensus\`, \`agent-relay/broker/shadow\`, and \`agent-relay/broker/browser\` subpath exports are removed. Use \`@agent-relay/sdk\` and its subpath exports directly (\`@agent-relay/sdk/client\`, \`@agent-relay/sdk/browser\`, etc.). \`@agent-relay/sdk\` is a dependency of \`agent-relay\` and gets installed automatically — no need to add it to your own dependencies. These exports pointed at \`packages/sdk/dist/...\` inside the CLI tarball, which forced us to ship every @agent-relay/* workspace package's dist/ files even after unbundling. With the legacy exports gone: - Drop \`packages/*/dist\` and \`packages/*/package.json\` from \`files\`. CLI tarball now ships only the CLI's own dist + runtime resources. Tarball size: 24.2 MB → 1.5 MB. Entries: 9197 → 1088. - Drop \`setupWorkspacePackageLinks\` from postinstall.js — that hack existed solely to bridge the bundled-workspace-packages → node_modules gap on global installs. With unbundling + no shipped packages/ dir, npm's normal hoisting handles it. - Revert validate-npm-tarball.mjs's "shipped workspace pkgs are ok if declared as regular dep" exception — no longer needed since no workspace packages are shipped. - Update JSDoc examples in shadow.ts, logs.ts, browser.ts to use the @agent-relay/sdk import path instead of agent-relay/broker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After unbundling, the previous attempt removed the subpath exports that pointed into shipped workspace dirs. That dropped the single import surface — users would have had to import directly from @agent-relay/sdk/<x> for granular paths. Restore the surface as one-line shim files in src/exports/<x>.ts, each \`export * from '@agent-relay/sdk/<x>';\`. The shims compile into the CLI's own dist/ and ship in the tarball; at runtime they forward to the real @agent-relay/sdk installed via the regular dependency. This mirrors every subpath @agent-relay/sdk exposes: agent-relay ← already re-exports SDK from src/index.ts agent-relay/client agent-relay/protocol agent-relay/relay agent-relay/logs agent-relay/consensus agent-relay/shadow agent-relay/browser agent-relay/workflows agent-relay/communicate agent-relay/communicate/a2a-types agent-relay/communicate/a2a-server agent-relay/communicate/a2a-transport agent-relay/communicate/a2a-bridge agent-relay/communicate/adapters/pi agent-relay/communicate/adapters/claude-sdk agent-relay/communicate/adapters/ai-sdk agent-relay/http agent-relay/broker-path Tarball stays 1.5 MB (shims are tiny). Validator stays clean (no shipped workspace dirs). End-to-end verified: import from agent-relay/client resolves through the shim to the real SDK code. Update JSDoc examples in shadow.ts, logs.ts, browser.ts to use the canonical agent-relay/<x> form rather than @agent-relay/sdk/<x>. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These examples live inside the SDK source. A reader who got here almost certainly installed @agent-relay/sdk standalone — they may not have agent-relay (the CLI) at all. Show them the canonical SDK import path, not the CLI's re-exported alias. CLI users still get the agent-relay/* import surface via the shim re-exports added in the previous commit; both work at runtime, but the right canonical depends on which package the reader has installed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shims added in 9bd07d2 were cosmetic. Their value proposition was \"users don't need to know @agent-relay/sdk exists\" — but @agent-relay/sdk is already in node_modules as a transitive dep of agent-relay (npm hoists it), so users can import from it directly without adding it to their own package.json. The shims were just a second name for the same thing. Industry convention (esbuild, sharp, swc, lightningcss) is exactly this: one main package entry, granular paths come from the actual sub-packages. Match it. Going forward: import from 'agent-relay' // all-in-one main API (re-exports SDK) import from '@agent-relay/sdk[/subpath]' // granular SDK access Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When setupWorkspacePackageLinks() was deleted in 9f0c35a, the \`const linkResult = setupWorkspacePackageLinks()\` declaration in main() went with it, but the diagnostic call still passed \`linkResult\` and the helper still branched on it. ESM strict mode turns the undeclared identifier into a ReferenceError at runtime — the postinstall .catch() swallows it as a warning, but the rest of the diagnostic output (dashboard/acp install confirmations) gets silently skipped. Drop the linkResult parameter from logPostinstallDiagnostics() and delete the workspace-link diagnostic branch (the function it reported on no longer exists). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TL;DR
agent-relay@6.0.0ships with a broken broker binary on every platform. This PR fixes that by unbundling the SDK and its dep tree from the CLI tarball — finally making the per-platform optional-dep pattern from #770 / #772 work as designed.This is a major version bump. The CLI tarball drops from 24.2 MB to 1.5 MB.
The actual production bug
npm install agent-relay@6.0.0ends up with empty broker bin dirs:The separately-published
@agent-relay/broker-darwin-arm64@6.0.0tarball does contain the real 11.7 MB binary — but it never reaches the user. Spawn fails at runtime.Why it broke
PR #772 introduced the per-platform optional-dep pattern (the same shape esbuild, sharp, swc, lightningcss, @next/swc all use). It worked for the SDK install path (
npm install @agent-relay/sdk) but silently broke for the CLI install path. Three layers conspired:bundledDependenciesandoptionalDependenciesare fundamentally incompatible for the same chain of deps. Bundling freezes resolution at pack time; optional-deps require resolution at user-install time. PR feat(sdk): split broker binaries into per-platform optional-dep packages #772 kept@agent-relay/sdkbundled while making the broker per-platform packages optional-deps of the SDK — the bundled SDK shipped with the broker packages frozen at their pack-time state (emptybin/.gitkeep), and npm refused to re-resolve them at install time because they were "already satisfied" via the bundle.@agent-relay/hooksdepended on@agent-relay/sdk. Even when I tried to remove SDK frombundledDependencies, hooks transitively pulled it back in. Cutting the chain required unbundling everything in the SDK's transitive closure.A duplicate legacy
bundleDependenciesfield (no 'd') sat at the bottom ofpackage.jsonfrom before npm normalized the spelling. It was still listing every@agent-relay/*package and silently overriding edits to the new-spelling field above. This made the bug look unfixable from the obvious-place view.What this PR does
The main change: unbundle the SDK chain, embrace optional-deps
@agent-relay/sdkand its transitive@agent-relay/*deps (cloud,config,hooks,telemetry,trajectory,user-directory,utils) are no longer bundled into the CLI tarball. They install from the npm registry exactly like every other Node ecosystem tool with native binaries:agent-relay→ declares@agent-relay/sdkas a regular dependency@agent-relay/sdk→ declares@agent-relay/broker-{darwin-arm64,darwin-x64,linux-arm64,linux-x64,win32-x64}as optional-deps withos/cpuconstraintsnpm install agent-relay→ npm filters the broker optional-deps by the user's platform, installs exactly one tonode_modules/@agent-relay/broker-<plat>/, broker binary lands atbin/agent-relay-brokergetBrokerBinaryPath()resolves it via standardrequire.resolve('@agent-relay/broker-<plat>/package.json')This is the esbuild / sharp / swc / lightningcss model. It's well-understood, the failure modes are well-known, and engineers debugging cross-platform binary install issues have prior art.
@relaycast/sdkand@relayfile/local-mount(external, not ours) stay bundled — they're not in the broker dep chain, so bundling is fine.Import surface (breaking change)
The previous
agent-relay/broker/<x>subpath exports (which pointed at literal files inside the CLI tarball) are removed. Going forward there are exactly two ways to use the SDK:@agent-relay/sdkremains a published, public package — install it standalone if you don't want the CLI. This matches the convention every Node ecosystem package follows: one main package entry, granular paths from the actual sub-packages.Other cleanup included in this PR
packages/*/dist,packages/*/package.json,packages/sdk/bin,packages/sdk/README.mdfromfiles. Tarball ships only the CLI's owndist/plus runtime resources. 24.2 MB → 1.5 MB. 9197 files → 1160 files.setupWorkspacePackageLinksfromscripts/postinstall.js(~130 lines). It existed solely to bridge bundled-workspace-packages →node_modules/@agent-relay/*symlinks for global installs. With unbundling, npm's normal hoisting handles this.packages/sdk/src/{shadow,logs,browser}.tsto use@agent-relay/sdk(the canonical SDK import for SDK-source readers), not the removedagent-relay/broker*.verify-publish.ymlandscripts/post-publish-verify/verify-install.shto use the SDK's owngetBrokerBinaryPath()resolver — the canonical way clients locate the binary. Replaces ~300 lines of stale path-probing checks that were written for the pre-feat(sdk): split broker binaries into per-platform optional-dep packages #772 bundled layout (node_modules/agent-relay/{bin,packages/sdk/bin}— neither exists anymore).Adjacent CI fixes also included
darwin-x64smoke leg. GitHub'smacos-13(Intel) runner pool is so capacity-constrained that the smoke job was queueing for 2+ hours while every other smoke leg finished in under 90s. Thex86_64-apple-darwinbroker binary is still cross-compiled, published, and verified downstream byVerify Standalone (macOS)— we just don't run the end-to-end smoke on an actual Intel Mac anymore.pypa/gh-action-pypi-publishwrites<wheel>.publish.attestationnext to each wheel on its first attempt. If upload then fails, the attestation file is left behind and the next attempt aborts withalready have publish attestations. Clear stale attestations between retries and addskip-existing: trueso retries also no-op cleanly when the wheel did make it through.verify-publish-sdk.yml.p.includes(expected)was comparing a backslash-separated path against a forward-slash package name. Normalize backslashes before the check. The resolver was always correct — false negative in the test.pull_requesttrigger fromverify-publish.yml. These tests install the publishedagent-relayfrom the npm registry. With unbundling, a locally-packed PR-mode tarball can't install standalone (it references@agent-relay/*@<ver>not yet on the registry). Manualworkflow_dispatchandworkflow_callfrompublish.ymlstill work.Trade-offs we accept
publish-packagesandpublish-mainis now real. Mitigated byneeds:ordering in the workflow, but a few seconds of theoretical risk exist where someone couldnpm i agent-relay@<new>before@agent-relay/sdk@<new>propagates the CDN.agent-relay/broker*import paths are removed (breaking). Users importing fromagent-relay/broker/clientetc. need to switch to@agent-relay/sdk/client. Major version bump covers this.Verification
Test plan
Publish Packageafter merge; confirm tarball at registry installs cleanly with non-empty broker bin dir.npm install agent-relay@<new>in a scratch project, verifynode_modules/@agent-relay/broker-<plat>/bin/agent-relay-brokerexists, runnpx agent-relay --version(loads CLI), run a spawn through the SDK to confirm broker binary actually runs.import { AgentRelayClient } from 'agent-relay'andimport { AgentRelayClient } from '@agent-relay/sdk'both resolve at type-check and runtime.npm install @agent-relay/sdk(SDK-only path) still installs broker optional-deps correctly.🤖 Generated with Claude Code