Skip to content

feat!: unbundle @agent-relay/* so per-platform broker binaries actually install#788

Merged
willwashburn merged 10 commits intomainfrom
ci/fix-post-publish-verify
Apr 25, 2026
Merged

feat!: unbundle @agent-relay/* so per-platform broker binaries actually install#788
willwashburn merged 10 commits intomainfrom
ci/fix-post-publish-verify

Conversation

@willwashburn
Copy link
Copy Markdown
Member

@willwashburn willwashburn commented Apr 24, 2026

TL;DR

agent-relay@6.0.0 ships with a broken broker binary on every platform. This PR fixes that by unbundling the SDK and its dep tree from the CLI tarball — finally making the per-platform optional-dep pattern from #770 / #772 work as designed.

This is a major version bump. The CLI tarball drops from 24.2 MB to 1.5 MB.


The actual production bug

npm install agent-relay@6.0.0 ends up with empty broker bin dirs:

node_modules/agent-relay/node_modules/@agent-relay/broker-darwin-arm64/bin/
  -rw-r--r--  .gitkeep      ← 0 bytes, no binary

The separately-published @agent-relay/broker-darwin-arm64@6.0.0 tarball does contain the real 11.7 MB binary — but it never reaches the user. Spawn fails at runtime.

Why it broke

PR #772 introduced the per-platform optional-dep pattern (the same shape esbuild, sharp, swc, lightningcss, @next/swc all use). It worked for the SDK install path (npm install @agent-relay/sdk) but silently broke for the CLI install path. Three layers conspired:

  1. bundledDependencies and optionalDependencies are fundamentally incompatible for the same chain of deps. Bundling freezes resolution at pack time; optional-deps require resolution at user-install time. PR feat(sdk): split broker binaries into per-platform optional-dep packages #772 kept @agent-relay/sdk bundled while making the broker per-platform packages optional-deps of the SDK — the bundled SDK shipped with the broker packages frozen at their pack-time state (empty bin/.gitkeep), and npm refused to re-resolve them at install time because they were "already satisfied" via the bundle.

  2. @agent-relay/hooks depended on @agent-relay/sdk. Even when I tried to remove SDK from bundledDependencies, hooks transitively pulled it back in. Cutting the chain required unbundling everything in the SDK's transitive closure.

  3. A duplicate legacy bundleDependencies field (no 'd') sat at the bottom of package.json from before npm normalized the spelling. It was still listing every @agent-relay/* package and silently overriding edits to the new-spelling field above. This made the bug look unfixable from the obvious-place view.

What this PR does

The main change: unbundle the SDK chain, embrace optional-deps

@agent-relay/sdk and its transitive @agent-relay/* deps (cloud, config, hooks, telemetry, trajectory, user-directory, utils) are no longer bundled into the CLI tarball. They install from the npm registry exactly like every other Node ecosystem tool with native binaries:

  • agent-relay → declares @agent-relay/sdk as a regular dependency
  • @agent-relay/sdk → declares @agent-relay/broker-{darwin-arm64,darwin-x64,linux-arm64,linux-x64,win32-x64} as optional-deps with os/cpu constraints
  • npm install agent-relay → npm filters the broker optional-deps by the user's platform, installs exactly one to node_modules/@agent-relay/broker-<plat>/, broker binary lands at bin/agent-relay-broker
  • SDK's getBrokerBinaryPath() resolves it via standard require.resolve('@agent-relay/broker-<plat>/package.json')

This is the esbuild / sharp / swc / lightningcss model. It's well-understood, the failure modes are well-known, and engineers debugging cross-platform binary install issues have prior art.

@relaycast/sdk and @relayfile/local-mount (external, not ours) stay bundled — they're not in the broker dep chain, so bundling is fine.

Import surface (breaking change)

The previous agent-relay/broker/<x> subpath exports (which pointed at literal files inside the CLI tarball) are removed. Going forward there are exactly two ways to use the SDK:

// Main entry — already exported by agent-relay's `src/index.ts`
import { AgentRelayClient } from 'agent-relay';

// Granular — @agent-relay/sdk is hoisted in node_modules as a
// transitive dep of agent-relay, so direct imports work without
// adding it to your own package.json
import { AgentRelayClient } from '@agent-relay/sdk';
import { AgentRelayClient } from '@agent-relay/sdk/client';
import { type BrokerEvent } from '@agent-relay/sdk/browser';

@agent-relay/sdk remains a published, public package — install it standalone if you don't want the CLI. This matches the convention every Node ecosystem package follows: one main package entry, granular paths from the actual sub-packages.

Other cleanup included in this PR

  • Drop packages/*/dist, packages/*/package.json, packages/sdk/bin, packages/sdk/README.md from files. Tarball ships only the CLI's own dist/ plus runtime resources. 24.2 MB → 1.5 MB. 9197 files → 1160 files.
  • Delete setupWorkspacePackageLinks from scripts/postinstall.js (~130 lines). It existed solely to bridge bundled-workspace-packages → node_modules/@agent-relay/* symlinks for global installs. With unbundling, npm's normal hoisting handles this.
  • Update JSDoc examples in packages/sdk/src/{shadow,logs,browser}.ts to use @agent-relay/sdk (the canonical SDK import for SDK-source readers), not the removed agent-relay/broker*.
  • Update verify-publish.yml and scripts/post-publish-verify/verify-install.sh to use the SDK's own getBrokerBinaryPath() resolver — the canonical way clients locate the binary. Replaces ~300 lines of stale path-probing checks that were written for the pre-feat(sdk): split broker binaries into per-platform optional-dep packages #772 bundled layout (node_modules/agent-relay/{bin,packages/sdk/bin} — neither exists anymore).

Adjacent CI fixes also included

  • Drop the darwin-x64 smoke leg. GitHub's macos-13 (Intel) runner pool is so capacity-constrained that the smoke job was queueing for 2+ hours while every other smoke leg finished in under 90s. The x86_64-apple-darwin broker binary is still cross-compiled, published, and verified downstream by Verify Standalone (macOS) — we just don't run the end-to-end smoke on an actual Intel Mac anymore.
  • Fix PyPI publish retry logic. pypa/gh-action-pypi-publish writes <wheel>.publish.attestation next to each wheel on its first attempt. If upload then fails, the attestation file is left behind and the next attempt aborts with already have publish attestations. Clear stale attestations between retries and add skip-existing: true so retries also no-op cleanly when the wheel did make it through.
  • Fix Windows path check in verify-publish-sdk.yml. p.includes(expected) was comparing a backslash-separated path against a forward-slash package name. Normalize backslashes before the check. The resolver was always correct — false negative in the test.
  • Drop the pull_request trigger from verify-publish.yml. These tests install the published agent-relay from the npm registry. With unbundling, a locally-packed PR-mode tarball can't install standalone (it references @agent-relay/*@<ver> not yet on the registry). Manual workflow_dispatch and workflow_call from publish.yml still work.

Trade-offs we accept

  • First install needs network for transitive deps. No more single-tarball offline install.
  • Race window between publish-packages and publish-main is now real. Mitigated by needs: ordering in the workflow, but a few seconds of theoretical risk exist where someone could npm i agent-relay@<new> before @agent-relay/sdk@<new> propagates the CDN.
  • agent-relay/broker* import paths are removed (breaking). Users importing from agent-relay/broker/client etc. need to switch to @agent-relay/sdk/client. Major version bump covers this.

Verification

  • Local pack of this branch: 1.5 MB tarball, 1160 entries, 0 validator violations.
  • Simulated install layout: broker binary found via SDK resolver with no special-casing.
  • Architecture matches industry pattern (esbuild, sharp, swc, lightningcss, @next/swc, parcel — all use unbundled per-platform optional-deps).

Test plan

  • Re-dispatch Publish Package after merge; confirm tarball at registry installs cleanly with non-empty broker bin dir.
  • On each of macOS arm64, macOS x64, Linux x64, Linux arm64, Windows x64: npm install agent-relay@<new> in a scratch project, verify node_modules/@agent-relay/broker-<plat>/bin/agent-relay-broker exists, run npx agent-relay --version (loads CLI), run a spawn through the SDK to confirm broker binary actually runs.
  • Verify import { AgentRelayClient } from 'agent-relay' and import { AgentRelayClient } from '@agent-relay/sdk' both resolve at type-check and runtime.
  • Verify npm install @agent-relay/sdk (SDK-only path) still installs broker optional-deps correctly.
  • Watch the Smoke Broker Packages and Verify Standalone macOS jobs in the next publish run; both should pass.

🤖 Generated with Claude Code

Three issues surfaced in post-publish verification:

1. PyPI publish retries left stale .publish.attestation files next to
   wheels after a failed upload, causing later attempts to abort with
   "already have publish attestations". Clear attestations between
   retries and set skip-existing: true on every attempt.

2. Verify Published SDK (win32-x64) hit a false negative in the
   resolver smoke: p.includes(expected) compared a backslash path to a
   forward-slash package name. Normalize backslashes before the check.

3. verify-publish.yml and verify-install.sh were written for the old
   bundled-broker layout (node_modules/agent-relay/{bin,packages/sdk/bin})
   that was replaced by the @agent-relay/broker-<platform>-<arch>
   optional-dep pattern. Rewrite each broker check to use the SDK's
   own getBrokerBinaryPath() resolver — the canonical way clients
   locate the binary — and run --help against the resolved path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

willwashburn and others added 3 commits April 24, 2026 17:03
The agent-relay CLI tarball bundles @agent-relay/sdk, which declares
the per-platform broker packages as optionalDependencies. But bundled
dependencies freeze their subtree at pack time — so the broker packages
ship inside the tarball with empty bin/.gitkeep dirs, and npm never
re-resolves them at user-install time. Users who `npm i agent-relay`
end up with no broker binary, and spawn fails.

Two changes land together:

1. Declare the broker packages as optionalDependencies at the top level
   of agent-relay's package.json. npm processes top-level optional-deps
   at user-install time regardless of bundling, so the real broker
   packages land at node_modules/@agent-relay/broker-<platform>/bin/
   alongside the empty bundled copies.

2. Update getOptionalDepBinaryPath to iterate every resolution
   reference instead of committing to the first successful
   require.resolve. Now that the install tree legitimately contains
   two copies of the broker package (empty bundled + real top-level),
   the resolver has to keep trying refs until it finds one with an
   actual binary file.

Verified against a simulated install tree: resolver picks the
top-level copy with the binary, skips the empty bundled copy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
My previous iteration-through-refs attempt didn't help: every
resolution reference inside the bundled agent-relay tree
(node_modules/agent-relay/**) hits the same empty bundled
@agent-relay/broker-* package, and Node's resolver stops at the
first match without climbing out to outer node_modules scopes.

Probe outer scopes explicitly: for each candidate package dir,
also consider the same package at every shallower node_modules
segment along its path. The real top-level copy (installed via
agent-relay's own optionalDependencies) wins over the empty
bundled copy.

Verified against a simulated install tree with the resolver
invoked from inside node_modules/agent-relay — it now returns
the top-level real binary instead of the empty bundled one.

Also update the verify-publish.yml and verify-install.sh broker
tests to run the node script from inside node_modules/agent-relay,
so Node's ESM bare-specifier resolution can find the bundled
@agent-relay/sdk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause of the broken agent-relay@6.0.0 install: package.json had
a stale legacy \`bundleDependencies\` field (old spelling) alongside
\`bundledDependencies\`. npm honours both, so removing packages from
one list did nothing. The legacy field still listed every
@agent-relay/* — transitively bundling SDK (because @agent-relay/hooks
depends on it), which bundled the broker optional-deps with empty
bin/.gitkeep dirs, which npm then refused to replace with real
registry content at user-install time.

Changes:

1. Delete the duplicate \`bundleDependencies\` field and reduce
   \`bundledDependencies\` to just the two external packages that still
   need bundling (@relaycast/sdk, @relayfile/local-mount). Every
   @agent-relay/* now resolves from the npm registry at install time
   — the same pattern esbuild, tailwindcss, and sharp use.
   CLI tarball drops 24.2 MB → 8.2 MB, 9197 → 1774 files.

2. Simplify getOptionalDepBinaryPath in broker-path.ts: revert the
   ancestor-walk / multi-ref iteration that was working around the
   bundled-shadow problem. With no bundled shadow, Node's default
   resolver finds the one real copy of the broker package. Keep the
   behaviour of trying every resolution reference, but drop the
   explicit node_modules ancestor probing.

3. Drop the \`optionalDependencies\` block from root package.json.
   With SDK unbundled, its own \`optionalDependencies\` declaration
   is the single source of truth for broker package resolution.

4. Drop the \`cd node_modules/agent-relay\` workarounds from the
   verify broker tests. @agent-relay/sdk is now hoisted to the top
   of node_modules, so bare-specifier imports work from any cwd.

5. Drop the pull_request trigger and PR-only pack steps from
   verify-publish.yml. These tests pack a local tarball and install
   it — which no longer works once the tarball references
   @agent-relay/* deps at a version that isn't yet on the registry.
   The workflow is fundamentally post-publish verification; it still
   runs via workflow_call from publish.yml and via workflow_dispatch
   for manual testing.

Verified against a simulated install tree: the broker binary resolves
correctly via the SDK's own getBrokerBinaryPath() with no special
handling, matching what \`npm i agent-relay\` will produce post-publish.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@xkonjin xkonjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction here, especially moving verification onto the SDK resolver instead of hand-inspecting bin paths. One concern though: adding skip-existing: true to all three PyPI publish attempts can hide a real partial-publish failure. If attempt 1 uploads some artifacts and then fails mid-run, attempts 2 and 3 may quietly skip the already-seen files instead of proving the full release set is consistent. That makes the workflow more retry-friendly, but it also weakens the signal that the publish step actually recovered cleanly. It may be safer to scope skip-existing only where duplicate uploads are expected, or add an explicit post-publish verification that every expected artifact/version exists in PyPI before declaring success.

Test coverage also gets thinner on the darwin-x64 path. This PR removes a lot of direct binary/smoke assertions and replaces them with resolver-based checks, which is good, but I do not see an equivalent darwin-x64 verification for the new optional-dependency model. A minimal resolver test for that platform, or a short note on why dropping that smoke path is safe, would make this change more convincing.

willwashburn and others added 2 commits April 24, 2026 18:35
…deps

After unbundling @agent-relay/* from bundledDependencies, the CLI
tarball still ships packages/<name>/dist/ for any workspace package
that's a target of agent-relay's own subpath \`exports\`
(e.g. \`agent-relay/broker\` → \`packages/sdk/dist/...\`). These
shipped copies anchor the static export resolution; the actual SDK
code is also installed via the regular dependency.

The validator was previously enforcing "any workspace package in the
tarball must be in bundledDependencies", which is now too strict.
Extend it to also accept workspace packages declared as regular
dependencies, optional dependencies, or peer dependencies of the root.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BREAKING CHANGE: The \`agent-relay/broker\`, \`agent-relay/broker/client\`,
\`agent-relay/broker/protocol\`, \`agent-relay/broker/relay\`,
\`agent-relay/broker/logs\`, \`agent-relay/broker/consensus\`,
\`agent-relay/broker/shadow\`, and \`agent-relay/broker/browser\` subpath
exports are removed. Use \`@agent-relay/sdk\` and its subpath exports
directly (\`@agent-relay/sdk/client\`, \`@agent-relay/sdk/browser\`, etc.).
\`@agent-relay/sdk\` is a dependency of \`agent-relay\` and gets installed
automatically — no need to add it to your own dependencies.

These exports pointed at \`packages/sdk/dist/...\` inside the CLI tarball,
which forced us to ship every @agent-relay/* workspace package's dist/
files even after unbundling. With the legacy exports gone:

- Drop \`packages/*/dist\` and \`packages/*/package.json\` from \`files\`.
  CLI tarball now ships only the CLI's own dist + runtime resources.
  Tarball size: 24.2 MB → 1.5 MB. Entries: 9197 → 1088.
- Drop \`setupWorkspacePackageLinks\` from postinstall.js — that hack
  existed solely to bridge the bundled-workspace-packages → node_modules
  gap on global installs. With unbundling + no shipped packages/ dir,
  npm's normal hoisting handles it.
- Revert validate-npm-tarball.mjs's "shipped workspace pkgs are ok if
  declared as regular dep" exception — no longer needed since no
  workspace packages are shipped.
- Update JSDoc examples in shadow.ts, logs.ts, browser.ts to use the
  @agent-relay/sdk import path instead of agent-relay/broker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

willwashburn and others added 2 commits April 24, 2026 19:04
After unbundling, the previous attempt removed the subpath exports
that pointed into shipped workspace dirs. That dropped the single
import surface — users would have had to import directly from
@agent-relay/sdk/<x> for granular paths.

Restore the surface as one-line shim files in src/exports/<x>.ts,
each \`export * from '@agent-relay/sdk/<x>';\`. The shims compile
into the CLI's own dist/ and ship in the tarball; at runtime they
forward to the real @agent-relay/sdk installed via the regular
dependency. This mirrors every subpath @agent-relay/sdk exposes:

  agent-relay              ← already re-exports SDK from src/index.ts
  agent-relay/client
  agent-relay/protocol
  agent-relay/relay
  agent-relay/logs
  agent-relay/consensus
  agent-relay/shadow
  agent-relay/browser
  agent-relay/workflows
  agent-relay/communicate
  agent-relay/communicate/a2a-types
  agent-relay/communicate/a2a-server
  agent-relay/communicate/a2a-transport
  agent-relay/communicate/a2a-bridge
  agent-relay/communicate/adapters/pi
  agent-relay/communicate/adapters/claude-sdk
  agent-relay/communicate/adapters/ai-sdk
  agent-relay/http
  agent-relay/broker-path

Tarball stays 1.5 MB (shims are tiny). Validator stays clean (no
shipped workspace dirs). End-to-end verified: import from
agent-relay/client resolves through the shim to the real SDK code.

Update JSDoc examples in shadow.ts, logs.ts, browser.ts to use the
canonical agent-relay/<x> form rather than @agent-relay/sdk/<x>.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These examples live inside the SDK source. A reader who got here
almost certainly installed @agent-relay/sdk standalone — they may
not have agent-relay (the CLI) at all. Show them the canonical
SDK import path, not the CLI's re-exported alias.

CLI users still get the agent-relay/* import surface via the shim
re-exports added in the previous commit; both work at runtime,
but the right canonical depends on which package the reader has
installed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@willwashburn willwashburn changed the title ci: fix post-publish verification and drop darwin-x64 smoke feat!: unbundle @agent-relay/* so per-platform broker binaries actually install Apr 25, 2026
willwashburn and others added 2 commits April 24, 2026 20:19
The shims added in 9bd07d2 were cosmetic. Their value proposition
was \"users don't need to know @agent-relay/sdk exists\" — but
@agent-relay/sdk is already in node_modules as a transitive dep of
agent-relay (npm hoists it), so users can import from it directly
without adding it to their own package.json. The shims were just a
second name for the same thing.

Industry convention (esbuild, sharp, swc, lightningcss) is exactly
this: one main package entry, granular paths come from the actual
sub-packages. Match it.

Going forward:

  import from 'agent-relay'                           // all-in-one main API (re-exports SDK)
  import from '@agent-relay/sdk[/subpath]'            // granular SDK access

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When setupWorkspacePackageLinks() was deleted in 9f0c35a, the
\`const linkResult = setupWorkspacePackageLinks()\` declaration in
main() went with it, but the diagnostic call still passed
\`linkResult\` and the helper still branched on it. ESM strict mode
turns the undeclared identifier into a ReferenceError at runtime —
the postinstall .catch() swallows it as a warning, but the rest of
the diagnostic output (dashboard/acp install confirmations) gets
silently skipped.

Drop the linkResult parameter from logPostinstallDiagnostics() and
delete the workspace-link diagnostic branch (the function it
reported on no longer exists).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@willwashburn willwashburn merged commit 21eefb8 into main Apr 25, 2026
40 checks passed
@willwashburn willwashburn deleted the ci/fix-post-publish-verify branch April 25, 2026 00:48
@willwashburn willwashburn restored the ci/fix-post-publish-verify branch April 25, 2026 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants