feat(agents): prod-readiness wiring — Modal sandbox, Redis-only bus, approvals, gateway#61452
Conversation
…approvals, gateway Several changes across the agent platform services to make the dev deploy actually work end-to-end: - Sandbox: implement Modal-backed `ModalSandboxPool` (per-session sandbox via the official `modal` npm SDK; tools + dispatch.js written to /sandbox + /workdir; per-invoke exec). Drop `in-process` from the selector and fail-fast at boot if `SANDBOX_BACKEND` isn't `modal` or `docker`. The `InProcessSandboxPool` class stays for the test harness; just no longer reachable from prod entry points. - Strip dead `egressProxyUrl` / `EgressPolicy` plumbing — egress filtering lives at the sandbox boundary + cluster-level smokescreen now. - Bus: drop `MemorySessionEventBus` from the runner + ingress boot paths. Both throw at boot if `REDIS_URL` is unset; harness still wires the in-memory bus directly. - Approvals: wire `PgApprovalStore` into both the runner Worker and the janitor (server + sweep), matching the test harness. - web-fetch: drop the unenforced-allowlist warning + dead spec field. Outbound host filtering lives at smokescreen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Size Change: 0 B Total Size: 81.1 MB ℹ️ View Unchanged
|
Adds a real Modal e2e test for the ModalSandboxPool (opt-in by env, same shape as real-inference.test.ts): - Skips unless both MODAL_TOKEN_ID + MODAL_TOKEN_SECRET are present (in process env or repo-root .env). - Provisions an actual Modal sandbox under a per-test app, lays out a trivial echo tool + dispatcher, exercises happy path + tool-not-loaded + action-not-found, asserts response shapes, terminates the sandbox. - Loads SSL_CERT_FILE from /etc/ssl/cert.pem on darwin so the Modal gRPC handshake works (same workaround as real-inference.test.ts). Caught a runtime bug while writing this: - Modal's `exec()` rejects `timeoutMs: 0` with `timeoutMs must be positive` even though the type def claims "default 0 (no timeout)". The previous impl was passing `req.timeoutMs ?? 0` unconditionally, which would have broken every production tool call without an explicit timeout. Fix: only pass `timeoutMs` to exec when the caller supplied a positive value. Runtime: ~7s end-to-end including Modal sandbox boot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🤖 Comment from a Claude Code session, not Ben directly. Follow-up: Ben asked whether I actually validated Modal end-to-end. I hadn't — only verified credentials worked and the bundle compiles. Pushed f7c8d26 which:
The test ran green against Ben's Modal account in ~7s (provisions a sandbox, writes dispatch.js + a trivial |
…hans
Closes the loop the previous Modal commit left open: the
`agent_sandbox_instance` row already existed, but `provider_sandbox_id`
was stuffed with the runner's session UUID — useless for out-of-process
termination. If the runner pod crashes mid-session, the Modal sandbox
runs (and bills) until its own `timeoutMs` cap.
What lands here:
- `Sandbox` interface now requires `providerSandboxId`. Modal returns
`handle.sandboxId` (the real `sb-...`); Docker returns the container
hash; InProcess falls back to sessionId (no separate handle exists).
- Worker writes the real id into `agent_sandbox_instance.provider_sandbox_id`
on `markReady` so out-of-process callers can use it.
- New `SandboxTerminator` abstraction in agent-shared, plus
`MultiBackendSandboxTerminator` that routes by `SandboxKind`:
- `modal` → lazy-imports `modal`, calls
`client.sandboxes.fromId(id).terminate()`.
Idempotent — 404 / "not found" responses count as ok.
- `in-process` → no-op (sandbox died with the runner that owned it).
- `docker` → no-op (janitor pod can't reach the runner's
docker socket; local-dev terminators can wire
their own).
- New sweep policy in agent-janitor:
- `findStale(sandboxStaleMs)` → list `(provisioning|ready|terminating)`
rows older than 10m (default).
- Per row: call `terminator.terminate(kind, providerSandboxId)`.
On success → `markTerminated`. On failure → leave the row alone
so the next sweep tick retries.
- SweepResult gains `reaped_sandboxes` + `sandbox_reap_failures`.
- `SANDBOX_STALE_MS` env var (default 10m = 2× the stuck-running
threshold so a healthy session re-queue + resume cycle doesn't race
the reaper).
- Janitor `index.ts` wires `PgSandboxInstanceStore` +
`MultiBackendSandboxTerminator(createModalSandboxTerminator())` into
the sweep deps. MODAL_TOKEN_ID/SECRET inherits from the existing
shared `secret_env` block; no chart change needed.
Coverage:
- 5 new unit tests in `agent-janitor/src/sweep.test.ts` exercising
reap-success, fresh-row no-op, terminator-failure retry path,
missing-dep no-op, custom threshold.
- New real-Modal test case in
`agent-shared/src/sandbox/sandbox-modal.test.ts`:
- Asserts `providerSandboxId` matches Modal's `sb-...` shape.
- Spawns a **fresh** terminator (no shared state with the original
pool), terminates by id only — proves the row's
provider_sandbox_id alone is enough to kill the compute
out-of-process.
- Confirms idempotency (second terminate of the same id → ok) and
that the sandbox reports dead afterwards.
- Existing e2e janitor fixtures updated for the new SweepResult fields.
All 158 e2e cases + 116 janitor unit tests pass. Real-Modal tests both
green (~7s + ~5s) against Ben's account.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vitest's Vite layer searches upward for a postcss.config.js and hits the repo-root one (Tailwind). In CI's filtered install (`pnpm --filter '@posthog/agent-X...' install --frozen-lockfile`), `@tailwindcss/postcss` isn't a transitive dep of any agent service and the test run blows up at load with `Cannot find module '@tailwindcss/postcss'`. Backend services don't process CSS at all, so set `css.postcss.plugins: []` per workspace to short-circuit the lookup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lifts the Modal sandbox impl from "v0 stub-replacement" to "production
shape" and converges Docker + Modal on the same per-session sandbox-host
image. Picks up the patterns from products/tasks/backend/services/modal_sandbox.py
that we'd otherwise rediscover the hard way.
Modal sandbox hardening (sandbox-modal.ts):
- Region pinning: MODAL_REGION env wins; otherwise derived from
CLOUD_DEPLOYMENT (US→us-east, EU→eu-west, else→us-east). Matches the
Tasks side so EU-tenant agents stay in EU.
- Named sandboxes: `name: agent-<sessionId-prefix>` so the Modal
dashboard is browsable.
- CPU + memory caps: passed as reservation+limit pairs (Modal rejects
one without the other), wired from new `spec.limits.max_memory_mb`
(default 512 MiB) and `spec.limits.max_cpu_cores` (default 0.25).
- Provision diagnostics: `verbose: true` + a try/catch around
`sandboxes.create` that logs image/region/cpu/mem with the error so a
wedged image pull or scheduling failure shows up in our logs
immediately instead of as a silent timeout.
- SANDBOX_HOST_IMAGE env supports a shared image reference; selector
resolves backend-specific override → SANDBOX_HOST_IMAGE → backend
default. Default stays `node:24-alpine` so dev keeps working; chart
switches to the canonical image once the first CI build lands.
Canonical sandbox-host image (services/agent-sandbox-host/):
- Dockerfile + README rewritten to make the image's role explicit:
baked `/sandbox/dispatch.js` + `/sandbox/host.js` consumed by BOTH
pools. No CMD — Modal idles by default; the Docker pool overrides
with `node /sandbox/host.js` for the alive-marker handshake.
- New `scripts/smoke-test-image.sh` containerized smoke test. One
fresh container + workdir per scenario (avoids a macOS bind-mount
cache race when re-using a single container across sequential
dispatches). Asserts happy / bad-action / missing-tool. Hooked into
the CI build step so a broken image fails before GHCR push.
- CI workflow adds posthog-agent-sandbox-host to both the build and
deploy matrices; deploy dispatches a state.yaml row the chart
consumes as `SANDBOX_HOST_IMAGE`.
Sandbox e2e parity (sandbox-{modal,docker}.test.ts):
- New `sandbox-docker.test.ts` mirrors the Modal e2e: provisions a real
container from the canonical image, lays out a tool, dispatches,
asserts response + providerSandboxId shape, releases. Opt-in by
`docker info` + `docker image inspect` so CI without a daemon
silently skips.
- Modal reaper test gets a poll-with-deadline on the post-terminate
`isAlive()` check — Modal's `poll()` is eventually consistent (1-2s)
and the test was occasionally racing the state update.
Spec (spec.ts):
- New `max_memory_mb` (default 512) and `max_cpu_cores` (default 0.25)
on `SpecLimitsSchema`. Authors get sensible defaults; tools that
load large model artifacts or are compute-bound can bump. Wired into
Worker.acquireForSession's SandboxLimits.
All 158 e2e cases + 116 janitor unit tests + Modal e2e (×2) + Docker
e2e + containerized smoke test all green against the local build.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-ups after the first CI run: - **agent-shared / agent-janitor / agent-tools unit tests need SeaweedFS.** Their memory.test.ts (and tools/memory.test.ts) hit a real S3MemoryStore by design — see services/agent-shared/CLAUDE.md. Bring up the `seaweedfs` compose service in the unit-tests matrix via `bin/ci-wait-for-docker` and point `AGENT_MEMORY_TEST_S3_ENDPOINT` at it. Uniform across the matrix so we don't need per-workspace conditionals. - **agent-migrations needed a vitest.config.ts.** Without one, vitest walks up to the repo-root postcss.config.js and tries to require `@tailwindcss/postcss` (same issue the other workspaces had — but agent-migrations had no config at all so the earlier fix didn't apply). Adds the same `css.postcss.plugins: []` stub + `passWithNoTests` so the script can stay `vitest run`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`@posthog/quill` ships `main: ./dist/index.cjs`, so tsc can't resolve
`import { … } from '@posthog/quill'` until quill is built. The build
job and Dockerfile already do this; the typecheck job was missing it,
producing TS2307 across every console source file that imports quill.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…heck stub CI run for the prior commit failed on the new sandbox-host image build: Error: failed to push 795637471508.dkr.ecr.us-east-1.amazonaws.com/posthog-agent-sandbox-host: unexpected status from POST: 404 Not Found The ECR repo for posthog-agent-sandbox-host doesn't exist, and there's no reason to create it — this image is consumed via GHCR by Modal and by the chart-side SANDBOX_HOST_IMAGE env (both reach GHCR fine). The chart doesn't deploy this image as a service so it doesn't need the ECR-based image pipeline. - `.github/actions/docker-meta/action.yml`: new `push-to-ecr` input (default `true`, no behavior change for existing callers). When `false`: skips configure-aws-credentials + amazon-ecr-login, omits the ECR tag from the images list. Aws-role-to-assume becomes optional in that case. - `.github/workflows/ci-agent-container.yml`: matrix gets a `push_to_ecr` per image; sandbox-host sets `'false'`. The digest + smoke-test steps fall back to the GHCR ref when ECR is skipped so the pulled image still resolves. - `services/agent-sandbox-host/package.json`: add a `typescript:check` script. The package is pure CJS (host.js + dispatch.js), so no real TS; node --check on the three .js files is the equivalent syntax check and lets the matrix's `typescript:check` step succeed for this workspace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-mount Last CI build pushed the sandbox-host image to GHCR successfully (so the push-to-ecr toggle works), but the smoke test failed: smoke: FAIL — host.alive never appeared The image's CMD runs `node /sandbox/host.js` as the in-container `sandbox` user, which writes `/workdir/host.alive`. On Linux the bind-mounted host dir carries the runner's UID/GID; the in-container non-root user can't write to it and exits silently. macOS Docker hides this with VirtioFS UID translation, which is why the test passed locally. Fix: chmod the per-scenario tmpdir to world-writable before bind-mount. The test owns the dir end to end so the looser perms are fine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The posthog-agent-sandbox-host image is now public on GHCR (verified — Modal pulls anonymously). Switching the Modal pool to use it by default and dropping the runtime dispatcher fallback now that the canonical image bakes /sandbox/dispatch.js. sandbox-modal.ts: - DEFAULT_IMAGE_TAG → ghcr.io/posthog/posthog-agent-sandbox-host:master (mutable tag for dev; prod chart pins a digest). - Drop the 80-line DISPATCH_JS string constant + the per-acquire writeText that handled the vanilla-node fallback. The image is now the contract; pointing at a base without /sandbox/dispatch.js will surface ENOENT at first invoke, by design. sandbox-modal.test.ts: - Both e2e tests honour SANDBOX_HOST_IMAGE env override so an in-flight PR can be validated against its `:pr-<num>` tag before `:master` exists. - Validated against ghcr.io/posthog/posthog-agent-sandbox-host:pr-61452 — acquire/dispatch + reaper both green in ~11s. posthog-agents smoke test fix (.github/scripts/smoke-test-agent-bundle.sh): - The branch's earlier change made REDIS_URL required at boot. The CI smoke test wasn't passing it, so the ingress bundle fatal-exited on the missing var before reaching the dial-out stage the smoke test recognizes — flagged as "no recognisable boot output". Adds REDIS_URL + SANDBOX_BACKEND + MODAL_TOKEN_* + AGENT_USE_AI_GATEWAY + POSTHOG_AI_GATEWAY_URL + POSTHOG_API_BASE_URL to the env (all unreachable 127.0.0.1:1 values; the bundle proceeds past config gates and fails on ECONNREFUSED, which the smoke test already recognizes). All four entrypoints (ingress/runner/janitor/migrate) pass locally against the published ghcr.io/posthog/posthog-agents:pr-61452 image. services/agents/scripts/smoke-local.sh (new): - One-command "build + smoke all four entrypoints" wrapper around the CI smoke script. SKIP_BUILD=1 reuses an existing tag; IMAGE=... smokes a specific reference (handy for validating a published PR image without a local rebuild). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
services/agent-shared/src/sandbox/sandbox-modal.ts:194-211
Stale rejected promise permanently breaks the pool — if the first `acquireForSession` call fails due to a transient Modal auth error, rate limit, or network blip, `this.clientPromise` is set to a rejected `Promise`. Because the guard is `if (!this.clientPromise)`, every subsequent call skips re-initialization and immediately rejects with the same original error. The pool stays broken until the runner pod restarts. Clearing `clientPromise` on rejection lets the next acquire retry the handshake.
```suggestion
private async getClient(): Promise<{ client: ModalClientType; app: App; image: Image }> {
if (!this.clientPromise) {
this.clientPromise = (async () => {
// Dynamic import — keeps `modal` (gRPC + protobuf, heavy) off
// the load path for tests / packages that never construct a
// ModalSandboxPool. The selector imports this module
// unconditionally; the SDK is paid for only when chosen.
const { ModalClient } = await import('modal')
const client = new ModalClient()
const app = await client.apps.fromName(this.opts.appName ?? DEFAULT_APP_NAME, {
createIfMissing: true,
})
const image = client.images.fromRegistry(this.opts.image ?? DEFAULT_IMAGE_TAG)
return { client, app, image }
})().catch((err) => {
// Clear so the next acquire can retry rather than
// permanently failing with the same transient error.
this.clientPromise = null
throw err
})
}
return this.clientPromise
}
```
### Issue 2 of 2
services/agent-shared/src/sandbox/sandbox-modal.ts:118
The JSDoc says `ap-...` but `ap-` is Modal's App ID prefix, not the Sandbox ID prefix. Modal Sandbox IDs start with `sb-`, which is what `client.sandboxes.fromId()` actually expects — the e2e test at line 625 of `sandbox-modal.test.ts` confirms this with `expect(sandbox.providerSandboxId).toMatch(/^sb-/)`. The wrong prefix in the comment will mislead anyone debugging the janitor's reaper or checking the `provider_sandbox_id` column in the DB.
```suggestion
/** Modal's `sb-...` sandbox id — what `client.sandboxes.fromId()` consumes. */
```
Reviews (1): Last reviewed commit: "feat(agents): switch Modal pool to canon..." | Re-trigger Greptile |
| private async getClient(): Promise<{ client: ModalClientType; app: App; image: Image }> { | ||
| if (!this.clientPromise) { | ||
| this.clientPromise = (async () => { | ||
| // Dynamic import — keeps `modal` (gRPC + protobuf, heavy) off | ||
| // the load path for tests / packages that never construct a | ||
| // ModalSandboxPool. The selector imports this module | ||
| // unconditionally; the SDK is paid for only when chosen. | ||
| const { ModalClient } = await import('modal') | ||
| const client = new ModalClient() | ||
| const app = await client.apps.fromName(this.opts.appName ?? DEFAULT_APP_NAME, { | ||
| createIfMissing: true, | ||
| }) | ||
| const image = client.images.fromRegistry(this.opts.image ?? DEFAULT_IMAGE_TAG) | ||
| return { client, app, image } | ||
| })() | ||
| } | ||
| return this.clientPromise | ||
| } |
There was a problem hiding this comment.
Stale rejected promise permanently breaks the pool — if the first
acquireForSession call fails due to a transient Modal auth error, rate limit, or network blip, this.clientPromise is set to a rejected Promise. Because the guard is if (!this.clientPromise), every subsequent call skips re-initialization and immediately rejects with the same original error. The pool stays broken until the runner pod restarts. Clearing clientPromise on rejection lets the next acquire retry the handshake.
| private async getClient(): Promise<{ client: ModalClientType; app: App; image: Image }> { | |
| if (!this.clientPromise) { | |
| this.clientPromise = (async () => { | |
| // Dynamic import — keeps `modal` (gRPC + protobuf, heavy) off | |
| // the load path for tests / packages that never construct a | |
| // ModalSandboxPool. The selector imports this module | |
| // unconditionally; the SDK is paid for only when chosen. | |
| const { ModalClient } = await import('modal') | |
| const client = new ModalClient() | |
| const app = await client.apps.fromName(this.opts.appName ?? DEFAULT_APP_NAME, { | |
| createIfMissing: true, | |
| }) | |
| const image = client.images.fromRegistry(this.opts.image ?? DEFAULT_IMAGE_TAG) | |
| return { client, app, image } | |
| })() | |
| } | |
| return this.clientPromise | |
| } | |
| private async getClient(): Promise<{ client: ModalClientType; app: App; image: Image }> { | |
| if (!this.clientPromise) { | |
| this.clientPromise = (async () => { | |
| // Dynamic import — keeps `modal` (gRPC + protobuf, heavy) off | |
| // the load path for tests / packages that never construct a | |
| // ModalSandboxPool. The selector imports this module | |
| // unconditionally; the SDK is paid for only when chosen. | |
| const { ModalClient } = await import('modal') | |
| const client = new ModalClient() | |
| const app = await client.apps.fromName(this.opts.appName ?? DEFAULT_APP_NAME, { | |
| createIfMissing: true, | |
| }) | |
| const image = client.images.fromRegistry(this.opts.image ?? DEFAULT_IMAGE_TAG) | |
| return { client, app, image } | |
| })().catch((err) => { | |
| // Clear so the next acquire can retry rather than | |
| // permanently failing with the same transient error. | |
| this.clientPromise = null | |
| throw err | |
| }) | |
| } | |
| return this.clientPromise | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: services/agent-shared/src/sandbox/sandbox-modal.ts
Line: 194-211
Comment:
Stale rejected promise permanently breaks the pool — if the first `acquireForSession` call fails due to a transient Modal auth error, rate limit, or network blip, `this.clientPromise` is set to a rejected `Promise`. Because the guard is `if (!this.clientPromise)`, every subsequent call skips re-initialization and immediately rejects with the same original error. The pool stays broken until the runner pod restarts. Clearing `clientPromise` on rejection lets the next acquire retry the handshake.
```suggestion
private async getClient(): Promise<{ client: ModalClientType; app: App; image: Image }> {
if (!this.clientPromise) {
this.clientPromise = (async () => {
// Dynamic import — keeps `modal` (gRPC + protobuf, heavy) off
// the load path for tests / packages that never construct a
// ModalSandboxPool. The selector imports this module
// unconditionally; the SDK is paid for only when chosen.
const { ModalClient } = await import('modal')
const client = new ModalClient()
const app = await client.apps.fromName(this.opts.appName ?? DEFAULT_APP_NAME, {
createIfMissing: true,
})
const image = client.images.fromRegistry(this.opts.image ?? DEFAULT_IMAGE_TAG)
return { client, app, image }
})().catch((err) => {
// Clear so the next acquire can retry rather than
// permanently failing with the same transient error.
this.clientPromise = null
throw err
})
}
return this.clientPromise
}
```
How can I resolve this? If you propose a fix, please make it concise.…JSDoc Two issues Greptile flagged on the latest commit, plus the regenerated OpenAPI artifacts the spec.limits.max_memory_mb + max_cpu_cores additions from 964df72 should have produced (lint-staged regenerated them this commit). 1. (P1) Stale rejected promise wedged the pool until pod restart. If the FIRST `acquireForSession` call rejected on the lazy client/app/image init (transient Modal auth blip, rate limit, network), the rejected promise stayed cached in `this.clientPromise`. The `if (!this.clientPromise)` guard skipped re-initialization on every subsequent call, so the pool kept rethrowing the same stale error forever. Clearing the field on rejection lets the next acquire retry the handshake. 2. (P2) JSDoc on `providerSandboxId` getter said `ap-...`, which is Modal's App ID prefix — Sandbox IDs start with `sb-`. The test regex already asserts `/^sb-/`; only the comment was stale. 3. Regenerated `spec_schema.py` (Django), `api.zod.ts` (frontend), and `api.ts` (MCP) from `hogli build:openapi` — picks up the new `max_memory_mb` (default 512) + `max_cpu_cores` (default 0.25) fields in `SpecLimitsSchema`. The lint-staged hook produced these when the previous commit's spec change was reformatted but the regen step didn't fire then. Both Modal e2e tests still green against pr-61452 image. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Query snapshots: Backend query snapshots updatedChanges: 2 snapshots (2 modified, 0 added, 0 deleted) What this means:
Next steps:
|
There was a problem hiding this comment.
Supply Chain Security Review
All 14 added packages published 7+ days ago with no new CVEs or MAL- findings. Workflow changes use SHA-pinned actions and look well-structured. The only notable concern is the unpinned base image in the new Dockerfile.
Tag @mendral-app with feedback or questions. View session
| # identical content) AND against a plain node base if the chart ever points | ||
| # at one. | ||
|
|
||
| FROM node:24-alpine |
There was a problem hiding this comment.
maintainability (P2): Base image node:24-alpine uses a floating tag. A compromised or unexpected upstream push could silently change the image contents between builds. Pin to a digest for reproducibility.
Suggested change
| FROM node:24-alpine | |
| FROM node:24-alpine@sha256:e2a4f5759776190b81bde04a498a462aad85810369e929ca8254f4c167b2bb8c |
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At services/agent-sandbox-host/Dockerfile, line 19:
<issue>
Base image `node:24-alpine` uses a floating tag. A compromised or unexpected upstream push could silently change the image contents between builds. Pin to a digest for reproducibility.
</issue>
… opts, selector
Regression + unit coverage for the things the prior commits left bare:
1. **Greptile P1 lock-in.** Mocks `ModalClient` to throw on first call,
asserts the second `acquireForSession` invokes the constructor again
instead of returning the cached rejected promise. Without the
catch-and-clear in `getClient()`, `expect(ctor).toHaveBeenCalledTimes(2)`
would fail with `1` — proves the fix is load-bearing.
2. **`resolveRegion` exported + tested.** Covers MODAL_REGION env wins,
CLOUD_DEPLOYMENT US/EU map correctly, unknown deployment falls back to
us-east, unset falls back to us-east, MODAL_REGION beats CLOUD_DEPLOYMENT
when both set. Was previously a module-private helper consumed only
through the e2e.
3. **`timeoutMs: 0` regression.** I hit this exact bug on the first
real-Modal run — Modal rejects `exec({ timeoutMs: 0 })` despite the
type def claiming "default 0 (no timeout)". Asserts the field is
*omitted* when undefined or zero, and passed through when positive.
Without this, a refactor that reverts to `timeoutMs: req.timeoutMs ?? 0`
would silently re-break.
4. **`selectSandboxPool` fail-fast + image fallback.** Throws clearly when
SANDBOX_BACKEND is unset or set to `in-process` (the intentionally-
removed case). SANDBOX_HOST_IMAGE applies to both pools as the shared
fallback. Backend-specific overrides (SANDBOX_MODAL_IMAGE /
SANDBOX_DOCKER_IMAGE) beat the shared one.
All tests use top-level imports + `vi.doMock` — the dynamic
`await import(...)` workaround in the first draft was unnecessary;
`vi.doMock` takes effect for the dynamic `await import('modal')` inside
`getClient()` at call time, not at the time `sandbox-modal` itself is
imported by the test file.
10 unit tests, ~250ms total. Full sandbox suite (28 tests including the
two real-Modal e2e + the real-Docker e2e) green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Promotes a working rule into agent-tests/CLAUDE.md so it lives in the repo (and auto-loads when working anywhere under services/agent-*) instead of one engineer's personal config: - Bug fix → regression test (must fail when fix is reverted; mental- trace counts when reverting is annoying because of formatter rewrites or hooks). - New helper / pure function → unit test for the obvious axes. - New plumbing → wire test that asserts the value lands at the downstream consumer. - E2E doesn't substitute for unit tests — harness covers end-to-end feature, unit tests cover single-function contract. - Run the relevant file before declaring done. "Wrote a test, didn't run it" is how broken assertions ship. Plus the vitest mock trap call-out: top-level imports + `vi.doMock` is the right pattern, not `await import` inside `it` blocks — the mock registry is consulted when the SUT's dynamic import runs, not when the test file imports the SUT. (Worked example: services/agent-shared/src/sandbox/sandbox-modal-unit.test.ts.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Supply Chain Security Review
✅ Approve — 1 finding in 1 file
14 packages added, all published 7+ days ago with no new CVEs or MAL- findings. GitHub Actions remain pinned to commit SHAs. The modal npm package addition aligns with the stated sandbox-pool feature.
Tag @mendral-app with feedback or questions. View session
| @@ -1,23 +1,41 @@ | |||
| # Canonical sandbox-host image used by BOTH the Docker and Modal pools. | |||
There was a problem hiding this comment.
maintainability (P2): The node:24-alpine tag is mutable; now that this Dockerfile is built and pushed in CI, pin to a digest to ensure reproducible builds and prevent silent base-image substitution.
Suggested change
| # Canonical sandbox-host image used by BOTH the Docker and Modal pools. | |
| FROM node:24-alpine@sha256:b39b36df10420d52576a1b22b50b29aed1264ab576a89fae9388cd7a363dd8d2 |
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At services/agent-sandbox-host/Dockerfile, line 1:
<issue>
The `node:24-alpine` tag is mutable; now that this Dockerfile is built and pushed in CI, pin to a digest to ensure reproducible builds and prevent silent base-image substitution.
</issue>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub. |
🔍 Migration Risk AnalysisWe've analyzed your migrations for potential risks. Summary: 2 Safe | 1 Needs Review | 0 Blocked
|
The repo-wide semgrep-general job excludes services/, so every services/agent-*/ (and services/agents/) silently dropped out of SAST and tripped the WF004-semgrep-services-coverage workflow lint check. Add them all to the semgrep-js target list. Bringing them under scan surfaced pre-existing benign findings — local dev/test redis:// defaults, a docstring URL, the Aurora-CA pg pool ssl setting, and an example seed script's urllib call. Suppress each with a documented `// nosemgrep` per the existing services/mcp convention rather than weakening the rules. Also pin the sandbox-host base image to node:24.13.0-alpine (matching services/agents/Dockerfile's NODE_VERSION) so it can't silently shift between builds, addressing the floating-tag review note. Generated-By: PostHog Code Task-Id: 3441c90a-dda9-4317-bf06-83ad39346d1e
|
Warning Review the following alerts detected in dependencies. According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.
|
| if (allowlist.length && !allowlist.includes(u.host)) { | ||
| throw new Error(`host not allowed: ${u.host}`) | ||
| } | ||
| const res = await fetch(args.url, { method: 'GET' }) |
There was a problem hiding this comment.
High: Server-side request forgery
@posthog/web-fetch runs as a native tool inside the runner process, so this direct fetch(args.url) lets an agent invoke arbitrary URLs from the runner's network position and return the response body. Route this through an explicit egress proxy/dispatcher or validate and block private, link-local, and cluster-local destinations here; Node's global fetch will not enforce that just because the tool description says infrastructure does.
PR overviewThis PR wires the agents system for production readiness, including Modal sandbox integration, a Redis-only bus, approval flows, and gateway-related plumbing. It also includes a native web-fetch tool used by agents to retrieve URL contents. There is one open security issue: the agent web-fetch tool can directly request arbitrary URLs from the runner’s network position and return the response body. This creates a meaningful server-side request forgery risk because internal, private, or cluster-local services may be reachable unless egress is proxied or destinations are explicitly blocked. No issues have been fixed yet, so the PR still needs a targeted mitigation before it is in a safe security posture. Open issues (1)
Fixed/addressed: 0 · PR risk: 7/10 |
|
🎭 Playwright report · View test results →
These issues are not necessarily caused by your changes. |
dmarticus
left a comment
There was a problem hiding this comment.
gonna stamp and merge this
Problem
The agent platform services were deployable in dev but several pieces blocked actually running real workloads end-to-end:
in-process, so author JS ran in the runner pod with no isolation; Modal was a stub that threw onacquireForSession.SessionEventBuswhenREDIS_URLwas unset, so SSE on ingress pod A silently missed events from runner pod B.PgApprovalStorewas fully implemented but never constructed in either prod entrypoint, so everyrequires_approvaltool was silently ungated and/approvals/*returned 503.egressProxyUrlplumbing on the sandbox interface that no caller actually wired (smokescreen owns egress now).@posthog/web-fetchshipped a "permitting all hosts" warning since the allowlist was never threaded into context — infra-level smokescreen handles this now.Companion chart + terraform PRs:
Changes
Modal sandbox (B1)
ModalSandboxPoolagainst the officialmodalnpm SDK (one Modal Sandbox perAgentSession,node:24-alpinebase image, tools + dispatch.js laid out viasandbox.filesystem.writeText, per-invokesandbox.exec(['node', '/sandbox/dispatch.js', ...])). The dispatcher script is inlined as a string constant — mirrors the existingservices/agent-sandbox-hostwire format so the file-based contract is portable.selectSandboxPoolno longer acceptsin-process; it throws at boot whenSANDBOX_BACKENDisn'tmodalordocker.InProcessSandboxPoolthe class stays, but tests construct it directly — it's just not reachable from prod entry points anymore.MODAL_TOKEN_ID+MODAL_TOKEN_SECRET(read directly by the Modal SDK) wired in the chart fromagent-platform-shared-secrets.Redis-only SessionEventBus (B8)
MemorySessionEventBuswhenREDIS_URLis unset — they throw at boot with a clear error pointing at the chart wiring.PlatformConfigSchema.redisUrlkeeps an optionalisDev()default ofredis://localhost:6379so test fixtures / local dev still work; prod (NODE_ENV=production) gets no default and the entrypoint check fires.valkey: agent-platform:map wiresREDIS_URLfromREDIS_WRITER_URLautomatically.Approvals (B4)
new PgApprovalStore(agentDb)into both the runnerWorkerandbuildJanitorApp(plus thesweepconfig) — matches the test harness exactly.requires_approvaltools now actually intercept;/approvals/*HTTP surface no longer 503s.0; deciding an approval happens via curl or MCP for now. Tracked in the plan as H3.Egress plumbing cleanup (B2)
egressProxyUrlfromAcquireOpts+InProcessSandbox. StripEgressPolicy+egressfromSandboxLimits. Update the secret-broker comment.web-fetch warning (B3)
allowlist not yet enforced; permitting allwarning + the dead spec-field hook. Description now says outbound host filtering lives at the infrastructure layer.How did you test this code?
I'm an agent (Claude Opus 4.7 via Claude Code). Manual testing is what you'd do in the cluster — I ran:
pnpm --filter @posthog/agent-shared --filter @posthog/agent-runner --filter @posthog/agent-ingress --filter @posthog/agent-janitor exec tsc --noEmit— clean across every iteration.pnpm --filter @posthog/agent-{shared,runner,ingress,janitor,tools} test --run— all unit suites pass (294 tests across the four packages).AGENT_SKIP_REAL_INFERENCE=1 pnpm --filter @posthog/agent-tests test --run— full e2e harness, 158 passed / 7 skipped (the 7 skipped are real-inference-only by env flag).pnpm --filter @posthog/agent-shared --filter @posthog/agent-runner --filter @posthog/agent-ingress --filter @posthog/agent-janitor --filter @posthog/agent-tools lint— no new warnings or errors.pnpm run buildinservices/agents— esbuild bundles cleanly; runner bundle picks up the Modal SDK (grep -c "ModalClient\|sandboxes.create" dist/runner.mjs→ 19 references).Not verified by me:
sandboxes.create→filesystem.writeText→exec→waitflow needs a real session to confirm.Automatic notifications
Docs update
docs/agent-platform/docs/deploy-runbook.mdwill need the new env vars (SANDBOX_BACKEND=modal,MODAL_TOKEN_ID/MODAL_TOKEN_SECRET, removal of the "Redis optional" note). Happy to follow up in this PR or a separate one.🤖 Agent context
Authored by Claude Opus 4.7 (1M context) via Claude Code, working from the running plan at
~/.claude/plans/agent-platform-prod-readiness.md. Ben drove the scope decisions:modalnpm SDK rather than a "TODO" placeholder.in-processfrom the prod selector entirely (fail-fast at boot) instead of warning-then-using.MemorySessionEventBusfrom prod boot, requireREDIS_URL. New Valkey serverless in the agent-platform terraform module backs it.PgTeamApiKeyResolverreadsposthog_team.api_tokenso no upstream provider keys are needed in the agent-platform secrets bag.Things explicitly skipped per Ben's scoping: prod-us / prod-eu Argo blocks, prod terraform, public ingress for agent-console / agent-ingress (Tailscale dev is fine for now). Tracked as the "Out of scope" section in the plan.
The dispatcher script is inlined as a string constant in
sandbox-modal.tsrather than reused fromservices/agent-sandbox-host/to keepagent-sharedself-contained (no runtime file resolution, no new package to publish). If Docker continues to be useful for local dev, the two should eventually share a single source — flagged as a follow-up but not blocking.