feat: sandbox tags and usage visibility API by ZIJ · Pull Request #184 · diggerhq/opencomputer

ZIJ · 2026-04-22T20:23:11Z

Ships

Five endpoints for per-sandbox and per-tag spend attribution, plus tags + tagsLastUpdatedAt on existing sandbox reads. Units are GB-seconds — Stripe remains the pricing source of truth.

GET /api/usage                    aggregator: groupBy=sandbox | tag:<key>
GET /api/tags                     org-wide tag-key discovery
GET /api/sandboxes/{id}/usage     per-sandbox drilldown
GET /api/sandboxes/{id}/tags      read current tags
PUT /api/sandboxes/{id}/tags      full-replace

GET /api/sandboxes and GET /api/sandboxes/{id} responses gain tags + tagsLastUpdatedAt (additive, all four code paths).

Design: .agents/design/sandbox-tags-and-usage.md
Impl plan + review-flag log (F1–F15): .agents/work/sandbox-tags-impl.md
TS + Python SDKs updated in lockstep; API reference under docs/api-reference/usage/ and docs/api-reference/sandboxes/{get,set}-tags.mdx, wired into docs/mint.json (new "Usage" nav group).

Tag storage: new table, PK on `(org_id, sandbox_id, key)`

Sandbox IDs are short sb-xxxxxxxx strings generated independently per create path and not schema-unique across orgs. A (sandbox_id, key) PK plus sandbox-id-only lookups would let a single cross-org ID collision alias tag state across tenants, or deny the rightful owner access to their own sandbox via ownsSandbox. Every tag read, write, and join — including the session lookups used by ownsSandbox and the drilldown — scopes on (org_id, sandbox_id). TestBuildUsageQuery_JoinsIncludeOrgID pins the SQL-level invariant.

sandbox_sessions.metadata is left untouched — semantically it's a per-session create-time snapshot, not sandbox-level tags. No SDK surface for it changes.

Math: bit-for-bit identical to `GetOrgUsage`

Per-sandbox, per-tag, and untagged sums must reconcile to the org rollup the Stripe pipeline reports. Inline GB-second math mirrors GetOrgUsage and DiskOverageGBSeconds verbatim — same COALESCE(ended_at, LEAST(now(), $to)) - GREATEST(started_at, $from) idiom, same max(0, disk_mb - 20480) / 1024 * duration formula. The pgfixture test pins the full chain within 1e-6:

GetOrgUsage rollup == ExecuteOrgTotals
                   == Σ ExecuteUsageQuery(groupBy=sandbox)
                   == Σ ExecuteUsageQuery(groupBy=tag:<k>) + ExecuteUntaggedTotals

Any future change to the rollup math must change this code in lockstep.

Attribution: retagging rewrites history

Queries join live sandbox_tags. A retag changes all prior attribution for that sandbox going forward — fine for ops, hazardous for chargebacks. tagsLastUpdatedAt (max updated_at across the sandbox's tag rows) is surfaced on every sandbox-level response so dashboards can annotate edits. No snapshot audit in v1; the upgrade path if needed is a separate sandbox_tag_changes table.

Freshness: fresh on clean shutdown, lagged until worker restart otherwise

ReconcileWorkerSessions closes zombie scale events at worker startup, not on a heartbeat — a silently-dead worker leaves events open and COALESCE(ended_at, now()) accruing until it restarts. This matches what GetOrgUsage and the Stripe rollup already do today; we surface the existing behavior through new endpoints rather than introducing a new risk. Called out in the /usage reference docs.

Not in v1

Dollars (Stripe), time-series bucketing (adds as ?interval=1d), multi-dim groupBy (extends to comma-separated), filter trees, historical tag snapshotting / audit log, GET /sandboxes?tag=k:v, CSV export, CLI/dashboard. The endpoint shape accommodates each additively.

PUT is full-replace only — no merge mode. One filter[<dim>] param per dimension; comma-OR values inside, AND across dimensions, repeats return 400. Aligns the HTTP surface with the map-shaped SDK.

Testing

Go: go test ./internal/db/ ./internal/api/ — query builder SQL shape, tenancy-predicate pinning, parsing, cursor round-trip, input validation, PUT size/charset/reserved-prefix rules, charset, handler smoke. All pass.
TS SDK: npx tsc --noEmit under sdks/typescript/ is clean.
Python SDK: module imports and dataclasses load cleanly.
Reconciliation invariant (internal/db/usage_query_pgfixture_test.go, build tag pgfixture): deterministic fixture covering tagged + untagged + still-running + pre-window sandboxes, asserts the four-way equality chain above. Gated on TEST_DATABASE_URL.

TEST_DATABASE_URL=postgres://... go test -tags=pgfixture \
  ./internal/db/ -run Reconciliation -v

Test plan

go test ./internal/db/ ./internal/api/ green in CI
Migration 026_sandbox_tags applies cleanly on an environment with existing sandbox_sessions data
TS SDK npx tsc --noEmit clean in CI; Python SDK imports
pgfixture reconciliation test wired into CI and passing — merge gate

🤖 Generated with Claude Code

The directory is no longer a one-off WIP experiment — it's where agent-facing design docs and completed plans live. Renaming drops the "wip" qualifier and introduces the durable layout: .agents/design/ — in-flight design docs .agents/done/ — shipped / superseded docs, kept for time-travel docs-plan.md moves to done/ as a completed plan; future design docs land in design/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Capture the prefix pattern that's already dominant on origin so future agents (and humans) default to it instead of following whichever branch they happened to land on. Current state on origin at time of writing: feat/* — 26 branches fix/* — 25 branches docs/* — 17 branches Plus a long tail of unprefixed kebab-case names and one-offs. The rule prefers the three prefixes for any new branch, and explicitly warns off personal-initials prefixes (ig/..., etc.) because they make in-flight work harder to find by topic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

API-only surface for attributing sandbox spend to customer-defined groupings (team, env, customer, etc.) and drilling down to individual sandboxes. Unit is GB-seconds — Stripe stays the source of truth for dollars. Key decisions captured in the doc: - Tags reuse the existing `sandbox_sessions.metadata` JSONB column. Audit showed the field is persisted but never queried today, so we rescope it as tags rather than introducing a parallel Tags field that would overlap semantically and break no SDKs. - Attribution model: current tags drive all historical spend. Retagging re-buckets. Chose this over per-event tag snapshotting for simplicity; revisit if stable historical attribution is asked for. - Unit: GB-seconds (memory + disk overage). Disk exposed as overage only (matches what's actually billed). CPU not exposed — it's deterministic from memory (1 vCPU per 4GB). - API shape: one `GET /usage` aggregator with `groupBy` and `filter[]` query params treating dimensions as data (sandbox, tag:<key>, future status/template/region), plus `GET /tags` for discovery, `GET /sandboxes/{id}/usage` as a drilldown, and `PATCH /sandboxes/{id}` for tag updates. Alternatives considered and rejected: - Narrow per-dimension endpoints (/usage/by-sandbox, /usage/by-tag). Privileges tags over other dimensions in the URL, forces a new route per future dimension. - Composable query DSL (POST /usage/query with full filter trees and nested aggregations, ELK/Prometheus shape). Reinvents a query engine for a problem without multi-dim demand yet. The middle — REST aggregator with dimensioned query params, Stripe list-endpoint style — keeps the door open to both without paying up front for either. Explicit non-goals for v1: dollars, time series / bucketing, multi-dim group-by, filter trees, per-event snapshotting, tag audit log, CSV export, dashboard, CLI. Each noted as additive-later, not breaking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-04-22T20:23:17Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
opensandbox	Ready	Preview, Comment	Apr 22, 2026 10:17pm

A pass from a fresh reviewer surfaced a load-bearing flaw and several real issues in the first cut. Revisions: - Tags move to a new `sandbox_tags` table, off `sandbox_sessions.metadata`. The first cut assumed one-session-per- sandbox. Verified: a sandbox owns MANY `sandbox_sessions` rows (every get sorts `ORDER BY started_at DESC LIMIT 1`), each with metadata captured at that session's create call. Reusing that column would silently require a "latest session" subquery on every tag query and conflict with PATCH mutation semantics (overwriting an otherwise-immutable create-time snapshot). New table: (org_id, sandbox_id, key, value, updated_at) PK (sandbox_id, key), indexed on (org_id, key, value). Row-per-tag keeps grouping SQL trivial and enables tag-count limits. `sandbox_sessions.metadata` left intact and unused; no SDK surface for it changes. - Endpoint scope narrowed from `PATCH /sandboxes/{id}` to `PUT /sandboxes/{id}/tags` + `GET /sandboxes/{id}/tags`. Closes the door on PATCH feature-creep for alias/memory/etc, each of which has its own semantic issues we haven't designed. - Retagging addressed explicitly. Attribution stays live (retag rewrites all history), with `tagsLastUpdatedAt` surfaced in every sandbox-level response so dashboards can annotate edits. Full snapshotting / audit log named as an upgrade path, out of v1 scope. Previously this was a single sentence; now it's a whole section. - Response shape normalized: variable key `"tag:team"` replaced with `{tagKey, tagValue}`. Untagged bucket moved from a null-valued item to a sibling `untagged` field — typed SDKs no longer have to null-check items. `sandboxCount` removed when groupBy=sandbox (always 1). - Status values reconciled with the actual state machine: removed "destroyed" (not a real state), kept running|hibernated|stopped|error. - GB-second math explicitly tied to `GetOrgUsage` and `DiskOverageGBSeconds` bit-for-bit. Reconciliation test required: Σ by-sandbox = Σ by-tag + untagged = GetOrgUsage(org) within float epsilon. - Data freshness claim qualified: fresh on clean shutdown, reconciliation-lagged on crashes (worker heartbeat closes zombie scale events via `ReconcileWorkerSessions`). - Validation rules added: ≤50 tags, 128/256 char limits, reserved `oc:` key prefix for future system-set tags. - Query guardrails added: ≤90-day window, ≤500 limit, 10s handler timeout. - Added SDK + docs implementation notes. Added authz note (inherits org-scope, no sub-org visibility — pre-existing pattern). - Trimmed the narrow-endpoints-vs-composable-DSL alternatives discussion from a full subsection to three sentences. The prose was re-litigating the chat. - `GET /sandboxes?tag=k:v` filter explicitly named as a follow-up PR, not v1 scope — obvious next ask once tags exist. Open questions list updated: tag-destroy behavior, optional PUT merge-mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Working doc for the sandbox-tags-and-usage delivery. Captures the plan of record and calls out issues the design doesn't flag that surfaced during code exploration. Flagged: - alias lives in sandbox_sessions.config JSONB, not a column — the design's responses show a top-level alias; plan is to extract via config->>'alias'. - ReconcileWorkerSessions runs at worker startup, not on a heartbeat, so the design's "reconciliation-lagged (minutes)" framing is misleading (inherited from the existing billing pipeline, not new). - Repo has no Postgres test fixture, so the reconciliation invariant test needs a pgfixture-tagged integration path gated on TEST_DATABASE_URL, plus pure-Go query-builder tests for the rest. - `:` in tag keys collides with tag:<key> syntax; resolved by SplitN(s, ":", 2). - Additive tags/tagsLastUpdatedAt response fields land in four read paths (getSandbox, getSandboxRemote, listSandboxes, listSandboxesRemote), not two. Also records the three design-level open-question decisions (firstStartedAt/lastEndedAt from sandbox_sessions; tag rows left on destroy; no PUT merge-mode) and the planned commit order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fresh code reading against the referenced primary sources (internal/db/usage.go, cmd/worker/main.go) surfaced one load-bearing inaccuracy and let the three open questions be closed. Revisions: - Freshness section rewritten. Previous wording framed staleness as "reconciliation-lagged" with an implied heartbeat sweep via ReconcileWorkerSessions. Verified: that function is called once at worker boot (cmd/worker/main.go:253), not on a heartbeat. A silently-dead worker that never restarts leaves scale events open and COALESCE(ended_at, now()) accruing indefinitely. Same behavior as GetOrgUsage and the Stripe pipeline today — we expose the existing risk, we don't introduce a new one. Section now says "lagged until worker restart" and names the Stripe pipeline as the precedent. - Open-question 1 (firstStartedAt / lastEndedAt source) closed in favour of sandbox_sessions MIN(started_at) / MAX(COALESCE(stopped_at, now())), clamped to the query window. Rationale folded into the /sandboxes/{id}/usage section: stable across scale-event churn and matches the user's "when did this sandbox exist" intent rather than "when was it billed." - Open-question 2 (tag-row behaviour on destroy) closed as "leave," with the minor /tags key-count overstatement accepted. Rationale folded into the /sandboxes/{id}/usage section alongside the "works for torn-down sandboxes" note. - Open-question 3 (PUT ?mode=merge) closed as "no." Rationale folded into the PUT section: one semantic per verb, atomicity stays simple, GET+PUT covers partial updates. - Validation section gained a bullet on `:`-in-keys parsing (SplitN(s, ":", 2) — everything after the first `:` is the tag key). Resolves ambiguity between user-namespaced keys like "team:payments" and the tag:<key> syntax in groupBy / filter. - PUT section gained a one-line empty-tags response contract: `tags` is `{}` (not null), `tagsLastUpdatedAt` is `null` — typed SDKs avoid a null map-check. - "Open questions for implementation" block removed entirely; all three items now have in-body answers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New table owning per-sandbox customer-defined tags for the upcoming usage-attribution endpoints (design: .agents/design/sandbox-tags-and- usage.md). Row-per-tag with PK (sandbox_id, key) so PUT semantics stay cheap and grouping SQL is a single join. org_id is denormalized so GET /tags can filter on (org_id, key) without joining sandbox_sessions — matches the indexing pattern already used in sandbox_scale_events. No change to sandbox_sessions.metadata — left intact and unused, as the design spelled out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

GetSandboxTags, GetSandboxTagsMulti, ReplaceSandboxTags, ListOrgTagKeys. The multi variant avoids an N+1 when hydrating tags onto GET /sandboxes list responses. ReplaceSandboxTags diffs against current state before mutating so no-op PUTs don't bump updated_at. tagsLastUpdatedAt is the signal dashboards use to annotate retagging events (design: "Attribution is live"); an idempotent PUT would confuse that signal if it refreshed the timestamp on rows whose value didn't change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pure-Go builder for the /usage aggregator: BuildUsageQuery, BuildUntaggedTotals, BuildOrgTotals. Split into separate queries so the items page and the untagged sibling bucket can paginate independently — the design puts untagged in a sibling field, not in items, and keyset pagination doesn't want a sentinel row interleaved. GB-second math mirrors GetOrgUsage and DiskOverageGBSeconds bit-for-bit — same `COALESCE(ended_at, LEAST(now(), $to)) - GREATEST(started_at, $from)` idiom, same `max(0, disk_mb - 20480) / 1024` formula. Reconciliation depends on the per-sandbox / per-tag sums adding back to GetOrgUsage totals by linearity; any change here requires an audit against those two call sites. Cursor is base64(JSON {v, t}) — sort value + tiebreaker (sandbox_id or tag value). Keyset filter lives in an outer WHERE over a subquery, not HAVING — Postgres disallows output aliases in HAVING. Pure-Go tests exercise: groupBy sandbox / tag, tag key containing `:` (SplitN rule from the design), value-present and key-absent filters, default and disk-overage sorts, cursor round-trip, all input- validation error paths. No DB required. The reconciliation invariant itself still needs a pgfixture-tagged integration test (tracked). SandboxUsageWindow covers the drilldown endpoint: per-sandbox GB- seconds plus first/last session bounds from sandbox_sessions (the closed decision on that open question — more stable than scale-event edges across scale churn). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Five new endpoints from the design: GET /api/usage # aggregator GET /api/tags # org-wide key discovery GET /api/sandboxes/:id/usage # per-sandbox drilldown GET /api/sandboxes/:id/tags # read tag set PUT /api/sandboxes/:id/tags # full-replace Tenancy enforcement lives in the handler (ownsSandbox), not in the sandbox_tags PK — (sandbox_id, key) is sufficient because the PUT path reads sandbox_sessions.org_id and rejects mismatches as 404 (does not leak existence across orgs). Design F3: PK stays narrow intentionally. PUT body is decoded as map[string]json.RawMessage so nested values fail fast with a typed 400 instead of coercing to string. Validation applies: - ≤ 50 tags - key 1..128 chars, [A-Za-z0-9_.\-:]+ (`:` allowed for user namespacing — see SplitN parsing rule in the design) - value ≤ 256 chars - `oc:` prefix reserved Empty-tag responses render as `"tags": {}`, `"tagsLastUpdatedAt": null` — typed SDKs avoid a null map-check. /usage delegates SQL to the query builder; handler hydrates items with alias/status/tags on groupBy=sandbox (alias extracted from sandbox_sessions.config JSONB per design F1). A per-row GetSandboxSession call is N+1 but bounded by the 500-row limit and the 10s handler timeout — fine for v1; promote to a batched store method if it shows up hot. isUserInputError inspects BuildUsageQuery's well-known error strings to map validation failures to 400 instead of 500. Brittle but self-contained; only the builder emits these. Pure-Go validation tests cover the PUT guardrails end-to-end. A handler-level HTTP test requires a Postgres fixture the repo doesn't have yet — same gap as the reconciliation integration test (tracked). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Additive fields on four read paths — getSandbox and listSandboxes in both local/combined and server/remote modes (design F8: easy to miss). Responses now carry a tags map (always, empty when no tags are set) and tagsLastUpdatedAt (null when empty). Extraction factored into sandbox_tags_hydrate.go: - mergeTagsInto for remote paths that already build response maps - withTagsHydrated / withTagsHydratedList for local paths that return sandbox types as-is — marshal-through-JSON preserves the original shape without coupling this file to pkg/types.Sandbox. Fail-soft semantics: a tag query error never fails the primary sandbox read — the response comes back without the tag fields rather than 500. Tags are informational; making them load-bearing for the list/get endpoints would be a regression. listSandboxesRemote batches via GetSandboxTagsMulti — one query per page, not N+1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Thin wrappers over the five new endpoints. Types mirror the server responses; dataclasses on the Python side, interfaces + classes on the TS side. Convenience wrappers live in the SDK layer, not the server — bySandbox / byTag / forSandbox are sugar around the single GET /usage route with different groupBy values. TS: type-checks under tsc --noEmit. Python: imports cleanly. Filter param handling preserves the repeatable `filter[tag:<k>]=...` shape — URLSearchParams in TS and list-of-tuples in Python both retain duplicates. Tag keys containing `:` pass through as literal substrings, matching the server's SplitN parsing rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Five new pages under docs/api-reference/, plus additive-field notes on the existing GET /sandboxes and GET /sandboxes/{id} pages. mint.json wires these into two places: - api-reference/sandboxes/{get,set}-tags.mdx appended to the existing "Sandboxes" group (PUT/GET are sandbox-scoped, so they belong with the other sandbox verbs). - A new "Usage" navigation group holds api-reference/usage/{get- usage,get-sandbox-usage,list-tags}.mdx — these are org-level reporting, a distinct concept from sandbox lifecycle. Docs mention the two reader-facing caveats from the design: retagging rewrites attribution going forward (with tagsLastUpdatedAt as the signal), and freshness lags until worker restart after a silent- death worker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Update the design and working notes after critical review. Key changes: - move sandbox_tags keying to (org_id, sandbox_id, key) - require org-scoped tag joins and reads instead of relying on sandbox ID uniqueness - narrow filter semantics to one param per dimension with comma-separated OR values - require clamped drilldown timestamps and explicit to<=from rejection - call out all four sandbox read paths, especially listSandboxesRemote - raise the testing bar so reconciliation against GetOrgUsage must be proven against real Postgres - make SDK/docs parity part of review readiness The main reason for the schema change is tenancy safety: sandbox IDs are short sb-xxxxxxxx values today, so sandbox_id-only tag keying is not a safe boundary.

The design and work docs were updated to tighten four areas that the first cut did not handle. All four are coordinated (they change the same method signatures and code paths), so they land together. F3 — org_id in the keyspace, not just in a lookup index. Sandbox IDs are short `sb-xxxxxxxx` strings generated independently per create path (internal/api/sandbox.go, internal/qemu/manager.go, internal/firecracker/manager.go), a short 32-bit space with no schema-enforced cross-org uniqueness. A `(sandbox_id, key)` PK plus sandbox_id-only lookups would let a single ID collision alias tag state across tenants. Changes: - Migration 026 PK is now (org_id, sandbox_id, key); the lookup index on (org_id, key, value) is unchanged. - GetSandboxTags / GetSandboxTagsMulti / ReplaceSandboxTags all take orgID and scope every read/write on (org_id, sandbox_id). - Every sandbox_tags join in the usage query builder (filter joins for /usage, the groupBy=tag join, the untagged-bucket join) adds `x.org_id = e.org_id` alongside the sandbox_id predicate. - All handler call sites plumb orgID from auth.GetOrgID down to the store. Hydration helpers (mergeTagsInto, withTagsHydrated*) take orgID too — hydration must not cross tenancy. - Added TestBuildUsageQuery_JoinsIncludeOrgID pinning the invariant at the SQL level so a future refactor can't drop the org scope silently. Migration has not been applied in prod (feature branch), so editing 026 in place rather than stacking a DROP/RECREATE fix. F9 — drilldown timestamps clamped, invalid window rejected. SandboxUsageWindow now wraps MIN(started_at) in GREATEST(…, $from) and MAX(COALESCE(stopped_at, now())) in LEAST(…, $to), so firstStartedAt / lastEndedAt can never leak outside the query window. The /sandboxes/:id/usage handler rejects to <= from with 400 to match the aggregator path. F10 — one filter param per dimension. The design was narrowed: `filter[tag:<key>]` may only appear once; multiple OR values go comma-separated within that one param. Accepting repeated filter[] keys would make the SDK map-shaped API diverge from the HTTP surface and buy no additional expressiveness. parseUsageQuery now returns 400 when it sees more than one value for a `filter[...]` key. F11 — batched session hydration for groupBy=sandbox. Under a 500-row × 10s handler budget, issuing one GetSandboxSession per result row is a silent multiplier. Added GetLatestSandboxSessionsMulti — a single DISTINCT ON (sandbox_id)-ordered-by-started_at DESC query that returns latest session per ID. hydrateSandboxUsageItems now does two batched reads (tags + sessions) and no per-row lookups. Builder tests still pass; the new org-scope test was added on top, not in place of existing ones. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The design and parseUsageQuery now reject duplicate filter[] params — one param per dimension, comma-separated OR values inside. The earlier copy on the SDK type and the /usage reference page still said "repeatable, AND'd," which no longer matches server behavior. Tightening the wording so readers don't write SDK calls that the server will 400. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Second review against the branch surfaced four more items; all now addressed. F12 — org-scoped session lookups throughout the feature. The F3 pass put tag storage on (org_id, sandbox_id, key) but left ownsSandbox and the drilldown handler reading sandbox_sessions by sandbox_id alone via GetSandboxSession. Sandbox IDs aren't globally unique, so on a cross-org ID collision the old code could return another org's session row — causing ownsSandbox to deny the rightful owner, or hydrating the drilldown's alias/status from the wrong tenant. Added GetSandboxSessionInOrg as the org-scoped primitive and swapped both call sites. ownsSandbox's 404 rationale is unchanged (same response for "doesn't exist" and "not your org"), but now it actually rejects the collision case correctly. F13 — SandboxUsageWindow no longer swallows DB errors. The session-bounds query treated any scan failure as "no sessions in window" and returned nil timestamps. That downgraded real DB/context failures into silent 200 responses with missing fields. Rewrote the scan: COALESCE(BOOL_OR(...), false) so an empty result set produces a normal successful scan with NULL pointer fields, and a scan error now means a genuine failure and is returned up the stack. F14 — reconciliation invariant has an executable test. internal/db/usage_query_pgfixture_test.go under build tag `pgfixture`, gated on TEST_DATABASE_URL. Seeds a fresh org with a deterministic fixture (two tagged sandboxes including one with disk overage and one still running, one untagged sandbox, and one sandbox entirely before the window) and asserts the full chain: GetOrgUsage rollup == ExecuteOrgTotals == Σ(ExecuteUsageQuery bySandbox items) == Σ(ExecuteUsageQuery byTag items) + ExecuteUntaggedTotals within 1e-6 float epsilon. Runs with: TEST_DATABASE_URL=... go test -tags=pgfixture ./internal/db/ \ -run Reconciliation -v Still needs CI wiring to be a merge gate; the test itself now exists so "we don't have the test" stops being the blocker. F15 — Python SDK docstring matches the narrowed filter contract. sdks/python/opencomputer/usage.py _build_params docstring was carrying the old "repeatable, httpx preserves duplicates" language. Rewrote to match the server: one param per dimension, comma-separated OR values, repeated keys rejected. Runtime surface (dict[str, str]) already matched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ZIJ and others added 3 commits April 22, 2026 21:22

vercel Bot deployed to Preview April 22, 2026 20:42 View deployment

ZIJ and others added 2 commits April 22, 2026 22:30

vercel Bot deployed to Preview April 22, 2026 21:34 View deployment

ZIJ and others added 7 commits April 22, 2026 22:35

vercel Bot deployed to Preview April 22, 2026 21:50 View deployment

ZIJ and others added 3 commits April 22, 2026 22:55

vercel Bot deployed to Preview April 22, 2026 22:03 View deployment

ZIJ and others added 2 commits April 22, 2026 23:13

docs: capture remaining usage review findings

1b9b133

vercel Bot deployed to Preview April 22, 2026 22:17 View deployment

ZIJ marked this pull request as ready for review April 22, 2026 22:18

ZIJ requested review from breardon2011 and motatoes April 22, 2026 22:24

motatoes approved these changes Apr 22, 2026

View reviewed changes

ZIJ merged commit 8f2e4b1 into main Apr 22, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: sandbox tags and usage visibility API#184

feat: sandbox tags and usage visibility API#184
ZIJ merged 18 commits intomainfrom
feat/sandbox-tags-and-usage

ZIJ commented Apr 22, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ZIJ commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ships

Tag storage: new table, PK on (org_id, sandbox_id, key)

Math: bit-for-bit identical to GetOrgUsage

Attribution: retagging rewrites history

Freshness: fresh on clean shutdown, lagged until worker restart otherwise

Not in v1

Testing

Test plan

Uh oh!

vercel Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ZIJ commented Apr 22, 2026 •

edited

Loading

Tag storage: new table, PK on `(org_id, sandbox_id, key)`

Math: bit-for-bit identical to `GetOrgUsage`

vercel Bot commented Apr 22, 2026 •

edited

Loading