feat: sandbox tags and usage visibility API#184
Merged
Conversation
The directory is no longer a one-off WIP experiment — it's where agent-facing design docs and completed plans live. Renaming drops the "wip" qualifier and introduces the durable layout: .agents/design/ — in-flight design docs .agents/done/ — shipped / superseded docs, kept for time-travel docs-plan.md moves to done/ as a completed plan; future design docs land in design/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Capture the prefix pattern that's already dominant on origin so future agents (and humans) default to it instead of following whichever branch they happened to land on. Current state on origin at time of writing: feat/* — 26 branches fix/* — 25 branches docs/* — 17 branches Plus a long tail of unprefixed kebab-case names and one-offs. The rule prefers the three prefixes for any new branch, and explicitly warns off personal-initials prefixes (ig/..., etc.) because they make in-flight work harder to find by topic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
API-only surface for attributing sandbox spend to customer-defined
groupings (team, env, customer, etc.) and drilling down to individual
sandboxes. Unit is GB-seconds — Stripe stays the source of truth for
dollars.
Key decisions captured in the doc:
- Tags reuse the existing `sandbox_sessions.metadata` JSONB column.
Audit showed the field is persisted but never queried today, so we
rescope it as tags rather than introducing a parallel Tags field
that would overlap semantically and break no SDKs.
- Attribution model: current tags drive all historical spend.
Retagging re-buckets. Chose this over per-event tag snapshotting
for simplicity; revisit if stable historical attribution is asked
for.
- Unit: GB-seconds (memory + disk overage). Disk exposed as overage
only (matches what's actually billed). CPU not exposed — it's
deterministic from memory (1 vCPU per 4GB).
- API shape: one `GET /usage` aggregator with `groupBy` and
`filter[]` query params treating dimensions as data (sandbox,
tag:<key>, future status/template/region), plus `GET /tags` for
discovery, `GET /sandboxes/{id}/usage` as a drilldown, and
`PATCH /sandboxes/{id}` for tag updates.
Alternatives considered and rejected:
- Narrow per-dimension endpoints (/usage/by-sandbox, /usage/by-tag).
Privileges tags over other dimensions in the URL, forces a new
route per future dimension.
- Composable query DSL (POST /usage/query with full filter trees and
nested aggregations, ELK/Prometheus shape). Reinvents a query
engine for a problem without multi-dim demand yet.
The middle — REST aggregator with dimensioned query params, Stripe
list-endpoint style — keeps the door open to both without paying up
front for either.
Explicit non-goals for v1: dollars, time series / bucketing,
multi-dim group-by, filter trees, per-event snapshotting, tag audit
log, CSV export, dashboard, CLI. Each noted as additive-later, not
breaking.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
A pass from a fresh reviewer surfaced a load-bearing flaw and several
real issues in the first cut. Revisions:
- Tags move to a new `sandbox_tags` table, off
`sandbox_sessions.metadata`. The first cut assumed one-session-per-
sandbox. Verified: a sandbox owns MANY `sandbox_sessions` rows
(every get sorts `ORDER BY started_at DESC LIMIT 1`), each with
metadata captured at that session's create call. Reusing that
column would silently require a "latest session" subquery on every
tag query and conflict with PATCH mutation semantics (overwriting
an otherwise-immutable create-time snapshot). New table:
(org_id, sandbox_id, key, value, updated_at) PK (sandbox_id, key),
indexed on (org_id, key, value). Row-per-tag keeps grouping SQL
trivial and enables tag-count limits. `sandbox_sessions.metadata`
left intact and unused; no SDK surface for it changes.
- Endpoint scope narrowed from `PATCH /sandboxes/{id}` to `PUT
/sandboxes/{id}/tags` + `GET /sandboxes/{id}/tags`. Closes the
door on PATCH feature-creep for alias/memory/etc, each of which
has its own semantic issues we haven't designed.
- Retagging addressed explicitly. Attribution stays live (retag
rewrites all history), with `tagsLastUpdatedAt` surfaced in every
sandbox-level response so dashboards can annotate edits. Full
snapshotting / audit log named as an upgrade path, out of v1
scope. Previously this was a single sentence; now it's a whole
section.
- Response shape normalized: variable key `"tag:team"` replaced with
`{tagKey, tagValue}`. Untagged bucket moved from a null-valued
item to a sibling `untagged` field — typed SDKs no longer have to
null-check items. `sandboxCount` removed when groupBy=sandbox
(always 1).
- Status values reconciled with the actual state machine: removed
"destroyed" (not a real state), kept
running|hibernated|stopped|error.
- GB-second math explicitly tied to `GetOrgUsage` and
`DiskOverageGBSeconds` bit-for-bit. Reconciliation test required:
Σ by-sandbox = Σ by-tag + untagged = GetOrgUsage(org) within float
epsilon.
- Data freshness claim qualified: fresh on clean shutdown,
reconciliation-lagged on crashes (worker heartbeat closes zombie
scale events via `ReconcileWorkerSessions`).
- Validation rules added: ≤50 tags, 128/256 char limits, reserved
`oc:` key prefix for future system-set tags.
- Query guardrails added: ≤90-day window, ≤500 limit, 10s
handler timeout.
- Added SDK + docs implementation notes. Added authz note (inherits
org-scope, no sub-org visibility — pre-existing pattern).
- Trimmed the narrow-endpoints-vs-composable-DSL alternatives
discussion from a full subsection to three sentences. The prose
was re-litigating the chat.
- `GET /sandboxes?tag=k:v` filter explicitly named as a follow-up
PR, not v1 scope — obvious next ask once tags exist.
Open questions list updated: tag-destroy behavior, optional PUT
merge-mode.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Working doc for the sandbox-tags-and-usage delivery. Captures the plan of record and calls out issues the design doesn't flag that surfaced during code exploration. Flagged: - alias lives in sandbox_sessions.config JSONB, not a column — the design's responses show a top-level alias; plan is to extract via config->>'alias'. - ReconcileWorkerSessions runs at worker startup, not on a heartbeat, so the design's "reconciliation-lagged (minutes)" framing is misleading (inherited from the existing billing pipeline, not new). - Repo has no Postgres test fixture, so the reconciliation invariant test needs a pgfixture-tagged integration path gated on TEST_DATABASE_URL, plus pure-Go query-builder tests for the rest. - `:` in tag keys collides with tag:<key> syntax; resolved by SplitN(s, ":", 2). - Additive tags/tagsLastUpdatedAt response fields land in four read paths (getSandbox, getSandboxRemote, listSandboxes, listSandboxesRemote), not two. Also records the three design-level open-question decisions (firstStartedAt/lastEndedAt from sandbox_sessions; tag rows left on destroy; no PUT merge-mode) and the planned commit order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fresh code reading against the referenced primary sources
(internal/db/usage.go, cmd/worker/main.go) surfaced one load-bearing
inaccuracy and let the three open questions be closed. Revisions:
- Freshness section rewritten. Previous wording framed staleness as
"reconciliation-lagged" with an implied heartbeat sweep via
ReconcileWorkerSessions. Verified: that function is called once at
worker boot (cmd/worker/main.go:253), not on a heartbeat. A
silently-dead worker that never restarts leaves scale events open
and COALESCE(ended_at, now()) accruing indefinitely. Same behavior
as GetOrgUsage and the Stripe pipeline today — we expose the
existing risk, we don't introduce a new one. Section now says
"lagged until worker restart" and names the Stripe pipeline as the
precedent.
- Open-question 1 (firstStartedAt / lastEndedAt source) closed in
favour of sandbox_sessions MIN(started_at) / MAX(COALESCE(stopped_at,
now())), clamped to the query window. Rationale folded into the
/sandboxes/{id}/usage section: stable across scale-event churn and
matches the user's "when did this sandbox exist" intent rather than
"when was it billed."
- Open-question 2 (tag-row behaviour on destroy) closed as "leave,"
with the minor /tags key-count overstatement accepted. Rationale
folded into the /sandboxes/{id}/usage section alongside the
"works for torn-down sandboxes" note.
- Open-question 3 (PUT ?mode=merge) closed as "no." Rationale folded
into the PUT section: one semantic per verb, atomicity stays
simple, GET+PUT covers partial updates.
- Validation section gained a bullet on `:`-in-keys parsing
(SplitN(s, ":", 2) — everything after the first `:` is the tag
key). Resolves ambiguity between user-namespaced keys like
"team:payments" and the tag:<key> syntax in groupBy / filter.
- PUT section gained a one-line empty-tags response contract:
`tags` is `{}` (not null), `tagsLastUpdatedAt` is `null` — typed
SDKs avoid a null map-check.
- "Open questions for implementation" block removed entirely; all
three items now have in-body answers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New table owning per-sandbox customer-defined tags for the upcoming usage-attribution endpoints (design: .agents/design/sandbox-tags-and- usage.md). Row-per-tag with PK (sandbox_id, key) so PUT semantics stay cheap and grouping SQL is a single join. org_id is denormalized so GET /tags can filter on (org_id, key) without joining sandbox_sessions — matches the indexing pattern already used in sandbox_scale_events. No change to sandbox_sessions.metadata — left intact and unused, as the design spelled out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GetSandboxTags, GetSandboxTagsMulti, ReplaceSandboxTags, ListOrgTagKeys. The multi variant avoids an N+1 when hydrating tags onto GET /sandboxes list responses. ReplaceSandboxTags diffs against current state before mutating so no-op PUTs don't bump updated_at. tagsLastUpdatedAt is the signal dashboards use to annotate retagging events (design: "Attribution is live"); an idempotent PUT would confuse that signal if it refreshed the timestamp on rows whose value didn't change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure-Go builder for the /usage aggregator: BuildUsageQuery,
BuildUntaggedTotals, BuildOrgTotals. Split into separate queries so
the items page and the untagged sibling bucket can paginate
independently — the design puts untagged in a sibling field, not in
items, and keyset pagination doesn't want a sentinel row interleaved.
GB-second math mirrors GetOrgUsage and DiskOverageGBSeconds
bit-for-bit — same `COALESCE(ended_at, LEAST(now(), $to)) -
GREATEST(started_at, $from)` idiom, same `max(0, disk_mb - 20480) /
1024` formula. Reconciliation depends on the per-sandbox /
per-tag sums adding back to GetOrgUsage totals by linearity; any
change here requires an audit against those two call sites.
Cursor is base64(JSON {v, t}) — sort value + tiebreaker (sandbox_id
or tag value). Keyset filter lives in an outer WHERE over a subquery,
not HAVING — Postgres disallows output aliases in HAVING.
Pure-Go tests exercise: groupBy sandbox / tag, tag key containing `:`
(SplitN rule from the design), value-present and key-absent filters,
default and disk-overage sorts, cursor round-trip, all input-
validation error paths. No DB required. The reconciliation invariant
itself still needs a pgfixture-tagged integration test (tracked).
SandboxUsageWindow covers the drilldown endpoint: per-sandbox GB-
seconds plus first/last session bounds from sandbox_sessions (the
closed decision on that open question — more stable than scale-event
edges across scale churn).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five new endpoints from the design:
GET /api/usage # aggregator
GET /api/tags # org-wide key discovery
GET /api/sandboxes/:id/usage # per-sandbox drilldown
GET /api/sandboxes/:id/tags # read tag set
PUT /api/sandboxes/:id/tags # full-replace
Tenancy enforcement lives in the handler (ownsSandbox), not in the
sandbox_tags PK — (sandbox_id, key) is sufficient because the PUT
path reads sandbox_sessions.org_id and rejects mismatches as 404
(does not leak existence across orgs). Design F3: PK stays narrow
intentionally.
PUT body is decoded as map[string]json.RawMessage so nested values
fail fast with a typed 400 instead of coercing to string. Validation
applies:
- ≤ 50 tags
- key 1..128 chars, [A-Za-z0-9_.\-:]+ (`:` allowed for user
namespacing — see SplitN parsing rule in the design)
- value ≤ 256 chars
- `oc:` prefix reserved
Empty-tag responses render as `"tags": {}`, `"tagsLastUpdatedAt":
null` — typed SDKs avoid a null map-check.
/usage delegates SQL to the query builder; handler hydrates items
with alias/status/tags on groupBy=sandbox (alias extracted from
sandbox_sessions.config JSONB per design F1). A per-row
GetSandboxSession call is N+1 but bounded by the 500-row limit and
the 10s handler timeout — fine for v1; promote to a batched store
method if it shows up hot.
isUserInputError inspects BuildUsageQuery's well-known error strings
to map validation failures to 400 instead of 500. Brittle but
self-contained; only the builder emits these.
Pure-Go validation tests cover the PUT guardrails end-to-end. A
handler-level HTTP test requires a Postgres fixture the repo
doesn't have yet — same gap as the reconciliation integration test
(tracked).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Additive fields on four read paths — getSandbox and listSandboxes in both local/combined and server/remote modes (design F8: easy to miss). Responses now carry a tags map (always, empty when no tags are set) and tagsLastUpdatedAt (null when empty). Extraction factored into sandbox_tags_hydrate.go: - mergeTagsInto for remote paths that already build response maps - withTagsHydrated / withTagsHydratedList for local paths that return sandbox types as-is — marshal-through-JSON preserves the original shape without coupling this file to pkg/types.Sandbox. Fail-soft semantics: a tag query error never fails the primary sandbox read — the response comes back without the tag fields rather than 500. Tags are informational; making them load-bearing for the list/get endpoints would be a regression. listSandboxesRemote batches via GetSandboxTagsMulti — one query per page, not N+1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thin wrappers over the five new endpoints. Types mirror the server responses; dataclasses on the Python side, interfaces + classes on the TS side. Convenience wrappers live in the SDK layer, not the server — bySandbox / byTag / forSandbox are sugar around the single GET /usage route with different groupBy values. TS: type-checks under tsc --noEmit. Python: imports cleanly. Filter param handling preserves the repeatable `filter[tag:<k>]=...` shape — URLSearchParams in TS and list-of-tuples in Python both retain duplicates. Tag keys containing `:` pass through as literal substrings, matching the server's SplitN parsing rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five new pages under docs/api-reference/, plus additive-field notes
on the existing GET /sandboxes and GET /sandboxes/{id} pages. mint.json
wires these into two places:
- api-reference/sandboxes/{get,set}-tags.mdx appended to the existing
"Sandboxes" group (PUT/GET are sandbox-scoped, so they belong with
the other sandbox verbs).
- A new "Usage" navigation group holds api-reference/usage/{get-
usage,get-sandbox-usage,list-tags}.mdx — these are org-level
reporting, a distinct concept from sandbox lifecycle.
Docs mention the two reader-facing caveats from the design: retagging
rewrites attribution going forward (with tagsLastUpdatedAt as the
signal), and freshness lags until worker restart after a silent-
death worker.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update the design and working notes after critical review. Key changes: - move sandbox_tags keying to (org_id, sandbox_id, key) - require org-scoped tag joins and reads instead of relying on sandbox ID uniqueness - narrow filter semantics to one param per dimension with comma-separated OR values - require clamped drilldown timestamps and explicit to<=from rejection - call out all four sandbox read paths, especially listSandboxesRemote - raise the testing bar so reconciliation against GetOrgUsage must be proven against real Postgres - make SDK/docs parity part of review readiness The main reason for the schema change is tenancy safety: sandbox IDs are short sb-xxxxxxxx values today, so sandbox_id-only tag keying is not a safe boundary.
The design and work docs were updated to tighten four areas that the
first cut did not handle. All four are coordinated (they change the
same method signatures and code paths), so they land together.
F3 — org_id in the keyspace, not just in a lookup index.
Sandbox IDs are short `sb-xxxxxxxx` strings generated independently
per create path (internal/api/sandbox.go, internal/qemu/manager.go,
internal/firecracker/manager.go), a short 32-bit space with no
schema-enforced cross-org uniqueness. A `(sandbox_id, key)` PK plus
sandbox_id-only lookups would let a single ID collision alias tag
state across tenants. Changes:
- Migration 026 PK is now (org_id, sandbox_id, key); the lookup
index on (org_id, key, value) is unchanged.
- GetSandboxTags / GetSandboxTagsMulti / ReplaceSandboxTags all
take orgID and scope every read/write on (org_id, sandbox_id).
- Every sandbox_tags join in the usage query builder (filter joins
for /usage, the groupBy=tag join, the untagged-bucket join)
adds `x.org_id = e.org_id` alongside the sandbox_id predicate.
- All handler call sites plumb orgID from auth.GetOrgID down to the
store. Hydration helpers (mergeTagsInto, withTagsHydrated*) take
orgID too — hydration must not cross tenancy.
- Added TestBuildUsageQuery_JoinsIncludeOrgID pinning the invariant
at the SQL level so a future refactor can't drop the org scope
silently. Migration has not been applied in prod (feature branch),
so editing 026 in place rather than stacking a DROP/RECREATE fix.
F9 — drilldown timestamps clamped, invalid window rejected.
SandboxUsageWindow now wraps MIN(started_at) in GREATEST(…, $from)
and MAX(COALESCE(stopped_at, now())) in LEAST(…, $to), so
firstStartedAt / lastEndedAt can never leak outside the query
window. The /sandboxes/:id/usage handler rejects to <= from with
400 to match the aggregator path.
F10 — one filter param per dimension.
The design was narrowed: `filter[tag:<key>]` may only appear once;
multiple OR values go comma-separated within that one param.
Accepting repeated filter[] keys would make the SDK map-shaped API
diverge from the HTTP surface and buy no additional expressiveness.
parseUsageQuery now returns 400 when it sees more than one value
for a `filter[...]` key.
F11 — batched session hydration for groupBy=sandbox.
Under a 500-row × 10s handler budget, issuing one GetSandboxSession
per result row is a silent multiplier. Added
GetLatestSandboxSessionsMulti — a single DISTINCT ON
(sandbox_id)-ordered-by-started_at DESC query that returns latest
session per ID. hydrateSandboxUsageItems now does two batched
reads (tags + sessions) and no per-row lookups.
Builder tests still pass; the new org-scope test was added on top,
not in place of existing ones.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The design and parseUsageQuery now reject duplicate filter[] params — one param per dimension, comma-separated OR values inside. The earlier copy on the SDK type and the /usage reference page still said "repeatable, AND'd," which no longer matches server behavior. Tightening the wording so readers don't write SDK calls that the server will 400. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second review against the branch surfaced four more items; all now
addressed.
F12 — org-scoped session lookups throughout the feature.
The F3 pass put tag storage on (org_id, sandbox_id, key) but left
ownsSandbox and the drilldown handler reading sandbox_sessions by
sandbox_id alone via GetSandboxSession. Sandbox IDs aren't globally
unique, so on a cross-org ID collision the old code could return
another org's session row — causing ownsSandbox to deny the
rightful owner, or hydrating the drilldown's alias/status from the
wrong tenant. Added GetSandboxSessionInOrg as the org-scoped
primitive and swapped both call sites. ownsSandbox's 404 rationale
is unchanged (same response for "doesn't exist" and "not your
org"), but now it actually rejects the collision case correctly.
F13 — SandboxUsageWindow no longer swallows DB errors.
The session-bounds query treated any scan failure as "no sessions
in window" and returned nil timestamps. That downgraded real
DB/context failures into silent 200 responses with missing fields.
Rewrote the scan: COALESCE(BOOL_OR(...), false) so an empty result
set produces a normal successful scan with NULL pointer fields,
and a scan error now means a genuine failure and is returned up
the stack.
F14 — reconciliation invariant has an executable test.
internal/db/usage_query_pgfixture_test.go under build tag
`pgfixture`, gated on TEST_DATABASE_URL. Seeds a fresh org with a
deterministic fixture (two tagged sandboxes including one with
disk overage and one still running, one untagged sandbox, and one
sandbox entirely before the window) and asserts the full chain:
GetOrgUsage rollup == ExecuteOrgTotals
== Σ(ExecuteUsageQuery bySandbox items)
== Σ(ExecuteUsageQuery byTag items)
+ ExecuteUntaggedTotals
within 1e-6 float epsilon. Runs with:
TEST_DATABASE_URL=... go test -tags=pgfixture ./internal/db/ \
-run Reconciliation -v
Still needs CI wiring to be a merge gate; the test itself now
exists so "we don't have the test" stops being the blocker.
F15 — Python SDK docstring matches the narrowed filter contract.
sdks/python/opencomputer/usage.py _build_params docstring was
carrying the old "repeatable, httpx preserves duplicates"
language. Rewrote to match the server: one param per dimension,
comma-separated OR values, repeated keys rejected. Runtime surface
(dict[str, str]) already matched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
motatoes
approved these changes
Apr 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ships
Five endpoints for per-sandbox and per-tag spend attribution, plus
tags+tagsLastUpdatedAton existing sandbox reads. Units are GB-seconds — Stripe remains the pricing source of truth.GET /api/sandboxesandGET /api/sandboxes/{id}responses gaintags+tagsLastUpdatedAt(additive, all four code paths)..agents/design/sandbox-tags-and-usage.md.agents/work/sandbox-tags-impl.mddocs/api-reference/usage/anddocs/api-reference/sandboxes/{get,set}-tags.mdx, wired intodocs/mint.json(new "Usage" nav group).Tag storage: new table, PK on
(org_id, sandbox_id, key)Sandbox IDs are short
sb-xxxxxxxxstrings generated independently per create path and not schema-unique across orgs. A(sandbox_id, key)PK plus sandbox-id-only lookups would let a single cross-org ID collision alias tag state across tenants, or deny the rightful owner access to their own sandbox viaownsSandbox. Every tag read, write, and join — including the session lookups used byownsSandboxand the drilldown — scopes on(org_id, sandbox_id).TestBuildUsageQuery_JoinsIncludeOrgIDpins the SQL-level invariant.sandbox_sessions.metadatais left untouched — semantically it's a per-session create-time snapshot, not sandbox-level tags. No SDK surface for it changes.Math: bit-for-bit identical to
GetOrgUsagePer-sandbox, per-tag, and untagged sums must reconcile to the org rollup the Stripe pipeline reports. Inline GB-second math mirrors
GetOrgUsageandDiskOverageGBSecondsverbatim — sameCOALESCE(ended_at, LEAST(now(), $to)) - GREATEST(started_at, $from)idiom, samemax(0, disk_mb - 20480) / 1024 * durationformula. Thepgfixturetest pins the full chain within 1e-6:Any future change to the rollup math must change this code in lockstep.
Attribution: retagging rewrites history
Queries join live
sandbox_tags. A retag changes all prior attribution for that sandbox going forward — fine for ops, hazardous for chargebacks.tagsLastUpdatedAt(maxupdated_atacross the sandbox's tag rows) is surfaced on every sandbox-level response so dashboards can annotate edits. No snapshot audit in v1; the upgrade path if needed is a separatesandbox_tag_changestable.Freshness: fresh on clean shutdown, lagged until worker restart otherwise
ReconcileWorkerSessionscloses zombie scale events at worker startup, not on a heartbeat — a silently-dead worker leaves events open andCOALESCE(ended_at, now())accruing until it restarts. This matches whatGetOrgUsageand the Stripe rollup already do today; we surface the existing behavior through new endpoints rather than introducing a new risk. Called out in the/usagereference docs.Not in v1
Dollars (Stripe), time-series bucketing (adds as
?interval=1d), multi-dimgroupBy(extends to comma-separated), filter trees, historical tag snapshotting / audit log,GET /sandboxes?tag=k:v, CSV export, CLI/dashboard. The endpoint shape accommodates each additively.PUT is full-replace only — no merge mode. One
filter[<dim>]param per dimension; comma-OR values inside, AND across dimensions, repeats return 400. Aligns the HTTP surface with the map-shaped SDK.Testing
go test ./internal/db/ ./internal/api/— query builder SQL shape, tenancy-predicate pinning, parsing, cursor round-trip, input validation, PUT size/charset/reserved-prefix rules, charset, handler smoke. All pass.npx tsc --noEmitundersdks/typescript/is clean.internal/db/usage_query_pgfixture_test.go, build tagpgfixture): deterministic fixture covering tagged + untagged + still-running + pre-window sandboxes, asserts the four-way equality chain above. Gated onTEST_DATABASE_URL.TEST_DATABASE_URL=postgres://... go test -tags=pgfixture \ ./internal/db/ -run Reconciliation -vTest plan
go test ./internal/db/ ./internal/api/green in CI026_sandbox_tagsapplies cleanly on an environment with existingsandbox_sessionsdatanpx tsc --noEmitclean in CI; Python SDK imports🤖 Generated with Claude Code