feat(plans): add vault tier policy (max entries + allowed envs) by mastermanas805 · Pull Request #1 · InstaNode-dev/common

mastermanas805 · 2026-05-11T07:59:10Z

Summary

Adds vault-feature policy fields to `PlanLimits`:

`VaultMaxEntries int` — per-team cap. `-1` = unlimited, `0` = feature unavailable.
`VaultEnvsAllowed []string` — allowed env scopes for vault entries.

Test coverage in `plans_test.go` for both fields across all tiers.

Test plan

`go test ./plans/...` passes

🤖 Generated with Claude Code

Adds two fields to PlanLimits: - VaultMaxEntries (int): per-team cap on vault entries. -1 = unlimited, 0 = vault feature unavailable on this tier. - VaultEnvsAllowed ([]string): list of environment names permitted for vault entries (production / staging / dev / ...). Test cases extend plans_test.go to cover both fields across all tiers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… OSV-Scanner) (#16) * feat(plans): add vault tier policy (max entries + allowed envs) (#1) Adds two fields to PlanLimits: - VaultMaxEntries (int): per-team cap on vault entries. -1 = unlimited, 0 = vault feature unavailable on this tier. - VaultEnvsAllowed ([]string): list of environment names permitted for vault entries (production / staging / dev / ...). Test cases extend plans_test.go to cover both fields across all tiers. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * common: add buildinfo package for compile-time GitSHA/BuildTime/Version New `instant.dev/common/buildinfo` exposes three package vars (`GitSHA`, `BuildTime`, `Version`) defaulting to sentinel strings. Real values are wired in at link time via `go build -ldflags -X` — the Dockerfile in each service passes `--build-arg GIT_SHA=...` into the ldflag so /healthz and slog log lines stamp the exact commit the running pod was built from. This is track 1 of 8 in the observability rollout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * common/logctx: relocate from api repo into the canonical common module Track 2 of the observability rollout originally created common/logctx inside the api repo as a side effect of dispatching from an api worktree. This blocked the obsstubs→common refactor in the api router because the api/go.mod has `replace instant.dev/common => ../common` — so imports of instant.dev/common/logctx were resolving to the monorepo common dir which didn't have the package. This commit puts common/logctx where its module path says it lives. After this lands, the api repo's fix-obsstubs-to-common-2026-05-12 PR can drop its obsstubs/ stubs and import instant.dev/common/logctx directly. No code changes to the package itself — straight relocation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * plans: restore free tier in Default() to mirror anonymous The api repo's plans tests (TestDefault_AllStandardTiersPresent, TestAll_ReturnsAllPlans, TestFreeTier_MirrorsAnonymous) require a `free` tier in the default registry. The api-level plans.yaml already defines `free` as a byte-for-byte clone of `anonymous` (same limits, same features) — the only difference being audience (free = claimed-but-unpaid teams, anonymous = pre-claim agents). Both still get reaped at 24h, so the pay-from-day-one policy holds. The `free` tier is real product surface, not test scaffolding: - api/internal/handlers/billing.go:361 sets tier="free" for unpaid teams - api/internal/handlers/webhook.go:411-416 reaps both anonymous and free - api/internal/handlers/openapi.go advertises "free" in 3 schemas - api/internal/models/resource_elevate_test.go uses tier "free" - api/internal/handlers/onboarding_test.go asserts tier == "free" The FREE-TIER-RECYCLE-2026-05-12.md plan also depends on `free` existing in the registry (Option B email-gate falls into this tier). Mirroring rule: anonymous and free must stay byte-identical so that an anonymous->free flip at claim time cannot widen or narrow quotas. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * logctx: read commit_id from buildinfo.GitSHA, drop env var fallback Today's B1 + B2 dispatches both surfaced that /healthz returned the real commit SHA (via buildinfo.GitSHA from the ldflag-patched Dockerfile) but slog lines showed commit_id=dev because logctx read os.Getenv(COMMIT_ID). The two systems disagreed. The env-var fallback was a decoupling shim from when logctx shipped before buildinfo. Now both live on the same common module; collapse to a direct import. * plans: add yearly variants (hobby/pro/team) + BillingPeriod helpers Adds hobby_yearly ($90/yr), pro_yearly ($490/yr), team_yearly ($1990/yr) to the embedded default registry — each mirrors its monthly counterpart's limits + features exactly, only `price_monthly_cents` (annual amount in cents) and `billing_period: yearly` differ. New helpers: - Plan.BillingPeriod field - Registry.BillingPeriod(tier) — "monthly" | "yearly" - CanonicalTier(tier) — strips "_yearly" suffix so the webhook can map yearly plan_ids back to the bare tier and teams.plan_tier stays cycle-agnostic. Tests pin the mirror invariant (limits + features identical to base tier) and that yearly_price < monthly_price * 12 so the "save $X/yr" badge is honest. * plans: yearly discount 17% -> 10% (hobby $97.20 / pro $529.20 / team $2149.20) P2 shipped the yearly variants at ~17% off monthly. User feedback: 17% is too steep a give-up on annual revenue; standardize on 10% off across all three tiers to keep yearly attractive without leaving margin on the table. New prices (annual amount in cents, stored in price_monthly_cents per the existing schema): hobby_yearly: 9000 -> 9720 ($90.00 -> $97.20) pro_yearly: 49000 -> 52920 ($490.00 -> $529.20) team_yearly: 199000 -> 214920 ($1990.00 -> $2149.20) Each new price = (monthly * 12 * 0.9), giving an effective monthly rate of $8.10 / $44.10 / $179.10 respectively. Tests: - existing TestYearlyVariants_MirrorMonthlyLimits still passes (limits + features unchanged) - existing TestYearlyPrices_DiscountedVsMonthlyTimesTwelve still passes - new TestYearlyDiscountIsExactly10Percent locks the contract: (yearly / 12) / monthly == 0.9 +/- 0.01 for hobby/pro/team. Future price changes that drift the discount fail loudly. Operator action required (not automatable from this PR): the existing RAZORPAY_PLAN_ID_HOBBY_YEARLY / _PRO_YEARLY / _TEAM_YEARLY env vars still point at the OLD prices in the Razorpay dashboard. Operator must EITHER edit the 3 existing yearly plans in Razorpay to the new prices ($97.20, $529.20, $2149.20) OR create 3 new plans + rotate the env vars in the k8s secret. Until then, checkout will charge the old amounts even though the dashboard quotes the new ones. Dashboard impact: none — the "Save $X/yr" badge reads PriceMonthly from the registry, so it auto-updates once this lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * plans: yearly back to '2 months free' (hobby $90 / pro $490 / team $1990) Reverts common#7 (yearly @ 10% off) back to the original 17%-ish pricing expressed as exactly monthly x 10 — the mathematical form of "2 months free". Per PRICING-BEST-PRACTICES-2026-05-13.md (top recommendation #3, Athenic case study), the "2 months free" framing outperforms percentage-off copy by ~3.4x in conversion. To use that framing honestly we need yearly_cents == monthly_cents * 10. - hobby_yearly: 9720 -> 9000 cents ($97.20 -> $90/yr) - pro_yearly: 52920 -> 49000 cents ($529.20 -> $490/yr) - team_yearly: 214920 -> 199000 cents ($2149.20 -> $1990/yr) Tests: - Renamed TestYearlyDiscountIsExactly10Percent -> TestYearlyIsTwoMonthsFree (asserts (yearly/12)/monthly == 10/12 within 0.01). - Added TestYearlyIsExactlyMonthlyTimesTen — strict integer-cents lock so the "2 months free" claim is provable to the cent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * plans: differentiate yearly discount — hobby 'save 1 month', pro/team '2 months free' Hobby Annual is now $99/yr (= $9 x 11 = 8.3% off, "save 1 month"). Pro Annual stays $490/yr (= $49 x 10 = 17% off, "2 months free"). Team Annual stays $1990/yr (= $199 x 10 = 17% off, "2 months free"). Strategic intent: when a hobby user sees their annual savings is small but Pro Annual saves "2 months free / $98", the differential nudges them to tier-skip into Pro Annual rather than just upgrade frequency. Tests: - Split TestYearlyIsTwoMonthsFree into TestProAnnualIsTwoMonthsFree (pro+team only, 10/12 ratio) + TestHobbyAnnualIsOneMonthFree (hobby only, 11/12 ratio). - Renamed TestYearlyIsExactlyMonthlyTimesTen to TestProTeamYearlyIsMonthlyTimesTen and added TestHobbyYearlyIsMonthlyTimesEleven for the new x11 lock. - Added TestTierDiscountDifferentiation locking the strategic intent: pro_yearly_ratio < hobby_yearly_ratio (and same for team). * plans: shared Rank() helper for tier ordering Two package-private rank functions used to live in the api repo (internal/handlers/billing.go::tierRank and internal/handlers/admin_customers.go::adminTierRank). They had subtly different orderings — billing.go covered 6 tiers (anonymous .. team), admin_customers.go covered 4 (free .. team) and was off-by-one against billing for the same names. The discrepancy never bit production because the admin surface never sees anonymous/growth, but it's a footgun. Promote a single canonical ordering here so all modules share one rank function. Returns -1 for unknown tiers; callers must guard against the sentinel when comparing ranks (a negative rank means "no transition direction"). Yearly variants are NOT auto-normalised — callers pass them through CanonicalTier first if they want "pro_yearly" to rank as "pro". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * plans: add hobby_plus tier — $19/mo mid-step between Hobby and Pro (W11) (#11) Inserts a new hobby_plus tier between hobby ($9) and pro ($49): - 2 deployment apps (vs hobby's 1) - custom_domains: true (first paid tier with this feature) - 5 GB object storage, 1 GB MongoDB, multi-env vault (50 entries) - 14-day backups with 1-click restore (vs hobby's 7d, no restore) - $199/yr annual variant (hobby_plus_yearly, ~13% discount) Research-backed pricing decoy: triple-tier $9/$19/$49 lifts conversion ~22% vs $9/$49 by anchoring against the middle price. Rank ordering: anonymous=0, free=1, hobby=2, hobby_plus=3, growth=4, pro=5, team=6. Every previous upgrade transition still resolves as "upgrade" because the relative ordering is preserved (only absolute values shifted). Also removes the legacy TrialDays field from Plan + Registry to keep common in lockstep with the api (which removed trial in W10). * plans: add custom_domains_max per-tier cap (FIX-G) (#12) Adds Limits.CustomDomainsMax field + Registry.CustomDomainsMaxLimit() method so handlers can enforce a per-team count cap on custom hostnames. Tier ladder (mirrors defaultYAML and api/plans.yaml): anonymous / free / hobby / hobby_yearly = 0 (feature off — boolean trips first) hobby_plus / hobby_plus_yearly = 1 (first tier with the feature) growth = 3 pro / pro_yearly = 5 team / team_yearly = 50 (effectively unlimited for dashboards) Closes BugBash U10 / #128 — previously the boolean Features.CustomDomains flag was the only gate, letting any Hobby Plus+ team bind an unbounded number of hostnames. Pairs with api PR that enforces the cap in custom_domain.go before the row insert. Tests: - TestCustomDomainsMaxLimit locks the per-tier numbers above. - TestCustomDomainsMax_PairedWithBooleanFlag guards the invariant that custom_domains_max > 0 always pairs with features.custom_domains:true (and vice versa) — drift between the two is dead code or unreachable capacity. * plans: add rpo_minutes / rto_minutes per-tier (FIX-H #Q50) (#13) Adds two Limits fields surfaced on GET /api/v1/capabilities so an agent can reason about a tier's durability promises before provisioning. Pairs with the FIX-H api/worker backup-integrity work: the api handler reads RPOMinutes/RTOMinutes via the new Registry methods. Anonymous/free return 0 ("not promised") because those tiers don't take scheduled backups; hobby/hobby_plus = 1440/30, pro/team = 60/15. No yaml updates here — plans.yaml lives in the api repo and FIX-H ships the values there in the same wave. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * plans: Pro storage bump + Growth bump (PRICING-AUDIT-2026-05-15) Pro: postgres 5→10 GB, vector 5→10 GB, redis 256→512 MB, mongo 2→5 GB, object 10→50 GB. Same $49/mo. Defensible against Supabase Pro ($25/8 GB PG/100 GB object) on a 30-second side-by-side. Growth: postgres + vector 5→20 GB, redis 256→1024 MB so the tier ladder stays ordered above Pro. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * plans: hobby_plus rolled back to production-only vault envs W12 pricing pass (2026-05-15): multi-env is Pro+. Mirrors the api/plans.yaml change and updates TestHobbyPlus_TierMatrix + TestVaultEnvsAllowed_HobbyIsProductionOnly to assert the new production-only posture. Code gate lives in api/internal/handlers/stack.go::multiEnvTierAllowed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(plans): add QueueCount limit field + QueueCountLimit() method (A6) Adds `queue_count: int` to the Limits struct and `QueueCountLimit(tier string) int` to Registry. The zero-value fallback treats absent fields as unlimited (-1) for backward compatibility with YAML files that predate this change. queue_count values in defaultYAML: anonymous/free/growth/team/team_yearly: -1 (unlimited) hobby/hobby_yearly: 3 hobby_plus/hobby_plus_yearly: 5 pro/pro_yearly: 20 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(plans): correct growth/pro tier-rank inversion P1, BUGHUNT-REPORT-2026-05-17-round2: the canonical Rank table had growth=4, pro=5 — i.e. growth ranked BELOW pro. This contradicted plans.yaml pricing (pro $49/mo < growth $99/mo) and the worker's billingTierRankMap (pro=4, growth=5). The api consumes common's Rank, the worker uses its own table — the two disagreed, so an automatic plan transition could be misclassified as an upgrade when it was a downgrade (and vice versa). Rank is now anchored to price: anonymous 0, free 1, hobby 2, hobby_plus 3, pro 4, growth 5, team 6 — matching the worker. rank_test.go updated: TestRank_AllStandardTiers / _MonotonicallyIncreasing / _CaseInsensitive reflect the corrected order; new TestRank_ProRanksBelow- Growth pins pro < growth < team explicitly so the inversion cannot regress. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(plans): correct hobby_yearly price drift in defaultYAML defaultYAML's hobby_yearly block had price_monthly_cents: 9900, while api/plans.yaml (the source of truth, confirmed correct against the instanode-web PricingPage FIX-K note "$90/yr = $7.50/mo") holds 9000. defaultYAML is documented to be a byte-mirror of api/plans.yaml. Diffed all four _yearly blocks (hobby_yearly, hobby_plus_yearly, pro_yearly, team_yearly): only the hobby_yearly price disagreed — every other yearly-block price and limit field was already in sync. The 9000 value puts hobby_yearly at hobby x10 ("save 2 months"), which contradicted three tests that pinned the stale x11 "save 1 month" model (TestHobbyAnnualIsOneMonthFree, TestHobbyYearlyIsMonthlyTimesEleven, TestTierDiscountDifferentiation). Since plans.yaml is authoritative, those tests encoded the drift and are replaced: - TestHobbyAnnualIsTwoMonthsFree (10/12 ratio for hobby) - TestYearlyIsMonthlyTimesTen (x10 lock for hobby/pro/team) - TestTierDiscountUniformity (uniform 10/12 across core tiers) - TestHobbyPlusYearlyDiscount (hobby_plus's distinct mid-discount) Added TestHobbyYearlyPriceIsPinned — a value-pinning guard that fails if defaultYAML's hobby_yearly price drifts off 9000 again. go build ./... / go vet ./... / go test ./... -count=1 all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(plans): add rpo_minutes/rto_minutes to every defaultYAML tier BugBash 2026-05-18 P2-W2-41: common/plans.go's defaultYAML const set no rpo_minutes/rto_minutes on any tier block, so plans.Default() reported RPO=RTO=0 for every tier — including Pro/Team whose real values are 60/15. The Limits.RPOMinutes/RTOMinutes struct fields and the RPOMinutes()/RTOMinutes() accessors already existed; only the embedded YAML was missing the keys. GET /api/v1/capabilities is served from a Default()-backed registry in any environment without a plans.yaml file present, so an agent reasoning about a workload's durability requirement got a false "not promised" (0/0) signal for paid tiers. - Add rpo_minutes/rto_minutes to all 11 tier blocks in defaultYAML, matching api/plans.yaml exactly (anon/free 0/0, hobby* 1440/30, pro*/team*/growth 60/15). - Re-verified the whole defaultYAML is a faithful mirror of api/plans.yaml — programmatic limits/features/price/billing_period diff is now clean (audience is YAML-only metadata, no struct field). - Add TestRPORTOMinutes_DefaultYAMLMatchesAPIPlansYAML — a registry-iterating regression test that fails if a new tier is added without RPO/RTO coverage or if Pro's values regress to 0. Symptom: plans.Default() RPOMinutes/RTOMinutes == 0 for all tiers Enumeration: grep -c 'rpo_minutes:' plans/plans.go (was 0, now 11) Sites found: 11 tier blocks Sites touched: 11 Coverage test: TestRPORTOMinutes_DefaultYAMLMatchesAPIPlansYAML Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(resourcestatus): canonical ResourceStatus enum + expiry-stage derivation BugBash flagged "expiry-stage predicate divergence": api and worker each carried independently-drifting hand-written predicates for resource status (active/paused/suspended/expired/deleted) and for the expiry-warning stage derived from expires_at vs now. New package instant.dev/common/resourcestatus is the single source of truth: - Status enum + Valid/IsActive/IsPaused/IsSuspended/IsExpired/IsDeleted/ IsTerminal/IsReapable predicates, AllStatuses(), Parse(), ReapableStatuses() - ExpiryStage enum (none/12h/6h/1h/past-ttl) + DeriveExpiryStage(), HoursUntilExpiry(), IsPastTTL() — the worker's selectStage/hoursLeft logic centralised, P2-12 "most-imminent-bucket-wins" behaviour preserved Exhaustive tests TestStatusPredicates_ExhaustiveOverEnum and TestDeriveExpiryStage_ExhaustiveOverStagesAndBoundaries iterate AllStatuses()/AllExpiryStages() — adding an enum value without handling it fails the build. Cross-repo contract change (CLAUDE.md rule 22): api + worker convert to this package in follow-up commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(resourcestatus): add StatusPending for two-phase provision lifecycle MR-P0-2 (BugBash 2026-05-20). The api's provisioner_reconciler sweeps `WHERE status='pending'` to recover rows stranded by an api crash mid-provision, but no code ever wrote 'pending' — every CreateResource INSERT landed on the column DEFAULT 'active' immediately, so the crash-recovery subsystem was dead code that matched zero rows. Add the StatusPending constant + IsPending predicate + cases in AllStatuses/Valid so the api side can insert pending and flip to active only after the backend provision RPC + persistence succeed. Pending is NOT reapable (the reconciler, not the TTL reaper, handles a stranded pending row) and NOT terminal. Update the exhaustive-status table test to add the StatusPending case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * storageprovider: cloud-agnostic storage credential abstraction Define the StorageCredentialProvider interface so /storage/new can switch from DO Spaces shared-master-key to Cloudflare R2 prefix-scoped tokens (or AWS S3 STS sessions) via OBJECT_STORE_BACKEND env flip + data migration — no application code changes. Per STORAGE-ABSTRACTION-DESIGN-2026-05-20.md: Provider PrefixScoped STS BucketPerTenant MaxKeys ───────────────────────────────── ─── ─────────────── ─────── do-spaces (today) no no ~100/account 200 r2 yes yes yes unbounded s3 (skeleton) yes yes yes unbounded Each impl reports its actual capabilities; the api's POST /storage/new consults Capabilities() to pick credential vs broker mode. The S3 impl is skeleton-only — session-policy assembly is real and tested, AWS SDK wiring is injected via SetAssumeRoleFunc. The MinIO impl lives in api/ so common stays free of madmin-go transitive deps. Tests (CLAUDE.md rule 18 — registry-iterating, not hand-typed): - contract_test.go iterates ListRegistered() and validates every backend satisfies the interface - dospaces_test.go: capability shape, shared-master-key issuance - r2_test.go: mocks Cloudflare R2 API; asserts the buckets/keys request body carries parameters.prefixes (prefix-scoping) AND the temp-creds request carries ttlSeconds + session token - s3_test.go: stub AssumeRole; asserts session policy carries Condition.StringLike.s3:prefix = <token>/* build/vet/test green on instant.dev/common. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(queueprovider): per-tenant queue isolation interface + 4 backends MR-P0-5 (NATS per-tenant isolation, 2026-05-20). Held architecture P0. See NATS-ISOLATION-MIGRATION-2026-05-20.md in repo root for the design doc. # What this adds `common/queueprovider/` — provider-agnostic interface for per-tenant queue credential issuance, mirroring the `common/storageprovider/` pattern. Implementations: - nats/ — real impl, NATS operator-mode (per-tenant accounts + signed user JWTs via nats-io/nkeys + nats-io/jwt/v2). Falls back to legacy_open transparently when no operator seed is configured, so api can deploy BEFORE the operator runs `nsc generate`. - rabbitmq/ — skeleton; ErrNotImplemented. Portability proof. - kafka/ — skeleton; ErrNotImplemented. Portability proof. - legacyopen/— cutover shim returning no creds (grandfathered behavior). # Why NATS in `instant-data` runs unauthenticated. Any pod in the cluster can dial nats://nats.instant-data.svc.cluster.local:4222 and read/write every other tenant's subjects + JetStream streams. The "subject prefix derived from token" pattern is naming convention, not isolation. Post-cutover: tenant accounts are signed by the operator key; each tenant gets its own NATS account = its own JetStream namespace = its own subject namespace. Cross-tenant pub/sub is denied at the server. # Tests - contract_test.go iterates every registered backend (CLAUDE.md rule 18) — no hand-typed slices. - nats/nats_test.go verifies (a) IssueIsolatedCredentials mints a valid user JWT with subject-scoped permissions, (b) two tenants get DISJOINT subject allow-lists (the breach we're fixing), (c) TTL applies to user JWT expiry, (d) Revoke pushes an updated account claim. # Coverage block Symptom: NATS unauthenticated cross-tenant access Enumeration: rg -F 'nats://' across all 6 repos — see design doc Sites found: ~36 hits across api/worker/provisioner/common/infra/dashboard Sites touched: common/queueprovider lands the interface; this PR ships common only. api wires the interface in a paired PR. Coverage test: TestRegistry_AllProvidersSatisfyContract + TestNATS_TwoTenants_DisjointSubjectPermissions Live verified: pending operator key generation (needs operator action) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * common: add readiness package for deep /readyz checks Shared library for the api / worker / provisioner deep readiness probe. Each service mounts a /readyz handler that runs component-by-component checks (platform_db, brevo, razorpay, do_spaces, provisioner_grpc, river, etc.) in parallel under a per-check 10s cache, then derives overall=ok|degraded|failed per the per-service criticality matrix. Wired to k8s readinessProbe (not livenessProbe — a Brevo outage MUST NOT SIGKILL every api pod). A failed critical check returns 503 so kubelet pulls the pod from the Service endpoints; a failed non-critical check returns 200 + overall=degraded so the pod keeps serving while the NR alert fires for the operator. This is the surface the Brevo silent-rejection bug from 2026-05-20 would have caught weeks earlier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(storageprovider): accept shared-key / shared-master-key as do-spaces aliases Live prod deploys OBJECT_STORE_BACKEND=shared-key (legacy naming from api/internal/config.go mode-resolution), which previously failed NormalizeBackend() and forced the factory to fall back to ErrUnknownBackend. This commit teaches the factory to collapse "shared-key" / "shared_key" / "sharedkey" / "shared-master-key" / "shared_master_key" onto "do-spaces", matching the storage-mode label surfaced in /storage/new responses. Coverage block (per CLAUDE.md rule 17): Symptom: live OBJECT_STORE_BACKEND=shared-key didn't match factory enum Enumeration: grep -rn 'NormalizeBackend\|OBJECT_STORE_BACKEND' common/ api/ Sites found: 2 (factory.go switch + contract_test.go cases) Sites touched: 2 Coverage test: TestNormalizeBackend covers shared-key + variants Live verified: next deploy of api will boot cleanly with the existing k8s secret instead of crashing on unknown-backend. Closes P1 from DOC-REALITY-DELTA-2026-05-20.md §3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * storageprovider: B17-P1 godoc fix + canonical Backend constants Two B17 BugBash findings for the SDK-side storage abstraction: 1. Config.Backend godoc claimed "empty or unknown values land on minio". The implementation actually returns ErrUnknownBackend for empty/unknown Backend values (deliberately — defaulting to a real provider has masked operator misconfiguration in the past). Godoc updated to match the shipped behavior and explain why empty is rejected loudly. 2. Canonical Backend identifiers exported as constants (BackendDOSpaces / BackendR2 / BackendS3 / BackendMinIO) so callers can compare against typed names instead of stringly-typed magic strings. BackendSharedKey kept as a Deprecated: alias for legacy operator configs that emitted "shared-key"; NormalizeBackend collapses it to BackendDOSpaces — both reach the same implementation. Gate green: go build / vet / test ./storageprovider/... all PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * storageprovider: B17 P2/P3 sweep — hardened sanitiser + Capabilities docs Closes the storage-broker P2/P3 findings from BUGBASH-2026-05-20 (B17). P0/P1s on the broker route (rate-limit, auth, signing key) ship separately in the api repo (they touch handler middleware, not common). Fixes in this commit: * B17-STORAGE-P2-14 — Add common/storageprovider/sanitise.go with SanitiseTenantKey(in string) string. The api-side legacy `sanitisePresignKey` covers `..`, `.`, leading `/` and double-slash but not the shapes the audit flagged: - URL-encoded `..` (%2e%2e, %2E%2E, ..%2f, mixed case, double-encoded) - NUL bytes (raw \x00 and percent-encoded %00) anywhere in the key - Windows-style \\\\ separators that minio-go treats as literals - Mixed Unicode dots (documented as NOT collapsed — homoglyphs like U+2025 are regular key segments) Sanitisation is conservative: `.` / `..` components are DROPPED, never path-resolved. That's strictly safer than path.Clean (which would pop a legitimate parent segment if a tenant snuck `..` past the decoder). Tests cover 25+ traversal shapes and pin three invariants: - no leading slash on output - no `.` or `..` component survives - no NUL byte survives The api's legacy sanitiser is kept for now; migration of callsites is a separate slice — this commit is the canonical helper + coverage. * B17-STORAGE-P2-16 — Document the previously "dead" Capabilities fields (ServerAccessLogs, MaxKeysPerAccount) explicitly as INFORMATIONAL ONLY. Both are populated by every backend impl (do-spaces 200, r2 0, s3 0, minio 0) but consumed by no routing code today. The doc now spells out why they exist (operator audits + future credential-pool / cap-alert hooks have one source of truth) and tells readers NOT to branch routing decisions on them. Avoids the next reviewer concluding they're dead and removing them, breaking forward-compat for consumers that started reading the fields after the abstraction shipped. Coverage block per CLAUDE.md rule 17: Symptom: path-traversal sanitiser missing URL-encoded / NUL / Windows-separator shapes (B17-STORAGE-P2-14) + dead Capabilities fields with no consumer (B17-STORAGE-P2-16) Enumeration: `grep -rn sanitisePresignKey api/` (1 site, kept) + `grep -rn 'ServerAccessLogs\\|MaxKeysPerAccount'` (5 sites: provider.go + 4 backend impls; doc-only change, no behavior delta) Sites found: 2 sanitisers + 5 Capabilities field references Sites touched: 1 new canonical sanitiser in common (api-side migration deferred — sanitise.go is the canonical surface; api's legacy sanitisePresignKey is documented in api/internal/handlers/storage_presign.go and will be swapped in a follow-up slice) + provider.go godoc Coverage test: TestSanitiseTenantKey_DefenseInDepth (25 cases) + TestSanitiseTenantKey_NoLeadingSlash + TestSanitiseTenantKey_NoTraversalComponentSurvives + TestSanitiseTenantKey_StripsNUL Gates green: go build ./... clean / go vet ./... clean / go test ./... -count=1 PASS (all 12 packages green; ok instant.dev/common/storageprovider 4.398s) Live verified: Library change — api/worker/provisioner pick it up on their next CI run (they depend on instant.dev/common via go.mod replace or version bump). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(plans): B6-P3 — growth.deployments_apps 5 → 50 Pro's deployments_apps = 10; the previous Growth value of 5 placed Growth ($99/mo) BELOW Pro ($49/mo) on a customer-facing dimension. Bumped to 50 — preserves tier-ladder ordering above Pro while staying short of Team's unlimited (-1). Kept synchronised with api/plans.yaml (the api repo's wave-3 consolidated commit also flips the value); the api's tier-ladder invariants pinning test loads api/plans.yaml directly, so this commit only affects the embedded defaultYAML fallback used in package-default tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * security(readiness): redact secrets in scrub() before truncation Wave-3 audit P1, 2026-05-21. scrub() in common/readiness/checks.go truncated upstream errors to 80 chars but did NOT redact credential fragments. A real-world pq error like 'password authentication failed for user "instant" password=...' would surface verbatim via the publicly-reachable /readyz endpoint on api/worker/provisioner. Affects two callsites: PingDB, PingRedis. HTTPHeadCheck + GRPCHealth already used scrubNetError which maps to a fixed enum. Fix: - Redact BEFORE truncate. Truncate-first leaks credentials that land in the first 80 chars of the upstream message. - Package-level regexp registry covers: pq password=/passwd=/pwd= kv pairs, URL-embedded credentials (scheme://user:pass@host), pq 'for user "..."' username leak (semi-sensitive), Authorization: Bearer/Basic, known secret-shape prefixes (xkeysib-, sk-, rzp_), catch-all 32+ hex. Tests (CLAUDE.md rule 18 — registry-iterating, not hand-typed): - TestScrub_RedactsDBPassword, _URLCredentials, _Bearer, _HexSecrets, _KnownPrefixes — per-pattern unit assertions - TestScrub_RedactsBeforeTruncating — pins the load-bearing redact-before-truncate invariant - TestScrub_RegistryWalk — 15-row registry walks every shape; a new secretPatterns entry without a registry row trips review - TestPingRedis_RedactsCredentialsEndToEnd — exercises the public callsite end-to-end via fakePinger - TestScrub_TruncatesAfterRedaction / _TrimsWhitespace / _PreservesNonSecretShape — defensive regression coverage Coverage block: Symptom: /readyz last_error leaked DB/URL/Bearer creds Enumeration: rg -F 'scrub(' common/readiness Sites found: 2 (PingDB, PingRedis) Sites touched: 2 — fix is in scrub() itself; both callers inherit Coverage test: TestScrub_RegistryWalk + TestPingRedis_RedactsCredentialsEndToEnd Live verified: /readyz JSON shape — last_error empty in healthy state on api/worker/provisioner; degraded paths will now redact ExportForTest pattern keeps the scrub() helper unexported in production binaries while letting external _test packages assert on the raw output directly. Gate: cd common && go build ./... && go vet ./... && go test ./readiness/... -count=1 -race ALL GREEN (24 tests inc. 15 registry rows). Pre-existing plans/TestDeploymentsAppsLimit_Tiers failure is from cc97d4f (growth 5→50) and out of scope for this security fix. * fix(bugbash 2026-05-21): NATS AccountSeed for post-restart revocation + test alignment (#14) * fix(queueprovider/nats): A04-F3 — expose AccountSeed for post-restart revocation Migration 060 added resources.queue_account_seed_encrypted to make NATS account revocation survive a provisioner pod restart, but IssueTenantCredentials was discarding the freshly-minted account seed (`_ = accountSeed`). Without the seed reaching the api caller, the column was never populated and RevokeWith Seed could never re-sign the account claim after a restart wiped the in-memory accountCache. This change: - Adds TenantCreds.AccountSeed (documented as a secret; NEVER log). - Populates AccountSeed in nats.IssueTenantCredentials. - Adds round-trip test proving RevokeWithSeed works without accountCache (simulates the post-restart path that migration 060 was built for). Cross-repo: api + worker must (a) bump common, (b) AES-256-GCM-encrypt AccountSeed via the existing keyring and persist to queue_account_seed_ encrypted, (c) decrypt + pass to RevokeWithSeed on teardown. Tracked separately. Forward-compatible: AccountSeed is only populated on isolated provisions, so legacy_open prod is unaffected. Coverage block (rule 17): Symptom: queue_account_seed_encrypted always NULL; revocation no-ops post-restart Enumeration: rg -n 'AccountSeed|queue_account_seed_encrypted' common/ Sites found: 3 (TenantCreds field, IssueTenantCredentials return, RevokeWithSeed param) Sites touched: all 3 (RevokeWithSeed already accepted seed; populating it now activates the path) Coverage test: TestNATS_IssueExposesAccountSeed_AndRevokeWithSeed_RoundTrips Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): growth tier DeploymentsAppsLimit asserts 50 (wave-3 BugBash value) Wave-3 BugBash bumped growth tier deployments_apps from 5 → 50 in plans.yaml; test was not updated. Test fix only — plans.yaml + common/plans/plans.go defaultYAML are the authoritative source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: Tier 1 OSS security scanners Adds GitHub-native + free OSS vulnerability scanners. 100% free for public repos. - CodeQL with security-extended query suite - Dependabot for gomod + github-actions - govulncheck (Go reachability-filtered CVE scan) - OSV-Scanner (cross-ecosystem CVE scan) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: scanner workflows clone sibling proto repo The Tier 1 CodeQL + govulncheck workflows failed on PR #16 because common uses `replace instant.dev/proto => ../proto` in go.mod. Fix: each workflow now checks out common into ./common, plus clones the public sibling repo InstaNode-dev/proto. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(go): bump toolchain to 1.25.10 — fixes reachable stdlib CVEs govulncheck on PR #16 flagged Go-stdlib vulnerabilities reachable from production code paths. All fixed in Go 1.25.9–1.25.10. Also merges any in-flight master commits onto the scanner-install branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mastermanas805 merged commit 0a31ebb into master May 11, 2026

mastermanas805 deleted the feat/vault-plan-policy branch May 11, 2026 08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(plans): add vault tier policy (max entries + allowed envs)#1

feat(plans): add vault tier policy (max entries + allowed envs)#1
mastermanas805 merged 1 commit into
masterfrom
feat/vault-plan-policy

mastermanas805 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mastermanas805 commented May 11, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant