feat(scripts): add Stripe admin backfills#2002
Conversation
📝 WalkthroughWalkthroughIntroduces Stripe backfill automation infrastructure comprising two new CLI scripts that reconstruct database records: one backfills org conversion rates from org creation timestamps and global stats, the other backfills Stripe customer countries from the Stripe API. Includes shared utility functions and comprehensive unit tests. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as Backfill Script
participant Supabase as Supabase<br/>(Database)
participant Stripe as Stripe API
User->>CLI: Run with --from/--to dates (--apply flag)
CLI->>Supabase: Load global_stats rows in date range
CLI->>Supabase: Load orgs.created_at timestamps
CLI->>CLI: Paginate & transform:<br/>For each global_stats row,<br/>calculate next_rate & orgs count
CLI->>CLI: Identify changed rows<br/>(delta > 0.0001)
alt Dry-run mode
CLI-->>User: Report sample changes,<br/>row counts
else Apply mode
CLI->>Supabase: Update changed rows<br/>(concurrent batches)
Supabase-->>CLI: Confirm updates
CLI-->>User: Report final totals
end
sequenceDiagram
actor User
participant CLI as Backfill Script
participant Supabase as Supabase<br/>(Database)
participant Stripe as Stripe API
participant FileSystem as File System
User->>CLI: Run with optional --customer-id, --limit (--apply flag)
CLI->>Supabase: Load stripe_info rows<br/>(scope by customer if provided)
loop Concurrent batches (configurable)
CLI->>Stripe: Fetch customer address for each ID
Stripe-->>CLI: Return customer data or error
CLI->>CLI: Extract & normalize<br/>country code to ISO-2
CLI->>CLI: Decide: update if missing<br/>or refresh mode enabled
end
CLI->>FileSystem: Write failures to<br/>./tmp/backfill_failures.json
alt Dry-run mode
CLI-->>User: Report sample from→to changes
alt Any failures
CLI-->>CLI: Throw error
end
else Apply mode
CLI->>Supabase: Write updates (concurrent)
Supabase-->>CLI: Confirm writes
CLI-->>User: Report update totals
alt Any failures
CLI-->>CLI: Throw error
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 0/5 reviews remaining, refill in 55 minutes and 30 seconds. Comment |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub. |
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7c53aeef28
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| .order('created_at', { ascending: true }) | ||
| .range(offset, offset + DEFAULT_PAGE_SIZE - 1) |
There was a problem hiding this comment.
Make org pagination deterministic for backfill reads
Ordering orgs only by created_at and paginating with range(offset, ...) can skip or duplicate rows when multiple orgs share the same timestamp, because tie ordering is not stable across pages. In that case the reconstructed denominator becomes incorrect and org_conversion_rate updates are wrong for affected dates. This is especially likely on historical bulk inserts where many rows have identical created_at values; include a unique secondary sort key (or keyset pagination) to guarantee deterministic paging.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/admin_stripe_backfill_utils.ts`:
- Around line 111-113: The isActionableStripeCustomerId helper currently treats
any trimmed, non-`pending_` value as valid; change it to only accept real Stripe
customer IDs by ensuring the trimmed value both startsWith('cus_') and is not
prefixed with 'pending_'. Update the function isActionableStripeCustomerId to
return true only when customerId is non-null/defined, trimmed,
startsWith('cus_'), and does not startWith('pending_') so malformed IDs like
'sub_' or 'acct_' are rejected before calling stripe.customers.retrieve.
In `@scripts/backfill_org_conversion_rate_trend.ts`:
- Around line 237-245: The asyncPool worker currently lets a single rejected
updateConversionRate() reject the whole pool and leaves in-flight tasks
untracked; instead, wrap the per-row call to updateConversionRate(supabase, row)
in a try/catch inside the asyncPool callback, push failed row identifiers (e.g.,
row.date_id or the full row) into a local failures array, increment updated only
on success, and continue so other in-flight tasks can finish; after asyncPool
resolves, log or throw a summary using failures and updated to surface which
date_id(s) failed and ensure the script exits non-zero if you need a failing CI
signal.
- Around line 46-49: assertDateId currently only checks DATE_ID_REGEX; update it
to also reject impossible calendar dates by parsing the YYYY-MM-DD parts and
validating them: split value into year, month, day, construct a Date (or use a
reliable date library) and confirm the Date is valid and that its year/month/day
match the parsed components (to catch things like 2026-02-31), and throw the
same Error(`${label} must use YYYY-MM-DD and be a valid date`) when invalid;
keep the original function name assertDateId and leave DATE_ID_REGEX in place
for the shape check so fetchGlobalStatsRows will never receive nonsensical date
strings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7f80493b-b9fa-481a-bd0b-3df460e3e580
📒 Files selected for processing (5)
package.jsonscripts/admin_stripe_backfill_utils.tsscripts/backfill_org_conversion_rate_trend.tsscripts/backfill_stripe_customer_countries.tstests/admin-stripe-backfill-scripts.unit.test.ts
| export function isActionableStripeCustomerId(customerId: string | null | undefined) { | ||
| const trimmedCustomerId = customerId?.trim() | ||
| return !!trimmedCustomerId && !trimmedCustomerId.startsWith('pending_') |
There was a problem hiding this comment.
Only accept real Stripe customer IDs here.
This helper currently treats any non-empty, non-pending_ value as actionable. In scripts/backfill_stripe_customer_countries.ts, that is the last gate before stripe.customers.retrieve(...), so malformed IDs like sub_... or acct_... become avoidable failures and can abort the run. Filter explicitly to cus_ IDs.
Suggested fix
export function isActionableStripeCustomerId(customerId: string | null | undefined) {
const trimmedCustomerId = customerId?.trim()
- return !!trimmedCustomerId && !trimmedCustomerId.startsWith('pending_')
+ return !!trimmedCustomerId
+ && trimmedCustomerId.startsWith('cus_')
+ && !trimmedCustomerId.startsWith('pending_')
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/admin_stripe_backfill_utils.ts` around lines 111 - 113, The
isActionableStripeCustomerId helper currently treats any trimmed, non-`pending_`
value as valid; change it to only accept real Stripe customer IDs by ensuring
the trimmed value both startsWith('cus_') and is not prefixed with 'pending_'.
Update the function isActionableStripeCustomerId to return true only when
customerId is non-null/defined, trimmed, startsWith('cus_'), and does not
startWith('pending_') so malformed IDs like 'sub_' or 'acct_' are rejected
before calling stripe.customers.retrieve.
| function assertDateId(value: string, label: string) { | ||
| if (!DATE_ID_REGEX.test(value)) | ||
| throw new Error(`${label} must use YYYY-MM-DD`) | ||
| return value |
There was a problem hiding this comment.
Reject impossible calendar dates, not just YYYY-MM-DD strings.
assertDateId() only checks the shape. Inputs like 2026-02-31 still pass, and --from is later used as a plain string filter in fetchGlobalStatsRows(), so the script can target the wrong range instead of failing fast.
Suggested fix
function assertDateId(value: string, label: string) {
if (!DATE_ID_REGEX.test(value))
throw new Error(`${label} must use YYYY-MM-DD`)
+
+ const parsed = new Date(`${value}T00:00:00.000Z`)
+ if (Number.isNaN(parsed.getTime()) || getDateId(parsed) !== value)
+ throw new Error(`${label} must be a real UTC calendar date`)
+
return value
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function assertDateId(value: string, label: string) { | |
| if (!DATE_ID_REGEX.test(value)) | |
| throw new Error(`${label} must use YYYY-MM-DD`) | |
| return value | |
| function assertDateId(value: string, label: string) { | |
| if (!DATE_ID_REGEX.test(value)) | |
| throw new Error(`${label} must use YYYY-MM-DD`) | |
| const parsed = new Date(`${value}T00:00:00.000Z`) | |
| if (Number.isNaN(parsed.getTime()) || getDateId(parsed) !== value) | |
| throw new Error(`${label} must be a real UTC calendar date`) | |
| return value | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/backfill_org_conversion_rate_trend.ts` around lines 46 - 49,
assertDateId currently only checks DATE_ID_REGEX; update it to also reject
impossible calendar dates by parsing the YYYY-MM-DD parts and validating them:
split value into year, month, day, construct a Date (or use a reliable date
library) and confirm the Date is valid and that its year/month/day match the
parsed components (to catch things like 2026-02-31), and throw the same
Error(`${label} must use YYYY-MM-DD and be a valid date`) when invalid; keep the
original function name assertDateId and leave DATE_ID_REGEX in place for the
shape check so fetchGlobalStatsRows will never receive nonsensical date strings.
| let updated = 0 | ||
| await asyncPool(concurrency, changedRows, async (row) => { | ||
| await updateConversionRate(supabase, row) | ||
| updated++ | ||
| if (updated % 100 === 0 || updated === changedRows.length) | ||
| console.log(`Updated ${updated}/${changedRows.length}`) | ||
| }) | ||
|
|
||
| console.log(`Done. Updated ${updated}/${changedRows.length} org conversion rate rows.`) |
There was a problem hiding this comment.
Handle per-row update failures before continuing.
A single rejected updateConversionRate() makes asyncPool() reject immediately, but the other in-flight updates have already started. That can leave a partially applied backfill with no record of which date_ids failed. Catch and collect row-level failures here, then report them at the end.
Suggested fix
let updated = 0
+ const failures: Array<{ date_id: string, error: string }> = []
await asyncPool(concurrency, changedRows, async (row) => {
- await updateConversionRate(supabase, row)
- updated++
- if (updated % 100 === 0 || updated === changedRows.length)
- console.log(`Updated ${updated}/${changedRows.length}`)
+ try {
+ await updateConversionRate(supabase, row)
+ updated++
+ if (updated % 100 === 0 || updated === changedRows.length)
+ console.log(`Updated ${updated}/${changedRows.length}`)
+ }
+ catch (error) {
+ failures.push({
+ date_id: row.date_id,
+ error: error instanceof Error ? error.message : String(error),
+ })
+ }
})
+ if (failures.length > 0)
+ throw new Error(`Org conversion rate backfill completed with ${failures.length} failures`)
+
console.log(`Done. Updated ${updated}/${changedRows.length} org conversion rate rows.`)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let updated = 0 | |
| await asyncPool(concurrency, changedRows, async (row) => { | |
| await updateConversionRate(supabase, row) | |
| updated++ | |
| if (updated % 100 === 0 || updated === changedRows.length) | |
| console.log(`Updated ${updated}/${changedRows.length}`) | |
| }) | |
| console.log(`Done. Updated ${updated}/${changedRows.length} org conversion rate rows.`) | |
| let updated = 0 | |
| const failures: Array<{ date_id: string, error: string }> = [] | |
| await asyncPool(concurrency, changedRows, async (row) => { | |
| try { | |
| await updateConversionRate(supabase, row) | |
| updated++ | |
| if (updated % 100 === 0 || updated === changedRows.length) | |
| console.log(`Updated ${updated}/${changedRows.length}`) | |
| } | |
| catch (error) { | |
| failures.push({ | |
| date_id: row.date_id, | |
| error: error instanceof Error ? error.message : String(error), | |
| }) | |
| } | |
| }) | |
| if (failures.length > 0) | |
| throw new Error(`Org conversion rate backfill completed with ${failures.length} failures`) | |
| console.log(`Done. Updated ${updated}/${changedRows.length} org conversion rate rows.`) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/backfill_org_conversion_rate_trend.ts` around lines 237 - 245, The
asyncPool worker currently lets a single rejected updateConversionRate() reject
the whole pool and leaves in-flight tasks untracked; instead, wrap the per-row
call to updateConversionRate(supabase, row) in a try/catch inside the asyncPool
callback, push failed row identifiers (e.g., row.date_id or the full row) into a
local failures array, increment updated only on success, and continue so other
in-flight tasks can finish; after asyncPool resolves, log or throw a summary
using failures and updated to surface which date_id(s) failed and ensure the
script exits non-zero if you need a failing CI signal.



Summary (AI generated)
Motivation (AI generated)
The admin dashboard has historical Stripe-backed analytics fields that can be empty or stale for older rows. These scripts let us safely backfill the derived conversion trend and billing country data without changing runtime API behavior.
Business Impact (AI generated)
More complete admin analytics improve visibility into conversion health and customer geography, which helps reporting, growth analysis, and billing operations.
Test Plan (AI generated)
bunx eslint --no-ignore scripts/admin_stripe_backfill_utils.ts scripts/backfill_org_conversion_rate_trend.ts scripts/backfill_stripe_customer_countries.ts tests/admin-stripe-backfill-scripts.unit.test.tsbunx tsc --ignoreConfig --noEmit --allowImportingTsExtensions --moduleResolution bundler --module ESNext --target ES2020 --lib DOM,ESNext --types bun,vitest --strict --esModuleInterop --skipLibCheck scripts/admin_stripe_backfill_utils.ts scripts/backfill_org_conversion_rate_trend.ts scripts/backfill_stripe_customer_countries.ts tests/admin-stripe-backfill-scripts.unit.test.tsbunx vitest run tests/admin-stripe-backfill-scripts.unit.test.tsbun run cli:build && vue-tsc --noEmit; it exited successfully while printing existing Vue macro plugin resolution warnings.Generated with AI
Summary by CodeRabbit
Chores
Tests