Skip to content

perf: switch analytics gas reads to the denormalized column#1434

Merged
suisuss merged 1 commit into
stagingfrom
feature/KEEP-683-analytics-read-column
Jun 2, 2026
Merged

perf: switch analytics gas reads to the denormalized column#1434
suisuss merged 1 commit into
stagingfrom
feature/KEEP-683-analytics-read-column

Conversation

@suisuss
Copy link
Copy Markdown

@suisuss suisuss commented Jun 2, 2026

Stacked on #1433 (base branch is feature/KEEP-683-denormalize-workflow-gas, not staging). Draft until #1433 merges and the backfill has run - retarget to staging then.

What

Switches the two heaviest /analytics gas reads off the per-step logs JSONB scan and onto the denormalized workflow_executions.gas_used_wei column added in #1433:

  • getWorkflowGasTotal - powers the summary KPI for both the current and previous period
  • the workflow portion of getSpendCapData - today's gas against the daily cap

Both now SUM(workflow_executions.gas_used_wei) over the org+window via the workflows join only. No logs table, no JSONB parse, no TOAST detoast.

Why / impact

This is the cold-scan that caching could not fix and that drove the 2026-05-29 RDS saturation. With the value as a first-class numeric column on a table already in the query, the org+window slice aggregates directly and the date filter pushes into an index.

Staging EXPLAIN, heaviest org, 30-day window

The new column is not on staging yet (#1433 not deployed there), so this is a shape-identical proxy: SUM(duration) (an existing numeric column) over the exact same workflow_executions JOIN workflows WHERE org + started_at the gas query now uses.

Metric Before (logs JSONB) After (column)
Execution time 2,140 ms 397 ms
Buffer traffic ~653k buffers (~5.1 GB) ~20.7k buffers (~162 MB)
Date filter residual Filter, not index-bound pushed into idx_workflow_executions_workflow_started
Logs probe / TOAST full per-execution probe + detoast none

The date filter is now index-pushed - the exact thing that was structurally impossible on the old log-keyed path. Staging has no TOAST pressure; on prod the win is larger because the old path's blob detoast is eliminated entirely.

Behavioral note

Gas is now windowed by run start (workflow_executions.started_at) rather than per-step time, because the column is a run-level rollup. This makes gas consistent with every other summary metric (all already keyed to run start). Only boundary-straddling runs reattribute, by the gap between run start and a late gas-bearing step - immaterial at dashboard granularity. The full per-network breakdown, which genuinely needs step-level time/network, is unchanged and tracked as a separate follow-up.

Not changed

  • getNetworkBreakdown and the runs-table aggregation still read the logs (network breakdown needs step granularity; the runs path is already paged and cheap).

Merge prerequisites

  1. feat: denormalize workflow gas onto workflow_executions #1433 merged and deployed (columns exist, writer populating new rows).
  2. Backfill run on the target env and the equivalence check passed: per org, all-time SUM(gas_used_wei) equals the old JSON sum over that org's logs.

Only then does switching the read keep totals stable.

@suisuss suisuss force-pushed the feature/KEEP-683-analytics-read-column branch from 6cdde89 to 03d2376 Compare June 2, 2026 04:11
Base automatically changed from feature/KEEP-683-denormalize-workflow-gas to staging June 2, 2026 04:27
…spend-cap

Switch getWorkflowGasTotal (summary current + previous period) and the
workflow portion of getSpendCapData to SUM workflow_executions.gas_used_wei
instead of re-extracting and summing the per-step logs JSONB. Removes the
logs join, the JSONB parse, and the TOAST detoast - the org+window slice is
aggregated straight off workflow_executions with the date filter pushed into
idx_workflow_executions_workflow_started.

Staging proxy EXPLAIN (heaviest org, 30d): 2140ms / 5.1GB -> 397ms / 162MB.

Gas is now windowed by run start (workflow_executions.started_at) rather than
per-step time, consistent with the other summary metrics. Depends on the
denormalized columns and backfill from the previous PR.
@suisuss suisuss force-pushed the feature/KEEP-683-analytics-read-column branch from 03d2376 to 29e2070 Compare June 2, 2026 06:06
@suisuss suisuss marked this pull request as ready for review June 2, 2026 06:10
@suisuss suisuss merged commit 55d62e0 into staging Jun 2, 2026
48 of 49 checks passed
@suisuss suisuss deleted the feature/KEEP-683-analytics-read-column branch June 2, 2026 06:21
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

🧹 PR Environment Cleaned Up

The PR environment has been successfully deleted.

Deleted Resources:

  • Namespace: pr-1434
  • All Helm releases (Keeperhub, Scheduler, Event services)
  • PostgreSQL Database (including data)
  • LocalStack, Redis
  • All associated secrets and configs

All resources have been cleaned up and will no longer incur costs.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

ℹ️ No PR Environment to Clean Up

No PR environment was found for this PR. This is expected if:

  • The PR never had the deploy-pr-environment label
  • The environment was already cleaned up
  • The deployment never completed successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant