Skip to content

To Prod#169

Merged
eskp merged 131 commits intoprodfrom
staging
Jan 22, 2026
Merged

To Prod#169
eskp merged 131 commits intoprodfrom
staging

Conversation

@eskp
Copy link
Copy Markdown

@eskp eskp commented Jan 22, 2026

No description provided.

suisuss and others added 30 commits January 14, 2026 12:00
- Add scripts/analyze-steps.ts for static code analysis of step functions
- Add scripts/profile-step.ts for CPU profiling individual steps
- Add scripts/workflow-runner-profiled.ts for profiled workflow execution
- Add scripts/reset-password.ts utility for password resets
- Add @noble/hashes and prom-client dependencies
Workflow runner jobs exit before Prometheus can scrape them, causing
metrics to be lost. This module queries execution statistics directly
from the database to provide persistent metrics.

- getWorkflowStatsFromDb(): queries workflow_executions table for
  status counts and duration histogram data
- getStepStatsFromDb(): queries workflow_execution_logs table for
  step-level metrics by type and status
Replace Counter/Histogram metrics with Gauge metrics for workflow
executions and steps. These gauges are populated from the database
on each Prometheus scrape via updateDbMetrics().

Metrics now use original names (no _db suffix):
- keeperhub_workflow_executions_total{status}
- keeperhub_workflow_execution_errors_total
- keeperhub_workflow_execution_duration_ms_bucket{le}
- keeperhub_workflow_step_executions_total{step_type,status}
- keeperhub_workflow_step_errors_total{step_type}
- keeperhub_workflow_step_duration_ms_bucket{le}

This ensures metrics persist even when workflow runner jobs exit
before Prometheus can scrape them.
Call updateDbMetrics() before returning metrics to ensure fresh
workflow execution data from the database on each scrape.
Add additional metrics that can be derived from the database:
- workflow_queue_depth: pending workflow count from DB
- workflow_concurrent_count: running workflow count from DB
- user_active_daily: distinct users with active sessions in 24h

These metrics now reflect actual database state rather than
relying on in-process counters that may not be scraped.
- Add Data Sources section explaining DB vs API process metrics
- Add Source column to all metric tables
- Document db-metrics.ts functions
- Update Prometheus metric names table with correct types
- Add DB-Sourced Metrics section explaining why DB queries are needed
Add getUserStatsFromDb() and getOrgStatsFromDb() functions to query
user and organization statistics from the database for Prometheus metrics.

User stats: total, verified, anonymous, with_workflows, with_integrations
Org stats: total, members_total, members_by_role, invitations_pending, with_workflows
Add DB-sourced gauges for user and organization metrics:
- keeperhub_user_total, keeperhub_user_verified_total, etc.
- keeperhub_org_total, keeperhub_org_members_by_role, etc.

Update updateDbMetrics() to populate these gauges on each scrape.
Add section 5 documenting user and organization metrics.
Update Prometheus metric names table, instrumentation files,
and DB-sourced metrics section with new tables queried.
Add new DB query functions for comprehensive metrics coverage:
- getWorkflowDefinitionStatsFromDb(): total, public, private, anonymous
- getScheduleStatsFromDb(): total, enabled, by last_status
- getIntegrationStatsFromDb(): total, managed, by type
- getInfraStatsFromDb(): API keys, chains, wallets, sessions
Add DB-sourced gauges:
- Workflow definitions: total, by visibility, anonymous
- Schedules: total, enabled, by last status
- Integrations: total, managed, by type
- Infrastructure: API keys, chains, wallets, sessions

Update updateDbMetrics() to populate all new gauges on each scrape.
Add documentation for new metric categories:
- Workflow Definition Metrics (total, visibility, anonymous)
- Schedule Metrics (total, enabled, by status)
- Integration Metrics (total, managed, by type)
- Infrastructure Metrics (API keys, chains, wallets, sessions)

Update Prometheus metric names table and DB-sourced metrics section.
Add section explaining user/org/wallet model and expected metric
relationships:
- org.total ≈ para_wallet.total (1:1)
- user.total ≥ org.total (shared orgs via invites)
- user.anonymous = users without orgs (trial mode)
- Web3 steps require org + wallet
Add 15 missing metrics to the Prometheus Metric Names table:
- Histogram metrics: api.status.latency_ms, plugin.action.duration_ms,
  ai.generation.duration_ms, external.service.latency_ms
- Counter metrics: plugin.invocations.total, ai.tokens.consumed,
  api.errors.total, external.service.errors, db.query.slow_count
- Gauge metrics: duration histogram sum/count variants,
  db.pool.utilization
Remove unimplemented metrics from documentation:
- api.requests.total, ai.tokens.consumed (never incremented)
- api.errors.total, external.service.errors (never called)
- db.pool.utilization, db.query.slow_count (dead code)
- external.service.latency_ms (never called)

Add cancelled status support to workflow execution stats.

Update Data Sources table and add note about DB histogram-as-gauge
semantics for Prometheus tooling compatibility.
Reset orgMembersByRole, scheduleByLastStatus, integrationByType,
stepExecutionsTotal, and stepErrorsTotal gauges before repopulating
to prevent labels from lingering when categories are removed.

Also clarify collector behavior in docs: Prometheus uses DB snapshots
while console logger receives runtime increments.
- Add dbSourcedMetrics set to silently skip workflow/step metrics
  in Prometheus mode (they're populated via DB scrape instead)
- Fix api.webhook.latency_ms labels: add execution_id
- Fix api.status.latency_ms labels: add status_code, execution_status
- Add api.errors.total back to docs (emitted by webhook handler)
Remove unused metrics infrastructure:
- apiRequests counter and counterMap entry (api.requests.total)
- aiTokensConsumed counter and counterMap entry (ai.tokens.consumed)
- startApiMetrics function and helper (never called by routes)
- API_REQUESTS_TOTAL and AI_TOKENS_CONSUMED from MetricNames
…owlist

- Add workflow.queue.depth and workflow.concurrent.count to dbSourcedMetrics
  set to silence Prometheus warnings (these are populated from DB)
- Remove externalServiceLatency histogram (never used in production)
- Remove externalServiceErrors counter (never used in production)
- Remove recordExternalServiceCall function from plugin instrumentation
- Remove externalService parameter from recordPluginMetrics
- Remove EXTERNAL_SERVICE_ERRORS from MetricNames
- Remove dead startApiMetrics tests from api-metrics.test.ts
- Clean up plugin-metrics tests
Link to metrics reference doc with brief overview of what's tracked:
- Workflow execution performance
- API latency
- Plugin action metrics
- User & organization stats
- Infrastructure metrics
…for-wallet

feat: Allow user to reassign email for Para wallet
* Add pricing model documentation and smart contract specs

- Add billing section to docs with pricing and credits pages
- Document NFT-based tier system (Developer/Team/Company/Enterprise)
- Document credit system for workflow runs and gas payments
- Add smart contract specifications for KeeperHubTiers and KeeperHubCredits
- New orgs receive 2,500 free credits
- Credits never expire, no rollover complexity
- All payments via smart contract (ETH, USDC, USDT, USDS)

* Apply 50% price bump and add competitor comparison

- Team: $225/yr, $675 lifetime (was $199/$599)
- Company: $675/yr, $2,025 lifetime (was $449/$1,349)
- Add "How We Compare" section with Zapier, n8n, Pabbly comparison
- Update smart contract pricing configuration

* Hide billing section from docs sidebar

* Fix docs navbar border and sidebar ordering

- Add full-width navbar border using pseudo-element
- Rename Documentation to Overview in sidebar
- Move FAQ to end of sidebar
- Hide internal spec files from sidebar (organization-*, etc.)

* Fix pagefind on Alpine by adding libc6-compat to builder stage

* Revert "Fix pagefind on Alpine by adding libc6-compat to builder stage"

This reverts commit 4cea31e.
- Rename unused WORKFLOW_LABELS/STEP_LABELS to _WORKFLOW_LABELS/_STEP_LABELS
- Add default case to status switch statement
- Extract parseStepDurationBuckets helper to reduce cognitive complexity
- Fix test file formatting (remove extra blank line)
Add workflow steps for ERC20 token operations:
- check-token-balance: Query balanceOf with auto-fetch of symbol/decimals/name
- transfer-token: ERC20 transfer with balance validation before send

Add shared contract infrastructure:
- lib/contracts/abis/erc20.json: OpenZeppelin IERC20Metadata ABI
- lib/contracts/abis/multicall3.json: Multicall3 ABI for batch calls
- lib/contracts/tokens.ts: Common token addresses per chain
suisuss and others added 11 commits January 21, 2026 16:41
…ce-management-and-gas-estimation

# Conflicts:
#	tests/e2e/api-key-auth.test.ts
#	tests/integration/web3-steps.test.ts
…-profiling

feat/KEEP-1229 workflow and step function profiling complexity and operation cost
…nagement-and-gas-estimation

feat/keep 1240 nonce management and gas estimation
…eding-env-var-json

fix: KEEP-1258 Chain seeding always takes values from env var JSON
…r-docker-size

perf: KEEP-1257 Reduce scheduler Docker image from 1.85GB to 267MB
@eskp eskp requested review from a team, OleksandrUA, joelorzet, suisuss and taitsengstock and removed request for a team January 22, 2026 02:31
@eskp eskp merged commit 48b9c50 into prod Jan 22, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants