Skip to content

feat(tx): lower default gas_adjustment to 1.3, make tuning configurable, retry on OOG#283

Merged
mateeullahmalik merged 3 commits intomasterfrom
feat/sn-finalize-gas-adjustment-configurable
Apr 22, 2026
Merged

feat(tx): lower default gas_adjustment to 1.3, make tuning configurable, retry on OOG#283
mateeullahmalik merged 3 commits intomasterfrom
feat/sn-finalize-gas-adjustment-configurable

Conversation

@mateeullahmalik
Copy link
Copy Markdown
Collaborator

Summary

Supernode was using gas_adjustment=1.5 + 50k padding for every outbound tx (including MsgFinalizeAction), inflating fee spend. This PR:

  1. Lowers the default GasAdjustment from 1.51.3.
  2. Makes the full fee/gas tuning bundle operator-configurable via YAML.
  3. Adds an OOG-retry path with multiplicative escalation as a safety net, so a lower default can never cause transaction failure.

Client-side only. No chain change. No migration. No upgrade handler.

Ref: Notion — SN Gas used for Finalize tx is too high

Behavior change

Defaults

Knob Before After
GasAdjustment 1.5 1.3
GasAdjustmentMultiplier 1.3
GasAdjustmentMaxAttempts 3 (cap 10)
GasPadding 50000 50000
GasPrice 0.025 0.025
FeeDenom ulume ulume

New optional YAML (supernode.yml under lumera:):

lumera:
  grpc_addr: ...
  chain_id: ...
  gas_adjustment: 1.3
  gas_adjustment_multiplier: 1.3
  gas_adjustment_max_attempts: 3
  gas_padding: 50000
  gas_price: "0.025"      # or "0.025ulume"
  fee_denom: ulume

Omitted / zero-valued fields fall back to package defaults deterministically.

OOG retry

  • Detects both "out of gas" simulation errors and code=11 codespace=sdk broadcast errors.
  • Escalates GasAdjustment *= Multiplier on each retry. Default sequence: 1.3 → 1.69 → 2.197.
  • Cloned per attempt — never mutates caller/shared TxConfig.
  • Non-OOG errors bail immediately (preserves the outer sequence-mismatch retry semantics).
  • Hard-capped at 10 attempts as a safety net against fee runaway.

Observability

Every retry emits a structured logtrace line that's facet-able in Datadog:

metric="tx_oog_retry"
outcome={retrying|success|exhausted}
attempt / max_attempts
initial_gas_adjustment / prev_gas_adjustment / new_gas_adjustment
gas_adjustment_multiplier
error

Files changed

  • pkg/lumera/modules/tx/impl.go, helper.go, interface.go — defaults, isOutOfGas, executeWithOOGRetry, applyTxHelperDefaults, TxConfig/TxHelperConfig extensions, retry wiring in ExecuteTransaction.
  • pkg/lumera/config.go, client.goTxOptions variadic on NewConfig; Config.toTxHelperConfig(); all three msg modules receive explicit helper config.
  • pkg/lumera/modules/{action_msg,supernode_msg,audit_msg} — new NewModuleWithTxHelperConfig constructors; legacy NewModule preserved for backward compat.
  • supernode/config/config.go — six new optional LumeraClientConfig YAML fields.
  • supernode/cmd/start.go — propagates YAML → lumera.NewConfig.

Tests

16 new tests, all passing:

pkg/lumera/modules/tx/helper_test.go:

  • Default is exactly 1.3.
  • applyTxHelperDefaults — zero → defaults, operator values preserved, max-attempts cap honored.
  • isOutOfGas — 6 subcases: nil / simulation string / code=11 codespace=sdk / raw_log-only / sequence-mismatch (false) / wrong codespace (false).
  • executeWithOOGRetry:
    • success on first attempt
    • escalates multiplicatively and succeeds on 3rd attempt (verifies exact 1.3 → 1.69 → 2.197 sequence)
    • non-OOG errors bail immediately (single attempt)
    • max-attempts honored; final error includes attempt count
    • clone isolates caller TxConfig from mutation
    • nil base cfg returns error

pkg/lumera/config_test.go:

  • NewConfig rejects invalid inputs (4 subcases).
  • TxOptions propagate through toTxHelperConfig().

supernode/config/config_gas_test.go:

  • Round-trips all six new YAML keys.
  • Omitted keys yield zero values (defaults applied at tx layer).

Verification:

  • go build ./pkg/... ./supernode/... clean.
  • go vet ./pkg/lumera/... ./supernode/config/... ./supernode/cmd/... clean.
  • gofmt -l clean.
  • Full go test sweep over pkg/..., supernode/..., sdk/... — 0 failures.

Risks

  • Under-estimation. 1.3 × simGas + 50k is industry-standard for Cosmos SDK; retry loop covers pathological edges. Max-attempts cap at 10 prevents runaway fee spend.
  • Concurrency. TxHelper already serializes via sync.Mutex; per-attempt TxConfig is cloned.
  • State-machine safety. None — this is fee accounting only; determinism preserved.
  • API break for external pkg/lumera consumers. Mitigated via variadic TxOptions on NewConfig and legacy NewModule constructors preserved.

Rollout / Rollback

  • Tag SN release → testnet SNs → 48h soak → mainnet.
  • No binary rollback needed: operators can set gas_adjustment: 1.5 in YAML if required.

Follow-ups (not in this PR)

  1. Companion PR in lumera-tools-internal to document the new keys in sample launch-bundle configs.
  2. Datadog monitor on metric:tx_oog_retry post-deploy.

…OG retry with multiplicative escalation

Supernode was using gas_adjustment=1.5 + 50k padding for every outbound
tx (including MsgFinalizeAction), inflating fee spend. Default lowered
to 1.3 with an escalating OOG-retry path as a safety net, and all
gas/fee tuning knobs are now driven by optional YAML config so operators
can tune without a code change.

Changes
-------
pkg/lumera/modules/tx:
  - DefaultGasAdjustment 1.5 -> 1.3.
  - Add DefaultGasAdjustmentMultiplier (1.3) and
    DefaultGasAdjustmentMaxAttempts (3, hard-capped at 10).
  - Extend TxConfig / TxHelperConfig with GasAdjustmentMultiplier and
    GasAdjustmentMaxAttempts.
  - Add applyTxHelperDefaults so zero-valued TxHelperConfig fields fall
    back to package defaults deterministically.
  - isOutOfGas detector (matches both "out of gas" and
    code=11 codespace=sdk patterns).
  - executeWithOOGRetry: cloned TxConfig per attempt (no mutation of
    caller state); multiplicative escalation
    (gas_adjustment *= multiplier); structured Datadog-faceted log
    lines with metric="tx_oog_retry" + outcome=(retrying|success|exhausted).
  - Wire the OOG retry loop into TxHelper.ExecuteTransaction so all
    msg types (action_msg, supernode_msg, audit_msg) benefit.

pkg/lumera (client config):
  - NewConfig gains variadic TxOptions for source-level backward compat.
  - Config.toTxHelperConfig materializes a normalized TxHelperConfig.

pkg/lumera/modules/{action_msg,supernode_msg,audit_msg}:
  - New NewModuleWithTxHelperConfig constructor (explicit TxHelperConfig).
  - Legacy NewModule(...) preserved for backward compatibility; now
    delegates to the helper-config constructor with defaults.

supernode/config:
  - LumeraClientConfig: add optional YAML fields
    gas_adjustment, gas_adjustment_multiplier,
    gas_adjustment_max_attempts, gas_padding, gas_price, fee_denom.
  - Omitted / zero values preserve the default behavior.

supernode/cmd/start.go:
  - initLumeraClient now propagates the new TxOptions from YAML into
    lumera.NewConfig.

Tests
-----
pkg/lumera/modules/tx:
  - DefaultGasAdjustment == 1.3.
  - applyTxHelperDefaults: zero-valued fields use defaults, operator
    values preserved, max-attempts cap honored.
  - isOutOfGas: simulation and broadcast OOG patterns matched; unrelated
    codes (e.g. sequence mismatch, code=11 wrong codespace) ignored.
  - executeWithOOGRetry:
      * success on first attempt
      * escalates multiplicatively and succeeds on 3rd attempt
        (1.3 -> 1.69 -> 2.197)
      * non-OOG errors bail immediately (single attempt)
      * max-attempts honored (stops at N, surfaces "out of gas after N attempts")
      * clone isolates caller TxConfig from mutation
      * nil base cfg returns error

pkg/lumera:
  - NewConfig rejects invalid inputs and propagates TxOptions through
    toTxHelperConfig().

supernode/config:
  - LoadConfig round-trips all six new YAML keys.
  - Omitted YAML keys yield zero values (defaults applied at tx layer).

Risk / rollout
--------------
Client-side only. No chain change, no migration, no upgrade handler.
Operators can override gas_adjustment at runtime via YAML if the new
default ever proves too tight; OOG retry is a safety net in the
meantime.

Refs: https://www.notion.so/pastelnetwork/SN-Gas-used-for-Finalize-tx-is-too-high-345df11fee148011bac1fe1268c99326
@roomote-v0
Copy link
Copy Markdown

roomote-v0 Bot commented Apr 21, 2026

Rooviewer Clock   See task

Both items from the previous review are now resolved in e324c6a. No new issues found.

  • ExecuteTransactionWithMsgs bypasses the new OOG retry wrapper (fixed: now wrapped in executeWithOOGRetry with tests)
  • UpdateConfig doesn't enforce the max-attempts cap of 10 that applyTxHelperDefaults does (fixed: cap applied via MaxGasAdjustmentAttemptsCap constant with tests)
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

Simulates the production scenario:
  1.3 is rejected by the chain (code=11 codespace=sdk, out of gas)
  → retry at 1.69 → retry at 2.197 → success

Three new integration tests using a scripted tx.Module stub:
  - OOG×2 → success on 3rd attempt; verifies exact
    1.3 → 1.69 → 2.197 sequence; sequence advances exactly once;
    base config not mutated.
  - All attempts OOG → clear error, no sequence hole.
  - Non-OOG error bails at first attempt, no sequence advance.
Comment on lines +195 to +201
resp, err := executeWithOOGRetry(
ctx,
h.config,
func(cfg *TxConfig) (*sdktx.BroadcastTxResponse, error) {
return h.txmod.ProcessTransaction(ctx, []types.Msg{msg}, accountInfo, cfg)
},
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OOG retry wrapper is wired into ExecuteTransaction here, but ExecuteTransactionWithMsgs (line 366) still calls ProcessTransaction directly without it. Any caller using the pre-built-messages path won't get the escalating gas adjustment on OOG failures. Currently there are no callers of ExecuteTransactionWithMsgs in the repo, so it's not a live bug, but it's an inconsistency in the public API that could bite external pkg/lumera consumers. Consider wrapping that call in executeWithOOGRetry as well for parity.

Fix it with Roo Code or mention @roomote and request a fix.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e324c6a. Wrapped ExecuteTransactionWithMsgs in executeWithOOGRetry for public-API parity with ExecuteTransaction. Sequence-mismatch retry intentionally not added here because the caller owns accountInfo (documented in the doc comment). New tests: TestExecuteTransactionWithMsgs_AppliesOOGRetry + TestExecuteTransactionWithMsgs_NonOOGBailsImmediately.

Comment on lines +426 to +428
if config.GasAdjustmentMaxAttempts > 0 {
h.config.GasAdjustmentMaxAttempts = config.GasAdjustmentMaxAttempts
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UpdateConfig accepts any positive GasAdjustmentMaxAttempts without enforcing the hard cap of 10 that applyTxHelperDefaults applies. If an operator (or a runtime code path) calls UpdateConfig with a value like 50, the safety net against fee runaway is bypassed. Consider adding the same cap here.

Suggested change
if config.GasAdjustmentMaxAttempts > 0 {
h.config.GasAdjustmentMaxAttempts = config.GasAdjustmentMaxAttempts
}
if config.GasAdjustmentMaxAttempts > 0 {
if config.GasAdjustmentMaxAttempts > 10 {
config.GasAdjustmentMaxAttempts = 10
}
h.config.GasAdjustmentMaxAttempts = config.GasAdjustmentMaxAttempts
}

Fix it with Roo Code or mention @roomote and request a fix.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e324c6a. Extracted the hard cap into MaxGasAdjustmentAttemptsCap = 10 as a single source of truth, and applied it in UpdateConfig mirroring applyTxHelperDefaults. New tests: TestTxHelper_UpdateConfig_CapsMaxAttempts (9999 → 10) and TestTxHelper_UpdateConfig_PreservesReasonableMaxAttempts (5 → 5).

…p ExecuteTransactionWithMsgs

Two consistency gaps surfaced in code review:

1. UpdateConfig accepted any positive GasAdjustmentMaxAttempts, bypassing
   the 10-attempt fee-runaway safety cap enforced by applyTxHelperDefaults.
   A runtime-reconfig caller (SetTxHelperConfig → UpdateConfig) could set
   MaxAttempts=9999 and defeat the protection. Now mirrors the defaults
   cap via the new MaxGasAdjustmentAttemptsCap constant.

2. ExecuteTransactionWithMsgs called ProcessTransaction directly, skipping
   the OOG retry wrapper that ExecuteTransaction uses. Public-API split-
   brain — external pkg/lumera SDK consumers using the pre-built-messages
   path would not get OOG escalation. Now wraps in executeWithOOGRetry.
   Sequence-mismatch retry is intentionally NOT added here because the
   caller owns accountInfo (documented in doc comment).

Extracted hard cap (10) into named constant MaxGasAdjustmentAttemptsCap
for clarity and a single source of truth.

Tests:
- TestTxHelper_UpdateConfig_CapsMaxAttempts (9999 → 10)
- TestTxHelper_UpdateConfig_PreservesReasonableMaxAttempts (5 → 5)
- TestExecuteTransactionWithMsgs_AppliesOOGRetry (3 attempts, escalating)
- TestExecuteTransactionWithMsgs_NonOOGBailsImmediately (non-OOG bubbles)

go test ./pkg/... + ./supernode/config/... green. gofmt + go vet clean.
@mateeullahmalik mateeullahmalik merged commit eae3929 into master Apr 22, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants