Skip to content

fix(x/audit): bridge cascade_kademlia_db_bytes from HostReport to SupernodeMetricsState#140

Merged
mateeullahmalik merged 2 commits into
masterfrom
fix/audit-host-report-cascade-bytes-bridge
May 12, 2026
Merged

fix(x/audit): bridge cascade_kademlia_db_bytes from HostReport to SupernodeMetricsState#140
mateeullahmalik merged 2 commits into
masterfrom
fix/audit-host-report-cascade-bytes-bridge

Conversation

@mateeullahmalik
Copy link
Copy Markdown
Contributor

Summary

Bridges cascade_kademlia_db_bytes from the audit HostReport (per-epoch) into x/supernode SupernodeMetricsState, which is the sole source Everlight payout and eligibility queries read from post LEP-6 §12.

Without this fix, Everlight distributes zero ulume to anyone on master and 1.12.0 today — confirmed live on devnet. See "Why this is needed" below.

Why this is needed

LEP-6 §12 (PR #122, 2026-04-28):

  • Removed cascade_kademlia_db_bytes (field 6) from audit.HostReport
  • Rewrote x/supernode/v1/keeper/audit_metrics.go::getLatestCascadeBytesFromAudit to read from SupernodeMetricsState instead of from audit epoch reports

But the migration was half-completed: no chain-side writer was added to populate SupernodeMetricsState.CascadeKademliaDbBytes from the audit-submission path. The only remaining writer is the now-operationally-dead legacy MsgReportSupernodeMetrics handler — and supernode@master / v2.5.0-rc has explicitly disabled the legacy reporter (supernode/cmd/start.go:29: "Legacy supernode metrics reporter has been superseded by epoch-scoped audit reporting in x/audit").

Result on chain 1.12.0 / master today:

  1. SuperNode submits epoch report — but post-LEP-6 §12, the audit module doesn't persist cascade_kademlia_db_bytes anywhere
  2. Every payment-period boundary, distributePool iterates all ACTIVE/STORAGE_FULL SNs, calls getLatestCascadeBytesFromAudit → returns found=false for every SN
  3. candidates slice is empty → no SN receives any payout
  4. QuerySNEligibility returns Eligible: false, Reason: "audit report is stale" for every SN

This PR closes the gap by making the audit SubmitEpochReport handler the single, sole writer of SupernodeMetricsState.CascadeKademliaDbBytes, in line with the post-LEP-6 design intent ("only audit-module epoch reports are valid").

Behavior change

Before After
audit.HostReport has no cascade_kademlia_db_bytes field audit.HostReport.cascade_kademlia_db_bytes (field 6, double) restored as a metric-courier
Audit-side consensus logic consumes cascade_kademlia_db_bytes Unchanged — still does not consume it (LEP-6 §12 intent preserved)
SupernodeMetricsState.CascadeKademliaDbBytes only written by legacy MsgReportSupernodeMetrics (now dead) Written by audit's SubmitEpochReport handler after a successful epoch report is persisted
Everlight payouts: 0 ulume to anyone Everlight payouts: proportional to SN-reported cascade DB bytes

Invariant table

# Field / Behavior Contract Enforcement Point Test Coverage
I1 HostReport.cascade_kademlia_db_bytes Finite, ≥ 0; zero valid validateHostMetricFields in msg_submit_epoch_report.go 1 happy positive, 1 zero, 4 violation (NaN, +Inf, -Inf, negative)
I2 Audit SubmitEpochReport is the SOLE writer of SupernodeMetricsState.CascadeKademliaDbBytes post-LEP-6 §12 After successful epoch report, upsert with correct val addr, current block height, incremented ReportCount bridgeCascadeBytesToSupernodeMetrics happy-path persistence with field-by-field assertions
I3 Bridge preserves prior non-cascade SupernodeMetrics.* fields Read-modify-write same dedicated preservation test seeds non-cascade fields → submits → asserts preserved
I4 Read path (getLatestCascadeBytesFromAudit, distribution.go, query_get_reward_eligibility.go) unchanged No code change n/a existing tests still pass

Bridge defensively no-ops with an audit_cascade_bytes_bridge_skipped event if the SuperNode record has an empty/invalid ValidatorAddress (a pre-existing x/supernode invariant violation outside audit's scope).

Files

File Change
proto/lumera/audit/v1/audit.proto +9 — restore cascade_kademlia_db_bytes = 6 with metric-courier comment
x/audit/v1/types/audit.pb.go regenerated
x/audit/v1/types/errors.go +1 — ErrInvalidHostMetric (code 18)
x/audit/v1/keeper/msg_submit_epoch_report.go +90/-2 — validation + bridge helper, wired into the SubmitEpochReport handler after SetReport
x/audit/v1/keeper/msg_submit_epoch_report_cascade_bytes_test.go NEW — invariant tests (8 cases)
x/audit/v1/keeper/msg_submit_epoch_report_storagefull_test.go +16 — GetMetricsState/SetMetricsState AnyTimes() expectations for the 2 storagefull-transition tests whose SN has a real validator address
docs/static/openapi.yml regenerated

Risk & rollback

Risk: low.

  • Proto field is purely additive (re-introducing field 6 that was previously assigned to this same field; never repurposed).
  • Validation is fail-closed against malformed input (NaN/Inf/negative).
  • Bridge is defensively no-op when val addr is missing.
  • All existing unit + integration tests pass unchanged.

Rollback: revert this commit. Devnet/mainnet falls back to the current broken-Everlight state (no worse than today).

Migration / upgrade impact

  • Proto schema additive: a new optional field on an existing message — backward-compatible at the wire level; old binaries that don't know field 6 will ignore it.
  • No state migration needed: SupernodeMetricsState schema unchanged. New writers just start populating Metrics.CascadeKademliaDbBytes from the audit path going forward.
  • Upgrade ordering: this PR is a prerequisite for any chain that ships supernode@v2.5.0-rc (or anything master-ish on the supernode side that emits HostReport.cascade_kademlia_db_bytes). Without this PR, those SN binaries' epoch-report Txs fail at proto decode (errUnknownField "*types.HostReport": {TagNum: 6, WireType:"fixed64"}: tx parse error).

Verification

# Unit (CI: make unit-tests)
go test ./x/audit/... ./x/supernode/...
# → ok, all green

# Integration (CI: make integration-tests, -tags=integration,test)
go test -tags=integration,test -p 4 ./tests/integration/...
# → ok, all green including tests/integration/{audit,supernode,everlight}

# Systemtests vet (Anti-pattern 13 mandate — separate Go module)
cd tests/systemtests && go vet -tags system_test ./...
# → clean

# Binary build
go build -o /tmp/lumerad ./cmd/lumera
# → ok

Observability

Two new event types:

  • audit_cascade_bytes_bridge_skipped — emitted when the bridge no-ops due to empty/invalid ValidatorAddress. Attributes: module, supernode_account, validator_address (when present), reason, error (when present).

Follow-up

The legacy MsgReportSupernodeMetrics handler remains in the codec (still registered, still callable) but no SN binary sends it. Removing it cleanly requires a chain upgrade handler and is out of scope for this surgical fix — to be done as a follow-up cleanup PR.

Cherry-pick plan

After merge to master, cherry-pick onto 1.12.0 release branch via a follow-up PR.

…ernodeMetricsState

LEP-6 §12 (PR #122) removed cascade_kademlia_db_bytes from the audit
HostReport and rewrote x/supernode getLatestCascadeBytesFromAudit to read
from SupernodeMetricsState. The migration was half-completed: no chain-side
writer was added to populate SupernodeMetricsState.CascadeKademliaDbBytes
from the audit epoch-report channel. The only remaining writer is the
now-operationally-dead legacy MsgReportSupernodeMetrics handler.

Result on chain 1.12.0 / master today: every Everlight payout-period
distribution sees getLatestCascadeBytesFromAudit return found=false for
every SuperNode → distributePool skips every candidate → pool is never
disbursed. Confirmed live on devnet.

This change:

1. Restores HostReport.cascade_kademlia_db_bytes (field 6) on the audit
   epoch-report proto purely as a metric-courier. The audit module does
   NOT consume the value for its own consensus logic (LEP-6 §12 intent
   preserved); it only carries the value into the chain on the audit
   submission channel that SuperNodes already use.

2. Adds validation in SubmitEpochReport: cascade_kademlia_db_bytes must be
   a finite number ≥ 0 (NaN, +Inf, -Inf, negative rejected with new
   ErrInvalidHostMetric). Zero is valid (empty Kademlia store).

3. Adds bridgeCascadeBytesToSupernodeMetrics: after the epoch report is
   successfully persisted, the audit handler upserts the reporter's
   SupernodeMetricsState via supernodeKeeper.SetMetricsState — read-
   modify-write so any non-cascade fields previously persisted are
   preserved. Bumps Height to current block and ReportCount.

4. Defensive no-op (with audit_cascade_bytes_bridge_skipped event for
   observability) when the SuperNode record has empty/invalid
   ValidatorAddress — that is a pre-existing x/supernode invariant
   violation outside audit's scope; the bridge surfaces it via event
   but does not fail the epoch report on someone else's data corruption.

This is now the SOLE writer of SupernodeMetricsState.CascadeKademliaDbBytes
post-LEP-6 §12 (legacy MsgReportSupernodeMetrics handler remains in the
codec but no SN sends it; left for a follow-up cleanup PR).

Tests:
- 4 unit tests covering invariant violations (NaN, +Inf, -Inf, negative)
- 1 happy-path test (value bridged into MetricsState with correct
  validator address, height, ReportCount)
- 1 zero-valid test
- 1 read-modify-write preservation test (non-cascade fields preserved)
- 1 defensive no-op test (empty ValidatorAddress emits event, accepts
  report, does not call Get/SetMetricsState)
- Existing 2 storagefull-transition tests updated with Get/SetMetricsState
  AnyTimes expectations.

make integration-tests passes (incl. tests/integration/everlight,
tests/integration/audit, tests/integration/supernode). go vet -tags
system_test ./... clean in tests/systemtests/.
After the audit→supernode metrics bridge added in this PR, the
SupernodeMetricsState is written on every accepted epoch report.
The sn-eligibility query now reaches the threshold gate (rawBytes=0)
and returns 'cascade bytes below minimum threshold' instead of the
pre-bridge 'no audit epoch report found' (which was a symptom of
no writer existing).

Substantive outcome — SN ineligible, no payout while storage-full —
is unchanged. Only the rejection reason advances by one step.

Fixes: failing system test TestEverlightSystem_PayoutAndHistoryWhileStorageFull
on PR #140 CI.
mateeullahmalik added a commit that referenced this pull request May 12, 2026
After the audit→supernode metrics bridge added in this PR, the
SupernodeMetricsState is written on every accepted epoch report.
The sn-eligibility query now reaches the threshold gate (rawBytes=0)
and returns 'cascade bytes below minimum threshold' instead of the
pre-bridge 'no audit epoch report found' (which was a symptom of
no writer existing).

Substantive outcome — SN ineligible, no payout while storage-full —
is unchanged. Only the rejection reason advances by one step.

Fixes: failing system test TestEverlightSystem_PayoutAndHistoryWhileStorageFull
on PR #140 CI.
mateeullahmalik added a commit that referenced this pull request May 12, 2026
…tes from HostReport to SupernodeMetricsState (#141)

* fix(x/audit): bridge cascade_kademlia_db_bytes from HostReport to SupernodeMetricsState

LEP-6 §12 (PR #122) removed cascade_kademlia_db_bytes from the audit
HostReport and rewrote x/supernode getLatestCascadeBytesFromAudit to read
from SupernodeMetricsState. The migration was half-completed: no chain-side
writer was added to populate SupernodeMetricsState.CascadeKademliaDbBytes
from the audit epoch-report channel. The only remaining writer is the
now-operationally-dead legacy MsgReportSupernodeMetrics handler.

Result on chain 1.12.0 / master today: every Everlight payout-period
distribution sees getLatestCascadeBytesFromAudit return found=false for
every SuperNode → distributePool skips every candidate → pool is never
disbursed. Confirmed live on devnet.

This change:

1. Restores HostReport.cascade_kademlia_db_bytes (field 6) on the audit
   epoch-report proto purely as a metric-courier. The audit module does
   NOT consume the value for its own consensus logic (LEP-6 §12 intent
   preserved); it only carries the value into the chain on the audit
   submission channel that SuperNodes already use.

2. Adds validation in SubmitEpochReport: cascade_kademlia_db_bytes must be
   a finite number ≥ 0 (NaN, +Inf, -Inf, negative rejected with new
   ErrInvalidHostMetric). Zero is valid (empty Kademlia store).

3. Adds bridgeCascadeBytesToSupernodeMetrics: after the epoch report is
   successfully persisted, the audit handler upserts the reporter's
   SupernodeMetricsState via supernodeKeeper.SetMetricsState — read-
   modify-write so any non-cascade fields previously persisted are
   preserved. Bumps Height to current block and ReportCount.

4. Defensive no-op (with audit_cascade_bytes_bridge_skipped event for
   observability) when the SuperNode record has empty/invalid
   ValidatorAddress — that is a pre-existing x/supernode invariant
   violation outside audit's scope; the bridge surfaces it via event
   but does not fail the epoch report on someone else's data corruption.

This is now the SOLE writer of SupernodeMetricsState.CascadeKademliaDbBytes
post-LEP-6 §12 (legacy MsgReportSupernodeMetrics handler remains in the
codec but no SN sends it; left for a follow-up cleanup PR).

Tests:
- 4 unit tests covering invariant violations (NaN, +Inf, -Inf, negative)
- 1 happy-path test (value bridged into MetricsState with correct
  validator address, height, ReportCount)
- 1 zero-valid test
- 1 read-modify-write preservation test (non-cascade fields preserved)
- 1 defensive no-op test (empty ValidatorAddress emits event, accepts
  report, does not call Get/SetMetricsState)
- Existing 2 storagefull-transition tests updated with Get/SetMetricsState
  AnyTimes expectations.

make integration-tests passes (incl. tests/integration/everlight,
tests/integration/audit, tests/integration/supernode). go vet -tags
system_test ./... clean in tests/systemtests/.

* test(systemtests): update Everlight storage-full eligibility reason

After the audit→supernode metrics bridge added in this PR, the
SupernodeMetricsState is written on every accepted epoch report.
The sn-eligibility query now reaches the threshold gate (rawBytes=0)
and returns 'cascade bytes below minimum threshold' instead of the
pre-bridge 'no audit epoch report found' (which was a symptom of
no writer existing).

Substantive outcome — SN ineligible, no payout while storage-full —
is unchanged. Only the rejection reason advances by one step.

Fixes: failing system test TestEverlightSystem_PayoutAndHistoryWhileStorageFull
on PR #140 CI.
@mateeullahmalik mateeullahmalik merged commit ab02243 into master May 12, 2026
23 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants