Skip to content

test: bump 10s timeouts that flake on Flat DB CI#11380

Merged
LukaszRozmej merged 2 commits into
masterfrom
fix/merge-plugin-test-timeouts
Apr 28, 2026
Merged

test: bump 10s timeouts that flake on Flat DB CI#11380
LukaszRozmej merged 2 commits into
masterfrom
fix/merge-plugin-test-timeouts

Conversation

@LukaszRozmej
Copy link
Copy Markdown
Member

Changes

  • RpcTest.TestSerializedRequest: bump cancellation token from 10s → 30s. Under TEST_USE_FLAT=1 engine_newPayloadV5 + JSON serialization can exceed 10s, throwing TaskCanceledException (e.g. NewPayloadV5_rejects_invalid_BAL_early(...,SurplusReads) 15s).
  • executePayloadV1_accepts_already_known_block: bump [CancelAfter(10000)][CancelAfter(30000)]. Same root cause — flat DB block production is just slower.

Both tests still complete well under 10s when not under CI load. Verified locally with TEST_USE_FLAT=1:

  • 4/4 V1 known_block cases pass (7s 113ms)
  • 32/32 V6 BAL early-reject cases pass (10s 066ms)

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other:

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Documentation

Requires documentation update

  • Yes
  • No

Requires explanation in Release Notes

  • Yes
  • No

RpcTest.TestSerializedRequest hard-cancels after 10s; under
TEST_USE_FLAT=1 engine_newPayloadV5 plus JSON serialization
exceeds that, causing TaskCanceledException in V6 BAL tests.

executePayloadV1_accepts_already_known_block(true) trips its
[CancelAfter(10000)] on the same workflow.

Bump both to 30s so CI has headroom; tests still complete in
under 10s when not under load.
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 28, 2026

Claude finished @LukaszRozmej's task in 1m 3s —— View job


Review

  • Read changed files and diff
  • Review for correctness, edge cases, regressions
  • Check adherence to repo rules
  • Post final review

This is a focused, minimal fix for CI flakiness under TEST_USE_FLAT=1. Both changes are correct and the root cause explanation is clear. No production code is touched.

Finding: Low — TestSerializedRequest timeout is now global (30s affects all 150 callers)

RpcTest.TestSerializedRequest is a shared utility called by ~150 tests. The 10s CTS was a safety net against hangs or runaway serialization in any of those tests. Tripling it globally means a hung/deadlocked test will now run for 30s before failing. In practice this is unlikely to matter much (these are unit tests, not health checks), and the flat DB serialization genuinely needs the headroom — so this is acceptable.

Finding: Low — potentially missed sibling timeout

EngineModuleTests.PayloadProduction.cs:108 has [CancelAfter(10000)] on getPayloadV1_picks_transactions_from_pool_v1. If flat DB block production also slows that test, it could flake for the same reason. Worth keeping an eye on; it's not blocking this PR if the test hasn't flaked yet.


No critical, high, or medium findings. The fix is correct and proportionate to the problem. Mergeable.

Same root cause: TEST_USE_FLAT=1 makes block production / forkchoice
slow enough to clip these tests' cancellation tokens.

- getPayloadV1_picks_transactions_from_pool_v1: 10s -> 30s
- executePayloadV1_on_top_of_terminal_block: 5s -> 30s
- executePayloadV1_on_top_of_not_processed_invalid_terminal_block: 5s -> 30s
- Maintain_correct_pointers_for_beacon_sync_in_archive_sync: 5s -> 30s
@LukaszRozmej LukaszRozmej merged commit 36546bb into master Apr 28, 2026
433 of 434 checks passed
@LukaszRozmej LukaszRozmej deleted the fix/merge-plugin-test-timeouts branch April 28, 2026 11:56
smartprogrammer93 pushed a commit that referenced this pull request May 5, 2026
executePayloadV1_accepts_already_known_block(true) trips
TaskCanceledException at 10s 489ms inside TestBlockchain.Build()
on TEST_USE_FLAT=1 runs. The wait there uses this.CancellationToken,
which is fed by TestTimeout (= DefaultTimeout), not the test's
[CancelAfter(30000)]. PR #11380 bumped per-test [CancelAfter] values
but missed this internal one; aligning the default with that fix.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LukaszRozmej added a commit that referenced this pull request May 25, 2026
#11731)

test: bump executePayloadV1_accepts_already_known_block CancelAfter to 60s

Test has flaked again on Flat DB CI at the 30s cap (originally bumped from
10s in #11380). bestBlockProcessed.WaitAsync exceeds 30s on slow runners.
Aligning with the 60s NewPayloadBlockProcessingTimeout already used in
MergeTestBlockchain.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants