Skip to content

fix(state-machine-tests): DEFI-2779: Register non-local subnets from routing table in registry#9891

Merged
mbjorkqvist merged 4 commits intomasterfrom
mathias/DEFI-2779-fix-golden-state-routing-table-flakiness
Apr 16, 2026
Merged

fix(state-machine-tests): DEFI-2779: Register non-local subnets from routing table in registry#9891
mbjorkqvist merged 4 commits intomasterfrom
mathias/DEFI-2779-fix-golden-state-routing-table-flakiness

Conversation

@mbjorkqvist
Copy link
Copy Markdown
Contributor

Fix flaky icrc_ledger_suite_integration_golden_state_upgrade_downgrade_test by registering all subnets referenced in the routing table in the registry. Extract add_cup_contents_and_key_record helper to eliminate code duplication.

The golden state tests use create_routing_table (introduced in PR #1530) to route canister IDs outside the local subnet to a non-existent subnet. This ensures that leftover cross-subnet responses in the golden state backup are routed into a remote stream (where they sit harmlessly) rather than triggering a critical error in the stream builder.

This worked until PR #9449 introduced routing table filtering in try_to_populate_network_topology: non-CloudEngine subnets now filter the routing table to only include entries for subnets that have registry records. Since StateMachineBuilder::build() only registered the local subnet, the non-existent subnet's routing entries were silently dropped, leaving cross-subnet responses unroutable and causing mr_stream_builder_response_destination_not_found critical errors in the golden state test.

The test is flaky (rather than always failing) because, depending on the timing of the snapshot, it may or may not contain messages destined for other subnets.

StateMachineBuilder::build() now derives the subnet list from the routing table and registers minimal registry records (DKG transcript, subnet record, key record) for each non-local subnet via register_non_local_subnet. This ensures their routing table entries survive the filtering.

@github-actions github-actions bot added the fix label Apr 15, 2026
@mbjorkqvist
Copy link
Copy Markdown
Contributor Author

Details of a successful run with the test (using a "known-bad" golden state snapshot) can be found here.

@mbjorkqvist mbjorkqvist added the CI_ALL_BAZEL_TARGETS Runs all bazel targets label Apr 15, 2026
@mbjorkqvist mbjorkqvist marked this pull request as ready for review April 15, 2026 14:32
@mbjorkqvist mbjorkqvist requested a review from a team as a code owner April 15, 2026 14:32
@mbjorkqvist mbjorkqvist requested a review from mraszyk April 15, 2026 14:37
Copy link
Copy Markdown
Contributor

@schneiderstefan schneiderstefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you and sorry for making your test flaky.

Comment thread rs/state_machine_tests/src/lib.rs
@mbjorkqvist mbjorkqvist added this pull request to the merge queue Apr 16, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 16, 2026
@mbjorkqvist mbjorkqvist added this pull request to the merge queue Apr 16, 2026
Merged via the queue into master with commit db54619 Apr 16, 2026
37 checks passed
@mbjorkqvist mbjorkqvist deleted the mathias/DEFI-2779-fix-golden-state-routing-table-flakiness branch April 16, 2026 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants