Enhance test execution and timeout configurations#3457
Merged
Conversation
Extend the wait windows to match: cluster grace period 28800/30000 -> 36000/37800 seconds, and target_testnets SESSION_TIMEOUT 20h -> 24h.
Light tests (no `lock_resources`, only `CLUSTER` in `use_resources`) now iterate cluster instances starting from the last two. Heavy tests start from the head. This keeps earlier instances free for heavy tests, which otherwise struggle to obtain an instance once light tests spread across all of them. Iteration order is computed by the new `_make_instances_order` helper, called outside of the global cluster lock to keep the locked section as short as possible.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR improves the test execution infrastructure by optimizing how cluster instances are selected under parallel load and by enhancing the pytest-xdist scheduler to prioritize smoke tests, along with extending timeouts to better accommodate long-running sessions.
Changes:
- Increased default testnet session timeout and extended cluster cleanup grace periods to reduce premature termination.
- Added a cluster instance iteration strategy that biases light tests toward “tail” instances to keep “head” instances available for heavy tests.
- Enhanced the custom xdist scheduler with a dedicated smoke-test fast lane when sufficient workers are available.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
runner/run_tests.sh |
Increases default SESSION_TIMEOUT for testnets to 24h. |
cardano_node_tests/pytest_plugins/xdist_scheduler.py |
Adds smoke-test routing/dedicated worker lane and updates nodeid tagging logic. |
cardano_node_tests/cluster_management/cluster_getter.py |
Extends grace periods and introduces precomputed instance iteration ordering for improved scheduling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When the run has at least SMOKE_DEDICATED_THRESHOLD (10) workers and the collection contains smoke tests, reserve SMOKE_DEDICATED_COUNT (2) of the lowest-id workers as a smoke-only fast lane. Other workers continue to schedule any work, including smoke. This prevents smoke tests from being queued behind long/heavy tests on overloaded workers.
cc2a150 to
4a94cd5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces significant improvements to the test cluster scheduling and resource management system, focusing on smarter distribution of test workloads and better prioritization for smoke and heavy tests. The changes include a new scheduling strategy for cluster instances, the introduction of a dedicated fast lane for smoke tests in the xdist scheduler, and adjustments to test timeouts for more robust test execution.
Cluster Instance Scheduling Improvements:
_make_instances_ordermethod tocluster_getter.pyto prioritize heavy tests for head cluster instances and pack light tests onto tail instances, reducing resource contention and improving scheduling efficiency.get_cluster_instanceto use the new precomputed order, ensuring better allocation of cluster resources. [1] [2]Test Scheduler Enhancements (xdist):
Timeout and Configuration Adjustments:
run_tests.shto reduce premature test termination.Other Improvements:
These updates collectively make the test infrastructure more robust, efficient, and responsive to different test types and workloads.