Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1184.feature - Add 'singlecollect' distribution mode #1184

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

zorhay
Copy link

@zorhay zorhay commented Mar 3, 2025

This adds a new 'singlecollect' distribution mode that only collects tests on the first worker node and skips redundant collection on other nodes. This can significantly improve startup time for large test suites with expensive collection.

Key features:

  • Only the first worker performs test collection
  • Other workers skip collection verification entirely
  • Tests are distributed using the same algorithm as 'load' mode
  • Handles worker failures gracefully, including the collecting worker
  • Solves issues with floating parameters in pytest collection

This adds a new 'singlecollect' distribution mode that only collects tests on
the first worker node and skips redundant collection on other nodes. This can
significantly improve startup time for large test suites with expensive collection.

Key features:
- Only the first worker performs test collection
- Other workers skip collection verification entirely
- Tests are distributed using the same algorithm as 'load' mode
- Handles worker failures gracefully, including the collecting worker
- Solves issues with floating parameters in pytest collection
@zorhay zorhay changed the title Add 'singlecollect' distribution mode 1184.feature - Add 'singlecollect' distribution mode Mar 3, 2025
@zorhay zorhay force-pushed the add-singlecollect-mode branch from 96175cb to 5808348 Compare March 3, 2025 21:19
@bluetech
Copy link
Member

bluetech commented Mar 4, 2025

This adds a new 'singlecollect' distribution mode that only collects tests on the first worker node and skips redundant collection on other nodes.

You say this skips redundant collection on other nodes, but I don't think that's right -- how can a node execute a test (Item) it hasn't collected? Can you be more precise what is being saved here? Is it just the _check_nodes_have_same_collection?

Copy link
Member

@RonnyPfannschmidt RonnyPfannschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm under the impression we should name this different as it's something that applies to most shedulers

@zorhay
Copy link
Author

zorhay commented Mar 15, 2025

@bluetech
When using pytest's parametrization with unordered collections like sets, each worker in pytest-xdist independently collects tests. Since sets don't maintain a consistent iteration order, this results in different workers seeing different test orderings.

For example:

@pytest.mark.parametrize("value", {1, 2, 3, 4, 5})  # Using a set
def test_example(value):
    assert value > 0

Worker 1 might collect: test_example[1], test_example[3], test_example[2]...
Worker 2 might collect: test_example[2], test_example[1], test_example[5]...
This inconsistency triggers pytest-xdist's collection verification, which detects the mismatch between workers and aborts the test run with an error message like:

Different tests were collected between gw0 and gw1

This forces developers to either avoid using unordered collections in parametrization or manually convert them to ordered sequences.

The workers still need to have the test items to execute them. What's happening in the singlecollect mode is:

  1. The first worker performs the full pytest collection process (discovering test files, importing modules, creating test items)
  2. The master node receives this collection and distributes the test items to all workers
  3. Other workers do not perform their own independent collection
  4. The collection verification step (_check_nodes_have_same_collection) is skipped

The key part in the code is:

def add_node_collection(self, node: WorkerController, collection: Sequence[str]) -> None:
    """Only use collection from the first node."""
    # We only care about collection from the first node
    if node == self.first_node:
        self.log(f"Received collection from first node {node.gateway.id}")
        self.collection = list(collection)
        self.collection_done = True
    else:
        # Skip collection verification for other nodes
        self.log(f"Ignoring collection from node {node.gateway.id}")

Other nodes receive tests from the master node during test distribution.

@RonnyPfannschmidt
More problem-focused naming can be something like unordered-params. I chose singlecollect based on the mechanism of tests collection.

@RonnyPfannschmidt
Copy link
Member

If the only difference for this is ignoring different test order on other nodes then it is a completely unacceptable no go as shedulers currently talk in terms of indexes into the collection

@zorhay
Copy link
Author

zorhay commented Mar 15, 2025

I understand your concern about schedulers using collection indexes.
May I ask what the specific benefit is of having each worker collect tests independently?
This PR maintains index-based scheduling while using a single collection source. Workers still receive and execute tests by index, just using the first worker's collection as the reference.
Would a mode with a single collector node be a viable solution? It would solve the unordered parameter issue while preserving the existing scheduling mechanism.

@RonnyPfannschmidt
Copy link
Member

Theres a number of edge cases to be aware of

  • nodeid is not unique as one might think theres numerous usecases where one will actually see duplicate node ids
  • sometimes nodeids differ between nodes due to misstakes in test definition

Come to mind off hand

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants