ci[cartesian]: Thread safe parallel stencil tests by romanc · Pull Request #1849 · GridTools/gt4py

romanc · 2025-02-06T14:10:34Z

Description

To avoid repeating boiler plate code in testing, StencilTestSuite provides a convenient interace to test gtscript stencils.

Within that StencilTestSuite base class, generating the stencil is separated from running & validating the stencil code. Each deriving test class will end up with two tests: one for stencil generation and a second one to test the implementation by running the generated code with defined inputs and expected outputs.

The base class was written such that the implementation test would re-use the generated stencil code from the first test. This introduces an implicit test order dependency. To save time and avoid unnecessary test failure outputs, failing to generate the stencil code would automatically skip the implementation/validation test.

Running tests in parallel (with xdist) breaks the expected test execution order (in the default configuration). This leads to automatically skiped validation tests in case the stencil code wasn't generated yet. On the CI, we only run with 2 threads so only a couple tests were skipped usually. Locally, I was running with 16 threads and got ~30 skipped validation tests.

This PR proposes to address the issue by setting an xdist_group mark on the generation/implementation tests that belong togehter. In combination with --dist loadgroup, this will keep the expected order where necessary. Only tests with xdist_group markers are affected by --dist loadgroup. Tests without that marker will be distributed normally as if in --dist load mode (the default so far). By grouping with cls_name and backend, we keep maximal parallelization, grouping only the two tests that are depending on each other.

Further reading: see --dist section in pytest-xdist documentation.

Requirements

All fixes and/or new features come with corresponding tests.
Existing tests are still green. No more skipped tests \o/ Works as expected locally
Important design decisions have been documented in the appropriate ADR inside the docs/development/ADRs/ folder.
N/A

To avoid repeating boiler plate code in testing, `StencilTestSuite` provides a convenient interace to test gtscript stencils. Within that `StencilTestSuite` base class, generating the stencil is separated from running & validating the stencil code. Each deriving test class will end up with two tests: one for stencil generation and a second one to test the implementation by running the generated code with defined inputs and expected outputs. The base class was written such that the implementation test would re-use the generated stencil code from the first test. This introduces an implicit test order dependency. To save time and avoid unnecessary test failure outputs, failing to generate the stencil code would automatically skip the implementation/validation test. Running tests in parallel (with `xdist`) breaks the expected test execution order (in the default configuration). This leads to automatically skiped validation tests in case the stencil code wasn't generated yet. On the CI, we only run with 2 threads so only a couple tests were skipped usually. Locally, I was running with 16 threads and got ~30 skipped validation tests. This PR proposes to address the issue by setting an `xdist_group` mark on the generation/implementation tests that belong togehter. In combination with `--dist loadgroup`, this will keep the expected order where necessary. Only tests with `xdist_group` markers are affected by `--dist loadgroup`. Tests without that marker will be distributed normally as if in `--dist load` mode (the default so far). By grouping with `cls_name` and backend, we keep maximal parallelization, grouping only the two tests that are depending on each other.

romanc

some details inline

romanc · 2025-02-06T14:22:43Z

src/gt4py/cartesian/testing/suites.py

                                d, generation_strategy_factories
                            ),
-                            implementations=[],
+                            implementation=None,


This might have been different in the past. The way we cache implementations now, there's only ever max one implementation per test context.

romanc · 2025-02-06T14:24:43Z

src/gt4py/cartesian/testing/suites.py

-        The generated implementations are cached in a :class:`utils.ImplementationsDB`
-        instance, to avoid duplication of (potentially expensive) compilations.
+        The generated implementation is cached in the test context, to avoid duplication
+        of (potentially expensive) compilation.
+        Note: This caching introduces a dependency between tests, which is captured by an
+        `xdist_group` marker in combination with `--dist loadgroup` to ensure safe parallel
+        test execution.


This comment was out of date. There's no utils.ImplementationDB (anymore).

romanc · 2025-02-06T14:25:08Z

src/gt4py/cartesian/testing/suites.py

-        test["implementations"].append(implementation)
+        assert test["implementation"] is None
+        test["implementation"] = implementation


Assert our assumption that we only ever cache one implementation per test context.

romanc · 2025-02-06T14:26:18Z

src/gt4py/cartesian/testing/suites.py

-        implementation_list = test["implementations"]
-        if not implementation_list:
-            pytest.skip(
-                "Cannot perform validation tests, since there are no valid implementations."
-            )
-        for implementation in implementation_list:
-            if not isinstance(implementation, StencilObject):
-                raise RuntimeError("Wrong function got from implementations_db cache!")
+        implementation = test["implementation"]
+        assert (
+            implementation is not None
+        ), "Stencil not yet generated. Did you attempt to run stencil tests in parallel?"
+        assert isinstance(implementation, StencilObject)

-            cls._run_test_implementation(parameters_dict, implementation)
+        cls._run_test_implementation(parameters_dict, implementation)


Simplified since we don't have an array of implementations (anymore). Assert that the stencil code has been generated. If not, fail instead of skip. This leads to more errors in case code generation fails. Imo the best way to handle this is to get rid of the the ideas of having two tests (one for codegen and one for validation) per class. We could achieve the same level of parallelization with less glue code if we had just one test per class (codegen and validation inside the same test).

romanc · 2025-02-06T14:36:24Z

/cc @egparedes @havogt FYI

The previous one was from when I thought we had to run these tests on one thread only.

FlorianDeconinck

LGTM.

havogt

lgtm

## Description To avoid repeating boiler plate code in testing, `StencilTestSuite` provides a convenient interace to test gtscript stencils. Within that `StencilTestSuite` base class, generating the stencil is separated from running & validating the stencil code. Each deriving test class will end up with two tests: one for stencil generation and a second one to test the implementation by running the generated code with defined inputs and expected outputs. The base class was written such that the implementation test would re-use the generated stencil code from the first test. This introduces an implicit test order dependency. To save time and avoid unnecessary test failure outputs, failing to generate the stencil code would automatically skip the implementation/validation test. Running tests in parallel (with `xdist`) breaks the expected test execution order (in the default configuration). This leads to automatically skiped validation tests in case the stencil code wasn't generated yet. On the CI, we only run with 2 threads so only a couple tests were skipped usually. Locally, I was running with 16 threads and got ~30 skipped validation tests. This PR proposes to address the issue by setting an `xdist_group` mark on the generation/implementation tests that belong togehter. In combination with `--dist loadgroup`, this will keep the expected order where necessary. Only tests with `xdist_group` markers are affected by `--dist loadgroup`. Tests without that marker will be distributed normally as if in `--dist load` mode (the default so far). By grouping with `cls_name` and backend, we keep maximal parallelization, grouping only the two tests that are depending on each other. Further reading: see [`--dist` section](https://pytest-xdist.readthedocs.io/en/stable/distribution.html) in `pytest-xdist` documentation. ## Requirements - [x] All fixes and/or new features come with corresponding tests. Existing tests are still green. No more skipped tests \o/ Works as expected locally - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md) folder. N/A --------- Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

romanc force-pushed the romanc/cartesian-thread-safe-parallel-tests branch from 6900575 to 8e8e497 Compare February 6, 2025 14:21

romanc commented Feb 6, 2025

View reviewed changes

romanc marked this pull request as ready for review February 6, 2025 14:34

romanc requested review from FlorianDeconinck and twicki February 6, 2025 14:34

Fixup: change error message

020d567

The previous one was from when I thought we had to run these tests on one thread only.

FlorianDeconinck approved these changes Feb 6, 2025

View reviewed changes

havogt approved these changes Feb 7, 2025

View reviewed changes

romanc mentioned this pull request Feb 7, 2025

gt4py tests: refactor StencilTestSuite base class to avoid test dependencies GEOS-ESM/NDSL#72

Open

FlorianDeconinck merged commit 4b566d7 into GridTools:main Feb 7, 2025
30 checks passed

romanc deleted the romanc/cartesian-thread-safe-parallel-tests branch February 8, 2025 12:17

romanc mentioned this pull request May 8, 2025

ci: add nox utilities for conditional session execution instead of using github actions #1982

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci[cartesian]: Thread safe parallel stencil tests#1849

ci[cartesian]: Thread safe parallel stencil tests#1849
FlorianDeconinck merged 2 commits intoGridTools:mainfrom
romanc:romanc/cartesian-thread-safe-parallel-tests

romanc commented Feb 6, 2025 •

edited

Loading

Uh oh!

romanc left a comment

Uh oh!

romanc Feb 6, 2025

Uh oh!

romanc Feb 6, 2025

Uh oh!

romanc Feb 6, 2025

Uh oh!

romanc Feb 6, 2025

Uh oh!

romanc commented Feb 6, 2025

Uh oh!

FlorianDeconinck left a comment

Uh oh!

havogt left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

romanc commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Requirements

Uh oh!

romanc left a comment

Choose a reason for hiding this comment

Uh oh!

romanc Feb 6, 2025

Choose a reason for hiding this comment

Uh oh!

romanc Feb 6, 2025

Choose a reason for hiding this comment

Uh oh!

romanc Feb 6, 2025

Choose a reason for hiding this comment

Uh oh!

romanc Feb 6, 2025

Choose a reason for hiding this comment

Uh oh!

romanc commented Feb 6, 2025

Uh oh!

FlorianDeconinck left a comment

Choose a reason for hiding this comment

Uh oh!

havogt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

romanc commented Feb 6, 2025 •

edited

Loading