Remove test_ucx_config_w_env_var flaky condition#5765
Conversation
|
cc @charlesbluca @jakirkham as per your comments in #5229 . |
test_ucx_config_w_env_var flaky condition
Unit Test Results 18 files ±0 18 suites ±0 10h 49m 56s ⏱️ + 1h 38m 0s For more details on these failures and errors, see this check. Results for commit 3f0f589. ± Comparison against base commit dbaae68. ♻️ This comment has been updated with latest results. |
jakirkham
left a comment
There was a problem hiding this comment.
Thanks Peter 🙂
Had a few questions below
|
|
||
|
|
||
| @pytest.mark.flaky( | ||
| reruns=10, reruns_delay=5, condition=ucp.get_ucx_version() < (1, 11, 0) |
There was a problem hiding this comment.
Do we guarantee having 1.11.0+ at this point? Do we document what versions we support anywhere?
There was a problem hiding this comment.
I'm dropping that condition to mark the test flaky to all UCX versions. In the next release we'll drop UCX < 1.11.1 support rapidsai/ucx-py#829, but today we theoretically support any version.
There was a problem hiding this comment.
Got it. Might be worth thinking about how to document UCX compatibility in Dask at some point
There was a problem hiding this comment.
That is actually left to UCX-Py, not Dask, so it's gonna be whatever UCX-Py supports, therefore I don't think it makes sense to document UCX as it's an indirect dependency. This will be more evident soon when we drop support for UCX < 1.11.1 and then there will be no more checks for UCX version in code, I have changes prepared but am still in the process of testing multi-GPU to ensure nothing breaks.
|
|
||
| port = "13339" | ||
| sched_addr = f"ucx://{HOST}:{port}" | ||
| sched_addr = f"ucx://127.0.0.1:{port}" |
There was a problem hiding this comment.
Curious why HOST doesn't work for us
Also should we assign to a local host variable if this needs to change in the future?
There was a problem hiding this comment.
It does work, what I noticed is 127.0.0.1 fails less often, that's the only reason I'm changing it here as most other Distributed tests seem to rely on 127.0.0.1 as well, see https://github.com/dask/distributed/blob/main/distributed/cli/tests/test_dask_scheduler.py for instance. I think we don't need to worry about a host variable here honestly, being straightforward seems good enough.
There was a problem hiding this comment.
Ok maybe we should leave a comment about that. Perhaps something like this
| sched_addr = f"ucx://127.0.0.1:{port}" | |
| # Use localhost directly (this appears to be less flaky than HOST) | |
| sched_addr = f"ucx://127.0.0.1:{port}" |
There was a problem hiding this comment.
Done in 3f0f589, I expanded a bit your comment suggestion too.
|
Plan to merge in later this afternoon unless there are other comments |
|
This is failing on |
|
Thank Peter! 😄 |
|
Thanks for reviews @jakirkham and @quasiben ! |
Mark
test_ucx_config_w_env_varflaky independent of UCX version. Using127.0.0.1greatly reduces the probability of the issue, but it still happens ~1% of the time when I run it locally, therefore marking it flaky for the time being.