Optimise scheduler.get_comm_cost set difference #6931

wence- · 2022-08-22T18:20:55Z

Computing set(A).difference(B) is O(max(len(A), len(B))). When
estimating the communication cost of a task's dependencies it is usual
that the number of dependencies (A) will be small but the number of
tasks the worker has (B) is large. In this case it is better to
manually construct the set difference by iterating over A and checking
if each element is in B.

Performing a left.merge(right, on="key", how="inner) of a distributed
dataframe with eight workers with chunks_per_worker * rows_per_chunk
held constant, I observe the following timings using the tcp
communication protocol:

chunks_per_worker	rows_per_chunk	before	after
100	50000	75s	48s
10	500000	~9s	~9s
1	5000000	~8s	~8s

Tests added / passed
Passes pre-commit run --all-files

wence- · 2022-08-22T18:21:55Z

This is one simple go at addressing #6899.

github-actions · 2022-08-22T19:59:20Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      15 files ±0       15 suites ±0 6h 44m 23s ⏱️ - 1m 14s
  3 041 tests ±0   2 954 ✔️ - 1   84 💤 +1 3 ❌ ±0
22 493 runs ±0 21 515 ✔️ - 1 975 💤 +1 3 ❌ ±0

For more details on these failures, see this check.

Results for commit ce87313. ± Comparison against base commit 2a2c3bb.

♻️ This comment has been updated with latest results.

gjoseph92

This seems like a simple and sensible improvement!

Just the highly-flaky #6896, so I think we could merge this.

fjetter · 2022-08-23T07:39:40Z

distributed/scheduler.py

+        if 10 * len(ts.dependencies) < len(ws.has_what):
+            # In the common case where the number of dependencies is
+            # much less than the number of tasks that we have,
+            # construct the set of deps that require communication in
+            # O(len(dependencies)) rather than O(len(has_what)) time.
+            # Factor of 10 is a guess at the overhead of explicit
+            # iteration as opposed to just calling set.difference
+            deps = {dep for dep in ts.dependencies if dep not in ws.has_what}
+        else:
+            deps = ts.dependencies.difference(ws.has_what)


Micro benchmarking this, I get a factor of ~2 rather than 10

for dict_size in [100, 1_000, 10_000, 100_000, 1_000_000]: a_large_dict = { f"{ix}-{uuid.uuid4()}": "foo" for ix in range(dict_size) } def timing(func): start = time.time_ns() iterations = 10 for iteration in range(iterations): func() end = time.time_ns() return (end-start)/iterations for factor in [0.1, 0.4, 0.45, 0.5]: small_set = set(sample(a_large_dict.keys(), int(factor * dict_size))) intersect = timing(lambda: small_set.intersection(a_large_dict)) iterate = timing(lambda: {k for k in small_set if k in a_large_dict}) if iterate < intersect: print(f"Iterating faster for {dict_size=} and {factor=}")

Iterating faster for dict_size=100 and factor=0.1 Iterating faster for dict_size=1000 and factor=0.1 Iterating faster for dict_size=1000 and factor=0.4 Iterating faster for dict_size=1000 and factor=0.5 Iterating faster for dict_size=10000 and factor=0.1 Iterating faster for dict_size=10000 and factor=0.4 Iterating faster for dict_size=10000 and factor=0.45 Iterating faster for dict_size=100000 and factor=0.1 Iterating faster for dict_size=100000 and factor=0.4 Iterating faster for dict_size=100000 and factor=0.45 Iterating faster for dict_size=100000 and factor=0.5 Iterating faster for dict_size=1000000 and factor=0.1

Conversely, on my (admittedly slightly antediluvian) Broadwell box, Python 3.9.13

Iterating faster for dict_size=100 and factor=0.1 Iterating faster for dict_size=1000 and factor=0.1 Iterating faster for dict_size=1000 and factor=0.4 Iterating faster for dict_size=10000 and factor=0.1 Iterating faster for dict_size=100000 and factor=0.1 Iterating faster for dict_size=1000000 and factor=0.1

Computing set(A).difference(B) is O(max(len(A), len(B))). When estimating the communication cost of a task's dependencies it is usual that the number of dependencies (A) will be small but the number of tasks the worker has (B) is large. In this case it is better to manually construct the set difference by iterating over A and checking if each element is in B. Performing a left.merge(right, on="key", how="inner) of a distributed dataframe with eight workers with chunks_per_worker * rows_per_chunk held constant, I observe the following timings using the tcp communication protocol: | chunks_per_worker | rows_per_chunk | before | after | |-------------------|----------------|--------|-------| | 100 | 50000 | 75s | 48s | | 10 | 500000 | ~9s | ~9s | | 1 | 5000000 | ~8s | ~8s |

wence- · 2022-08-23T10:46:23Z

Updated commit message/summary for timings with tcp rather than UCX comms protocol (otherwise no change in the force push).

I can adapt the heuristic for when to select between the two options but as above, the threshold varies depending on hardware.

wence- · 2022-08-23T10:47:13Z

I can adapt the heuristic for when to select between the two options but as above, the threshold varies depending on hardware.

In my benchmarking of the workflow, a factor of 10 or 2 didn't really make a difference, I guess because the dependencies set is really much smaller.

fjetter · 2022-08-23T14:49:12Z

Thanks for checking about the factor again. I guess you are right and that's good enough

gjoseph92 approved these changes Aug 23, 2022

View reviewed changes

fjetter reviewed Aug 23, 2022

View reviewed changes

wence- force-pushed the wence/get-comm-cost-opt branch from 01d4e05 to ce87313 Compare August 23, 2022 10:44

fjetter approved these changes Aug 23, 2022

View reviewed changes

fjetter merged commit c15a10e into dask:main Aug 23, 2022

wence- deleted the wence/get-comm-cost-opt branch August 23, 2022 14:58

jrbourbeau mentioned this pull request Aug 23, 2022

scheduler.get_comm_cost a significant portion of runtime in merge benchmarks #6899

Closed

wence- mentioned this pull request Aug 23, 2022

More potential merge speedups #6938

Open

gjoseph92 pushed a commit to gjoseph92/distributed that referenced this pull request Oct 31, 2022

Optimise scheduler.get_comm_cost set difference (dask#6931)

61e10ec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise scheduler.get_comm_cost set difference #6931

Optimise scheduler.get_comm_cost set difference #6931

wence- commented Aug 22, 2022 •

edited

Loading

wence- commented Aug 22, 2022

github-actions bot commented Aug 22, 2022 •

edited

Loading

gjoseph92 left a comment

fjetter Aug 23, 2022 •

edited

Loading

wence- Aug 23, 2022

wence- commented Aug 23, 2022

wence- commented Aug 23, 2022

fjetter commented Aug 23, 2022

Optimise scheduler.get_comm_cost set difference #6931

Optimise scheduler.get_comm_cost set difference #6931

Conversation

wence- commented Aug 22, 2022 • edited Loading

wence- commented Aug 22, 2022

github-actions bot commented Aug 22, 2022 • edited Loading

Unit Test Results

gjoseph92 left a comment

Choose a reason for hiding this comment

fjetter Aug 23, 2022 • edited Loading

Choose a reason for hiding this comment

wence- Aug 23, 2022

Choose a reason for hiding this comment

wence- commented Aug 23, 2022

wence- commented Aug 23, 2022

fjetter commented Aug 23, 2022

wence- commented Aug 22, 2022 •

edited

Loading

github-actions bot commented Aug 22, 2022 •

edited

Loading

fjetter Aug 23, 2022 •

edited

Loading