Add page on optimization to user guide #397

tomwhite · 2024-02-26T10:41:48Z

This relies on #396 to make it easier to cross-refer operation IDs in the debug output to the visualizations.

TomNicholas

This is very helpful for me to understand what optimization options we have, but for a user this would seem like an unmotivated exposure of pretty deep internals. Is this intended as developer docs?

TomNicholas · 2024-02-26T16:29:25Z

docs/user-guide/optimization.md

@@ -0,0 +1,109 @@
+# Optimization
+
+Cubed will automatically optimize the computation graph before running it. This can reduce the number of tasks in the plan, and the amount of intermediate IO, both of which speed up the computation.


From this intro it is unclear why I would ever not want to fuse tasks. Are there situations in which fusing would sacrifice memory safety? Or is it merely a trade-off between asking single tasks to read more data and potentially having more IO steps in the pipeline?

Are there situations in which fusing would sacrifice memory safety?

No, fusing should always respect memory limits.

Or is it merely a trade-off between asking single tasks to read more data and potentially having more IO steps in the pipeline?

Yes that's it.

TomNicholas · 2024-02-26T16:30:24Z

docs/user-guide/optimization.md

+
+![Map fusion optimization](../images/optimization_map_fusion.svg)
+
+Note that with optimization turned on, the array `b` is no longer written as an intermediate output since it will be computed in the same tasks that compute array `c`. The overall number of tasks is reduced from 10 to 5, and the intermediate data (total nbytes) is reduced too.


Maybe note that this optimization is always beneficial, which is why it's on by default?

TomNicholas · 2024-02-26T16:47:42Z

docs/user-guide/optimization.md

+
+`max_total_num_input_blocks` specifies the maximum number of input blocks (chunks) that are allowed in the fused operation.
+
+Again, this is to limit the number of reads that an individual task must perform. The default is `None`, which means that operations are fused only if they have the same number of tasks. If set to an integer, then this limitation is removed, and tasks with a different number of tasks will be fused - as long as the total number of input blocks does not exceed the maximum. This setting is useful for reductions, and can be set using `functools.partial`:


My understanding is that both the max_total_source_arrays and max_total_num_input_blocks are related to the "optimal fan-in" parameter described in #331. I therefore think this explanation would be better motivated if it started with a description of that general trade-off first.

I would have structured it like this:

Description of the trade-off detailed in Optimizing reduction #331

Therefore an optimal fan-in parameter exists (unique for any particular reduction?)

Cubed makes a default global choice for you, but you may want to override it

In which scenarios are you likely to want to override this default?

Here's how you actually use the API to change the default...

TomNicholas · 2024-02-26T16:52:28Z

docs/user-guide/optimization.md

+The output explains which operations can or can't be fused, and why:
+
+```
+DEBUG:cubed.core.optimization:can't fuse op-001 since it is not fusable


that's not the most helpful message 😆 - why is it not fusable?

Oops - that is a tautology! In this case, op-001 has no predecessors so there's nothing to fuse with.

tomwhite · 2024-02-26T17:30:26Z

This is very helpful for me to understand what optimization options we have, but for a user this would seem like an unmotivated exposure of pretty deep internals. Is this intended as developer docs?

I wrote it to capture what I'd implemented so far on the optimization work, but you're right that it's not pitched well for users!

I hope that we can make the more aggressive optimizations enabled by default in which case users won't need to care about these settings, although there will likely always be cases where an advanced user wants to change things. But we're not ready to do that quite yet.

Might be worth leaving this open until we have more experience with optimizing some example workloads.

TomNicholas · 2024-02-26T17:34:03Z

To be clear - as a temporary record of the state of possible optimizations, this is great. My criticisms are more looking towards what would be most useful for users in the future.

tomwhite · 2024-02-26T17:39:26Z

To be clear - as a temporary record of the state of possible optimizations, this is great. My criticisms are more looking towards what would be most useful for users in the future.

It's very helpful feedback - thanks!

tomwhite · 2024-10-01T11:51:48Z

I've re-worked this page to reflect the new defaults (multiple-input fusion is enabled by default now), and also to move the advanced settings to the end since they should be rarely needed. I think this is generally useful for users should should be Ok to go in now @TomNicholas.

…vanced settings to the end

TomNicholas · 2024-10-01T14:40:28Z

docs/user-guide/optimization.md

+
+There are limits to how many input arrays and input chunks reads are fused together. These are imposed so that the number of reads that an individual task must perform is not excessive, which would otherwise result in slow running tasks.
+
+In some cases you may want to change these limits, which we look at here.


Do we want to explain what some of the cases might be? Or leave that as future work?

It could certainly be expanded, but I'd prefer to leave it for now.

tomwhite added the documentation Improvements or additions to documentation label Feb 26, 2024

TomNicholas reviewed Feb 26, 2024

View reviewed changes

tomwhite mentioned this pull request Feb 26, 2024

Create a single set of example scripts that can run on any executor #395

Merged

tomwhite mentioned this pull request Mar 5, 2024

Make the new multiple-inputs optimization function the default. #412

Merged

Add page on optimization to user guide

458e0ea

tomwhite force-pushed the doc-optimization branch from 9f8ffa4 to e895434 Compare October 1, 2024 11:51

Update optimization docs to reflect new default settings, and move ad…

e0a594b

…vanced settings to the end

tomwhite force-pushed the doc-optimization branch from e895434 to e0a594b Compare October 1, 2024 12:02

TomNicholas reviewed Oct 1, 2024

View reviewed changes

TomNicholas approved these changes Oct 1, 2024

View reviewed changes

tomwhite merged commit 2065794 into main Oct 1, 2024
11 checks passed

tomwhite deleted the doc-optimization branch October 1, 2024 16:13

tomwhite mentioned this pull request Oct 1, 2024

Document optimization changes #381

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add page on optimization to user guide #397

Add page on optimization to user guide #397

tomwhite commented Feb 26, 2024

TomNicholas left a comment

TomNicholas Feb 26, 2024

tomwhite Feb 26, 2024

TomNicholas Feb 26, 2024

TomNicholas Feb 26, 2024

TomNicholas Feb 26, 2024

tomwhite Feb 26, 2024

tomwhite commented Feb 26, 2024

TomNicholas commented Feb 26, 2024

tomwhite commented Feb 26, 2024

tomwhite commented Oct 1, 2024

TomNicholas Oct 1, 2024

tomwhite Oct 1, 2024

		@@ -0,0 +1,109 @@
		# Optimization

		Cubed will automatically optimize the computation graph before running it. This can reduce the number of tasks in the plan, and the amount of intermediate IO, both of which speed up the computation.


		![Map fusion optimization](../images/optimization_map_fusion.svg)

		Note that with optimization turned on, the array `b` is no longer written as an intermediate output since it will be computed in the same tasks that compute array `c`. The overall number of tasks is reduced from 10 to 5, and the intermediate data (total nbytes) is reduced too.


		`max_total_num_input_blocks` specifies the maximum number of input blocks (chunks) that are allowed in the fused operation.

		Again, this is to limit the number of reads that an individual task must perform. The default is `None`, which means that operations are fused only if they have the same number of tasks. If set to an integer, then this limitation is removed, and tasks with a different number of tasks will be fused - as long as the total number of input blocks does not exceed the maximum. This setting is useful for reductions, and can be set using `functools.partial`:


		There are limits to how many input arrays and input chunks reads are fused together. These are imposed so that the number of reads that an individual task must perform is not excessive, which would otherwise result in slow running tasks.

		In some cases you may want to change these limits, which we look at here.

Add page on optimization to user guide #397

Add page on optimization to user guide #397

Conversation

tomwhite commented Feb 26, 2024

TomNicholas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomwhite commented Feb 26, 2024

TomNicholas commented Feb 26, 2024

tomwhite commented Feb 26, 2024

tomwhite commented Oct 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment