Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add page on optimization to user guide #397

Merged
merged 2 commits into from
Oct 1, 2024
Merged

Add page on optimization to user guide #397

merged 2 commits into from
Oct 1, 2024

Conversation

tomwhite
Copy link
Member

See #381.

This relies on #396 to make it easier to cross-refer operation IDs in the debug output to the visualizations.

@tomwhite tomwhite added the documentation Improvements or additions to documentation label Feb 26, 2024
Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very helpful for me to understand what optimization options we have, but for a user this would seem like an unmotivated exposure of pretty deep internals. Is this intended as developer docs?

@@ -0,0 +1,109 @@
# Optimization

Cubed will automatically optimize the computation graph before running it. This can reduce the number of tasks in the plan, and the amount of intermediate IO, both of which speed up the computation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this intro it is unclear why I would ever not want to fuse tasks. Are there situations in which fusing would sacrifice memory safety? Or is it merely a trade-off between asking single tasks to read more data and potentially having more IO steps in the pipeline?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there situations in which fusing would sacrifice memory safety?

No, fusing should always respect memory limits.

Or is it merely a trade-off between asking single tasks to read more data and potentially having more IO steps in the pipeline?

Yes that's it.


![Map fusion optimization](../images/optimization_map_fusion.svg)

Note that with optimization turned on, the array `b` is no longer written as an intermediate output since it will be computed in the same tasks that compute array `c`. The overall number of tasks is reduced from 10 to 5, and the intermediate data (total nbytes) is reduced too.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe note that this optimization is always beneficial, which is why it's on by default?


`max_total_num_input_blocks` specifies the maximum number of input blocks (chunks) that are allowed in the fused operation.

Again, this is to limit the number of reads that an individual task must perform. The default is `None`, which means that operations are fused only if they have the same number of tasks. If set to an integer, then this limitation is removed, and tasks with a different number of tasks will be fused - as long as the total number of input blocks does not exceed the maximum. This setting is useful for reductions, and can be set using `functools.partial`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that both the max_total_source_arrays and max_total_num_input_blocks are related to the "optimal fan-in" parameter described in #331. I therefore think this explanation would be better motivated if it started with a description of that general trade-off first.

I would have structured it like this:

  • Description of the trade-off detailed in Optimizing reduction #331
  • Therefore an optimal fan-in parameter exists (unique for any particular reduction?)
  • Cubed makes a default global choice for you, but you may want to override it
  • In which scenarios are you likely to want to override this default?
  • Here's how you actually use the API to change the default...

The output explains which operations can or can't be fused, and why:

```
DEBUG:cubed.core.optimization:can't fuse op-001 since it is not fusable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not the most helpful message 😆 - why is it not fusable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops - that is a tautology! In this case, op-001 has no predecessors so there's nothing to fuse with.

@tomwhite
Copy link
Member Author

This is very helpful for me to understand what optimization options we have, but for a user this would seem like an unmotivated exposure of pretty deep internals. Is this intended as developer docs?

I wrote it to capture what I'd implemented so far on the optimization work, but you're right that it's not pitched well for users!

I hope that we can make the more aggressive optimizations enabled by default in which case users won't need to care about these settings, although there will likely always be cases where an advanced user wants to change things. But we're not ready to do that quite yet.

Might be worth leaving this open until we have more experience with optimizing some example workloads.

@TomNicholas
Copy link
Member

To be clear - as a temporary record of the state of possible optimizations, this is great. My criticisms are more looking towards what would be most useful for users in the future.

@tomwhite
Copy link
Member Author

To be clear - as a temporary record of the state of possible optimizations, this is great. My criticisms are more looking towards what would be most useful for users in the future.

It's very helpful feedback - thanks!

@tomwhite
Copy link
Member Author

tomwhite commented Oct 1, 2024

I've re-worked this page to reflect the new defaults (multiple-input fusion is enabled by default now), and also to move the advanced settings to the end since they should be rarely needed. I think this is generally useful for users should should be Ok to go in now @TomNicholas.


There are limits to how many input arrays and input chunks reads are fused together. These are imposed so that the number of reads that an individual task must perform is not excessive, which would otherwise result in slow running tasks.

In some cases you may want to change these limits, which we look at here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to explain what some of the cases might be? Or leave that as future work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could certainly be expanded, but I'd prefer to leave it for now.

@tomwhite tomwhite merged commit 2065794 into main Oct 1, 2024
11 checks passed
@tomwhite tomwhite deleted the doc-optimization branch October 1, 2024 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants