-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add page on optimization to user guide #397
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very helpful for me to understand what optimization options we have, but for a user this would seem like an unmotivated exposure of pretty deep internals. Is this intended as developer docs?
@@ -0,0 +1,109 @@ | |||
# Optimization | |||
|
|||
Cubed will automatically optimize the computation graph before running it. This can reduce the number of tasks in the plan, and the amount of intermediate IO, both of which speed up the computation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this intro it is unclear why I would ever not want to fuse tasks. Are there situations in which fusing would sacrifice memory safety? Or is it merely a trade-off between asking single tasks to read more data and potentially having more IO steps in the pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there situations in which fusing would sacrifice memory safety?
No, fusing should always respect memory limits.
Or is it merely a trade-off between asking single tasks to read more data and potentially having more IO steps in the pipeline?
Yes that's it.
docs/user-guide/optimization.md
Outdated
|
||
![Map fusion optimization](../images/optimization_map_fusion.svg) | ||
|
||
Note that with optimization turned on, the array `b` is no longer written as an intermediate output since it will be computed in the same tasks that compute array `c`. The overall number of tasks is reduced from 10 to 5, and the intermediate data (total nbytes) is reduced too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe note that this optimization is always beneficial, which is why it's on by default?
docs/user-guide/optimization.md
Outdated
|
||
`max_total_num_input_blocks` specifies the maximum number of input blocks (chunks) that are allowed in the fused operation. | ||
|
||
Again, this is to limit the number of reads that an individual task must perform. The default is `None`, which means that operations are fused only if they have the same number of tasks. If set to an integer, then this limitation is removed, and tasks with a different number of tasks will be fused - as long as the total number of input blocks does not exceed the maximum. This setting is useful for reductions, and can be set using `functools.partial`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that both the max_total_source_arrays
and max_total_num_input_blocks
are related to the "optimal fan-in" parameter described in #331. I therefore think this explanation would be better motivated if it started with a description of that general trade-off first.
I would have structured it like this:
- Description of the trade-off detailed in Optimizing reduction #331
- Therefore an optimal fan-in parameter exists (unique for any particular reduction?)
- Cubed makes a default global choice for you, but you may want to override it
- In which scenarios are you likely to want to override this default?
- Here's how you actually use the API to change the default...
docs/user-guide/optimization.md
Outdated
The output explains which operations can or can't be fused, and why: | ||
|
||
``` | ||
DEBUG:cubed.core.optimization:can't fuse op-001 since it is not fusable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not the most helpful message 😆 - why is it not fusable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops - that is a tautology! In this case, op-001
has no predecessors so there's nothing to fuse with.
I wrote it to capture what I'd implemented so far on the optimization work, but you're right that it's not pitched well for users! I hope that we can make the more aggressive optimizations enabled by default in which case users won't need to care about these settings, although there will likely always be cases where an advanced user wants to change things. But we're not ready to do that quite yet. Might be worth leaving this open until we have more experience with optimizing some example workloads. |
To be clear - as a temporary record of the state of possible optimizations, this is great. My criticisms are more looking towards what would be most useful for users in the future. |
It's very helpful feedback - thanks! |
9f8ffa4
to
e895434
Compare
I've re-worked this page to reflect the new defaults (multiple-input fusion is enabled by default now), and also to move the advanced settings to the end since they should be rarely needed. I think this is generally useful for users should should be Ok to go in now @TomNicholas. |
…vanced settings to the end
e895434
to
e0a594b
Compare
|
||
There are limits to how many input arrays and input chunks reads are fused together. These are imposed so that the number of reads that an individual task must perform is not excessive, which would otherwise result in slow running tasks. | ||
|
||
In some cases you may want to change these limits, which we look at here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to explain what some of the cases might be? Or leave that as future work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could certainly be expanded, but I'd prefer to leave it for now.
See #381.
This relies on #396 to make it easier to cross-refer operation IDs in the debug output to the visualizations.