You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider a pipeline with three outputs f2, g2, h2. These call Funcs f1, g1, h1 respectively. Everything is compute_root. The realization order f1 f2 g1 g2 h1 h2 is going to use a lot less intermediate memory than the order f1 g1 h1 f2 g2 h2.
We should shuffle the realization order of realizations at each loop level in schedule_functions to minimize the number of overlapping lifetimes. This could be done by identifying each loop level used in a compute_at, and then for each, coming up with a new realization order for that loop level. This would have to be done at the level of fused groups, not Funcs.
The text was updated successfully, but these errors were encountered:
abadams
changed the title
We should shuffle around the realization order to minimize peak memory usages
We should shuffle around the realization order to minimize peak memory usage
Mar 12, 2024
It also affects locality, so there might be a trade-off here. Also if the allocations are all dynamic-size, the peak usage and thus the order will depend on those sizes, so the compiler won't be able to infer it.
You can already sort of schedule it with compute_at(Var::outermost(), the_func_you_want_to_go_before)
Consider a pipeline with three outputs f2, g2, h2. These call Funcs f1, g1, h1 respectively. Everything is compute_root. The realization order f1 f2 g1 g2 h1 h2 is going to use a lot less intermediate memory than the order f1 g1 h1 f2 g2 h2.
We should shuffle the realization order of realizations at each loop level in schedule_functions to minimize the number of overlapping lifetimes. This could be done by identifying each loop level used in a compute_at, and then for each, coming up with a new realization order for that loop level. This would have to be done at the level of fused groups, not Funcs.
The text was updated successfully, but these errors were encountered: