You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using a tuple of closures, evaluation of the diffusive flux divergence for an arbitrary number of closures requires recursing into the diffusive flux operator. Currently, this recursion starts with
However, this pattern does not compile on the GPU (which is why we hard code the 2- and 3-tuple cases to support these on the GPU). The reason is a compiler heuristic that aborts inlining when self-recursion is encountered (eg a function is called within itself).
To avoid this, I think we can use an "outer-inner" form whereby the outer function
@inlinecolumn(x, inds...) = x
@inlinecolumn(tup::Tuple, inds...) =column_args(tup, inds...)
# Recursively call column() on broadcast arguments in a way that is statically reducible by the optimizer# see Base.Broadcast.preprocess_args@inlinecolumn_args(args::Tuple, inds...) =
(column(args[1], inds...), column_args(Base.tail(args), inds...)...)
@inlinecolumn_args(args::Tuple{Any}, inds...) = (column(args[1], inds...),)
@inlinecolumn_args(args::Tuple{}, inds...) = ()
When using a tuple of closures, evaluation of the diffusive flux divergence for an arbitrary number of closures requires recursing into the diffusive flux operator. Currently, this recursion starts with
Oceananigans.jl/src/TurbulenceClosures/closure_tuples.jl
Lines 45 to 47 in 3c86d8f
which calls itself and terminates at the end points
Oceananigans.jl/src/TurbulenceClosures/closure_tuples.jl
Lines 33 to 34 in 3c86d8f
and
Oceananigans.jl/src/TurbulenceClosures/closure_tuples.jl
Lines 36 to 38 in 3c86d8f
However, this pattern does not compile on the GPU (which is why we hard code the 2- and 3-tuple cases to support these on the GPU). The reason is a compiler heuristic that aborts inlining when self-recursion is encountered (eg a function is called within itself).
To avoid this, I think we can use an "outer-inner" form whereby the outer function
unpacks one element, calls itself,
and handles the rest of the elements with an inner function
Or, something like that... getting this right might require a little trial and error.
This is similar to a pattern implemented in
ClimaCore.jl
:cc @jakebolewski
The text was updated successfully, but these errors were encountered: