Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encourage compiler to inline recursive closure tuple operators with "outer-inner" form #2101

Closed
glwagner opened this issue Dec 7, 2021 · 1 comment
Labels
GPU 👾 Where Oceananigans gets its powers from turbulence closures 🎐

Comments

@glwagner
Copy link
Member

glwagner commented Dec 7, 2021

When using a tuple of closures, evaluation of the diffusive flux divergence for an arbitrary number of closures requires recursing into the diffusive flux operator. Currently, this recursion starts with

@inline ∇_dot_qᶜ(i, j, k, grid::AbstractGrid, closures::Tuple, c, iᶜ, clock, Ks, args...) = (
∇_dot_qᶜ(i, j, k, grid, closures[1:2], c, iᶜ, clock, Ks[1:2], args...)
+ ∇_dot_qᶜ(i, j, k, grid, closures[3:end], c, iᶜ, clock, Ks[3:end], args...))

which calls itself and terminates at the end points

@inline ∇_dot_qᶜ(i, j, k, grid::AbstractGrid, closures::Tuple{<:Any}, c, iᶜ, clock, Ks, args...) =
∇_dot_qᶜ(i, j, k, grid, closures[1], c, iᶜ, clock, Ks[1], args...)

and

@inline ∇_dot_qᶜ(i, j, k, grid::AbstractGrid, closures::Tuple{<:Any, <:Any}, c, iᶜ, clock, Ks, args...) = (
∇_dot_qᶜ(i, j, k, grid, closures[1], c, iᶜ, clock, Ks[1], args...)
+ ∇_dot_qᶜ(i, j, k, grid, closures[2], c, iᶜ, clock, Ks[2], args...))

However, this pattern does not compile on the GPU (which is why we hard code the 2- and 3-tuple cases to support these on the GPU). The reason is a compiler heuristic that aborts inlining when self-recursion is encountered (eg a function is called within itself).

To avoid this, I think we can use an "outer-inner" form whereby the outer function

∇_dot_qᶜ(i, j, k, grid::AbstractGrid, closures::Tuple, c, iᶜ, clock, Ks, args...)

unpacks one element, calls itself,

∇_dot_qᶜ(i, j, k, grid, closures[1], c, iᶜ, clock, Ks[1], args...)

and handles the rest of the elements with an inner function

inner_∇_dot_qᶜ(i, j, k, grid, closures[2:end], c, iᶜ, clock, Ks[2:end], args...)

Or, something like that... getting this right might require a little trial and error.

This is similar to a pattern implemented in ClimaCore.jl:

@inline column(x, inds...) = x
@inline column(tup::Tuple, inds...) = column_args(tup, inds...)

# Recursively call column() on broadcast arguments in a way that is statically reducible by the optimizer
# see Base.Broadcast.preprocess_args
@inline column_args(args::Tuple, inds...) =
    (column(args[1], inds...), column_args(Base.tail(args), inds...)...)
@inline column_args(args::Tuple{Any}, inds...) = (column(args[1], inds...),)
@inline column_args(args::Tuple{}, inds...) = ()

cc @jakebolewski

@glwagner glwagner added GPU 👾 Where Oceananigans gets its powers from turbulence closures 🎐 labels Dec 7, 2021
@glwagner
Copy link
Member Author

Seems like things are working so I'm closing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GPU 👾 Where Oceananigans gets its powers from turbulence closures 🎐
Projects
None yet
Development

No branches or pull requests

1 participant