Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example GPU kernels #121

Closed
milankl opened this issue Aug 10, 2022 · 5 comments
Closed

Example GPU kernels #121

milankl opened this issue Aug 10, 2022 · 5 comments
Labels
gpu 🖼️ Everthing GPU related

Comments

@milankl
Copy link
Member

milankl commented Aug 10, 2022

Compute-intensive loops for which we'll define GPU kernels basically fall into one of the three categories (sorted from simple to complex)

  1. for lm in eachharmonic. These kernels loop over the non-zero indices of one or several LowerTriangularMatrixs but only access/write into the $l,m$ harmonic lm on every iteration. No cross dependencies to other harmonics. May include scalar constants. All input arrays are of the same size.

  2. for i,j in eachentry(::Matrix) with vec[j]. These kernels loop over all entries $i,j$ of a matrix (can be LowerTriangularMatrix) but also pull data from vector at index $j$. There's at least two different indices used in the loop. Input matrices are of the same size, but the (precomputed) vectors are (obviously) smaller.

  3. for l,m in eachharmonic with A[l+1,m] and A[l-1,m]. These kernels loop over the non-zero indices $l,m$ of several LowerTriangularMatrixs and access for every $l,m$ also $l+1,m$ and $l-1,m$, meaning there are cross dependencies to other harmonics. These loops usually involve a separate loop for the diagonal (as $l-1,m$ is zero) and for the last row (as $l+1,m$ is out of bounds).

  4. spherical harmonic transforms. Like 2) but with signs depending on odd and even modes, and combined with Fourier transforms.

Examples

  1. for lm in eachharmonic
    a) The horizontal diffusion
    https://github.com/milankl/SpeedyWeather.jl/blob/e1c1e79fe43cf5c23603a87039b27c4fc59d4250/src/diffusion.jl#L12-L21
    b) The leapfrog time integration
    https://github.com/milankl/SpeedyWeather.jl/blob/e1c1e79fe43cf5c23603a87039b27c4fc59d4250/src/time_integration.jl#L13-L43

  2. for i,j in eachentry(::Matrix) with vec[j]
    a) The vorticity fluxes (in grid-point space)
    https://github.com/milankl/SpeedyWeather.jl/blob/e1c1e79fe43cf5c23603a87039b27c4fc59d4250/src/tendencies_dynamics.jl#L287-L313
    b) The Bernoulli potential (in grid-point space)
    https://github.com/milankl/SpeedyWeather.jl/blob/e1c1e79fe43cf5c23603a87039b27c4fc59d4250/src/tendencies_dynamics.jl#L349-L373
    c) The Laplace operator (in spectral space)
    https://github.com/milankl/SpeedyWeather.jl/blob/e1c1e79fe43cf5c23603a87039b27c4fc59d4250/src/spectral_gradients.jl#L263-L289

  3. for l,m in eachharmonic with A[l+1,m] and A[l-1,m]
    a) The divergence/curl operator
    https://github.com/milankl/SpeedyWeather.jl/blob/e1c1e79fe43cf5c23603a87039b27c4fc59d4250/src/spectral_gradients.jl#L68-L99
    b) $U,V$ from vorticity and divergence (inverse Laplace combined with horizontal gradients)
    https://github.com/milankl/SpeedyWeather.jl/blob/e1c1e79fe43cf5c23603a87039b27c4fc59d4250/src/spectral_gradients.jl#L166-L210

  4. spherical harmonic transforms
    a) spectral to grid
    https://github.com/milankl/SpeedyWeather.jl/blob/e1c1e79fe43cf5c23603a87039b27c4fc59d4250/src/spectral_transform.jl#L279-L335
    b) grid to spectral
    https://github.com/milankl/SpeedyWeather.jl/blob/e1c1e79fe43cf5c23603a87039b27c4fc59d4250/src/spectral_transform.jl#L384-L435

@maximilian-gelbrecht

@milankl milankl added the gpu 🖼️ Everthing GPU related label Aug 10, 2022
@milankl
Copy link
Member Author

milankl commented Aug 10, 2022

Comments on the examples

  • 1a, 1b, 2c are good places to start as I do not expect these functions to change with anything that's on our general todo list.
  • 2a and 2b might get a slightly different indexing if we move towards more generic grids (octahedral or HEALPix, see A new grid for SpeedyWeather.jl? #112)
  • 3a and 3b currently mix the single index lm and the double index l,m to access the LowerTriangularMatrix, we may want to change that by separating out the loop over the diagonal elements.
  • 4a and 4b still require some reworking as the isodd(l+m) ? -term : term operation isn't optimal and seems to cause performance issues with Float32 (see Float32 vs Float64 performance #106)

@maximilian-gelbrecht
Copy link
Member

maximilian-gelbrecht commented Aug 10, 2022

Thanks Milan!

About the time stepping, so 1b we have to talk separately, at least for the AD compatibility, so should do that as one of the last things. The options are either something along the lines of using Checkpointing.jl, or looking if it possible to use the leapfrog solver from DifferentialEquations.jl

@milankl
Copy link
Member Author

milankl commented Aug 15, 2022

Sure, happy to move the simplicity of that kernel further down

@milankl
Copy link
Member Author

milankl commented Aug 16, 2022

@katharinamaetschke quick test whether you get pinged by github if we @ you in

@milankl
Copy link
Member Author

milankl commented Sep 12, 2024

closed in favour of #575

@milankl milankl closed this as completed Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpu 🖼️ Everthing GPU related
Projects
None yet
Development

No branches or pull requests

2 participants