Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increased interpolate speed #2859

Merged
merged 3 commits into from
Dec 29, 2022
Merged

Increased interpolate speed #2859

merged 3 commits into from
Dec 29, 2022

Conversation

jagoosw
Copy link
Collaborator

@jagoosw jagoosw commented Dec 22, 2022

Hi all,

I noticed that some of the array indexing in interpolate weren't @inbounds'ed so added them, and then also noticed that issorted tooke a good fraction of the time when interpolating fields on irregular grids so I removed that since the funciton is only used internally and is always going to be given sorted vectors.

This results in a ~5x speedup so seems worth it.

@jagoosw jagoosw added the performance 🏍️ So we can get the wrong answer even faster label Dec 22, 2022
@navidcy
Copy link
Collaborator

navidcy commented Dec 26, 2022

Sounds great! Can you post here the code with which you benchmarked it?

@jagoosw
Copy link
Collaborator Author

jagoosw commented Dec 27, 2022

Yep!

using Oceananigans, BenchmarkTools

using Oceananigans.Fields: interpolate

# Regularly spaced grids are interpolated differently
grid = RectilinearGrid(size=(100, 100, 100), x = [i^1.1 for i in 1:101], y = [i^1.2 for i in 1:101], z = [i^2 for i in 1:101])

field = CenterField(grid)

@benchmark interpolate(field, rand(), rand(), -rand())
BenchmarkTools.Trial: 10000 samples with 181 evaluations.
 Range (min … max):  584.481 ns …  26.258 μs  ┊ GC (min … max): 0.00% … 97.60%
 Time  (median):     612.343 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   634.835 ns ± 300.484 ns  ┊ GC (mean ± σ):  0.40% ±  0.98%

  ▆██▄▂▁                                                        ▂
  ██████▇▇▆▆▅▅▅▅▆▅▄▄▅▃▁▄▃▁▁▃▁▃▁▃▁▃▁▄▃▁▁▃▃▁▁▃▃▁▁▃▁▁▁▁▁▁▁▃▄▄▁▁▃▃█ █
  584 ns        Histogram: log(frequency) by time       1.49 μs <

 Memory estimate: 64 bytes, allocs estimate: 4.

Adding the inbounding improves this to:

BenchmarkTools.Trial: 10000 samples with 182 evaluations.
 Range (min … max):  577.841 ns …  28.209 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     605.769 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   625.685 ns ± 444.392 ns  ┊ GC (mean ± σ):  0.42% ± 0.98%

        █▃                                                       
  ▄▄▄▄▄▅██▆▅▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▂
  578 ns           Histogram: frequency by time          839 ns <

 Memory estimate: 64 bytes, allocs estimate: 4.

And removing the issorted check improves a lot more to:

BenchmarkTools.Trial: 10000 samples with 911 evaluations.
 Range (min … max):  118.231 ns …  11.848 μs  ┊ GC (min … max): 0.00% … 98.73%
 Time  (median):     125.412 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   135.838 ns ± 198.933 ns  ┊ GC (mean ± σ):  2.87% ±  2.40%

   █▅                                                            
  ▆██▆▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▁▂▂▂▁▁▂▂▂▁▁▂▂▁▂▁▂▂▁▂▂▁▂▁▂▁▂▁▁▂▂ ▂
  118 ns           Histogram: frequency by time          324 ns <

 Memory estimate: 64 bytes, allocs estimate: 4.

@jagoosw
Copy link
Collaborator Author

jagoosw commented Dec 27, 2022

For reference, with the same sized grid but regularly spaced the same benchmark gives this:

BenchmarkTools.Trial: 10000 samples with 916 evaluations.
 Range (min … max):  116.721 ns …   6.288 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     123.181 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   131.844 ns ± 171.767 ns  ┊ GC (mean ± σ):  2.51% ± 2.39%

        █                                                        
  ▄▃▃▃▃▃█▅▄▅▇▇▄▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▂▂▂▁▂▂▂▂ ▂
  117 ns           Histogram: frequency by time          165 ns <

 Memory estimate: 64 bytes, allocs estimate: 4.

@jagoosw
Copy link
Collaborator Author

jagoosw commented Dec 27, 2022

Also the reference link is dead and not archived anywhere, but I'm pretty sure its this code, from this article

Copy link
Collaborator

@navidcy navidcy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually more than what it claims!
Not only it's faster, but now it also works on GPUs! (It wasn't working before).

@navidcy
Copy link
Collaborator

navidcy commented Dec 28, 2022

@jagoosw let's update the link? @glwagner was it you that wrote this? Is the links that @jagoosw points to above correct?

@navidcy navidcy added the GPU 👾 Where Oceananigans gets its powers from label Dec 28, 2022
@navidcy navidcy changed the title Increased interpolate speed Increased interpolate speed + avoid scalar operations Dec 28, 2022
@navidcy
Copy link
Collaborator

navidcy commented Dec 28, 2022

@glwagner was there any thinking behind having issorted there?

@navidcy navidcy changed the title Increased interpolate speed + avoid scalar operations Increased interpolate speed Dec 28, 2022
@navidcy
Copy link
Collaborator

navidcy commented Dec 28, 2022

It's actually more than what it claims!
Not only it's faster, but now it also works on GPUs! (It wasn't working before).

I must have been hallucinating or did something wrong because still interpolate requires scalar operations... :)

@navidcy
Copy link
Collaborator

navidcy commented Dec 28, 2022

@jagoosw I think we should update the link and merge

@glwagner
Copy link
Member

I didn't write it.

interpolate(field, x, y, z)

this is a high-level function; it isn't for use on the GPU. I don't know it's purpose.

@jagoosw
Copy link
Collaborator Author

jagoosw commented Dec 29, 2022

I didn't write it.

interpolate(field, x, y, z)

this is a high-level function; it isn't for use on the GPU. I don't know it's purpose.

I think the only place it is used is in Lagrangian Particle tracking, for all particles it is used to get the velocity:

@inline function update_particle_position!(particles, p, restitution, grid::AbstractGrid{FT, TX, TY, TZ}, Δt, velocities) where {FT, TX, TY, TZ}
    # Advect particles using forward Euler.
    @inbounds u = interpolate(velocities.u, Face(), Center(), Center(), grid, particles.x[p], particles.y[p], particles.z[p]) 
    @inbounds v = interpolate(velocities.v, Center(), Face(), Center(), grid, particles.x[p], particles.y[p], particles.z[p]) 
    @inbounds w = interpolate(velocities.w, Center(), Center(), Face(), grid, particles.x[p], particles.y[p], particles.z[p]) 

And is also used to interpolate tracked fields. So it would be useful if this worked on GPUs.

@jagoosw
Copy link
Collaborator Author

jagoosw commented Dec 29, 2022

Should I merge?

@navidcy
Copy link
Collaborator

navidcy commented Dec 29, 2022

Go for it

@jagoosw jagoosw merged commit 7debded into main Dec 29, 2022
@navidcy navidcy deleted the jsw/faster-interpolation branch December 29, 2022 17:26
@glwagner
Copy link
Member

This function:

interpolate(field, x, y, z)

isn't used for Lagrangian particle tracking.

That's this function:

@inline function interpolate(field, LX, LY, LZ, grid, x, y, z)

which is supposed to be GPU friendly.

@jagoosw
Copy link
Collaborator Author

jagoosw commented Dec 30, 2022

I checked and interpolate(field, x, y, z) isn't used anywhere in the source code but in a few validation experiment. It basically does the same thing, but extracts the fields location and grid. I've tried changing it to just be a wrapper for interpolate(field, LX, LY, LZ, grid, x, y, z) like:

@inline interpolate(field::AbstractField{LX, LY, LZ, G, T, N}, x, y, z) where {LX, LY, LZ, G, T, N} = interpolate(field, LX(), LY(), LZ(), G, x, y, z)

but this fails as a dynamic funciton invocation. I also tried changing it to:

@inline function interpolate(field, x, y, z)
    LX, LY, LZ = location(field)
    grid = field.grid
    return interpolate(field, LX(), LY(), LZ(), grid, x, y, z)
end

but this errors with Reason: unsupported call to an unknown function (call to jl_f_getfield), so I'm not sure its going to be straight forward or worthwhile trying to make the high level version work on GPU.

Also, if we want to test interpolation, it always fails on GPU because of scalar indexing if called directly, but if wrapped in a kernel function is fine:

@kernel function test!(field, grid, res, x, y, z)
    n = @index(Global)
    LX, LY, LZ = location(field)
    @inbounds res[n] = interpolate(field, Center(), Center(), Center(), grid, x[n], y[n], z[n])
end

(If I put grid = field.grid in the kernel function it also fails like above).

@glwagner
Copy link
Member

glwagner commented Jan 3, 2023

but this errors with Reason: unsupported call to an unknown function (call to jl_f_getfield), so I'm not sure its going to be straight forward or worthwhile trying to make the high level version work on GPU.

I think that's because within a kernal on the GPU, field in

@inline function interpolate(field, x, y, z)
    LX, LY, LZ = location(field)
    grid = field.grid
    return interpolate(field, LX(), LY(), LZ(), grid, x, y, z)
end

is an OffsetArray and has no property grid (or a location).

@glwagner
Copy link
Member

glwagner commented Jan 3, 2023

The function signature

@inline function interpolate(field, LX, LY, LZ, grid, x, y, z)

is confusing: field can be either an array (eg data(field::Field)) or field::Field itself. Perhaps this should be documented or even the name changed field -> data or field_data.

@navidcy
Copy link
Collaborator

navidcy commented Jan 3, 2023

Interesting. I see that error a lot in #2782

@navidcy
Copy link
Collaborator

navidcy commented Jan 3, 2023

If it’s an array how will field.grid work?

@glwagner
Copy link
Member

glwagner commented Jan 3, 2023

It wont. We can't write field.grid on the GPU.

@navidcy
Copy link
Collaborator

navidcy commented Jan 3, 2023

I’m confused though. Is the field struct different when it lives on GPU? No, right?

@glwagner
Copy link
Member

glwagner commented Jan 3, 2023

Why do we want interpolate(field, x, y, z) to work on the GPU?

@navidcy
Copy link
Collaborator

navidcy commented Jan 3, 2023

We might not. Where is interpolate used?

@glwagner
Copy link
Member

glwagner commented Jan 3, 2023

Yes, most containers are "adapted" on the GPU. We adapt field to field.data:

Adapt.adapt_structure(to, f::Field) = Adapt.adapt(to, f.data)

And note that field.data will be further adapted. For example, CuArray becomes CuDeviceArray.

@navidcy
Copy link
Collaborator

navidcy commented Jan 3, 2023

Yes, most containers are "adapted" on the GPU. We adapt field to field.data:

Adapt.adapt_structure(to, f::Field) = Adapt.adapt(to, f.data)

And note that field.data will be further adapted. For example, CuArray becomes CuDeviceArray.

GOTHCA!

@navidcy
Copy link
Collaborator

navidcy commented Jan 3, 2023

OK, may I suggest the following:

If interpolate(field, x, y, z) would only work for CPU fields then let's type it, e.g. interpolate(field::Field{CPU}, x, y, z) and also make a note in the docstring! Otherwise people like e.g. myself prior of this conversation, would have no idea.

@glwagner
Copy link
Member

glwagner commented Jan 3, 2023

Another option is to delete it.

I don't know about the type restriction. GPU is fine with @allowscalar. The point is that we can't use it inside a kernel. But if you just want a single value you're ok with scalar operations.

@jagoosw
Copy link
Collaborator Author

jagoosw commented Jan 4, 2023

We might not. Where is interpolate used?

Its not used anywhere in the source code but in quite a few validation scripts

@jagoosw
Copy link
Collaborator Author

jagoosw commented Jan 4, 2023

but this errors with Reason: unsupported call to an unknown function (call to jl_f_getfield), so I'm not sure its going to be straight forward or worthwhile trying to make the high level version work on GPU.

I think that's because within a kernal on the GPU, field in

@inline function interpolate(field, x, y, z)
    LX, LY, LZ = location(field)
    grid = field.grid
    return interpolate(field, LX(), LY(), LZ(), grid, x, y, z)
end

is an OffsetArray and has no property grid (or a location).

Ah this makes sense, thank you

@glwagner
Copy link
Member

glwagner commented Jan 4, 2023

We might not. Where is interpolate used?

Its not used anywhere in the source code but in quite a few validation scripts

Can you point us to a few?

@jagoosw
Copy link
Collaborator Author

jagoosw commented Jan 5, 2023

I think these are the only two instances

function interp_values(xvec, Us, Vs, Ws)
xI = xvec; # forced node is the x argument
uI = interpolate(Us, xI[1], xI[2], xI[3])
vI = interpolate(Vs, xI[1], xI[2], xI[3])
wI = interpolate(Ws, xI[1], xI[2], xI[3])
vels = [uI; vI; wI]
return vels
end

@inline function interpolate_fluxes(old_array, Nx_old, Ny_old, Nx_new, Ny_new)
old_grid = LatitudeLongitudeGrid(size = (Nx_old, Ny_old, 1), latitude = (-80, 80), longitude = (-180, 180), z = (0, 1))
new_grid = LatitudeLongitudeGrid(size = (Nx_new, Ny_new, 1), latitude = (-80, 80), longitude = (-180, 180), z = (0, 1))
old_field = Field{Center, Center, Center}(old_grid)
set!(old_field, old_array)
new_array = zeros(Nx_new, Ny_new)
for i in 1:Nx_new, j in 1:Ny_new
new_array[i, j] = interpolate(old_field, new_grid.λᶜᵃᵃ[i], new_grid.φᵃᶜᵃ[j], old_grid.zᵃᵃᶜ[1])
end
return new_array
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GPU 👾 Where Oceananigans gets its powers from performance 🏍️ So we can get the wrong answer even faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants