-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increased interpolate
speed
#2859
Conversation
Sounds great! Can you post here the code with which you benchmarked it? |
Yep! using Oceananigans, BenchmarkTools
using Oceananigans.Fields: interpolate
# Regularly spaced grids are interpolated differently
grid = RectilinearGrid(size=(100, 100, 100), x = [i^1.1 for i in 1:101], y = [i^1.2 for i in 1:101], z = [i^2 for i in 1:101])
field = CenterField(grid)
@benchmark interpolate(field, rand(), rand(), -rand()) BenchmarkTools.Trial: 10000 samples with 181 evaluations.
Range (min … max): 584.481 ns … 26.258 μs ┊ GC (min … max): 0.00% … 97.60%
Time (median): 612.343 ns ┊ GC (median): 0.00%
Time (mean ± σ): 634.835 ns ± 300.484 ns ┊ GC (mean ± σ): 0.40% ± 0.98%
▆██▄▂▁ ▂
██████▇▇▆▆▅▅▅▅▆▅▄▄▅▃▁▄▃▁▁▃▁▃▁▃▁▃▁▄▃▁▁▃▃▁▁▃▃▁▁▃▁▁▁▁▁▁▁▃▄▄▁▁▃▃█ █
584 ns Histogram: log(frequency) by time 1.49 μs <
Memory estimate: 64 bytes, allocs estimate: 4. Adding the inbounding improves this to: BenchmarkTools.Trial: 10000 samples with 182 evaluations.
Range (min … max): 577.841 ns … 28.209 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 605.769 ns ┊ GC (median): 0.00%
Time (mean ± σ): 625.685 ns ± 444.392 ns ┊ GC (mean ± σ): 0.42% ± 0.98%
█▃
▄▄▄▄▄▅██▆▅▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▂
578 ns Histogram: frequency by time 839 ns <
Memory estimate: 64 bytes, allocs estimate: 4. And removing the BenchmarkTools.Trial: 10000 samples with 911 evaluations.
Range (min … max): 118.231 ns … 11.848 μs ┊ GC (min … max): 0.00% … 98.73%
Time (median): 125.412 ns ┊ GC (median): 0.00%
Time (mean ± σ): 135.838 ns ± 198.933 ns ┊ GC (mean ± σ): 2.87% ± 2.40%
█▅
▆██▆▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▁▂▂▂▁▁▂▂▂▁▁▂▂▁▂▁▂▂▁▂▂▁▂▁▂▁▂▁▁▂▂ ▂
118 ns Histogram: frequency by time 324 ns <
Memory estimate: 64 bytes, allocs estimate: 4. |
For reference, with the same sized grid but regularly spaced the same benchmark gives this: BenchmarkTools.Trial: 10000 samples with 916 evaluations.
Range (min … max): 116.721 ns … 6.288 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 123.181 ns ┊ GC (median): 0.00%
Time (mean ± σ): 131.844 ns ± 171.767 ns ┊ GC (mean ± σ): 2.51% ± 2.39%
█
▄▃▃▃▃▃█▅▄▅▇▇▄▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▂▂▂▁▂▂▂▂ ▂
117 ns Histogram: frequency by time 165 ns <
Memory estimate: 64 bytes, allocs estimate: 4. |
Also the reference link is dead and not archived anywhere, but I'm pretty sure its this code, from this article |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually more than what it claims!
Not only it's faster, but now it also works on GPUs! (It wasn't working before).
interpolate
speedinterpolate
speed + avoid scalar operations
@glwagner was there any thinking behind having |
interpolate
speed + avoid scalar operationsinterpolate
speed
I must have been hallucinating or did something wrong because still |
98c464e
to
ae3a6df
Compare
@jagoosw I think we should update the link and merge |
I didn't write it. Oceananigans.jl/src/Fields/interpolate.jl Line 116 in ae3a6df
this is a high-level function; it isn't for use on the GPU. I don't know it's purpose. |
I think the only place it is used is in Lagrangian Particle tracking, for all particles it is used to get the velocity:
And is also used to interpolate tracked fields. So it would be useful if this worked on GPUs. |
Should I merge? |
Go for it |
This function: Oceananigans.jl/src/Fields/interpolate.jl Line 116 in ae3a6df
isn't used for Lagrangian particle tracking. That's this function: Oceananigans.jl/src/Fields/interpolate.jl Line 143 in ae3a6df
which is supposed to be GPU friendly. |
I checked and @inline interpolate(field::AbstractField{LX, LY, LZ, G, T, N}, x, y, z) where {LX, LY, LZ, G, T, N} = interpolate(field, LX(), LY(), LZ(), G, x, y, z) but this fails as a dynamic funciton invocation. I also tried changing it to:
but this errors with Also, if we want to test interpolation, it always fails on GPU because of scalar indexing if called directly, but if wrapped in a kernel function is fine: @kernel function test!(field, grid, res, x, y, z)
n = @index(Global)
LX, LY, LZ = location(field)
@inbounds res[n] = interpolate(field, Center(), Center(), Center(), grid, x[n], y[n], z[n])
end (If I put |
I think that's because within a kernal on the GPU, @inline function interpolate(field, x, y, z)
LX, LY, LZ = location(field)
grid = field.grid
return interpolate(field, LX(), LY(), LZ(), grid, x, y, z)
end is an OffsetArray and has no property |
The function signature @inline function interpolate(field, LX, LY, LZ, grid, x, y, z) is confusing: |
Interesting. I see that error a lot in #2782 |
If it’s an array how will field.grid work? |
It wont. We can't write |
I’m confused though. Is the field struct different when it lives on GPU? No, right? |
Why do we want |
We might not. Where is interpolate used? |
Yes, most containers are "adapted" on the GPU. We adapt Oceananigans.jl/src/Fields/field.jl Line 394 in 7debded
And note that |
GOTHCA! |
OK, may I suggest the following: If |
Another option is to delete it. I don't know about the type restriction. |
Its not used anywhere in the source code but in quite a few validation scripts |
Ah this makes sense, thank you |
Can you point us to a few? |
I think these are the only two instances Oceananigans.jl/validation/immersed_boundaries/cylinder_flow_with_tracer.jl Lines 243 to 250 in 7debded
Oceananigans.jl/validation/near_global_lat_lon/cyclic_interpolate_utils.jl Lines 18 to 30 in 7debded
|
Hi all,
I noticed that some of the array indexing in
interpolate
weren't@inbounds
'ed so added them, and then also noticed thatissorted
tooke a good fraction of the time when interpolating fields on irregular grids so I removed that since the funciton is only used internally and is always going to be given sorted vectors.This results in a ~5x speedup so seems worth it.