`EnsembleGPUKernel `+ Texture Memory Support

@ChrisRackauckas @utkarsh530 

Per our conversation here is an example demonstrating how texture memory interpolation could be use.

I would like to be able to leverage CUDA.jl's texture memory support for interpolation of data in the EOM and/or in a callback. A use case could be dropping a ball in a wind field with ground impact termination for a non-flat terrain. Here, one would want to interpolate the wind field as a function of state in the eom as a forcing term and an elevation map as a function of altitude. 

Below is an initial prototype. This includes a CPU implementation that leverages DataInterpolations.jl to demonstrate the functionality desired using this data [forecast.txt](https://github.com/SciML/DiffEqGPU.jl/files/10469752/forecast.txt) I also included an initial non-working prototype using texture memory. 

# No interpolation 
Working model for CPU and GPU w/o interpolation

```julia
using CUDA, DiffEqGPU, OrdinaryDiffEq, Plots, Serialization, StaticArrays, Distributions, LinearAlgebra
import DataInterpolations
const DI = DataInterpolations

# Ballistic model
function ballistic(u, p, t, weather)
    CdS, mass, g = p
    vel = @view u[4:6]

    wind, ρ = get_weather(weather, u[3])

    airvelocity = vel - wind
    airspeed = norm(airvelocity)
    accel = -(ρ * CdS * airspeed) / (2 * mass) * airvelocity - mass*SVector{3}(0f0, 0f0, g)

    return SVector{6}(vel..., accel...)
end

# constant wind and airdensity.
function get_weather(weather, z)
    wind = SVector{3}(1f0, 0f0, 0f0)
    ρ = 1.27f0
    wind, ρ
end

trajectories = 100
u0 = @SVector [0.0f0, 0.0f0, 10000.0f0, 0f0, 0f0, 0f0]
tspan = (0.0f0, 50.0f0)
saveat = LinRange(tspan..., 100)
p = @SVector [25f0, 225f0, 9.807f0]
CdS_dist = Normal(0f0, 1f0)

eom_nowind = (u, p, t) -> ballistic(u,p,t,nothing)
prob = ODEProblem{false}(eom_nowind, u0, tspan, p)
sol = solve(prob, Tsit5())

prob_func = (prob, i, repeat) -> remake(prob, p = p + SVector{3}(rand(CdS_dist), 0f0, 0f0))
eprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false)

esol = solve(eprob, Tsit5(); trajectories, saveat)
esol_gpu = solve(eprob, GPUTsit5(), EnsembleGPUKernel(); trajectories, saveat)
```

# DataInterpolations.jl CPU Example
This demonstrates the basic capability I would like to replicate in w/ `EnsembleGPUKernel` using `CUDA.CuTexture`

```julia
# Ballistic w/ Interpolated winds via DataInterpolations.jl
data = deserialize("forecast.txt")
N = length(data.altitude)

weather_sa = map(data.altitude, data.windx, data.windy, data.density) do alt, wx, wy, ρ
    SVector{4}(alt, wx, wy, ρ)
end

interp = DI.LinearInterpolation(weather_sa,data.altitude)

function get_weather(itp::DI.LinearInterpolation, z)
    weather = itp(u[3])
    wind = SVector{3}(weather[2], weather[3], 0f0)
    ρ = weather[4]
    wind, ρ
end

eom_di = (u,p,t) -> ballistic(u,p,t,interp)
prob = ODEProblem{false}(eom_di, u0, tspan, p)
sol = solve(prob, Tsit5())

prob_func = (prob, i, repeat) -> remake(prob, p = p + SVector{3}(rand(dist), 0f0, 0f0))
eprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false)
esol = solve(eprob, Tsit5(); trajectories, saveat)
```

# GPU Texture Interpolation Validation
Demonstrate usage of CuTexture for interpolation. Note, here I index into the texture memory by broadcasting over a  `CuArray{Float32}` of indices via `dst_gpu .= getindex.(Ref(texture), idx_tlu)`

```julia
weather = map(weather_sa) do w
    (w...,)
end

weather_TA = CuTextureArray(weather)
texture = CuTexture(weather_TA; address_mode = CUDA.ADDRESS_MODE_CLAMP, normalized_coordinates = true, interpolation = CUDA.LinearInterpolation())

## Test Texture interpolation
idx = LinRange(0f0, 1f0, 4000)
idx_gpu = CuArray(idx)
idx_tlu = (1f0-1f0/N)*idx_gpu .+ 0.5f0/N  # normalized table lookup form https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#table-lookup
dst_gpu = CuArray{NTuple{4, Float32}}(undef, size(idx))
dst_gpu .= getindex.(Ref(texture), idx_tlu)  # interpolation ℝ->ℝ⁴

begin
    p1 = scatter(data.altitude, data.altitude, ylabel = "Altitude", label = "Raw", ms = 2, marker = :x)
    plot!(idx*data.altitude[end], getindex.(dst_cpu, 1), label = "Texture",lw = 2)

    p2 = scatter(data.altitude, data.windx, ylabel = "Wind X", label = "Raw", ms = 2, marker = :x, leg = false)
    plot!(idx*data.altitude[end], getindex.(dst_cpu, 2), label = "Texture",lw = 2)

    p3 = scatter(data.altitude, data.windy, ylabel = "Wind Y", label = "Raw", ms = 2, marker = :x, leg = false)
    plot!(idx*data.altitude[end], getindex.(dst_cpu, 3), label = "Texture",lw = 2)

    p4 = scatter(data.altitude, data.density, ylabel = "Density", label = "Raw", ms = 2, marker = :x, leg = false)
    plot!(idx*data.altitude[end], getindex.(dst_cpu, 4), label = "Texture",lw = 2)

    plot(p1, p2, p3, p4, link=:x, xlabel = "Altitude")
end
```

# `EnsembleGPUKernel` + `CuTexture` prototype
Non-working prototype. Note here the `get_weather` function is indexing the texture at a single index for a single trajectory which isn't supported by `CUDA.jl`. Although this is scalar indexing it should actually be occurring for each trajectory in the ensemble. 

```julia
function get_weather(tex::CuTexture, z, zmax = data.altitude[end], N = length(data.altitude))
    idx = (1f0-1f0/N)*z/zmax + 0.5f0/N # normalized input for table lookup based on https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#table-lookup
    tex[idx]
end

eom_tex = (u,p,t) -> ballistic(u,p,t,texture)
prob = ODEProblem{false}(eom_tex, u0, tspan, p)
prob_func = (prob, i, repeat) -> remake(prob, p = p + SVector{3}(rand(dist), 0f0, 0f0))
eprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false)
esol_gpu = solve(eprob, GPUTsit5(), EnsembleGPUKernel(); trajectories, saveat) #InvalidIR 
```

# `ContinousCallback Prototype`
The above example only does interpolation in the eom. However, interpolation could also occur in evaluating a callback. e.g. something like this

```julia
terrain_texture = CuTexture(...)  # ℝ² -> ℝ, map xy position to ground elevation

condition(u,t,integrator) = u[3] - terrain_texture[@view u[1:2]]. # check height above elevation
cb = ContinuousCallback(condition, terminate!)
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`EnsembleGPUKernel` + Texture Memory Support #224

No interpolation

DataInterpolations.jl CPU Example

GPU Texture Interpolation Validation

`EnsembleGPUKernel` + `CuTexture` prototype

`ContinousCallback Prototype`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

EnsembleGPUKernel + Texture Memory Support #224

Description

No interpolation

DataInterpolations.jl CPU Example

GPU Texture Interpolation Validation

EnsembleGPUKernel + CuTexture prototype

ContinousCallback Prototype

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`EnsembleGPUKernel` + Texture Memory Support #224

`EnsembleGPUKernel` + `CuTexture` prototype

`ContinousCallback Prototype`