Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static GPU compilation of Jacobian #129

Merged
merged 1 commit into from
Feb 16, 2024
Merged

Conversation

avik-pal
Copy link
Member

@avik-pal avik-pal commented Feb 15, 2024

using KernelAbstractions, CUDA, NonlinearSolve, StaticArrays

@kernel function parallel_nonlinearsolve_kernel!(result, @Const(prob), @Const(alg))
    i = @index(Global)
    prob_i = remake(prob; u0 = prob.u0[i])
    @inbounds result[i] = solve(prob_i, alg).u
    return nothing
end

function vectorized_solve(prob, alg)
    backend = get_backend(prob.u0)
    result = KernelAbstractions.zeros(backend, prob.u0)
end

@generated function generalized_rosenbrock(x::SVector{N}, p) where {N}
    vals = ntuple(gensym  string, N)
    expr = []
    push!(expr, :($(vals[1]) = oneunit(x[1]) - x[1]))
    for i in 2:N
        push!(expr, :($(vals[i]) = 10.0 * (x[$i] - x[$i - 1] * x[$i - 1])))
    end
    push!(expr, :(@SVector [$(vals...)]))
    return Expr(:block, expr...)
end

u0 = @SVector [@SVector(rand(10)) for _ in 1:16]
prob = NonlinearProblem(generalized_rosenbrock, u0)

backend = CUDABackend()#  get_backend(u0)
result = KernelAbstractions.zeros(backend, eltype(u0), length(u0))

kernel! = parallel_nonlinearsolve_kernel!(backend)
kernel!(result, prob, SimpleNewtonRaphson(; autodiff = AutoForwardDiff(; chunksize = 10));
    ndrange = length(u0))
kernel!(result, prob, SimpleNewtonRaphson(; autodiff = AutoForwardDiff(; chunksize = 10));
    ndrange = length(u0))

kernel!(result, prob, SimpleDFSane(); ndrange = length(u0))

prob_i = remake(prob; u0 = prob.u0[1])

@profview_allocs for i in 1:100000
    solve(prob_i, SimpleNewtonRaphson(; autodiff = AutoForwardDiff(; chunksize = 10)))
end

using AllocCheck

@check_allocs function noalloc_solve(prob, alg)
    return solve(prob, alg)
end

noalloc_solve(prob_i, SimpleNewtonRaphson(; autodiff = AutoForwardDiff(; chunksize = 10)))

Copy link

codecov bot commented Feb 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (4ed9e45) 90.22% compared to head (f2149a9) 90.26%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #129      +/-   ##
==========================================
+ Coverage   90.22%   90.26%   +0.04%     
==========================================
  Files          20       20              
  Lines        1217     1223       +6     
==========================================
+ Hits         1098     1104       +6     
  Misses        119      119              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@avik-pal
Copy link
Member Author

avik-pal commented Feb 15, 2024

using KernelAbstractions, CUDA, AMDGPU, NonlinearSolve, StaticArrays

@kernel function parallel_nonlinearsolve_kernel!(result, @Const(prob), @Const(alg))
    i = @index(Global)
    prob_i = remake(prob; u0 = prob.u0[i])
    sol = solve(prob_i, alg)
    @inbounds result[i] = sol.u
    return nothing
end

function vectorized_solve(prob, alg; backend = CPU())
    result = KernelAbstractions.allocate(backend, eltype(prob.u0), length(prob.u0))

    groupsize = min(length(prob.u0), 1024)

    kernel! = parallel_nonlinearsolve_kernel!(backend, groupsize, length(prob.u0))
    kernel!(result, prob, alg)
    KernelAbstractions.synchronize(backend)

    return result
end

@generated function generalized_rosenbrock(x::SVector{N}, p) where {N}
    vals = ntuple(gensym  string, N)
    expr = []
    push!(expr, :($(vals[1]) = oneunit(x[1]) - x[1]))
    for i in 2:N
        push!(expr, :($(vals[i]) = 10.0 * (x[$i] - x[$i - 1] * x[$i - 1])))
    end
    push!(expr, :(@SVector [$(vals...)]))
    return Expr(:block, expr...)
end

u0 = @SVector [@SVector(rand(10)) for _ in 1:1024]
prob = NonlinearProblem(generalized_rosenbrock, u0)

vectorized_solve(prob, SimpleNewtonRaphson(); backend = CPU())

vectorized_solve(prob, SimpleNewtonRaphson(); backend = ROCBackend())

vectorized_solve(prob, SimpleNewtonRaphson(); backend = CUDABackend())

The CPU one is not working correctly atm, I will test it later.

@avik-pal
Copy link
Member Author

@utkarsh530 can you take a quick look and tell me why the CPU() one is not giving me correct results but all the other ones are?

@avik-pal avik-pal changed the title [WIP] Static GPU compilation of Jacobian Static GPU compilation of Jacobian Feb 15, 2024
@avik-pal
Copy link
Member Author

we can get the example working later but this is good to go.

@ChrisRackauckas ChrisRackauckas merged commit 8995a23 into main Feb 16, 2024
16 of 19 checks passed
@ChrisRackauckas ChrisRackauckas deleted the ap/static_jacobian branch February 16, 2024 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants