Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with iteration #1310

Open
jgreener64 opened this issue Feb 27, 2024 · 2 comments
Open

Segfault with iteration #1310

jgreener64 opened this issue Feb 27, 2024 · 2 comments
Labels
gc garbage collection

Comments

@jgreener64
Copy link
Contributor

I am on Enzyme main (e8ede63), StaticArrays 1.9.2, Julia 1.10.1 and Linux. The following works when n_atoms = 200 but gives a segfault for higher values like 1000. It could be an OOM error but ideally it wouldn't segfault. Sometimes to trigger the segfault I had to run with julia -i and it would crash when I tried to type something into the REPL.

using Enzyme, StaticArrays, LinearAlgebra

const T = Float64

struct StillingerWeber{T}
    A::T
    B::T
    p::Int
    q::Int
    a::T
    λ::T
    γ::T
    σ::T
    ϵ::T
end

Base.zero(::StillingerWeber) = StillingerWeber{T}(0.0, 0.0, 0, 0, 0.0, 0.0, 0.0, 0.0, 0.0)

function Base.:+(sw1::StillingerWeber, sw2::StillingerWeber)
    StillingerWeber(
        sw1.A + sw2.A,
        sw1.B + sw2.B,
        sw1.p + sw2.p,
        sw1.q + sw2.q,
        sw1.a + sw2.a,
        sw1.λ + sw2.λ,
        sw1.γ + sw2.γ,
        sw1.σ + sw2.σ,
        sw1.ϵ + sw2.ϵ,
    )
end

sw = StillingerWeber{T}(7.049556277, 0.6022245584, 4, 0, 1.80, 21.0, 1.20, 0.14, 200.0)

function forces!(fs_chunks, sw, coords, neighbors, n_threads)
    A, B, p, q, a, λ, γ, σ, ϵ = sw.A, sw.B, sw.p, sw.q, sw.a, sw.λ, sw.γ, sw.σ, sw.ϵ
    n_atoms = length(coords)

    n_chunks = n_threads
    Threads.@threads for chunk_i in 1:n_chunks
        @inbounds for ni in chunk_i:n_chunks:length(neighbors)
            i, j = neighbors[ni]
            dr = coords[i] - coords[j]
            rij = norm(dr)
            r = rij / σ
            if r < a
                df_dr = -A * exp(inv(r - a)) * (B * p * inv(r^(p+1)) + (B * inv(r^p) - 1) * inv((r - a)^2))
                f = -ϵ * df_dr * inv(σ)
                fdr = f * dr / rij
                fs_chunks[chunk_i][i] -= fdr
                fs_chunks[chunk_i][j] += fdr
            end

            trip_cutoff = a * σ
            trip_cutoff_2 = 2 * trip_cutoff
            for k in (j + 1):n_atoms
                dr_ik = coords[i] - coords[k]
                norm_dr_ik = norm(dr_ik)
                (norm_dr_ik > trip_cutoff_2) && continue
                dr_jk = coords[j] - coords[k]
                norm_dr_jk = norm(dr_jk)
                ((norm_dr_ik < trip_cutoff) || (norm_dr_jk < trip_cutoff)) || continue
                dr_ij, norm_dr_ij = dr, rij
                ndr_ij, ndr_ik, ndr_jk = dr_ij / norm_dr_ij, dr_ik / norm_dr_ik, dr_jk / norm_dr_jk
                dot_ij_ik, dot_ji_jk, dot_ki_kj = dot(dr_ij, dr_ik), dot(-dr_ij, dr_jk), dot(dr_ik, dr_jk)
                rij_trip = r
                rik_trip = norm_dr_ik / σ
                rjk_trip = norm_dr_jk / σ

                if rij_trip < a && rik_trip < a
                    cos_θ_jik = dot_ij_ik / (norm_dr_ij * norm_dr_ik)
                    exp_term = exp* inv(rij_trip - a) + γ * inv(rik_trip - a))
                    cos_term = cos_θ_jik + T(1/3)
                    dh_term = λ * cos_term^2 * -γ * exp_term
                    dh_drij = inv((rij_trip - a)^2) * dh_term
                    dh_drik = inv((rik_trip - a)^2) * dh_term
                    dh_dcosθjik = 2 * λ * exp_term * cos_term
                    dcosθ_drj = (dr_ik * norm_dr_ij^2 - dr_ij * dot_ij_ik) / (norm_dr_ij^3 * norm_dr_ik)
                    dcosθ_drk = (dr_ij * norm_dr_ik^2 - dr_ik * dot_ij_ik) / (norm_dr_ik^3 * norm_dr_ij)
                    fj = -ϵ * (dh_drij * ndr_ij / σ + dh_dcosθjik * dcosθ_drj)
                    fk = -ϵ * (dh_drik * ndr_ik / σ + dh_dcosθjik * dcosθ_drk)
                    fs_chunks[chunk_i][i] -= fj + fk
                    fs_chunks[chunk_i][j] += fj
                    fs_chunks[chunk_i][k] += fk
                end
                if rij_trip < a && rjk_trip < a
                    cos_θ_ijk = dot_ji_jk / (norm_dr_ij * norm_dr_jk)
                    exp_term = exp* inv(rij_trip - a) + γ * inv(rjk_trip - a))
                    cos_term = cos_θ_ijk + T(1/3)
                    dh_term = λ * cos_term^2 * -γ * exp_term
                    dh_drij = inv((rij_trip - a)^2) * dh_term
                    dh_drjk = inv((rjk_trip - a)^2) * dh_term
                    dh_dcosθijk = 2 * λ * exp_term * cos_term
                    dcosθ_dri = ( dr_jk * norm_dr_ij^2 + dr_ij * dot_ji_jk) / (norm_dr_ij^3 * norm_dr_jk)
                    dcosθ_drk = (-dr_ij * norm_dr_jk^2 - dr_jk * dot_ji_jk) / (norm_dr_jk^3 * norm_dr_ij)
                    fi = -ϵ * (dh_drij * -ndr_ij / σ + dh_dcosθijk * dcosθ_dri)
                    fk = -ϵ * (dh_drjk *  ndr_jk / σ + dh_dcosθijk * dcosθ_drk)
                    fs_chunks[chunk_i][j] -= fi + fk
                    fs_chunks[chunk_i][i] += fi
                    fs_chunks[chunk_i][k] += fk
                end
                if rik_trip < a && rjk_trip < a
                    cos_θ_ikj = dot_ki_kj / (norm_dr_ik * norm_dr_jk)
                    exp_term = exp* inv(rik_trip - a) + γ * inv(rjk_trip - a))
                    cos_term = cos_θ_ikj + T(1/3)
                    dh_term = λ * cos_term^2 * -γ * exp_term
                    dh_drik = inv((rik_trip - a)^2) * dh_term
                    dh_drjk = inv((rjk_trip - a)^2) * dh_term
                    dh_dcosθikj = 2 * λ * exp_term * cos_term
                    dcosθ_dri = (-dr_jk * norm_dr_ik^2 + dr_ik * dot_ki_kj) / (norm_dr_ik^3 * norm_dr_jk)
                    dcosθ_drj = (-dr_ik * norm_dr_jk^2 + dr_jk * dot_ki_kj) / (norm_dr_jk^3 * norm_dr_ik)
                    fi = -ϵ * (dh_drik * -ndr_ik / σ + dh_dcosθikj * dcosθ_dri)
                    fj = -ϵ * (dh_drjk * -ndr_jk / σ + dh_dcosθikj * dcosθ_drj)
                    fs_chunks[chunk_i][k] -= fi + fj
                    fs_chunks[chunk_i][i] += fi
                    fs_chunks[chunk_i][j] += fj
                end
            end
        end
    end

    return nothing
end

n_atoms = 1000
coords = rand(SVector{3, T}, n_atoms) .* T(1.784)
neighbors = Tuple{Int, Int}[]
for i in 1:n_atoms, j in (i+1):n_atoms
    if norm(coords[i] - coords[j]) < T(0.6)
        push!(neighbors, (i, j))
    end
end

function forces(sw, coords, neighbors, n_threads)
    fs_chunks = [zero(coords) for _ in 1:n_threads]
    forces!(fs_chunks, sw, coords, neighbors, n_threads)
    return sum(fs_chunks)
end

n_threads = Threads.nthreads()
println(forces(sw, coords, neighbors, n_threads))
println("Got here")

fs_chunks = [zero(coords) for _ in 1:n_threads]
d_fs = rand(SVector{3, T}, length(coords))
d_fs_chunks = [deepcopy(d_fs) for _ in 1:n_threads]
d_coords = zero(coords)

grads = autodiff(
    Enzyme.Reverse,
    forces!,
    Const,
    Duplicated(fs_chunks, d_fs_chunks),
    Active(sw),
    Duplicated(coords, d_coords),
    Const(neighbors),
    Const(n_threads),
)[1]
[2836832] signal (11.1): Segmentation fault
in expression starting at /home/jgreener/dms/molly_dev/enzyme_err33.jl:154
Allocations: 58379556 (Pool: 58232441; Big: 147115); GC: 69
Segmentation fault (core dumped)

The printall_error.txt is attached.

@jgreener64 jgreener64 changed the title Segfault with threading Segfault with iteration Feb 28, 2024
@jgreener64
Copy link
Contributor Author

Actually this segfaults without Threads.@threads too.

@wsmoses
Copy link
Member

wsmoses commented May 12, 2024




[1455207] signal (11.1): Segmentation fault
in expression starting at REPL[21]:1
gc_setmark_pool_ at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gc.c:875 [inlined]
gc_setmark_pool at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gc.c:895 [inlined]
gc_setmark at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gc.c:902 [inlined]
gc_mark_outrefs at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gc.c:2617 [inlined]
gc_mark_loop_serial_ at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gc.c:2690
gc_mark_loop_serial at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gc.c:2713
gc_mark_loop at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gc.c:2908 [inlined]
_jl_gc_collect at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gc.c:3241
ijl_gc_collect at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gc.c:3538
maybe_collect at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gc.c:937 [inlined]
jl_gc_big_alloc_inner at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gc.c:1008 [inlined]
jl_gc_big_alloc_noinline at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gc.c:1045 [inlined]
jl_gc_alloc_ at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia_internal.h:481 [inlined]
jl_gc_alloc at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gc.c:3590
macro expansion at /home/wmoses/.julia/packages/StaticArrays/EHHaF/src/linalg.jl:278 [inlined]
_norm at /home/wmoses/.julia/packages/StaticArrays/EHHaF/src/linalg.jl:266 [inlined]
norm at /home/wmoses/.julia/packages/StaticArrays/EHHaF/src/linalg.jl:265 [inlined]
macro expansion at ./REPL[8]:24 [inlined]
#14#threadsfor_fun#1 at ./threadingconstructs.jl:215
#14#threadsfor_fun at ./threadingconstructs.jl:182 [inlined]
referenceCaller at /home/wmoses/git/Enzyme.jl/src/rules/parallelrules.jl:45 [inlined]
augmented_julia_referenceCaller_5398wrap at /home/wmoses/git/Enzyme.jl/src/rules/parallelrules.jl:0
macro expansion at /home/wmoses/git/Enzyme.jl/src/compiler.jl:5703 [inlined]
enzyme_call at /home/wmoses/git/Enzyme.jl/src/compiler.jl:5381 [inlined]
AugmentedForwardThunk at /home/wmoses/git/Enzyme.jl/src/compiler.jl:5271 [inlined]
fwd at /home/wmoses/git/Enzyme.jl/src/rules/parallelrules.jl:79 [inlined]
#1 at ./threadingconstructs.jl:154
unknown function (ip: 0x7f529e983e79)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/task.c:1238
Allocations: 65310385 (Pool: 65202878; Big: 107507); GC: 68
Segmentation fault (core dumped)

@wsmoses wsmoses added the gc garbage collection label May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gc garbage collection
Projects
None yet
Development

No branches or pull requests

2 participants