Skip to content

Allocations with StaticArrays in Julia v1.11 #1282

@kaipartmann

Description

@kaipartmann

I am the main author of the package Peridynamics.jl and recently changed to Julia v1.11.
With the new Julia version, a significant increase in the simulation time can be found when using a certain material.

Peridynamics.jl Benchmark

Please see the following benchmark, which is also described in more detail in the tutorials (Tutorial mode I fracture).

using Peridynamics

function mode_i(mat, npyz)
    l, Δx, δ, a = 1.0, 1/npyz, 3.015/npyz, 0.5
    pos, vol = uniform_box(l, l, 0.1l, Δx)
    ids = sortperm(pos[2,:])
    body = Body(mat, pos[:, ids], vol[ids])
    material!(body; horizon=3.015Δx, E=2.1e5, nu=0.25, rho=8e-6, Gc=2.7)
    point_set!(p -> p[1]  -l/2+a && 0  p[2]  2δ, body, :set_a)
    point_set!(p -> p[1]  -l/2+a && -2δ  p[2] < 0, body, :set_b)
    precrack!(body, :set_a, :set_b)
    point_set!(p -> p[2] > l/2-Δx, body, :set_top)
    point_set!(p -> p[2] < -l/2+Δx, body, :set_bottom)
    velocity_bc!(t -> -30, body, :set_bottom, :y)
    velocity_bc!(t -> 30, body, :set_top, :y)
    vv = VelocityVerlet(steps=2000)
    job = Job(body, vv; path=joinpath(@__DIR__, "results", "mode_i"))
    @time submit(job)
    return nothing
end

mode_i(NOSBMaterial(), 25) # compilation
mode_i(NOSBMaterial(), 50) # work
❯ julia +1.10.5 --project -t 6 mode_i_benchmark.jl
  0.719477 seconds (1.39 M allocations: 450.082 MiB, 2.77% gc time, 74.16% compilation time)
  6.090774 seconds (1.28 M allocations: 1.118 GiB, 1.06% gc time, 2.17% compilation time)

❯ julia +1.11.1 --project -t 6 mode_i_benchmark.jl
  9.574071 seconds (2.15 G allocations: 82.415 GiB, 15.59% gc time, 6.09% compilation time)
177.261652 seconds (40.34 G allocations: 1.501 TiB, 23.15% gc time)

The same simulation takes 29x longer on v1.11.1 than v1.10.5.

When investigating the problem, I narrowed it down to the calculation of the deformation gradient:
https://github.com/kaipartmann/Peridynamics.jl/blob/f9b8fe60e743d828cf96d2366610c6a69d7ac054/src/physics/correspondence.jl#L174-L194

MWE

The same behavior can be reproduced with a small MWE:

using LinearAlgebra, StaticArrays, BenchmarkTools

function mwe1(a, X, n)
    K = zeros(SMatrix{3,3})
    for i in 1:n
        k = a * i
        K += k * X * X'
    end
    return K
end

@btime mwe1(1e-5, SVector{3}(1.0, 1.0, 1.0), 1_000_000);
❯ julia +1.10.5 --project -t 6 mwe.jl 
  1.070 ms (0 allocations: 0 bytes)

❯ julia +1.11.1 --project -t 6 mwe.jl 
  129.890 ms (7000000 allocations: 289.92 MiB)

I think the problem is related to allocations when using += with a SMatrix or SVector.

As a quick fix, rewriting the calculation by hand leads to even slightly better performance on v1.11:

function mwe2(a, X, n)
    K11, K12, K13 = 0.0, 0.0, 0.0
    K21, K22, K23 = 0.0, 0.0, 0.0
    K31, K32, K33 = 0.0, 0.0, 0.0
    X1, X2, X3 = X[1], X[2], X[3]
    for i in 1:n
        k = a * i
        K11 += k * X1 * X1
        K12 += k * X1 * X2
        K13 += k * X1 * X3
        K21 += k * X2 * X1
        K22 += k * X2 * X2
        K23 += k * X2 * X3
        K31 += k * X3 * X1
        K32 += k * X3 * X2
        K33 += k * X3 * X3
    end
    K = SMatrix{3,3}(K11, K21, K31, K12, K22, K32, K13, K23, K33)
    return K
end

@btime mwe2(1e-5, SVector{3}(1.0, 1.0, 1.0), 1_000_000);
❯ julia +1.10.5 --project -t 6 mwe.jl 
  1.211 ms (0 allocations: 0 bytes)

❯ julia +1.11.1 --project -t 6 mwe.jl 
  1.072 ms (0 allocations: 0 bytes)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions