-
-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added thermal fluid MTK and JuliaSimCompiler benchmark. #939
Conversation
The JuliaHubRegistry can now be added for benchmarks that require it, as of #940.
|
Does it require released versions of the packages? |
My desktop is one of the antiques that features AVX(512) downclocking. Intel hasn't had that issue since ice lake, and AMD never did. The Julia code isn't vectorized at all. julia> @pstats "cpu-cycles,(instructions,branch-instructions,branch-misses),(task-clock,context-switches,cpu-migrations,page-faults),(L1-dcache-load-misses,L1-dcache-loads,L1-icache-load-misses),(dTLB-load-misses,dTLB-loads),(iTLB-load-misses,iTLB-loads),(cache-misses,cache-references)" begin
foreachf(f_l, 100_000, du, u0, p, 0.0)
end
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
╶ cpu-cycles 2.85e+09 33.4% # 3.8 cycles per ns
┌ instructions 3.82e+09 33.4% # 1.3 insns per cycle
│ branch-instructions 4.58e+07 33.4% # 1.2% of insns
└ branch-misses 2.97e+05 33.4% # 0.6% of branch insns
┌ task-clock 7.51e+08 100.0% # 750.5 ms
│ context-switches 0.00e+00 100.0%
│ cpu-migrations 0.00e+00 100.0%
└ page-faults 0.00e+00 100.0%
┌ L1-dcache-load-misses 2.73e+08 16.7% # 33.3% of dcache loads
│ L1-dcache-loads 8.19e+08 16.7%
└ L1-icache-load-misses 3.76e+08 16.7%
┌ dTLB-load-misses 5.30e+03 16.7% # 0.0% of dTLB loads
└ dTLB-loads 8.44e+08 16.7%
┌ iTLB-load-misses 3.64e+04 33.3% # 26.2% of iTLB loads
└ iTLB-loads 1.39e+05 33.3%
┌ cache-misses 9.63e+04 33.3% # 19.9% of cache refs
└ cache-references 4.83e+05 33.3%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
julia> @pstats "cpu-cycles,(instructions,branch-instructions,branch-misses),(task-clock,context-switches,cpu-migrations,page-faults),(L1-dcache-load-misses,L1-dcache-loads,L1-icache-load-misses),(dTLB-load-misses,dTLB-loads),(iTLB-load-misses,iTLB-loads),(cache-misses,cache-references)" begin
foreachf(f_j, 100_000, du, u0, p, 0.0)
end
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
╶ cpu-cycles 2.71e+09 33.3% # 4.5 cycles per ns
┌ instructions 5.37e+09 33.4% # 2.0 insns per cycle
│ branch-instructions 4.60e+08 33.4% # 8.6% of insns
└ branch-misses 4.76e+06 33.4% # 1.0% of branch insns
┌ task-clock 6.02e+08 100.0% # 601.9 ms
│ context-switches 0.00e+00 100.0%
│ cpu-migrations 0.00e+00 100.0%
└ page-faults 0.00e+00 100.0%
┌ L1-dcache-load-misses 1.49e+08 16.7% # 8.4% of dcache loads
│ L1-dcache-loads 1.77e+09 16.7%
└ L1-icache-load-misses 5.62e+08 16.7%
┌ dTLB-load-misses 0.00e+00 16.6% # 0.0% of dTLB loads
└ dTLB-loads 1.73e+09 16.6%
┌ iTLB-load-misses 1.46e+04 33.2% # 0.9% of iTLB loads
└ iTLB-loads 1.71e+06 33.2%
┌ cache-misses 1.67e+03 33.2% # 6.2% of cache refs
└ cache-references 2.67e+04 33.2%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ The Julia version requires 40% more instructions, but also executed at 50% more instructions per clock cycle, which is another issue that may require looking into. The LLVM backend had 2.7k l1d cache misses/call vs 1.5k for the Julia backend. I'll try to make some improvements (that should be applicable to all backends). Note that a lot of linear algebra code will use AVX512, so in practice this CPU is going to be downclocked anyway. Newer CPUs do not have this problem (and use less naive clock speed algorithms anyway). |
Yes, it will. |
What's the status here? |
I made one PR that really improved the LLVM backend's performance here. Then we can cut a release. |
For the last set of 3200 equations takes, the ODEProblem + In other words, with The Not sure what the relative priorities are, but I'd argue that we shouldn't worry too much about Symbolics until we're serious about ditching it (but if anyone in the open source community wants to contribute to improving it, that'd still be nice). I have a couple more ideas that should improve the |
I meant the status to getting this merged, i.e. the devops part 😅 |
That requires a JuliaSimCompiler release. |
Also, to give any readers an idea of just how bad working with Symbolics.jl is... julia> irsys = @time @eval IRSystem(testbench);
40.631866 seconds (92.43 M allocations: 4.919 GiB, 2.90% gc time, 0.00% compilation time)
julia> sys_jsir = @time @eval structural_simplify(irsys);
2.939211 seconds (19.91 M allocations: 1.412 GiB, 7.14% gc time, 0.00% compilation time)
|
It is annoying that buildkite reports green whether it failed or not. ┌ Info: Instantiating
└ folder = "benchmarks/ModelingToolkit"
Activating project at `/cache/build/exclusive-amdci1-0/julialang/scimlbenchmarks-dot-jl/benchmarks/ModelingToolkit`
Updating registry at `/cache/julia-buildkite-plugin/depots/5b300254-1738-4989-ae0a-f4d2d937f953/registries/General.toml`
Updating registry at `/cache/julia-buildkite-plugin/depots/5b300254-1738-4989-ae0a-f4d2d937f953/registries/JuliaComputingRegistry.toml`
Updating registry at `/cache/julia-buildkite-plugin/depots/5b300254-1738-4989-ae0a-f4d2d937f953/registries/JuliaHubRegistry.toml`
ERROR: expected package `JuliaSimCompilerRuntime [9cbdfd5a]` to be registered
Stacktrace:
[1] pkgerror(msg::String)
@ Pkg.Types /cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/share/julia/stdlib/v1.10/Pkg/src/Types.jl:70
[2] check_registered
@ /cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:1288 [inlined]
[3] up(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}, level::Pkg.Types.UpgradeLevel; skip_writing_project::Bool, preserve::Nothing)
@ Pkg.Operations /cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:1537
[4] up(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; level::Pkg.Types.UpgradeLevel, mode::Pkg.Types.PackageMode, preserve::Nothing, update_registry::Bool, skip_writing_project::Bool, kwargs::@Kwargs{}) https://buildkite.com/julialang/scimlbenchmarks-dot-jl/builds/2366#018f9794-f0cf-475e-b63a-6bab9d60fd51/376-385 Or, at least it is registered in the inernal package server, so that CI using github actions can install it by setting the |
Buildkite isn't running? |
@thazhemadam The saved plot was not uploaded among the artifacts on buildkite. (N_x_i, ss_times[i, :], times[i, :]) = (960, [1172.1312520503998, 88.812344
07424927], [(50.97079300880432, 1.03e-5), (49.89296197891235, 8.33633333333
3334e-6), (30.01787805557251, 2.3599e-5), (5.702844142913818, 7.4725e-6)])
CairoMakie.Screen{IMAGE} On the server, MTK and JuliaSimCompiler's Julia both had about 50s compile times, vs 30s for the C backend and 6s for the LLVM backend (which also had the best runtime). Would be nice to actually see the plot, though. |
These tests depend on a couple proprietary repositories:
@thazhemadam, can you set it up so we can run with these dependencies?
Does it taking more than adding
to the
env
in the appropriate workflows?Additionally, it currently requires the
JuliaSimCompiler#cbackendmultifunuse
branch, but it will likely be merged shortly.I would also suggest this branch of XSteam, to fix precompilation: hzgzh/XSteam.jl#2
It times and plots
structural_simplify
+ (for JuliaSimCompiler) theIRSystem
ODEProblem
+ a call tof!
.@belapsed
)Compile times
structural_simplify
is much faster when usingJuliaSimCompiler
thanMTK
, and this is by far the slowest step of model building.Waiting over 10 minutes with MTK vs a minute with JuliaSimCompiler has a substantial impact on productivity and iterative model development.
In terms of compiling the simplified model, Julia is substantially slower to compile than C, which is comparable to directly emitting LLVM IR.
Regardless, this time is fairly inconsequential compared to
structural_simplify
.Runtimes
This code example uses a lot of registered functions.
I suspect these hurt the performance of the C backend, as they are handled as function pointers that are passed in as arguments to the C function.
There are many calls to elementary functions. The LLVM backend uses the variants that LLVM links in, equivalent to
llvmcall
ing@llvm.pow
from Julia, for example, instead of those that come with Julia.On this compute, the
@llvm
versions tend to be faster in microbenchmarks. I'd have to look into why the LLVM backend falls behind the Julia backend in runtime performance.The Julia backend using MTK produces slower code than the Julia or LLVM backends of JuliaSimCompiler, but faster than the C backend here (due to the registered functions).
I could put some of these comments in the document. I didn't mention times explicitly, which may be different on the server vs my desktop.
Note, this takes a long time to build. The largest
structural_simplify
took MTK more than 11 minutes!We may want to cut off MTK above a certain number of states.