-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Open
Labels
foldsum, maximum, reduce, foldl, etc.sum, maximum, reduce, foldl, etc.mathsMathematical functionsMathematical functionsperformanceMust go fasterMust go faster
Description
sum(f,A) performs significantly worse than sum(f.(A)) for integer inputs to certain transcendental functions on x86 (maybe specific to AMD?)
julia> versioninfo()
Julia Version 1.11.0-DEV.237
Commit 958da95647 (2023-08-07 21:48 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 32 × AMD Ryzen 9 3950X 16-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
Threads: 47 on 32 virtual cores
Environment:
LD_PRELOAD = /lib/x86_64-linux-gnu/libc_malloc_debug.so.0
JULIA_NUM_THREADS = 32
JULIA_EDITOR = vim
julia> a=collect(1:1000000);
julia> @btime sum(sin.(a))
8.574 ms (4 allocations: 7.63 MiB)
-0.11710952409815278
julia> @btime sum(sin,a)
12.908 ms (1 allocation: 16 bytes)
-0.11710952409817987
julia> @btime sum(log.(a))
8.066 ms (4 allocations: 7.63 MiB)
1.2815518384658169e7
julia> @btime sum(log,a)
6.302 ms (1 allocation: 16 bytes)
1.281551838465817e7Different story on Apple Silicon
julia> versioninfo()
Julia Version 1.11.0-DEV.237
Commit 958da95647 (2023-08-07 21:48 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin23.0.0)
CPU: 8 × Apple M1
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 on 4 virtual cores
Environment:
JULIA_EDITOR = vim
julia> a=collect(1:1000000);
julia> @btime sum(sin.(a))
6.621 ms (4 allocations: 7.63 MiB)
-0.11710952409819408
julia> @btime sum(sin,a)
5.972 ms (1 allocation: 16 bytes)
-0.11710952409817987brenhinkeller and Uroc327
Metadata
Metadata
Assignees
Labels
foldsum, maximum, reduce, foldl, etc.sum, maximum, reduce, foldl, etc.mathsMathematical functionsMathematical functionsperformanceMust go fasterMust go faster