-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@inbounds
in copyto!
for structured broadcasting
#48437
Conversation
This seems to cut down the runtime of `D .* D` by half for `5000x5000` diagonal matrices. Using nightly `v"1.10.0-DEV.450"` ```julia julia> using LinearAlgebra, BenchmarkTools julia> D = Diagonal(rand(5000)); julia> @Btime $D * $D; 3.648 μs (2 allocations: 39.11 KiB) julia> @Btime $D .* $D; 8.729 μs (2 allocations: 39.11 KiB) julia> B = Broadcast.instantiate(Broadcast.Broadcasted(*, (D, D))); julia> @Btime Base.copyto!($(copy(D)), $B); 8.390 μs (0 allocations: 0 bytes) julia> function copyto2!(dest::Diagonal, bc::Broadcast.Broadcasted{<:LinearAlgebra.StructuredMatrixStyle}) !LinearAlgebra.isstructurepreserving(bc) && !LinearAlgebra.fzeropreserving(bc) && return copyto!(dest, convert(Broadcast.Broadcasted{Nothing}, bc)) axs = axes(dest) axes(bc) == axs || Broadcast.throwdm(axes(bc), axs) @inbounds for i in axs[1] dest.diag[i] = Broadcast._broadcast_getindex(bc, CartesianIndex(i, i)) end return dest end copyto2! (generic function with 1 method) julia> @Btime copyto2!($(copy(D)), $B); 4.207 μs (0 allocations: 0 bytes) ```
Since default |
@inbounds
in diagonal broadcasting@inbounds
in copyto!
for structured broadcasting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These seem like a very large chunk of code to entirely mark @inbounds
, since I think I count at least 6 different indexing operations in each of these. Can we mark just the operation that matters for it?
I've updated it to only mark the broadcasted |
good to merge? |
This should be ready |
This seems to cut down the runtime of
D .* D
by half for5000x5000
diagonal matrices. Using nightlyv"1.10.0-DEV.450"
Given that the axes are verified to be identical in the previous line, this seems safe.