Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated @muladd and broadcasts #104

Merged
merged 28 commits into from
Aug 7, 2017
Merged

Updated @muladd and broadcasts #104

merged 28 commits into from
Aug 7, 2017

Conversation

devmotion
Copy link
Member

This PR improves the application of @muladd and broadcasts to all algorithms and the calculation of initial time steps. As a test I additionally reverted methods with 12+ broadcasts fusing to the so-called "mixed" implementation - it was only necessary for a few methods like Feagin, Vern, BS5, and Tsit5, and also works with StaticArrays. The preferred broadcast version is added as a comment. Moreover, I tried to clean the code a bit by removing unneeded unpacking or packing of parameters and e.g. merging duplicate initializations of symplectic caches.

Locally all tests passed.

@ChrisRackauckas
Copy link
Member

Spring cleaning haha. Thanks for tackling this. This is looks really nice.

if typeof(u) <: AbstractArray

Instead of using that in the mixed implementation, can you instead make it send StaticArrays to the other?

if typeof(u) <: AbstractArray && !(typeof(u) <: SArray)

The reason is similar(u) is an MVector which is actually will allocate to the heap and not the stack. If everything is fully optimized, then this will slow down static arrays since they normally will allocate to the stack and thus be "free".

@ChrisRackauckas
Copy link
Member

so-called "mixed" implementation - it was only necessary for a few methods like Feagin, Vern, BS5, and Tsit5

Only necessary there because those were the only ones which hit the broadcast limit?

@devmotion
Copy link
Member Author

I changed it according to your suggestion (with some minor additional changes such that residuals can be calculated for StaticArrays). Yes, only those hit the limit.

rtmp = integrator.fsalfirst
A = f[1]
u = expm(dt*A)*(@. uprev + dt*rtmp)
@muladd u = expm(dt*A)*(@. uprev + dt*rtmp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does @muladd work with matrix multiplication?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@muladd does not change or interfere with matrix multiplications. This generates the following expression:

julia> @macroexpand( @muladd u = expm(dt*A)*(@. uprev + dt*rtmp) )
:(u = expm(dt * A) * (muladd).(dt, rtmp, uprev))

So this should work as expected, shouldn't it? By the way, because of matrix multiplications I thought it would be better to only apply muladd as a dot call if both operators are dotted; i.e. expressions like @muladd A*x .+ b stay untouched whereas @muladd a.*b.+c leads to (muladd).(a,b,c).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, now I remember you made it handle that. Cool. Just making sure it would stay correct 👍

@@ -42,7 +42,7 @@ end
if typeof(cache) <: IIF1ConstantCache
tmp = expm(A*dt)*(uprev)
elseif typeof(cache) <: IIF2ConstantCache
tmp = expm(A*dt)*(@. uprev + 0.5dt*uhold[1]) # This uhold only works for non-adaptive
@muladd tmp = expm(A*dt)*(@. uprev + 0.5dt*uhold[1]) # This uhold only works for non-adaptive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another @muladd on matrix exponential

@@ -224,15 +223,15 @@ end
W = I - dto2*J
else
J = ForwardDiff.derivative(uf,uprev)
W = 1 - dto2*J
W = @. 1 - dto2*J
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't @. because it's Number only. If it's not a Number and instead is an AbstractArray, this would actually be incorrect since 1 would have to be the identity matrix. But that's the first case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll change it 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure: does not the same problem occur in lines https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/integrators/rosenbrock_integrators.jl#L246-L252? Shouldn't we use the identity matrix instead of 1 in line 252 if J is an AbstractArray?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Julia has been throwing a warning when using I like a number there since v0.5. It sucks so this is just a workaround to have it not throw the warning. The out-of-place Rosenbrock methods should be fixed to have the I route.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'll change it there as well such that W is already calculated inside the if-else expression - once with identity matrix I and once just as a number?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good.

@@ -459,7 +456,7 @@ end
W = I - d*dt*J
else
J = ForwardDiff.derivative(uf,uprev)
W = 1 - d*dt*J
W = @. 1 - d*dt*J
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@ChrisRackauckas
Copy link
Member

Alright cool. Looks great! How do the benchmarks look?

Can we list out all of the methods which are not fully broadcasting? I think it's all of the RK methods with order >4 and that's it, right? That includes Nystrom5. Might want to document that in an issue on broadcasting, since that means that arrays with broadcasting but no indices (I'm look at you, ArrayPartition of GPUArrays) cannot be used there, and it'll be something we will want to fix in the future when Julia gets fixed.

@devmotion
Copy link
Member Author

Which benchmarks? What exactly do you want to compare? I did some benchmarks of the residual calculation, BS3, and BS5, which I put up here. For the residuals and BS3 the results were in favour of the broadcast implementation in higher dimensional spaces and gave almost similar results as the current implementation in one-dimensional spaces; however, as we hit the broadcast limit in BS5 the broadcast implementation leads to a huge regression in that case. However, the mixed implementation clearly outperformed even the current implementation for BS5. Hence I assumed, the benchmarks would show similar results for other methods I haven't tested so far - as long as we don't hit the broadcast limit performance is fine and otherwise we get a huge regression which can be circumvented by the mixed implementation.

@ChrisRackauckas
Copy link
Member

Could you quickly run something like the first few plots of https://github.com/JuliaDiffEq/DiffEqBenchmarks.jl/blob/master/StiffODE/ROBER.ipynb to make sure that TRBDF2 and Rosenbrock methods don't slow down? That'll make me finally believe in broadcast.

Then other than the two @. 1 - d*dt*J changes, this is ready to merge when tests pass.

@ChrisRackauckas
Copy link
Member

@devmotion
Copy link
Member Author

Just broadcasting leads to the warning

WARNING: J::UniformScaling - x::Number is deprecated, use J.λ - x instead.

and, even worse, sometimes to incorrect calculations:

julia> @. I - 3*[0.5  0.25; 1.5  2.5]
WARNING: J::UniformScaling - x::Number is deprecated, use J.λ - x instead.
Stacktrace:
 [1] depwarn(::String, ::Symbol) at ./deprecated.jl:70
 [2] -(::UniformScaling{Int64}, ::Float64) at ./deprecated.jl:57
 [3] macro expansion at ./broadcast.jl:153 [inlined]
 [4] macro expansion at ./simdloop.jl:73 [inlined]
 [5] macro expansion at ./broadcast.jl:147 [inlined]
 [6] _broadcast!(::##7#8, ::Array{Float64,2}, ::Tuple{Tuple{},Tuple{Bool,Bool}}, ::Tuple{Tuple{},Tuple{Int64,Int64}}, ::UniformScaling{Int64}, ::Tuple{Array{Float64,2}}, ::Type{Val{1}}, ::CartesianRange{CartesianIndex{2}}) at ./broadcast.jl:139
 [7] broadcast_t(::Function, ::Type{T} where T, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{2}}, ::UniformScaling{Int64}, ::Array{Float64,2}) at ./broadcast.jl:268
 [8] broadcast_c at ./broadcast.jl:314 [inlined]
 [9] broadcast(::Function, ::UniformScaling{Int64}, ::Array{Float64,2}) at ./broadcast.jl:434
 [10] eval(::Module, ::Any) at ./boot.jl:235
 [11] eval_user_input(::Any, ::Base.REPL.REPLBackend) at ./REPL.jl:66
 [12] macro expansion at ./REPL.jl:97 [inlined]
 [13] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73
while loading no file, in expression starting on line 0
2×2 Array{Float64,2}:
 -0.5   0.25
 -3.5  -6.5

Thus also convergence tests fail after changing these lines to broadcasts. I'm not sure what's the best way to deal with this - my idea would be something like

if typeof(mass_matrix) <: UniformScaling
    W .= mass_matrix - dt*J
else
    @. W = mass_matrix - dt*J
end

@ChrisRackauckas
Copy link
Member

Open a separate issue for that and we'll do that in another PR.

@devmotion
Copy link
Member Author

OK, I'll open a separate PR for that issue. I redid the plots of https://github.com/JuliaDiffEq/DiffEqBenchmarks.jl/blob/master/StiffODE/ROBER.ipynb and got:

figure_1

figure_2

figure_3

figure_4

figure_5

@devmotion devmotion mentioned this pull request Aug 4, 2017
39 tasks
T0 = typeof(d₀)
T1 = typeof(d₁)
if d₀ < T0(1//10^(5)) || d₁ < T1(1//10^(5))
if d₀ < 1//10^(5) || d₁ < 1//10^(5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this obey units? Did you run units_tests.jl?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did run all long tests as well, including units_tests.jl. They all passed.

tmp is unitless, and hence both d₀ and d₁ are unitless as well.

src/initdt.jl Outdated
sk = @. abstol+abs(u0)*reltol
d₀ = internalnorm(u0./sk)
d₀ = internalnorm(@. u0/sk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with the units here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. u0 and sk have both the same unit, hence u0./sk and also d₀ are unitless.

@devmotion
Copy link
Member Author

It seems the initdt fix introduces a test error in data_array_test.jl...

@ChrisRackauckas
Copy link
Member

Does broadcasting a DataArray return a DataArray? I think in the long run we should do something like

https://github.com/JuliaDiffEq/MultiScaleArrays.jl/blob/master/src/math.jl#L9

but we cannot expect most user defined types to broadcast correctly. I would revert that part and change back to pre-allocating and using the inplace update because that's type-safe.

@ChrisRackauckas ChrisRackauckas merged commit 5fd16ad into SciML:master Aug 7, 2017
@ChrisRackauckas
Copy link
Member

I reverted to use similar again in initdt and then merged. All of the performance regressions seemed to be weeded out, but the tests take noticeably longer. This may be in large part due to compilation time increases, which would happen when inlining is overly aggressive and there's a lot of broadcasting. That makes #120 important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants