Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional functionalities for exp utility: error estimate and IOP #372

Merged
merged 5 commits into from
Jun 4, 2018
Merged

Additional functionalities for exp utility: error estimate and IOP #372

merged 5 commits into from
Jun 4, 2018

Conversation

MSeeker1340
Copy link
Contributor

The aim of this PR is to address the second point of #243 (comment).

Some numerical results for the corrector:

n = 100
K = 4
A = diagm(-2 * ones(n)) + diagm(ones(n - 1), -1) + diagm(ones(n - 1), 1)
b = randn(n);
function error_comparison(K, t, m)
    # Compute exact result using dense phi
    w_exact = zeros(n, K)
    P = phi(t*A, K)
    for i = 1:K
        w_exact[:, i] = P[i] * b
    end
    # Krylov with and without correction
    w = phiv(t, A, b, K; m=m, correct=false)[:, 1:K]
    w_correct = phiv(t, A, b, K; m=m, correct=true)[:, 1:K]
    err = [norm(w[:,i] - w_exact[:,i]) / norm(w_exact[:,i]) for i in 1:K]
    err_correct = [norm(w_correct[:,i] - w_exact[:,i]) / norm(w_exact[:,i]) for i in 1:K]
    for i = 1:K
        println("p = $(i-1), before: $(err[i]), after: $(err_correct[i])")
    end
end;
error_comparison(4, 5.0, 20)
p = 0, before: 1.851580144556005e-8, after: 1.1866137192062755e-8
p = 1, before: 1.1604243216282569e-9, after: 6.930957387640787e-10
p = 2, before: 1.4084870014088947e-10, after: 7.894351673381101e-11
p = 3, before: 2.417853000686249e-11, after: 1.2786553243680304e-11
error_comparison(4, 5.0, 15)
p = 0, before: 1.670064384606024e-5, after: 1.3529002061306636e-5
p = 1, before: 1.523366736059646e-6, after: 1.1210370888360611e-6
p = 2, before: 2.577756909429866e-7, after: 1.7459774282018524e-7
p = 3, before: 5.971387392772451e-8, after: 3.7583355289957666e-8
error_comparison(4, 5.0, 10)
p = 0, before: 0.0039036186519269406, after: 0.004777657784517128
p = 1, before: 0.0005983342152897502, after: 0.0006318794323758182
p = 2, before: 0.00015660771392520995, after: 0.0001468087636175307
p = 3, before: 5.3046898782640144e-5, after: 4.495612453983268e-5
error_comparison(4, 10.0, 20)
p = 0, before: 2.7920782891613483e-5, after: 3.5001350862247234e-5
p = 1, before: 2.3777873965132265e-6, after: 2.740946017107914e-6
p = 2, before: 3.8466149667992783e-7, after: 4.123593087931721e-7
p = 3, before: 8.52677289464572e-8, after: 8.57052414404906e-8
error_comparison(4, 10.0, 15)
p = 0, before: 0.0016615112563970835, after: 0.0025777546011967675
p = 1, before: 0.00021408895792534243, after: 0.00029448496471261217
p = 2, before: 4.9115234177339934e-5, after: 6.131688135215691e-5
p = 3, before: 1.477590119497644e-5, after: 1.6994996067541433e-5
error_comparison(4, 10.0, 10)
p = 0, before: 0.041799641994719756, after: 0.08964522220170576
p = 1, before: 0.008921966901360257, after: 0.016377920187105897
p = 2, before: 0.0030763723943405454, after: 0.005008332379204382
p = 3, before: 0.0013105118114586309, after: 0.0019318170942597493

So the corrector works best when the error is already small but actually amplifies error when it is big. IMO this is not very practical in real use... on the other hand, using the correction term purely as an error estimate should work well.

@MSeeker1340
Copy link
Contributor Author

Not very sure about the interface for the error estimate at the moment. Do we just add an output argument err to phiv!? Seems to over complicate the interface a bit, also we don't always want the error estimate. How can I add a non-allocating output argument that is optional, without having to maintain two separate versions of the function?

@coveralls
Copy link

coveralls commented May 30, 2018

Coverage Status

Coverage increased (+5.2%) to 83.615% when pulling 3004f66 on MSeeker1340:krylov-error-estimate into 57f1eae on JuliaDiffEq:master.

@codecov
Copy link

codecov bot commented May 31, 2018

Codecov Report

Merging #372 into master will decrease coverage by 0.02%.
The diff coverage is 65.51%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #372      +/-   ##
==========================================
- Coverage   83.64%   83.61%   -0.03%     
==========================================
  Files          75       75              
  Lines       21639    21652      +13     
==========================================
+ Hits        18099    18104       +5     
- Misses       3540     3548       +8
Impacted Files Coverage Δ
src/exponential_utils.jl 83.03% <65.51%> (-3.82%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 57f1eae...3004f66. Read the comment docs.

@MSeeker1340
Copy link
Contributor Author

So I've probably overcomplicated myself yesterday about error estimates... In both phipm and KIOPS the only necessary error estimate to have is for the highest order phi (in the case of KIOPS it's the exp). So there's no need to return a separate error estimate for all the outputs of phiv(!), at least for now.

Error estimates in action:

using OrdinaryDiffEq: phi, phiv
n = 100
m = 15
K = 4
A = diagm(-2 * ones(n)) + diagm(ones(n - 1), -1) + diagm(ones(n - 1), 1)
b = randn(n)
t = 5.0;
P = phi(t*A, K)
w_exact = P[K] * b
w, errest = phiv(t, A, b, K; m=m, errest=true)
err = norm(w[:,K] - w_exact)
println("Error: $err, error estimate: $errest")
Error: 4.9003876156387905e-8, error estimate: 7.513596116146876e-8

@MSeeker1340 MSeeker1340 changed the title [WIP] Error estimate and correction for Krylov phiv Error estimate and correction for Krylov phiv May 31, 2018
@MSeeker1340 MSeeker1340 changed the title Error estimate and correction for Krylov phiv Additional functionalities for exp utility: error estimate and IOP Jun 1, 2018
@MSeeker1340
Copy link
Contributor Author

IOP is easier than I thought to implement so I'll just append the commit here.

Because I'm not using a dedicated data structure for H (is there btw? H here is an upper Heisenberg matrix with an length-p upper band), all I need to modify is adding a bound to the inner arnoldi loop.

Turns out expv and phiv stays just the same, as well as the code for error estimate and correction. IOP does change how adaptation is performed, however, but this is not the concern of this PR.

As a reference, some papers about using IOP to approximate matrix exponential-vector products:

https://www.researchgate.net/profile/Antti_Koskela/publication/273697107_Approximating_the_Matrix_Exponential_of_an_Advection-Diffusion_Operator_Using_the_Incomplete_Orthogonalization_Method/links/550954880cf27e990e0e5045/Approximating-the-Matrix-Exponential-of-an-Advection-Diffusion-Operator-Using-the-Incomplete-Orthogonalization-Method.pdf

https://onlinelibrary.wiley.com/doi/pdf/10.1002/nla.2090 (Algorithm 1 gives a clear definition of IOP).

@MSeeker1340
Copy link
Contributor Author

Sanity checks:

using OrdinaryDiffEq: arnoldi, getH
srand(0)
n = 100
m = 15
A = diagm(-2 * ones(n)) + diagm(ones(n - 1), -1) + diagm(ones(n - 1), 1)
Aperm = A + 1e-5 * randn(n, n)
b = randn(n);

For symmetric matrices, IOP(2) should be almost the same as Lanczos:

Ks = arnoldi(A, b; m=m) # Lanczos
Ksperm = arnoldi(Aperm, b; m=m) # full Arnoldi
Ksperm2 = arnoldi(Aperm, b; m=m, iop=2); # IOP(2)
getH(Ks)
16×15 SubArray{Float64,2,Array{Float64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}:
 -2.21741   1.2582    0.0       0.0       …   0.0        0.0        0.0    
  1.2582   -1.84706   1.1293    0.0           0.0        0.0        0.0    
  0.0       1.1293   -2.05088   1.07224       0.0        0.0        0.0    
  0.0       0.0       1.07224  -2.03328       0.0        0.0        0.0    
  0.0       0.0       0.0       0.844917      0.0        0.0        0.0    
  0.0       0.0       0.0       0.0       …   0.0        0.0        0.0    
  0.0       0.0       0.0       0.0           0.0        0.0        0.0    
  0.0       0.0       0.0       0.0           0.0        0.0        0.0    
  0.0       0.0       0.0       0.0           0.0        0.0        0.0    
  0.0       0.0       0.0       0.0           0.0        0.0        0.0    
  0.0       0.0       0.0       0.0       …   0.0        0.0        0.0    
  0.0       0.0       0.0       0.0           0.984855   0.0        0.0    
  0.0       0.0       0.0       0.0          -1.87424    0.946788   0.0    
  0.0       0.0       0.0       0.0           0.946788  -1.89749    0.98828
  0.0       0.0       0.0       0.0           0.0        0.98828   -2.16322
  0.0       0.0       0.0       0.0       …   0.0        0.0        1.00082
getH(Ksperm)
16×15 SubArray{Float64,2,Array{Float64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}:
 -2.2174   1.25819  -1.20995e-6   4.74473e-6  …  -1.56411e-5  -1.09724e-5
  1.2582  -1.84709   1.1293      -9.36381e-6     -1.28886e-5   3.87006e-6
  0.0      1.1293   -2.05088      1.07224        -8.84203e-6   1.21384e-5
  0.0      0.0       1.07224     -2.03327         1.99436e-5   9.47316e-6
  0.0      0.0       0.0          0.844914        1.74995e-5   4.67657e-6
  0.0      0.0       0.0          0.0         …   4.02996e-7   1.5533e-5 
  0.0      0.0       0.0          0.0            -1.25157e-5   2.853e-7  
  0.0      0.0       0.0          0.0             2.05337e-5   1.3109e-6 
  0.0      0.0       0.0          0.0            -1.6915e-5    6.61577e-7
  0.0      0.0       0.0          0.0            -4.99333e-6   1.18318e-5
  0.0      0.0       0.0          0.0         …  -2.15769e-6  -7.03456e-6
  0.0      0.0       0.0          0.0             1.90411e-5   1.22223e-6
  0.0      0.0       0.0          0.0             0.946821     7.6984e-6 
  0.0      0.0       0.0          0.0            -1.89743      0.988291  
  0.0      0.0       0.0          0.0             0.988298    -2.16308   
  0.0      0.0       0.0          0.0         …   0.0          1.00084   
getH(Ksperm2)
16×15 SubArray{Float64,2,Array{Float64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}:
 -2.2174   1.25819   0.0       0.0       …   0.0        0.0        0.0     
  1.2582  -1.84709   1.1293    0.0           0.0        0.0        0.0     
  0.0      1.1293   -2.05088   1.07224       0.0        0.0        0.0     
  0.0      0.0       1.07224  -2.03327       0.0        0.0        0.0     
  0.0      0.0       0.0       0.844914      0.0        0.0        0.0     
  0.0      0.0       0.0       0.0       …   0.0        0.0        0.0     
  0.0      0.0       0.0       0.0           0.0        0.0        0.0     
  0.0      0.0       0.0       0.0           0.0        0.0        0.0     
  0.0      0.0       0.0       0.0           0.0        0.0        0.0     
  0.0      0.0       0.0       0.0           0.0        0.0        0.0     
  0.0      0.0       0.0       0.0       …   0.0        0.0        0.0     
  0.0      0.0       0.0       0.0           0.984791   0.0        0.0     
  0.0      0.0       0.0       0.0          -1.87428    0.946821   0.0     
  0.0      0.0       0.0       0.0           0.946813  -1.89743    0.988291
  0.0      0.0       0.0       0.0           0.0        0.988298  -2.16308 
  0.0      0.0       0.0       0.0       …   0.0        0.0        1.00084 

I'll do an detailed performance-cost analysis for IOP in the afternoon with large sparse A in mind.

@MSeeker1340
Copy link
Contributor Author

Some test results for IOP:

The test case is an n = 1000 sparse 1D Laplacian with reflecting boundary condition. The reason for reflecting BC is that the Laplacian is not symmetric, so Lanczos does not apply. But since it is "almost symmetric", we should expect good performance for IOP(2).

As a comparison, the Laplacian is permuted to get a dense version. They differ a lot in their A_mul_B! cost:

@time A_mul_B!(w, Asparse, b);
  0.000009 seconds (4 allocations: 160 bytes)
@time A_mul_B!(w, Adense, b);
  0.001130 seconds (4 allocations: 160 bytes)

This means that arnoldi will be dominated by dot and axpy! in the sparse case and by A_mul_B! in the second case. Indeed this is confirmed by the profiler:

  • Sparse A:

    sparse

  • Dense A:

    dense

Because IOP saves the number of dot products and axpys, we expect to see a large increase in performance for the sparse case but not for the dense case:

@time profile_arnoldi(Asparse, 0);
  0.000901 seconds (2.61 k allocations: 130.250 KiB)
@time profile_arnoldi(Asparse, 2);
  0.000390 seconds (257 allocations: 20.000 KiB)
@time profile_arnoldi(Adense, 0);
  0.039249 seconds (2.61 k allocations: 138.250 KiB)
@time profile_arnoldi(Adense, 2);
  0.040866 seconds (260 allocations: 28.000 KiB)

(Here an IOP value of 0 indicates full Arnoldi).

Finally, the work-precision diagrams for the sparse case (note I didn't use a lot of trials so the results jiggle a bit):

h5
h20
h100

It is safe to say that in this case using IOP is a strictly better choice.

If A is not as symmetric, then the results may not be as obvious and we may need to use longer IOP lengths. But since most of the differential operators from PDE fall in this "almost symmetric" category, I believe we'll be fine using IOP(2).

Full test script:

Incomplete Orthogonalization Procedure.zip

@ChrisRackauckas
Copy link
Member

Looks great!

end
# TODO: switch to overload `getproperty` in v0.7
getH(Ks::KrylovSubspace) = @view(Ks.H[1:Ks.m, 1:Ks.m])
getV(Ks::KrylovSubspace) = @view(Ks.V[:, 1:Ks.m])
getH(Ks::KrylovSubspace) = @view(Ks.H[1:Ks.m + 1, 1:Ks.m])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is everything +1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to get the "extended" H and V matrix from Arnoldi. Originally I just left out the last terms to make the outputs of getH and getV square for easier coding.

Since error estimate & correction is a need now the extra terms are required, so I added them back and also changed the "happy-breakdown" termination logic of the Arnoldi loop so that they are always calculated.

@@ -228,6 +238,9 @@ function arnoldi!(Ks::KrylovSubspace{B, T}, A, b::AbstractVector{T}; tol=1e-7,
end
V, H = getV(Ks), getH(Ks)
vtol = tol * norm(A, Inf)
if iop == 0
iop = m
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the PR you only show for 2? What does higher look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the output of iop=p, the Heisenberg matrix H will have a length-p upper band (i.e. only p non-zero diagonals for the upper part). I checked this before running the performance tests and it's working as intended.

Performance & accuracy wise, haven't really tested yet but the idea is by increasing p you get better accuracy and worse performance. For the almost symmetric operators that I showed in the tests, since IOP(2) is already good enough there's no need to check higher ps. For general operators I would expect a more visible performance-accuracy tradeoff for adjusting p.

@MSeeker1340
Copy link
Contributor Author

One thing I forgot to mention is that: from the profiling results it is clear that arnoldi is computation bounded both for the dense and sparse case, so I don't think there's a need to use a dedicated banded matrix type for H as I originally thought.

At best it saves some space when the KrylovSubspace is originally constructed. But this is done only once for an integrator so the difference would be negligible.

OK to merge this PR now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants