Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue with (sparse) backsolve after LU factorization (turn off iterative refinement by default) #122

Closed
briochemc opened this issue Feb 19, 2019 · 13 comments

Comments

@briochemc
Copy link

I noticed there was a performance issue with the backsolve that I thought I should share here.

Coming from MATLAB I thought the best way to show this performance issue was to benchmark the two on the same machine with the same data, so here it is. I perform a simple backslash (M \ x), a factorization (factorize(M)), and two backsolves (Mf \ x and Mf \ [x 2x 3x 4x]) using a somewhat-random matrix M and vector x. First, I use MATLAB (v2017b) to perform this test and I save M and x in a .mat file. Finally I load these in Julia (v1.1.0) using the MAT.jl package and perform the same test again. (I ] up'd my packages before this test.)

Here are the results (the code I used is at the end of this post):

  1. MATLAB performance:

    >> tic; Mf = linfactor(M); toc;
    Elapsed time is 45.817367 seconds.
    >> tic; M \ x; toc;
    Elapsed time is 34.809706 seconds.
    >> tic; linfactor(Mf, x); toc;
    Elapsed time is 0.352151 seconds.
    >> tic; linfactor(Mf, [x 2*x 3*x 4*x]); toc;
    Elapsed time is 1.288229 seconds.
  2. Julia performance:

    julia> @btime Mf = factorize(M) ;
      32.151 s (66 allocations: 8.71 GiB)
    
    julia> @btime M \ x ;
      33.191 s (74 allocations: 8.71 GiB)
    
    julia> @btime Mf \ x ;
      1.132 s (8 allocations: 1.22 MiB)
    
    julia> @btime Mf \ [x 2x 3x 4x] ;
      4.523 s (31 allocations: 5.95 MiB)

As you can see the backsolves in Julia take roughly 3–4 times as long as the MATLAB equivalent, while the factorization takes roughly the same time. Any idea what is the issue there? (Note using inplace ldiv! does not change those times).

(I know I am comparing apples to oranges, and that this could be an artefact of the random M and x but I seem to see this happen every time and I think there is too significant a difference in performance to be neglected. Please tell me if I am wrong 🙂)


The code I used:

  1. MATLAB code:

    n = 20000;
    M = sprand(n, n, 20/n) + speye(n);
    x = rand(n, 1);
    tic; Mf = linfactor(M); toc;
    tic; M \ x; toc;
    tic; linfactor(Mf, x); toc;
    tic; linfactor(Mf, [x 2*x 3*x 4*x]); toc;
    save test_matrix.mat M x
  2. Julia code

    using LinearAlgebra, SparseArrays, SuiteSparse, BenchmarkTools, MAT
    M = matread("test_matrix.mat")["M"] ;
    x = matread("test_matrix.mat")["x"] ;
    Mf = factorize(M) ;
    @btime Mf = factorize(M) ;
    @btime M \ x ;
    @btime Mf \ x ;
    @btime Mf \ [x 2x 3x 4x] ;
@KristofferC
Copy link
Member

KristofferC commented Feb 19, 2019

Maybe same issue as JuliaLang/julia#19500. You could try run MATLAB single threaded and see if things change.

@briochemc
Copy link
Author

briochemc commented Feb 19, 2019

Maybe same issue as JuliaLang/julia#19500. You could try run MATLAB single threaded and see if things change.

Knowing if this is the same issue is out of reach for my uneducated brain. But it seems to me they are different because issue JuliaLang/julia#19500 is about multiple RHS, while the performance issue herein is the same with both a single RHS and multiple RHS... Could you let me know how to force single thread? (The machine I used was a 1-node 12-cores one.)

@KristofferC
Copy link
Member

while the performance issue herein is the same with both a single RHS and multiple RHS

Yeah, I read a bit quickly and mostly noticed the slowdown for the multiple RHS.

@andreasnoack
Copy link
Member

andreasnoack commented Feb 19, 2019

JuliaLang/julia#19500 mentions two possible speed-ups and I think the first one is mentioned there might explain the difference to Matlab. I'm getting

julia> @btime Mf \ x ;
  710.918 ms (6 allocations: 1.22 MiB)

julia> @btime Mf \ [x 2x 3x 4x] ;
  3.198 s (32 allocations: 5.95 MiB)

julia> SuiteSparse.UMFPACK.umf_ctrl[8]
2.0

julia> SuiteSparse.UMFPACK.umf_ctrl[8]=0
0

julia> @btime Mf \ x ;
  225.241 ms (6 allocations: 625.19 KiB)

julia> @btime Mf \ [x 2x 3x 4x] ;
  905.906 ms (32 allocations: 3.51 MiB)

i.e. disabling iterative refinements helps a lot. I still don't think we expose this option in a more convenient way and I'm also wondering if we should disable the refinements by default but

julia> norm(x - M*sol_refinements)
3.877108630658927e-12

julia> norm(x - M*sol_no_refinements)
5.613799611342331e-8

Could you check the error in Matlab? Does it look like refinements are on by default?

@briochemc
Copy link
Author

Hey there, sorry for the delayed response, but here it is. It seems that you are right, the refinements are off in MATLAB when using linfactor (but they are on when using backslash directly on the matrix).

I used the same data as earlier (loading the same file) and I checked a couple of norms to compare the errors in MATLAB and Julia. First are the results (the code I used is at the end of this post). sol1 = Mf \ x is computed after explicit factorization while sol2 = M \ x is computed directly from the matrix.

  • Julia:

    julia> norm(x - M * sol1) / norm(x)
    8.793283052561324e-15
    
    julia> norm(x - M * sol1)
    7.167977657716642e-13
    
    julia> norm(x - M * sol2) / norm(x)
    8.793283052561324e-15
    
    julia> norm(x - M * sol2)
    7.167977657716642e-13
  • MATLAB:

    >> norm(x - M * sol1) / norm(x)
       1.7546e-10
    
    >> norm(x - M * sol1)
       1.4303e-08
    
    >> norm(x - M * sol2) / norm(x)
       8.8145e-15
    
    >> norm(x - M * sol2)
       7.1853e-13

(In addition to the absolute error I added the relative one because I think its magnitude is likely more robust, in case you want to compare those from different machines / runs / randomly-created matrices M and vectors x.)


Code used:

  • Julia:

    using LinearAlgebra, SparseArrays, SuiteSparse, BenchmarkTools, MAT
    M = matread("test_matrix.mat")["M"] ;
    x = matread("test_matrix.mat")["x"] ;
    Mf = factorize(M) ;
    sol1 = Mf \ x ;
    sol2 = M \ x ;
    norm(x)
    norm(x - M * sol1) / norm(x)
    norm(x - M * sol1)
    norm(x - M * sol2) / norm(x)
    norm(x - M * sol2)
  • MATLAB:

    load test_matrix.mat M x
    Mf = linfactor(M) ;
    sol1 = linfactor(Mf, x) ;
    sol2 = M \ x ;
    norm(x)
    norm(x - M * sol1) / norm(x)
    norm(x - M * sol1)
    norm(x - M * sol2) / norm(x)
    norm(x - M * sol2)

@briochemc
Copy link
Author

@andreasnoack Should I close this issue? Or do you want it to remain open as a reminder in case you want to eventually "expose this option [to turn the iterative refinements on or off] in a more convenient way"?

@andreasnoack
Copy link
Member

I think we should keep this one open until we have made a decision about the iterative refinements.

@ViralBShah
Copy link
Member

I think iterative refinement should not be the default and should be opt-in.

@ViralBShah ViralBShah transferred this issue from JuliaLang/julia May 19, 2021
@ViralBShah ViralBShah changed the title Performance issue with (sparse) backsolve after LU factorization Performance issue with (sparse) backsolve after LU factorization (turn off iterative refinement by default) May 29, 2021
@rayegun rayegun transferred this issue from JuliaSparse/SuiteSparse.jl Jun 7, 2022
@ViralBShah
Copy link
Member

@SobhanMP In your workspace sizing code, I noticed the comment that it could be smaller if iterative refinement is off. Would it be a huge gain?

We do want to turn off iterative refinement by default, but have a way to turn it on optionally.

@SobhanMP
Copy link
Member

SobhanMP commented Jul 12, 2022

@ViralBShah
image
yeah, 5 times the memory for float64 matrices so i'd say huge?

@ViralBShah
Copy link
Member

That's huge. Would love to get those savings!

@SobhanMP
Copy link
Member

SobhanMP commented Jul 12, 2022

do you know why UMFPACK_IRSTEP isn't exported by the wrapper?

@ViralBShah
Copy link
Member

ViralBShah commented Aug 10, 2022

#181 disables iterative refinement. Would be nice if folks can confirm performance improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants