# Meeting June 22

## Preconditioner

[AMG package](https://github.com/JuliaLinearAlgebra/AlgebraicMultigrid.jl): Algebraic Multigrid 


```
N = 2^10
julia> size(A)
(1050625, 1050625)


# Solving directly with LU decomposition

# Factorization
@benchmark A_fac = lu(A)

 memory estimate:  2.07 GiB
  allocs estimate:  65
  --------------
  minimum time:     8.331 s (0.03% GC)
  median time:      8.331 s (0.03% GC)
  mean time:        8.331 s (0.03% GC)
  maximum time:     8.331 s (0.03% GC)
  --------------
  samples:          1
  evals/sample:     1
  
 
@benchmark A_fac \ b
 memory estimate:  64.13 MiB
  allocs estimate:  6
  --------------
  minimum time:     510.915 ms (0.00% GC)
  median time:      513.875 ms (0.00% GC)
  mean time:        516.267 ms (0.00% GC)
  maximum time:     527.832 ms (0.00% GC)
  --------------
  samples:          10
  evals/sample:     1
 

# Construct Ruge Stuben Solver
ml = ruge_stuben(A)

# Using as preconditioner
p = aspreconditioner(ml)

@benchmark ml = ruge_stuben(A)

 memory estimate:  2.18 GiB
  allocs estimate:  938
  --------------
  minimum time:     2.953 s (50.83% GC)
  median time:      3.018 s (52.00% GC)
  mean time:        3.018 s (52.00% GC)
  maximum time:     3.083 s (53.13% GC)
  --------------
  samples:          2
  evals/sample:     1

@benchmark p = aspreconditioner(ml)

memory estimate:  32 bytes
  allocs estimate:  1
  --------------
  minimum time:     21.241 ns (0.00% GC)
  median time:      21.743 ns (0.00% GC)
  mean time:        26.562 ns (16.15% GC)
  maximum time:     7.550 μs (99.33% GC)
  --------------
  samples:          10000
  evals/sample:     998



# Conjugate Gradient without preconditioner: initialization with all-zero vector
@benchmark cg(A,b) 
memory estimate:  40.08 MiB
  allocs estimate:  19
  --------------
  minimum time:     30.714 s (0.00% GC)
  median time:      30.714 s (0.00% GC)
  mean time:        30.714 s (0.00% GC)
  maximum time:     30.714 s (0.00% GC)
  --------------
  samples:          1
  evals/sample:     1


# Conjugate Gradient using ruge_stuben as preconditioner: initialization with all-zero vector

memory estimate:  40.08 MiB
  allocs estimate:  22
  --------------
  minimum time:     1.097 s (0.35% GC)
  median time:      1.137 s (0.31% GC)
  mean time:        1.129 s (0.25% GC)
  maximum time:     1.160 s (0.31% GC)
  --------------
  samples:          5
  evals/sample:     1
  
  
  
  # Directly Using Algebraic Multigrid Method
  
  @benchmark solve(ml,b)
  
   memory estimate:  8.02 MiB
  allocs estimate:  2
  --------------
  minimum time:     1.210 s (0.00% GC)
  median time:      1.276 s (0.00% GC)
  mean time:        1.296 s (0.00% GC)
  maximum time:     1.421 s (0.00% GC)
  --------------
  samples:          4
  evals/sample:     1
  
  ```

## Not so successful part: [LinearOperator.jl](https://github.com/JuliaSmoothOptimizers/LinearOperators.jl) package

```
A_op = LinearOperator(A)
@benchmark cg(A_op,b)

  memory estimate:  19.72 GiB
  allocs estimate:  5047
  --------------
  minimum time:     37.136 s (5.60% GC)
  median time:      37.136 s (5.60% GC)
  mean time:        37.136 s (5.60% GC)
  maximum time:     37.136 s (5.60% GC)
  --------------
  samples:          1
  evals/sample:     1

```

## Banded Matrix Factorization


This package can define [BandedMatrix](https://github.com/JuliaMatrices/BandedMatrices.jl) </br>
It utilizes [LimitedLDLFactorizations](https://github.com/JuliaSmoothOptimizers/LimitedLDLFactorizations.jl) perform factorization and solving for BandedMatrix

using lldl in `LimitedLDLFactorizations.jl` would save memory significantly, and it can perform factorization with constraint to memory <br>

```
lldl(A)
```

# Reworking on Hybrid Code

## Separate Matrices forming and Factorizations

Getting rid of `threaded_LocalGlobalOperators` <br>
Decompose `SBPLocalOperator1` into two parts `SBPLocalOperator1_forming` and `SBPLocalOperator1_factorization`




## Errors still exist for cholesky factorizations

Hybrid method on the mesh with 4 by 4 blocks (LoadError with )


```
lvl = 5
N: 1024
Time elapsed (assembleλmatrix) for lvl 5 = 305.92599987983704
(lvl, ϵ[lvl]) = (5, 2.5755272693657694e-6)
Time elapsed for the whole code is approximately 325.7149999141693
Time elapsed (reading matrices) for lvl 5 = 9.999275207519531e-5
Time elapsed (linear solve with reading matrices) for lvl 5 = 0.33499999046325685
Time elapsed (All three parts) for lvl 5 = 0.35230002403259275

lvl = 6
N: 2048
ERROR: LoadError: ArgumentError: sparse matrix construction failed for unknown reasons. Please submit a bug report.
Stacktrace:
 [1] SuiteSparse.CHOLMOD.Sparse{Float64}(::Ptr{SuiteSparse.CHOLMOD.C_Sparse{Float64}}) at C:\Users\julia\AppData\Local\Julia-1.4.2\share\julia\stdlib\v1.4\SuiteSparse\src\cholmod.jl:264
 [2] Sparse at C:\Users\julia\AppData\Local\Julia-1.4.2\share\julia\stdlib\v1.4\SuiteSparse\src\cholmod.jl:283 [inlined]
 [3] spsolve(::Int32, ::SuiteSparse.CHOLMOD.Factor{Float64}, ::SuiteSparse.CHOLMOD.Sparse{Float64}) at C:\Users\julia\AppData\Local\Julia-1.4.2\share\julia\stdlib\v1.4\SuiteSparse\src\cholmod.jl:772
 [4] \ at C:\Users\julia\AppData\Local\Julia-1.4.2\share\julia\stdlib\v1.4\SuiteSparse\src\cholmod.jl:1720 [inlined]
 [5] \ at C:\Users\julia\AppData\Local\Julia-1.4.2\share\julia\stdlib\v1.4\SuiteSparse\src\cholmod.jl:1721 [inlined]
 [6] assembleλmatrix(::Array{Int64,1}, ::Array{Int64,1}, ::Array{Int64,2}, ::Array{Int64,1}, ::Array{SuiteSparse.CHOLMOD.Factor{Float64},1}, ::Array{Float64,1}, ::SparseMatrixCSC{Float64,Int64}) at C:\Users\cheny\OneDrive\Documents\version-control\SBP_Seismology\decomposite\global_curved.jl:850
 [7] top-level scope at C:\Users\cheny\OneDrive\Documents\version-control\SBP_Seismology\decomposite\test_multithreading.jl:257
 [8] include(::String) at .\client.jl:439
 [9] top-level scope at none:0
in expression starting at C:\Users\cheny\OneDrive\Documents\version-control\SBP_Seismology\decomposite\test_multithreading.jl:6
```

Hybrid method on the mesh with 8 by 8 blocks

```
lvl = 4
N: 1024
Time elapsed (assembleλmatrix) for lvl 4 = 181.62999987602234
(lvl, ϵ[lvl]) = (4, 3.7967141132278187e-6)
Time elapsed for the whole code is approximately 200.8140001296997
Time elapsed (reading matrices) for lvl 4 = 9.999275207519531e-5
Time elapsed (linear solve with reading matrices) for lvl 4 = 0.24119999408721923
Time elapsed (All three parts) for lvl 4 = 0.24890000820159913

lvl = 5
N: 2048
ERROR: LoadError: OutOfMemoryError()
Stacktrace:
 [1] Array at .\boot.jl:405 [inlined]
 [2] _allocres at C:\Users\julia\AppData\Local\Julia-1.4.2\share\julia\stdlib\v1.4\SparseArrays\src\higherorderfns.jl:234 [inlined]
 [3] _noshapecheck_map(::typeof(-), ::SparseMatrixCSC{Float64,Int64}, ::SparseMatrixCSC{Float64,Int64}) at C:\Users\julia\AppData\Local\Julia-1.4.2\share\julia\stdlib\v1.4\SparseArrays\src\higherorderfns.jl:164
 [4] _shapecheckbc at C:\Users\julia\AppData\Local\Julia-1.4.2\share\julia\stdlib\v1.4\SparseArrays\src\higherorderfns.jl:1025 [inlined]
 [5] _copy at C:\Users\julia\AppData\Local\Julia-1.4.2\share\julia\stdlib\v1.4\SparseArrays\src\higherorderfns.jl:1015 [inlined]
 [6] copy at C:\Users\julia\AppData\Local\Julia-1.4.2\share\julia\stdlib\v1.4\SparseArrays\src\higherorderfns.jl:1131 [inlined]
 [7] materialize at .\broadcast.jl:820 [inlined]
 [8] broadcast_preserving_zero_d at .\broadcast.jl:809 [inlined]
 [9] -(::SparseMatrixCSC{Float64,Int64}, ::Adjoint{Float64,SparseMatrixCSC{Float64,Int64}}) at .\arraymath.jl:39
 [10] isapprox(::SparseMatrixCSC{Float64,Int64}, ::Adjoint{Float64,SparseMatrixCSC{Float64,Int64}}; atol::Int64, rtol::Float64, nans::Bool, norm::typeof(norm)) at C:\Users\julia\AppData\Local\Julia-1.4.2\share\julia\stdlib\v1.4\LinearAlgebra\src\generic.jl:1588      
 [11] isapprox at C:\Users\julia\AppData\Local\Julia-1.4.2\share\julia\stdlib\v1.4\LinearAlgebra\src\generic.jl:1588 [inlined]
 [12] assembleλmatrix(::Array{Int64,1}, ::Array{Int64,1}, ::Array{Int64,2}, ::Array{Int64,1}, ::Array{SuiteSparse.CHOLMOD.Factor{Float64},1}, ::Array{Float64,1}, ::SparseMatrixCSC{Float64,Int64}) at C:\Users\cheny\OneDrive\Documents\version-control\SBP_Seismology\decomposite\global_curved.jl:871
 [13] top-level scope at C:\Users\cheny\OneDrive\Documents\version-control\SBP_Seismology\decomposite\test_multithreading.jl:262
 [14] include(::String) at .\client.jl:439
 [15] top-level scope at none:0
in expression starting at C:\Users\cheny\OneDrive\Documents\version-control\SBP_Seismology\decomposite\test_multithreading.jl:6
```

In [1]:
function SBPLocalOperator1_factorization(lop,Nr,Ns,factorization)    # This is the new function that only contains operators solving part
    nelems = length(lop)
    FTYPE = typeof(factorization(sparse([1],[1],[1.0])))
    factors = Array{FTYPE, 1}(undef, nelems)
    for e in 1:nelems
        factors[e] = factorization(lop[e].M̃)
    end
    (factors)
end

SBPLocalOperator1_factorization (generic function with 1 method)

## Incompatibilty with LU factorizations from line 843 in assembleλmatrix
```
 for e = 1:nelems
    # println((e, nelems))
    vrng = vstarts[e]:(vstarts[e+1]-1)
    for lf = 1:4
      f = EToF[lf,e]
      if FToB[f] == BC_LOCKED_INTERFACE || FToB[f] >= BC_JUMP_INTERFACE
        λrng = FToλstarts[f]:(FToλstarts[f+1]-1)
        # B = -(Matrix(F[e]' \ Fbar[vrng, λrng])) # This is where backslash happens
        B = -(F[e]' \ Fbar[vrng,λrng])
 ```

In [2]:
lu_fac = (x) -> lu(x)

#5 (generic function with 1 method)

In [3]:
chol_fac = (x) -> cholesky(Symmetric(x))

#7 (generic function with 1 method)

In [5]:
using SparseArrays
using LinearAlgebra
A = rand(3,3)
A_sparse = sparse(A)

3×3 SparseMatrixCSC{Float64,Int64} with 9 stored entries:
  [1, 1]  =  0.542541
  [2, 1]  =  0.814941
  [3, 1]  =  0.821039
  [1, 2]  =  0.468169
  [2, 2]  =  0.898763
  [3, 2]  =  0.169251
  [1, 3]  =  0.978025
  [2, 3]  =  0.408296
  [3, 3]  =  0.506394

In [6]:
lu_fac(A)

LU{Float64,Array{Float64,2}}
L factor:
3×3 Array{Float64,2}:
 1.0       0.0       0.0
 0.992573  1.0       0.0
 0.660798  0.487607  1.0
U factor:
3×3 Array{Float64,2}:
 0.821039  0.169251   0.506394
 0.0       0.730769  -0.0943378
 0.0       0.0        0.6894

In [7]:
chol_fac(A)

PosDefException: PosDefException: matrix is not positive definite; Cholesky factorization failed.

In [8]:
chol_fac(A+2I)

Cholesky{Float64,Array{Float64,2}}
U factor:
3×3 UpperTriangular{Float64,Array{Float64,2}}:
 1.59453  0.293609  0.613361
  ⋅       1.67707   0.136075
  ⋅        ⋅        1.45316

In [28]:
lu_fac(A_sparse)

SuiteSparse.UMFPACK.UmfpackLU{Float64,Int64}
L factor:
3×3 SparseMatrixCSC{Float64,Int64} with 6 stored entries:
  [1, 1]  =  1.0
  [2, 1]  =  1.40775
  [3, 1]  =  2.01084
  [2, 2]  =  1.0
  [3, 2]  =  -3.90996
  [3, 3]  =  1.0
U factor:
3×3 SparseMatrixCSC{Float64,Int64} with 6 stored entries:
  [1, 1]  =  0.272807
  [1, 2]  =  0.23541
  [2, 2]  =  0.0921466
  [1, 3]  =  0.491782
  [2, 3]  =  -0.499895
  [3, 3]  =  -2.60512

In [30]:
chol_fac(A_sparse + 2I)

SuiteSparse.CHOLMOD.Factor{Float64}
type:    LLt
method:  simplicial
maxnnz:  6
nnz:     6
success: true


In [11]:
FTYPE_chol = typeof((chol_fac(sparse([1],[1],[1.0]))))

SuiteSparse.CHOLMOD.Factor{Float64}

In [12]:
FTYPE_lu = typeof((lu_fac(sparse([1],[1],[1.0]))))

SuiteSparse.UMFPACK.UmfpackLU{Float64,Int64}

In [14]:
factors_chol = Array{FTYPE_chol,1}(undef,2)

2-element Array{SuiteSparse.CHOLMOD.Factor{Float64},1}:
 #undef
 #undef

In [15]:
factors_lu = Array{FTYPE_lu,1}(undef,2)

2-element Array{SuiteSparse.UMFPACK.UmfpackLU{Float64,Int64},1}:
 #undef
 #undef

In [21]:
factors_chol[1] = cholesky(Symmetric(A_sparse+2I))

SuiteSparse.CHOLMOD.Factor{Float64}
type:    LLt
method:  simplicial
maxnnz:  6
nnz:     6
success: true


In [22]:
factors_chol[1] = chol_fac(A_sparse+2I)

SuiteSparse.CHOLMOD.Factor{Float64}
type:    LLt
method:  simplicial
maxnnz:  6
nnz:     6
success: true


In [23]:
factors_lu[1] = lu(A_sparse)

SuiteSparse.UMFPACK.UmfpackLU{Float64,Int64}
L factor:
3×3 SparseMatrixCSC{Float64,Int64} with 6 stored entries:
  [1, 1]  =  1.0
  [2, 1]  =  1.40775
  [3, 1]  =  2.01084
  [2, 2]  =  1.0
  [3, 2]  =  -3.90996
  [3, 3]  =  1.0
U factor:
3×3 SparseMatrixCSC{Float64,Int64} with 6 stored entries:
  [1, 1]  =  0.272807
  [1, 2]  =  0.23541
  [2, 2]  =  0.0921466
  [1, 3]  =  0.491782
  [2, 3]  =  -0.499895
  [3, 3]  =  -2.60512

factors_lu[2] = lu_fac(A_sparse)

In [31]:
b = sparse(rand(3,2))  # Here we define b to be a sparse matrix instead of vector

3×2 SparseMatrixCSC{Float64,Int64} with 6 stored entries:
  [1, 1]  =  0.310122
  [2, 1]  =  0.115528
  [3, 1]  =  0.708056
  [1, 2]  =  0.0899221
  [2, 2]  =  0.735535
  [3, 2]  =  0.399863

In [26]:
factors_chol[1] \ b

3×2 SparseMatrixCSC{Float64,Int64} with 6 stored entries:
  [1, 1]  =  0.0934948
  [2, 1]  =  0.248379
  [3, 1]  =  0.191347
  [1, 2]  =  0.00763141
  [2, 2]  =  0.195049
  [3, 2]  =  0.313064

In [27]:
factors_lu[1] \ b

MethodError: MethodError: no method matching ldiv!(::SuiteSparse.UMFPACK.UmfpackLU{Float64,Int64}, ::SparseMatrixCSC{Float64,Int64})
Closest candidates are:
  ldiv!(!Matched::Number, ::AbstractArray) at C:\Users\julia\AppData\Local\Julia-1.4.2\share\julia\stdlib\v1.4\LinearAlgebra\src\generic.jl:252
  ldiv!(!Matched::Diagonal{T,V} where V<:AbstractArray{T,1}, ::SparseArrays.AbstractSparseMatrixCSC{T,Ti} where Ti<:Integer) where T at C:\Users\julia\AppData\Local\Julia-1.4.2\share\julia\stdlib\v1.4\SparseArrays\src\linalg.jl:836
  ldiv!(!Matched::Diagonal{T,V} where V<:AbstractArray{T,1}, ::AbstractArray{T,2}) where T at C:\Users\julia\AppData\Local\Julia-1.4.2\share\julia\stdlib\v1.4\LinearAlgebra\src\diagonal.jl:415
  ...

# Tables for Brittany

Table 1 is not completed yet, I still need to do some rework on Jeremy's code <br>
Overleaf link https://www.overleaf.com/4588648356xcctvxzmtdnp