# MCMC2.5: Dense and Sparse Matrices

This section is not necessarily for Marcov chain Monte Carlo itself, but the use of Sparse matrices is important to speed up quantum Monte Carlo simulations.

## Dense matrix

To use dense matrices, `LinearAlgebra` provides wrapper to BLAS/LAPACK.

In [1]:
using LinearAlgebra

I recommend Intel MKL instead of OpenBLAS. You can check whether MKL is used by the following command:

In [2]:
BLAS.vendor()

:mkl

1D Array is called `Vector`.

In [3]:
Array{Int64, 1} == Vector{Int64}

true

2D Array is called `Matrix`, which I later call dense matrix.

In [4]:
Array{Float64, 2} == Matrix{Float64}

true

`Matrix` and `Array` of `Array` are different.

In [5]:
matrix = [1 2
                 3 4]
array =[[1, 2], [3, 4]]
matrix == array

false

Julia's support on 3-rank tensors is limited, but still we can define and use them.

In [6]:
tensor = zeros(Float64, 2, 2, 2)

2×2×2 Array{Float64,3}:
[:, :, 1] =
 0.0  0.0
 0.0  0.0

[:, :, 2] =
 0.0  0.0
 0.0  0.0

Be careful, matrices are stored columnwise.

In [7]:
mat = ones(10000, 10000)
mat[:, 1]
mat[1, :]
@time mat[:, 1];
@time mat[1, :];

  0.000025 seconds (7 allocations: 78.375 KiB)
  0.000274 seconds (7 allocations: 78.375 KiB)


Thus, vertical vectors are more important.

In [8]:
[1.0 2.0; 3.0 4.0] * [5.0; 6.0]

2-element Array{Float64,1}:
 17.0
 39.0

## BLAS/LAPACK

Dense matrices are not memory-efficient, but support BLAS/LAPACK. The most important BLAS operation in quantum Monte Carlo is rank-1 update (or other finite-rank updates).

In [9]:
A = rand(Float64, 1000, 1000)
Ainv = inv(A)

1000×1000 Array{Float64,2}:
 -0.216807    -0.182728   -0.313444   -0.00442697  …   0.246234     0.166771 
 -0.106858    -0.0511798  -0.0729319  -0.0827161      -0.0443618    0.259181 
  0.161415     0.347792    0.202863   -0.163116       -0.318149    -0.209124 
 -0.0144296   -0.201034   -0.222693   -0.0436117      -0.0129424   -0.021837 
  0.0357635   -0.0196757   0.0162717  -0.0539516       0.0260143   -0.0321266
 -0.0709083   -0.0539753  -0.0971216   0.0266725   …   0.0084783    0.0754321
  0.0412158   -0.0231214  -0.112432   -0.00943499     -0.0214947    0.0595807
  0.00912263   0.166601    0.277468   -0.0251675      -0.127449     0.073971 
 -0.0925192   -0.123677   -0.261929   -0.059861        0.0381073    0.199565 
  0.139945     0.12504     0.178481    0.143263        0.00231803  -0.226917 
 -0.470312    -0.500863   -0.642289   -0.0394096   …   0.411755     0.320678 
  0.0989903    0.0706353   0.222132    0.16538         0.0440387   -0.310612 
 -0.128899    -0.0213592  -0.171049 

Let's check the Sherman-Morrison formula!
$$\left( A+\vec{u} \vec{v}^T \right)^{-1} = A^{-1} - \frac{A^{-1} \vec{u} \vec{v}^TA^{-1}}{1 + \vec{v}^T A^{-1} \vec{u}}$$
Of course, the vectors are vertical. I define $B = A+\vec{u} \vec{v}^T$

In [10]:
u = rand(Float64, 1000)
v = rand(Float64, 1000)
B = copy(A)
BLAS.ger!(1.0, u, v, B)

1000×1000 Array{Float64,2}:
 0.493476   0.287165  0.885528  0.281831  …  1.02257    1.00036   0.188449
 0.53524    0.781394  0.236354  0.705672     0.276558   0.71931   0.722191
 1.05341    0.15486   0.423554  0.836416     0.743585   0.290037  0.247521
 0.954052   1.13119   1.51305   0.402359     1.07643    0.900526  0.567791
 0.803435   0.314087  0.986208  0.941365     0.79413    0.622662  0.930937
 0.422485   0.227933  0.75559   0.396224  …  0.800609   0.815683  1.05606 
 1.03509    0.287621  1.12183   0.705295     0.714422   0.644318  0.216585
 1.32684    0.735915  1.52469   0.454684     1.1374     1.11563   0.741174
 0.373527   0.428652  0.938984  0.750711     0.367198   0.969151  0.130871
 0.389354   0.660456  0.256996  0.728986     0.0664954  0.487686  0.53124 
 0.748909   0.835971  0.528378  0.357133  …  1.26022    0.237799  0.594378
 0.439319   0.602939  0.701492  0.871948     1.40638    0.685876  0.554511
 0.578877   1.26819   0.348994  0.658416     0.570996   0.647767  0.7443

You may need to copy the matrix first because most BLAS operations are destructive. The rank-1 update `BLAS.ger!` is a BLAS-2 function, so it is faster than the BLAS-3 function, matrix inversion. That's why we use the Sherman-Morrison formula to calculate $B^{-1}$ when we already know $A^{-1}$.

In [11]:
Binv = copy(Ainv)
BLAS.ger!(-1.0 / (1.0 + v' * Ainv * u), Ainv * u, (v' * Ainv)', Binv)

1000×1000 Array{Float64,2}:
 -0.210722    -0.165846    …  -0.344856    0.249913     0.170303 
 -0.108237    -0.0550062      -0.134173   -0.0451957    0.258381 
  0.128561     0.256637        0.149242   -0.338014    -0.228196 
  0.0532276   -0.0133194       0.214019    0.0279662    0.0174369
  0.0342052   -0.023999       -0.0177607   0.0250721   -0.0330312
 -0.0227284    0.0796995   …   0.284146    0.03761      0.1034   
  0.0398927   -0.0267921      -0.0550646  -0.0222947    0.0588127
 -0.0474696    0.00958601     -0.114276   -0.161667     0.0411201
 -0.0523626   -0.0122633      -0.0177558   0.0623877    0.222875 
  0.132332     0.103917        0.143918   -0.00228513  -0.231336 
 -0.389001    -0.275268    …  -0.322295    0.460919     0.367877 
  0.0987765    0.0700422       0.233175    0.0439095   -0.310736 
 -0.100418     0.0576604       0.11119    -0.040864     0.101177 
  ⋮                        ⋱                                     
 -0.0256326   -0.270892       -0.449368    0.076

Let's check the results.

In [12]:
norm(B * Binv - I)

1.7034651752394562e-10

Iterating rank-1 updates accumulates some error, so sometimes you have to refresh the updated matrix "from scratch." (Be careful because sometimes it is not really from scratch.)

As for LAPACK, most Julia functions on linear algebra are just wrappers of LAPACK.

In [13]:
eigvals(A)

1000-element Array{Complex{Float64},1}:
   499.64451884764486 + 0.0im                
    7.326353262732669 + 5.467112951676198im  
    7.326353262732669 - 5.467112951676198im  
    4.511755204557462 + 8.083066699013186im  
    4.511755204557462 - 8.083066699013186im  
    9.028247612822707 + 1.1919623586751555im 
    9.028247612822707 - 1.1919623586751555im 
    7.821192282735549 + 4.4633601285490085im 
    7.821192282735549 - 4.4633601285490085im 
    6.479101407733715 + 6.1454582568154486im 
    6.479101407733715 - 6.1454582568154486im 
    6.669012289428684 + 5.778934893595768im  
    6.669012289428684 - 5.778934893595768im  
                      ⋮                      
 -0.21476999839163097 - 0.8102786144277376im 
   0.9245959366880042 + 0.0im                
   0.9198014818236955 + 0.3774487000128153im 
   0.9198014818236955 - 0.3774487000128153im 
  0.09867037372449497 + 0.6285240450698899im 
  0.09867037372449497 - 0.6285240450698899im 
   0.6358888592001618 + 0.0im           

`eigvals` is just a wrapper for `LAPACK.geevx!`, so you can directely call `LAPACK.geevx!` instead if you wish.

## Sparse matrix

If your program is intensively using sparse matrices, you should use python instead because Julia only supports CSC matrix. Julia's native support for sparse matrices is not strong, so I do not recommend to write a code using multiple types of sparse matrices in Julia.

In [14]:
using SparseArrays

Let's solve a tight-binding model on the 2D square lattice in a poor man's way, i.e. in the real space.

In [15]:
const L = 30
iter1D = 1 : L
nnbondx = zip(Iterators.product(iter1D, iter1D), Iterators.product((mod1(i + 1, L) for i in iter1D), iter1D))
nnbondy = zip(Iterators.product(iter1D, iter1D), Iterators.product(iter1D, (mod1(i + 1, L) for i in iter1D)))
collect(nnbondx), collect(nnbondy)

(Tuple{Tuple{Int64,Int64},Tuple{Int64,Int64}}[((1, 1), (2, 1)) ((1, 2), (2, 2)) … ((1, 29), (2, 29)) ((1, 30), (2, 30)); ((2, 1), (3, 1)) ((2, 2), (3, 2)) … ((2, 29), (3, 29)) ((2, 30), (3, 30)); … ; ((29, 1), (30, 1)) ((29, 2), (30, 2)) … ((29, 29), (30, 29)) ((29, 30), (30, 30)); ((30, 1), (1, 1)) ((30, 2), (1, 2)) … ((30, 29), (1, 29)) ((30, 30), (1, 30))], Tuple{Tuple{Int64,Int64},Tuple{Int64,Int64}}[((1, 1), (1, 2)) ((1, 2), (1, 3)) … ((1, 29), (1, 30)) ((1, 30), (1, 1)); ((2, 1), (2, 2)) ((2, 2), (2, 3)) … ((2, 29), (2, 30)) ((2, 30), (2, 1)); … ; ((29, 1), (29, 2)) ((29, 2), (29, 3)) … ((29, 29), (29, 30)) ((29, 30), (29, 1)); ((30, 1), (30, 2)) ((30, 2), (30, 3)) … ((30, 29), (30, 30)) ((30, 30), (30, 1))])

These iterators will generate the 2D square lattice.

In [16]:
xytoz(nn::Tuple{Tuple{Int64, Int64}, Tuple{Int64, Int64}}) = (nn[1][2] - 1) * L + nn[1][1], (nn[2][2] - 1) * L + nn[2][1]
nnx = Base.Generator(xytoz, nnbondx)
nny = Base.Generator(xytoz, nnbondy)

Base.Generator{Base.Iterators.Zip2{Base.Iterators.ProductIterator{Tuple{UnitRange{Int64},UnitRange{Int64}}},Base.Iterators.ProductIterator{Tuple{UnitRange{Int64},Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##5#6"))}}}},typeof(xytoz)}(xytoz, Base.Iterators.Zip2{Base.Iterators.ProductIterator{Tuple{UnitRange{Int64},UnitRange{Int64}}},Base.Iterators.ProductIterator{Tuple{UnitRange{Int64},Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##5#6"))}}}}(Base.Iterators.ProductIterator{Tuple{UnitRange{Int64},UnitRange{Int64}}}((1:30, 1:30)), Base.Iterators.ProductIterator{Tuple{UnitRange{Int64},Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##5#6"))}}}((1:30, Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##5#6"))}(getfield(Main, Symbol("##5#6"))(), 1:30)))))

`Base.Generator(f, iter)` is same as `(f(x) for x in iter)`, or you can regard it as a lazy version of `map`, as you saw before.

In [17]:
N = L ^ 2
H = spzeros(Float64, N, N)
for (i, j) in Iterators.flatten((nnx, nny))
    H[i, j] = -1.0
    H[j, i] = -1.0
end
H

900×900 SparseMatrixCSC{Float64,Int64} with 3600 stored entries:
  [2  ,   1]  =  -1.0
  [30 ,   1]  =  -1.0
  [31 ,   1]  =  -1.0
  [871,   1]  =  -1.0
  [1  ,   2]  =  -1.0
  [3  ,   2]  =  -1.0
  [32 ,   2]  =  -1.0
  [872,   2]  =  -1.0
  [2  ,   3]  =  -1.0
  [4  ,   3]  =  -1.0
  [33 ,   3]  =  -1.0
  [873,   3]  =  -1.0
  ⋮
  [28 , 898]  =  -1.0
  [868, 898]  =  -1.0
  [897, 898]  =  -1.0
  [899, 898]  =  -1.0
  [29 , 899]  =  -1.0
  [869, 899]  =  -1.0
  [898, 899]  =  -1.0
  [900, 899]  =  -1.0
  [30 , 900]  =  -1.0
  [870, 900]  =  -1.0
  [871, 900]  =  -1.0
  [899, 900]  =  -1.0

Note that you can rewrite this code by `zip`, but `zip` is not efficient here.

Most of the operations for sparse matrices are similar to the ones for dense matrices. However, sparse arrays are more memory-efficient when the components of the matrix is almost zero. Especially, if the matrix is sparse enough, it significantly reduces the matrix muliplication cost from $O(N^3)$ to $O(N)$.

In [18]:
Hdense = Array(H)
H * H
Hdense * Hdense
@time H * H;
@time Hdense * Hdense;

  0.000157 seconds (18 allocations: 268.500 KiB)
  0.028663 seconds (6 allocations: 6.180 MiB)


`eigvals` does not support sparse matrices, so the calculation of the whole eigenvalues still costs $O(N^3)$. I will discuss this problem later in MCMC5.0.

In [19]:
eigvals(Hdense)

900-element Array{Float64,1}:
 -4.000000000000004 
 -3.9562952014676074
 -3.956295201467607 
 -3.9562952014676056
 -3.9562952014676043
 -3.9125904029352245
 -3.912590402935224 
 -3.9125904029352214
 -3.912590402935216 
 -3.827090915285207 
 -3.827090915285198 
 -3.8270909152851953
 -3.827090915285193 
  ⋮                 
  3.8270909152851997
  3.8270909152852073
  3.82709091528521  
  3.9125904029352205
  3.9125904029352228
  3.912590402935223 
  3.912590402935234 
  3.9562952014676016
  3.956295201467605 
  3.956295201467612 
  3.9562952014676163
  3.999999999999999 

For the square (or cubic, etc.) lattice, you can directly begin from a dense matrix. Here's a smart implementation.

In [20]:
H4d = zeros(Float64, L, L, L, L)
for ((i, j), (k, l)) in Iterators.flatten((nnbondx, nnbondy))
    H4d[i, j, k, l] = -1.0
    H4d[k, l, i, j] = -1.0
end
H2d = reshape(H4d, N, N)
eigvals(H2d)

900-element Array{Float64,1}:
 -4.000000000000004 
 -3.9562952014676074
 -3.956295201467607 
 -3.9562952014676056
 -3.9562952014676043
 -3.9125904029352245
 -3.912590402935224 
 -3.9125904029352214
 -3.912590402935216 
 -3.827090915285207 
 -3.827090915285198 
 -3.8270909152851953
 -3.827090915285193 
  ⋮                 
  3.8270909152851997
  3.8270909152852073
  3.82709091528521  
  3.9125904029352205
  3.9125904029352228
  3.912590402935223 
  3.912590402935234 
  3.9562952014676016
  3.956295201467605 
  3.956295201467612 
  3.9562952014676163
  3.999999999999999 

## Block checkerboard decomposition/approximation

It is sometimes very useful to approximate a dense matrix by a product of sparse matrices. In the physical models like tight-binding models, block checkerboard docomposition will be a good approximation.

~ under construction ~

## Iterative solvers

Iterative solvers, especially conjugate gradient methods are important for hybrid Monte Carlo simulations for lattice gauge theories.

### Preconditioners

FYI a careful choice of a preconditiner is necessary for ill-conditioned matrices, i.e. matrices with a large condition number. I personally recommend the incomplete Cholesky preconditioner in Preconditioners.jl.

In [1]:
using Preconditioners