# Table of Contents
 <p><div class="lev1 toc-item"><a href="#A-list-of-&quot;easy&quot;-linear-systems" data-toc-modified-id="A-list-of-&quot;easy&quot;-linear-systems-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>A list of "easy" linear systems</a></div><div class="lev2 toc-item"><a href="#Diagonal-matrix" data-toc-modified-id="Diagonal-matrix-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Diagonal matrix</a></div><div class="lev2 toc-item"><a href="#Bidiagonal,-tridiagonal,-and-banded-matrices" data-toc-modified-id="Bidiagonal,-tridiagonal,-and-banded-matrices-12"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Bidiagonal, tridiagonal, and banded matrices</a></div><div class="lev2 toc-item"><a href="#Triangular-matrix" data-toc-modified-id="Triangular-matrix-13"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Triangular matrix</a></div><div class="lev2 toc-item"><a href="#Block-diagonal-matrix" data-toc-modified-id="Block-diagonal-matrix-14"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Block diagonal matrix</a></div><div class="lev2 toc-item"><a href="#Kronecker-product" data-toc-modified-id="Kronecker-product-15"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Kronecker product</a></div><div class="lev2 toc-item"><a href="#Sparse-matrix" data-toc-modified-id="Sparse-matrix-16"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Sparse matrix</a></div><div class="lev2 toc-item"><a href="#Easy-plus-low-rank" data-toc-modified-id="Easy-plus-low-rank-17"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Easy plus low rank</a></div><div class="lev2 toc-item"><a href="#Easy-plus-border" data-toc-modified-id="Easy-plus-border-18"><span class="toc-item-num">1.8&nbsp;&nbsp;</span>Easy plus border</a></div><div class="lev2 toc-item"><a href="#Orthogonal-matrix" data-toc-modified-id="Orthogonal-matrix-19"><span class="toc-item-num">1.9&nbsp;&nbsp;</span>Orthogonal matrix</a></div><div class="lev2 toc-item"><a href="#Toeplitz-matrix" data-toc-modified-id="Toeplitz-matrix-110"><span class="toc-item-num">1.10&nbsp;&nbsp;</span>Toeplitz matrix</a></div><div class="lev2 toc-item"><a href="#Circulant-matrix" data-toc-modified-id="Circulant-matrix-111"><span class="toc-item-num">1.11&nbsp;&nbsp;</span>Circulant matrix</a></div><div class="lev2 toc-item"><a href="#Vandermonde-matrix" data-toc-modified-id="Vandermonde-matrix-112"><span class="toc-item-num">1.12&nbsp;&nbsp;</span>Vandermonde matrix</a></div><div class="lev2 toc-item"><a href="#Cauchy-like-matrix" data-toc-modified-id="Cauchy-like-matrix-113"><span class="toc-item-num">1.13&nbsp;&nbsp;</span>Cauchy-like matrix</a></div><div class="lev2 toc-item"><a href="#Structured-rank-matrix" data-toc-modified-id="Structured-rank-matrix-114"><span class="toc-item-num">1.14&nbsp;&nbsp;</span>Structured-rank matrix</a></div>

In [1]:
versioninfo()

Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code


# A list of "easy" linear systems

Consider $\mathbf{A} \mathbf{x} = \mathbf{b}$, $\mathbf{A} \in \mathbb{R}^{n \times n}$. Or, consider matrix inverse (if you want). $\mathbf{A}$ can be huge. Keep massive data in mind: 1000 Genome Project, NetFlix, Google PageRank, finance, spatial statistics, ... We should be alert to many easy linear systems. 

Don't blindly use `A \ b` and `inv` in Julia or `solve` function in R. **Don't waste computing resources by bad choices of algorithms!**

## Diagonal matrix

Diagonal $\mathbf{A}$: $n$ flops. Use `Diagonal` type of Julia.

In [2]:
using BenchmarkTools, LinearAlgebra, Random

# generate random data
Random.seed!(280)
n = 1000
A = diagm(0 => randn(n)) # a diagonal matrix stored as Matrix{Float64}
b = randn(n);

In [3]:
# should give link: https://github.com/JuliaLang/julia/blob/5b637df34396034b0dd353e603ab3d61322369fb/stdlib/LinearAlgebra/src/generic.jl#L956
@which A \ b

In [4]:
# check `istril(A)` and `istriu(A)` (O(n^2)), then call `Diagonal(A) \ b` (O(n))
@benchmark $A \ $b

BenchmarkTools.Trial: 
  memory estimate:  15.89 KiB
  allocs estimate:  3
  --------------
  minimum time:     746.903 μs (0.00% GC)
  median time:      769.459 μs (0.00% GC)
  mean time:        798.657 μs (0.19% GC)
  maximum time:     9.535 ms (90.32% GC)
  --------------
  samples:          6197
  evals/sample:     1

In [5]:
# O(n) computation, no extra array allocation
@benchmark Diagonal($A) \ $b

BenchmarkTools.Trial: 
  memory estimate:  15.89 KiB
  allocs estimate:  3
  --------------
  minimum time:     3.056 μs (0.00% GC)
  median time:      3.809 μs (0.00% GC)
  mean time:        5.851 μs (30.47% GC)
  maximum time:     5.294 ms (99.88% GC)
  --------------
  samples:          10000
  evals/sample:     9

## Bidiagonal, tridiagonal, and banded matrices

Bidiagonal, tridiagonal, or banded $\mathbf{A}$: Band LU, band Cholesky, ... roughly $O(n)$ flops.   
* Use [`Bidiagonal`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.Bidiagonal), [`Tridiagonal`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.Tridiagonal), [`SymTridiagonal`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.SymTridiagonal) types of Julia.

In [6]:
Random.seed!(280) 

n  = 1000
dv = randn(n)
ev = randn(n - 1)
b  = randn(n) # rhs
# symmetric tridiagonal matrix
A  = SymTridiagonal(dv, ev)

1000×1000 SymTridiagonal{Float64,Array{Float64,1}}:
 0.126238   0.618244    ⋅        …    ⋅          ⋅          ⋅      
 0.618244  -2.34688    1.10206        ⋅          ⋅          ⋅      
  ⋅         1.10206    1.91661        ⋅          ⋅          ⋅      
  ⋅          ⋅        -0.447244       ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅        …    ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅        …    ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
  ⋅          ⋅          ⋅             ⋅          ⋅          ⋅      
 ⋮                               ⋱                              

In [7]:
# convert to a full matrix
Afull = Matrix(A)

# LU decomposition (2/3) n^3 flops!
@benchmark $Afull \ $b

BenchmarkTools.Trial: 
  memory estimate:  7.65 MiB
  allocs estimate:  5
  --------------
  minimum time:     9.734 ms (0.00% GC)
  median time:      10.579 ms (0.00% GC)
  mean time:        10.938 ms (6.19% GC)
  maximum time:     47.785 ms (80.36% GC)
  --------------
  samples:          457
  evals/sample:     1

In [8]:
# specialized algorithm for tridiagonal matrix, O(n) flops
@benchmark $A \ $b

BenchmarkTools.Trial: 
  memory estimate:  23.86 KiB
  allocs estimate:  5
  --------------
  minimum time:     14.103 μs (0.00% GC)
  median time:      16.662 μs (0.00% GC)
  mean time:        24.038 μs (29.14% GC)
  maximum time:     48.492 ms (99.95% GC)
  --------------
  samples:          10000
  evals/sample:     1

## Triangular matrix

Triangular $\mathbf{A}$: $n^2$ flops to solve linear system.

In [9]:
Random.seed!(280)

n = 1000
A = tril(randn(n, n)) # a lower-triangular matrix stored as Matrix{Float64}
b = randn(n)

# check istril() then triangular solve
@benchmark $A \ $b

BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     747.004 μs (0.00% GC)
  median time:      874.891 μs (0.00% GC)
  mean time:        910.269 μs (0.00% GC)
  maximum time:     2.001 ms (0.00% GC)
  --------------
  samples:          5411
  evals/sample:     1

In [10]:
# triangular solve directly; save the cost of istril()
@benchmark LowerTriangular($A) \ $b

BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     294.950 μs (0.00% GC)
  median time:      325.817 μs (0.00% GC)
  mean time:        334.716 μs (0.00% GC)
  maximum time:     818.166 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

## Block diagonal matrix

Block diagonal: Suppose $n = \sum_b n_b$. For linear equations, $(\sum_b n_b)^3$ (without using block diagonal structure) vs $\sum_b n_b^3$ (using block diagonal structure).  

Julia has a [`blockdiag`](https://docs.julialang.org/en/v1/stdlib/SparseArrays/#SparseArrays.blockdiag) function that generates a **sparse** matrix. **Anyone interested writing a `BlockDiagonal.jl` package?**

In [11]:
using SparseArrays

Random.seed!(280)

B  = 10 # number of blocks
ni = 100
A  = blockdiag([sprandn(ni, ni, 0.01) for b in 1:B]...)

1000×1000 SparseMatrixCSC{Float64,Int64} with 969 stored entries:
  [31  ,    1]  =  2.07834
  [53  ,    1]  =  -1.11883
  [58  ,    1]  =  -0.66448
  [14  ,    4]  =  1.11793
  [96  ,    5]  =  1.22813
  [81  ,    8]  =  -0.919643
  [48  ,    9]  =  1.0185
  [49  ,    9]  =  -0.996332
  [15  ,   10]  =  1.30841
  [28  ,   10]  =  -0.818757
  [39  ,   11]  =  1.08248
  [82  ,   11]  =  -0.0102294
  ⋮
  [935 ,  987]  =  0.677319
  [956 ,  987]  =  -0.900804
  [967 ,  987]  =  -0.438788
  [971 ,  991]  =  0.176756
  [929 ,  992]  =  -1.17384
  [974 ,  993]  =  1.59235
  [967 ,  994]  =  0.542169
  [994 ,  995]  =  0.627832
  [998 ,  997]  =  0.60382
  [935 ,  998]  =  0.342675
  [947 ,  998]  =  0.482228
  [975 , 1000]  =  0.991598

In [12]:
using UnicodePlots
spy(A)

[1m                     Sparsity Pattern[22m
[90m        ┌──────────────────────────────────────────┐[39m    
      [90m1[39m[90m │[39m[35m⣮[39m[35m⡿[39m[35m⣳[39m[35m⣽[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[90m│[39m [31m> 0[39m
       [90m │[39m[35m⢽[39m[35m⣗[39m[35m⡻[39m[35m⢓[39m[35m⡃[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[90m│[39m [34m< 0[39m
       [90m │[39m[31m⠁[39m[0m⠀[31m⠉[39m[31m⠈[39m[35m⢦[39m[35m⢻[39m[35m⣷[39m[35m⠧[39m[35m⡁[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[90m│[39m    
       [90m │[39m

## Kronecker product

Use
$$
\begin{eqnarray*}
    (\mathbf{A} \otimes \mathbf{B})^{-1} &=& \mathbf{A}^{-1} \otimes \mathbf{B}^{-1} \\
    (\mathbf{C}^T \otimes \mathbf{A}) \text{vec}(\mathbf{B}) &=& \text{vec}(\mathbf{A} \mathbf{B} \mathbf{C}) \\
    \text{det}(\mathbf{A} \otimes \mathbf{B}) &=& [\text{det}(\mathbf{A})]^p [\text{det}(\mathbf{B})]^m, \quad \mathbf{A} \in \mathbb{R}^{m \times m}, \mathbf{B} \in \mathbb{R}^{p \times p}
\end{eqnarray*}    
$$
to avoid forming and doing costly computation on the potentially huge Kronecker $\mathbf{A} \otimes \mathbf{B}$.

**Anyone interested writing a package?**

## Sparse matrix

Sparsity: sparse matrix decomposition or iterative method.  
* The easiest recognizable structure. Familiarize yourself with the sparse matrix computation tools in Julia, Matlab, R (`Matrix` package), MKL (sparse BLAS), ... as much as possible.

In [13]:
using MatrixDepot

Random.seed!(280)

# a 7701-by-7701 sparse pd matrix
A = matrixdepot("wathen", 50)
# random generated rhs
b = randn(size(A, 1))
Afull = Matrix(A)
count(!iszero, A) / length(A) # sparsity

include group.jl for user defined matrix generators
verify download of index files...
used remote site is https://sparse.tamu.edu/?per_page=All
populating internal database...


0.001994776158751544

In [14]:
using UnicodePlots
spy(A)

[1m                     Sparsity Pattern[22m
[90m        ┌──────────────────────────────────────────┐[39m    
      [90m1[39m[90m │[39m[35m⢿[39m[35m⣷[39m[35m⣄[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[90m│[39m [31m> 0[39m
       [90m │[39m[0m⠀[35m⠙[39m[35m⢿[39m[35m⣷[39m[35m⣄[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[90m│[39m [34m< 0[39m
       [90m │[39m[0m⠀[0m⠀[0m⠀[35m⠙[39m[35m⢿[39m[35m⣷[39m[35m⣄[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[90m│[39m    
       [90m │[39m[0m⠀[0m⠀[0m⠀[0m⠀[0m⠀[35m⠙[39m

In [15]:
# dense matrix-vector multiplication
@benchmark $Afull * $b

BenchmarkTools.Trial: 
  memory estimate:  60.27 KiB
  allocs estimate:  2
  --------------
  minimum time:     18.410 ms (0.00% GC)
  median time:      19.803 ms (0.00% GC)
  mean time:        19.930 ms (0.00% GC)
  maximum time:     26.712 ms (0.00% GC)
  --------------
  samples:          251
  evals/sample:     1

In [16]:
# sparse matrix-vector multiplication
@benchmark $A * $b

BenchmarkTools.Trial: 
  memory estimate:  60.27 KiB
  allocs estimate:  2
  --------------
  minimum time:     109.226 μs (0.00% GC)
  median time:      147.636 μs (0.00% GC)
  mean time:        163.660 μs (6.25% GC)
  maximum time:     4.575 ms (95.78% GC)
  --------------
  samples:          10000
  evals/sample:     1

In [17]:
# solve via dense Cholesky
xchol = cholesky(Afull) \ b
@benchmark cholesky($Afull) \ $b

BenchmarkTools.Trial: 
  memory estimate:  452.52 MiB
  allocs estimate:  8
  --------------
  minimum time:     1.711 s (1.55% GC)
  median time:      1.769 s (1.50% GC)
  mean time:        1.760 s (2.15% GC)
  maximum time:     1.798 s (4.19% GC)
  --------------
  samples:          3
  evals/sample:     1

In [18]:
# solve via sparse Cholesky
xcholsp = cholesky(A) \ b
@show norm(xchol - xcholsp)
@benchmark cholesky($A) \ $b

norm(xchol - xcholsp) = 3.7385578057004605e-15


BenchmarkTools.Trial: 
  memory estimate:  13.45 MiB
  allocs estimate:  55
  --------------
  minimum time:     14.926 ms (0.00% GC)
  median time:      17.012 ms (4.32% GC)
  mean time:        16.666 ms (2.68% GC)
  maximum time:     20.364 ms (3.64% GC)
  --------------
  samples:          300
  evals/sample:     1

In [19]:
# sparse solve via conjugate gradient
using IterativeSolvers

xcg = cg(A, b)
@show norm(xcg - xchol)
@benchmark cg($A, $b)

norm(xcg - xchol) = 2.1854385431265016e-7


BenchmarkTools.Trial: 
  memory estimate:  302.20 KiB
  allocs estimate:  23
  --------------
  minimum time:     29.848 ms (0.00% GC)
  median time:      33.203 ms (0.00% GC)
  mean time:        33.728 ms (0.11% GC)
  maximum time:     39.849 ms (0.00% GC)
  --------------
  samples:          149
  evals/sample:     1

## Easy plus low rank

Easy plus low rank: $\mathbf{U} \in \mathbb{R}^{n \times r}$, $\mathbf{V} \in \mathbb{R}^{r \times n}$, $r \ll n$. Woodbury formula
\begin{eqnarray*}
	(\mathbf{A} + \mathbf{U} \mathbf{V}^T)^{-1} &=& \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{U} (\mathbf{I}_r + \mathbf{V}^T \mathbf{A}^{-1} \mathbf{U})^{-1} \mathbf{V}^T \mathbf{A}^{-1} \\
    \text{det}(\mathbf{A} + \mathbf{U} \mathbf{V}^T) &=& \text{det}(\mathbf{A}) \text{det}(\mathbf{I}_r + \mathbf{V} \mathbf{A}^{-1} \mathbf{U}^T).
\end{eqnarray*}

* Keep HW2 Q2 in mind.  
* [`WoodburyMatrices.jl`](https://github.com/timholy/WoodburyMatrices.jl) package can be useful.

In [20]:
using BenchmarkTools, Random, WoodburyMatrices

Random.seed!(280)
n = 1000
r = 5

A = Diagonal(rand(n))
B = randn(n, r)
D = Diagonal(rand(r))
b = randn(n)
# Woodbury structure: W = A + B * D * B'
W = SymWoodbury(A, B, D)
Wfull = Symmetric(Matrix(W)) # stored as a Matrix{Float64}

1000×1000 Symmetric{Float64,Array{Float64,2}}:
  1.8571     0.513107    0.872146   …   0.764278    -0.241331    0.54921 
  0.513107   4.57505    -0.636972      -1.86465     -1.92237    -1.72569 
  0.872146  -0.636972    4.81387        1.99357      1.99337     3.66327 
 -0.516414  -0.996711   -0.0919924      0.262832     0.612402    0.621834
  0.193686   1.68244    -0.770028      -0.723437    -1.4868     -1.32247 
  1.6567     0.0634435  -0.901968   …  -0.241872    -0.0356772  -0.39826 
  0.553996  -0.274515    2.21265        0.219437     2.20382     2.60902 
  0.402356   1.89288    -1.13032       -0.771441    -1.96862    -1.93483 
 -1.07744   -1.63881     1.78016        0.96551      1.7292      1.91326 
 -2.21617   -2.90695    -2.55971       -0.47867      0.855389   -0.933916
  1.29975    0.779828    4.12459    …   1.87358      0.737112    2.84136 
 -0.80833    1.44882     1.67581       -0.139063    -0.107873    0.818132
 -2.32469   -4.83109    -2.31796       -0.0346402    2.65564     

In [21]:
# compares storage
Base.summarysize(W), Base.summarysize(Wfull)

(48200, 8000056)

In [22]:
# solve via Cholesky
@benchmark cholesky($Wfull) \ $b

BenchmarkTools.Trial: 
  memory estimate:  7.64 MiB
  allocs estimate:  7
  --------------
  minimum time:     6.363 ms (0.00% GC)
  median time:      7.741 ms (0.00% GC)
  mean time:        7.891 ms (7.37% GC)
  maximum time:     11.226 ms (16.99% GC)
  --------------
  samples:          633
  evals/sample:     1

In [23]:
# solve using Woodbury formula
@benchmark $W \ $b

BenchmarkTools.Trial: 
  memory estimate:  75.45 KiB
  allocs estimate:  24
  --------------
  minimum time:     21.169 μs (0.00% GC)
  median time:      44.107 μs (0.00% GC)
  mean time:        53.582 μs (16.66% GC)
  maximum time:     3.355 ms (97.85% GC)
  --------------
  samples:          10000
  evals/sample:     1

In [24]:
# multiplication without using Woodbury structure
@benchmark $Wfull * $b

BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     32.940 μs (0.00% GC)
  median time:      36.441 μs (0.00% GC)
  mean time:        42.473 μs (0.00% GC)
  maximum time:     286.755 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

In [25]:
# multiplication using Woodbury structure
@benchmark $W * $b

BenchmarkTools.Trial: 
  memory estimate:  24.06 KiB
  allocs estimate:  5
  --------------
  minimum time:     3.303 μs (0.00% GC)
  median time:      4.979 μs (0.00% GC)
  mean time:        7.868 μs (31.06% GC)
  maximum time:     676.630 μs (97.59% GC)
  --------------
  samples:          10000
  evals/sample:     8

In [26]:
# determinant without using Woodbury structure
@benchmark det($Wfull)

BenchmarkTools.Trial: 
  memory estimate:  8.13 MiB
  allocs estimate:  7
  --------------
  minimum time:     12.604 ms (0.00% GC)
  median time:      14.318 ms (0.00% GC)
  mean time:        14.982 ms (3.26% GC)
  maximum time:     48.041 ms (0.00% GC)
  --------------
  samples:          334
  evals/sample:     1

In [27]:
# determinant using Woodbury structure
@benchmark det($W)

BenchmarkTools.Trial: 
  memory estimate:  138.05 KiB
  allocs estimate:  28
  --------------
  minimum time:     28.124 μs (0.00% GC)
  median time:      89.737 μs (0.00% GC)
  mean time:        111.692 μs (19.05% GC)
  maximum time:     4.253 ms (97.56% GC)
  --------------
  samples:          10000
  evals/sample:     1

## Easy plus border

Easy plus border: For $\mathbf{A}$ pd and $\mathbf{V}$ full row rank,
$$
	\begin{pmatrix}
	\mathbf{A} & \mathbf{V}^T \\
	\mathbf{V} & \mathbf{0}
	\end{pmatrix}^{-1} = \begin{pmatrix}
	\mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \\
	(\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & - (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1}
	\end{pmatrix}.
$$
**Anyone interested writing a package?**

## Orthogonal matrix

Orthogonal $\mathbf{A}$: $n^2$ flops **at most**. Why? Permutation matrix, Householder matrix, Jacobi matrix, ... take less.

## Toeplitz matrix

Toeplitz systems:
$$
	\mathbf{T} = \begin{pmatrix}
	r_0 & r_1 & r_2 & r_3 \\
	r_{-1} & r_0 & r_1 & r_2 \\
	r_{-2} & r_{-1} & r_0 & r_1 \\
	r_{-3} & r_{-2} & r_{-1} & r_0
	\end{pmatrix}.
$$
$\mathbf{T} \mathbf{x} = \mathbf{b}$, where $\mathbf{T}$ is pd and Toeplitz, can be solved in $O(n^2)$ flops. Durbin algorithm (Yule-Walker equation), Levinson algorithm (general $\mathbf{b}$), Trench algorithm (inverse). These matrices occur in auto-regressive models and econometrics.

* [`ToeplitzMatrices.jl`](https://github.com/JuliaMatrices/ToeplitzMatrices.jl) package can be useful.

## Circulant matrix

Circulant systems: Toeplitz matrix with wraparound
$$
	C(\mathbf{z}) = \begin{pmatrix}
	z_0 & z_4 & z_3 & z_2 & z_1 \\
	z_1 & z_0 & z_4 & z_3 & z_2 \\
	z_2 & z_1 & z_0 & z_4 & z_3 \\
	z_3 & z_2 & z_1 & z_0 & z_4 \\
	z_4 & z_3 & z_2 & z_1 & z_0
	\end{pmatrix},
$$
FFT type algorithms: DCT (discrete cosine transform) and DST (discrete sine transform).

## Vandermonde matrix

Vandermonde matrix: such as in interpolation and approximation problems
$$
	\mathbf{V}(x_0,\ldots,x_n) = \begin{pmatrix}
	1 & 1 & \cdots & 1 \\
	x_0 & x_1 & \cdots & x_n \\
	\vdots & \vdots & & \vdots \\
	x_0^n & x_1^n & \cdots & x_n^n
	\end{pmatrix}.
$$
$\mathbf{V} \mathbf{x} = \mathbf{b}$ or $\mathbf{V}^T \mathbf{x} = \mathbf{b}$ can be solved in $O(n^2)$ flops.

## Cauchy-like matrix

Cauchy-like matrices:
$$
	\Omega \mathbf{A} - \mathbf{A} \Lambda = \mathbf{R} \mathbf{S}^T,
$$
where $\Omega = \text{diag}(\omega_1,\ldots,\omega_n)$ and $\Lambda = \text{diag}(\lambda_1,\ldots, \lambda_n)$. $O(n)$ flops for LU and QR.

## Structured-rank matrix

Structured-rank problems: semiseparable matrices (LU and QR takes $O(n)$ flops), quasiseparable matrices, ...