# 1. LU Decomposition using Low-Level LAPACK Functions for Band Matrices (```linalg.get_lapack_funcs```)

* ```gbtrf``` LU decomposition $PA = LU$

```python
gbtrf = linalg.get_lapack_funcs("gbtrf", dtype = np.float64)
LU_band, piv, info = gbtrf(A_band_LU, lbw, ubw)
```

* ```LU_band``` - $L$ and $U$ are in band forms and stored in a single matrix
* ```A_band_LU``` - band matrix, however, it is different from the forms we learned so far. More on this later in this post.
* ```lbw``` - lower band width
* ```ubw``` - upper band width
* ```piv``` - 1D array: row interchange information
* ```info``` - info=0 is normal, info>0 is singluar matrix, info<0: wrong input

Example.

This is the original matrix we are trying to decompose using LU decomposition.

$$
A = 
\begin{bmatrix}
a_{00} & a_{01} & 0 & 0 & 0\\
a_{10} & a_{11} & a_{12} & 0 & 0 \\
a_{20} & a_{21} & a_{22} & a_{23} & 0 \\
0 & a_{31} & a_{32} & a_{33} & a_{34} \\
0 & 0 & a_{42} & a_{43} & a_{44}
\end{bmatrix}
$$

```lbw``` = 2, ```ubw``` = 1

Now, we need to form a band matrix from the original matrix to use as input.

Band matrix we are familiar with:

$$
\tilde{A} = 
\begin{bmatrix}
0 & a_{01} & a_{12} & a_{23} & a_{34} \\
a_{00} & a_{11} & a_{22} & a_{33} & a_{44} \\
a_{10} & a_{21} & a_{32} & a_{43} & 0\\
a_{20} & a_{31} & a_{42} &0    & 0
\end{bmatrix}
$$

$\therefore$ row size of $\tilde{A}$ = ```lbw``` + ```ubw``` + 1

But this is the band matrix we are using for LU decomposition for band matrix (```gbtrf```). We vertically concat in the upside a dummay 2D array with the size of ```lbw``` 

$$
\tilde{A'} = 
\begin{bmatrix}
0 & 0 & 0 & x & x \\
0 & 0 & x & x & x \\
0 & a_{01} & a_{12} & a_{23} & a_{34} \\
a_{00} & a_{11} & a_{22} & a_{33} & a_{44} \\
a_{10} & a_{21} & a_{32} & a_{43} & 0\\
a_{20} & a_{31} & a_{42} &0    & 0
\end{bmatrix}
$$

$\therefore$ row size of $\tilde{A'}$ = ```lbw``` + ```ubw``` + 1 + ```lbw```

In python code, suppose we have the band matrix in a form we know as $\tilde{A}$ as ```A_band```. Then, $\tilde{A'}$, ```A_band_LU``` can be constructed as follows:

```python
dummy_array = np.zeros((lbw, A_band.shape[1]))
A_band_LU = np.vstack((dummay_array, A_band))
```

```A_band_LU``` is the input variable for ```gbtrf``` function we have seen above.

How does the **output** look like then?

$$
\tilde{LU} = 
\begin{bmatrix}
0 & 0 & 0 & u_{03} & u_{04} \\
0 & 0 & u_{02} & u_{13} & u_{24} \\
0 & u_{01} & u_{12} & u_{23} & u_{34} \\
u_{00} & u_{11} & u_{22} & u_{33} & u_{44} \\
l_{01} & l_{21} & l_{32} & l_{43} & 0 \\
l_{20} & l_{31} & l_{42} & 0 & 0
\end{bmatrix}
$$

Information of $L$ and $U$ are compressed and stored in a single matrix, $\tilde{LU}$.

> Remark. In Scipy functions, the input variable, $A$ is intact. In constrast, in LAPACK, input variable $\tilde{A'}$ is overwritten by $\tilde{LU}$. That's why we went through trouble to change the form of input band matrix.

How to reconstruct $L$ and $U$?

$$
\tilde{LU} = 
\begin{bmatrix}
0 & 0 & 0 & u_{03} & u_{04} \\
0 & 0 & u_{02} & u_{13} & u_{24} \\
0 & u_{01} & u_{12} & u_{23} & u_{34} \\
u_{00} & u_{11} & u_{22} & u_{33} & u_{44} \\
l_{01} & l_{21} & l_{32} & l_{43} & 0 \\
l_{20} & l_{31} & l_{42} & 0 & 0
\end{bmatrix} \; \Rightarrow \quad
U = 
\begin{bmatrix}
u_{00} & u_{01} & u_{02} & u_{03} & u_{04} \\
0 & u_{11} & u_{12} & u_{13} & u_{14} \\
0& 0 & u_{22} & u_{23} & u_{24} \\
0 & 0 & 0 & u_{33} & u_{34} \\
0 & 0 & 0 & 0 & u_{44} \\
\end{bmatrix}, \;
$$

So reconstructing $U$ is fairly straightforward.

The hard part is reconstructing $L$. We assume ```piv = [ 2, 3, 4, 3, 4]```.

$$
L_0 = 
\begin{bmatrix}
- &  &  &  &  \\
l_{10}& - &  &  &  \\
l_{20}& l_{21}  & -  &  &  \\
& l_{31}  & l_{32}  & - &  \\
&   & l_{42}  & l_{43}  & -  \\
\end{bmatrix}
$$

row interchange information **only under the diagonal entries**:

0 <-> 2\
1 <-> 3\
2 <-> 4\
3 <-> 3\
4 <-> 4

(0 <-> 2)
If thie row interchange is made, $\begin{bmatrix} l_{20}& l_{21} \end{bmatrix}$ would end up in the positions above the diagonal entries. So, do **not** interchange.

(1 <-> 3) For this row interchange, row1,  $\begin{bmatrix} l_{10} \end{bmatrix}$ can go to row3. But row 3 cannot go to row1 so  $\begin{bmatrix} l_{31}& l_{32} \end{bmatrix}$ will remain still.

$$
L_1 = 
\begin{bmatrix}
- &  &  &  &  \\
& - &  &  &  \\
l_{20}& l_{21}  & -  &  &  \\
l_{10} & l_{31}  & l_{32}  & - &  \\
&   & l_{42}  & l_{43}  & -  \\
\end{bmatrix}
$$

(2 <-> 4) For this row interchange, row2,  $\begin{bmatrix} l_{20} & l_{21} \end{bmatrix}$ can go to row4. But row 4 cannot go to row2 so  $\begin{bmatrix} l_{42}& l_{43} \end{bmatrix}$ will remain still.

$$
L_2 = 
\begin{bmatrix}
- &  &  &  &  \\
& - &  &  &  \\
&   & -  &  &  \\
l_{10} & l_{31}  & l_{32}  & - &  \\
l_{20}& l_{21}   & l_{42}  & l_{43}  & -  \\
\end{bmatrix}
$$

(3 <-> 3), (4 <-> 4) $\rightarrow$ no interchange.

At the end, add diagonal entries of $1$. This is $L$.

$$
L = 
\begin{bmatrix}
1 &  &  &  &  \\
& 1 &  &  &  \\
&   & 1  &  &  \\
l_{10} & l_{31}  & l_{32}  & 1 &  \\
l_{20}& l_{21}   & l_{42}  & l_{43}  & 1  \\
\end{bmatrix}
$$

Actual $A$ is $A = P^\top LU$ and $P$ can be computed from the ```piv```.

> Note that we walked through reconstruction processes for educational purpose and we do not actually do so in solving matrix equations. Otherwise it would defeat the whole purpose of going great length to use LAPACK function to do LU decomposition on band matrix in the first place.

In [1]:
import numpy as np
from scipy import linalg

Example.

$$
A =
\begin{bmatrix}
1 & 2 & & & \\
1 & 1 & 2 & & \\
2 & 1 & 1 & 2 & \\
 & 2 & 1 & 1 & 2 \\
 &  & 2 & 1 & 1 \\
\end{bmatrix} \; \Rightarrow \;
\tilde{A} =
\begin{bmatrix}
 &  &  & + & + \\
 &  & + & + & + \\
0 & 2 & 2 & 2 & 1 \\
1 & 1 & 1 & 1 & 1 \\
1 & 1 & 1 & 1 & 0 \\
2 & 2 & 2 & 0 & 0 \\
\end{bmatrix}
$$, ```lbw``` = 2, ```ubw``` = 1 

In [2]:
""" constructing input variable """
lbw = 2
ubw = 1

# Band Matrix we are familiar with
A_band = np.array([
    [0, 2, 2, 2, 1], #ub1
    [1, 1, 1, 1, 1], #diag
    [1, 1, 1, 1, 0], #lb1 
    [2, 2, 2, 0, 0]  #lb2
])

# Dummy array
dummy_array = np.zeros((lbw, A_band.shape[1]), dtype=np.float64)

# Band matrix with dummay array concat -> input variable
A_band_LU = np.vstack((dummy_array, A_band))

In [3]:
# input for gbtrf
print(A_band_LU)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 2. 2. 2. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 0.]
 [2. 2. 2. 0. 0.]]


In [4]:
""" LAPACK function: gbtrf """

gbtrf = linalg.get_lapack_funcs("gbtrf", dtype=np.float64)

""" LU Decomposition with gbtrf """

LU_band, piv, info = gbtrf(A_band_LU, lbw, ubw)

In [5]:
""" Output """
# This is the LU-tilde we observed above.
print(LU_band)

[[ 0.     0.     0.     2.     1.   ]
 [ 0.     0.     1.     1.     1.   ]
 [ 0.     1.     1.     1.    -0.875]
 [ 2.     2.     2.    -1.875  0.4  ]
 [ 0.5    0.75   0.625  0.6    0.   ]
 [ 0.5    0.25  -0.625  0.     0.   ]]


# 2. LU Decomposition Solver using Low-Level LAPACK Functions for Band Matrices

* ```gbtrs``` LU decomposition Solver for Band Matrices

```python
gbtrs = linalg.get_lapack_funcs("gbtrs", dtype=np.float64)

x, info = gbtrs(LU_band, lbw, ubw, b, piv) # LU_band, piv are results of "gbtrf" / # info=0: normal, info<0: wrong input 
```

Example

$$
A =
\begin{bmatrix}
1 & 2 & & & \\
1 & 1 & 2 & & \\
2 & 1 & 1 & 2 & \\
 & 2 & 1 & 1 & 2 \\
 &  & 2 & 1 & 1 \\
\end{bmatrix}, \; \;
\mathbf{b} =
\begin{bmatrix}
1  \\
1  \\
1  \\
1  \\
1 \\
\end{bmatrix}
$$

$$
A \mathbf{x} = \mathbf{b}
$$

In [6]:
# original param
# lbw = 2
# ubw = 1

# gbtrf results
# LU_band
# piv

# b
b = np.ones((5, ), dtype=np.float64)

# gbtrs
gbtrs = linalg.get_lapack_funcs("gbtrs", dtype=np.float64)

# Matrix equation solution
x, info = gbtrs(LU_band, lbw, ubw, b, piv)
print(info) # should be 0
print(x)

0
[ 1.   0.   0.  -0.5  1.5]


Example: Performance comparison - ```lu_factor```, ```lu_solve``` vs ```gbtrf```, ```gbtrs```

$$
A =
\begin{bmatrix}
5 & 1 &        &        &        \\
1 & 5 & 1     &        &        \\
   & 1 & 5     & \ddots &        \\
   &    & \ddots & \ddots & 1     \\
   &    &        & 1     & 5
\end{bmatrix}_{1000 \times 1000}, \; \;
\mathbf{b} =
\begin{bmatrix}
1  \\
1  \\
1  \\
1  \\
1 \\
\end{bmatrix}
$$

In [7]:
""" Full matrix """
off_diag = np.ones((999, ))
A_full = 5*np.identity(1000) + np.diag(off_diag, k=1) + np.diag(off_diag, k=-1)

In [8]:
""" Band matrix """
row0 = np.hstack((np.array([0]), off_diag))
row2 = np.hstack((off_diag, np.array([0])))
A_band = np.vstack((row0, 5*np.ones((1000,)), row2))

""" Band matrix with dummy array for gbtrf """
lbw = 1
ubw = 1
dummy_array = np.zeros((lbw, A_band.shape[1]), dtype=np.float64)
A_band_LU = np.vstack((dummy_array, A_band)) # input

In [9]:
""" define gbtrf & gbtrs"""

gbtrf = linalg.get_lapack_funcs("gbtrf", dtype=np.float64)
gbtrs = linalg.get_lapack_funcs("gbtrs", dtype=np.float64)

In [10]:
import timeit

In [11]:
b = np.ones((1000,), dtype=np.float64)

In [12]:
""" method 1. LAPACK (Band) """

start = timeit.default_timer()
LU_band, piv, info = gbtrf(A_band_LU, lbw, ubw)
x_band, info = gbtrs(LU_band, lbw, ubw, b, piv)
end = timeit.default_timer()

time_band = end-start
print(f'time: {time_band: .6f}')

time:  0.041245


In [13]:
""" method 2. lu_factor, lu_solve """

start = timeit.default_timer()
lu, piv = linalg.lu_factor(A_full)
x_full = linalg.lu_solve((lu, piv), b)
end = timeit.default_timer()

time_full = end-start
print(f'time: {time_full: .6f}')

time:  0.041858


In [14]:
# Sanity check
np.allclose(x_band, x_full)

True

> Result: Using LAPACK functions taking advantage of band matrix forms was way faster

## Practice: Performance evaluation

$$
A =
\begin{bmatrix}
5 & 1 &   j     &        &      &  \\
1 & 5 & 1     &  j      &       & \\
-j   & 1 & 5     & \ddots & \ddots &       \\
   &  \ddots  & \ddots & \ddots & 1  & j   \\
      &    &     -j   & 1     & 5& 1\\
   &    &        & -j     & 1 &5
\end{bmatrix}_{10000 \times 10000}
$$

With the Hermitian matrix, $A$,

1. Use LU decomposition with the full matrix format
2. Use Cholesky decomposition with the full matrix format
3. Use LU decomposition with the band matrix format
4. Use Cholesky decomposition with band matrix format

Compare the computation times for each case.

In [15]:
b = np.ones((10000,), dtype=np.float64) # assumption

In [16]:
""" Construct Full matrix """
off_diag1 = np.ones((9999, ))
off_diag2 = np.ones((9998, ))
A_full = 5*np.identity(10000) + np.diag(off_diag1, k=1) + np.diag(off_diag1, k=-1) + np.diag(off_diag2, k=2) * 1j + np.diag(off_diag2, k=-2) * 1j

In [17]:
""" Construct Band matrix """
row0 = np.hstack((np.ones((2, )), off_diag2))
row1 = np.hstack((np.ones((1, )), off_diag1))
row2 = np.diag(A_full)
A_band = np.vstack((row0, row1, row2))

In [18]:
""" Construct Band matrix (gbtrf, gbtrs) """
lbw = 2
ubw = 2
dummy_array = np.zeros((lbw, A_band.shape[1]), dtype=np.float64)
A_band_LU = np.vstack((dummy_array, A_band)) # input for gbtrf, gbtrs

In [19]:
""" Define gbtrf, gbtrs """
gbtrf = linalg.get_lapack_funcs("gbtrf", dtype=np.float64)
gbtrs = linalg.get_lapack_funcs("gbtrs", dtype=np.float64)

In [20]:
""" method 1. LU decomposition with full matrix """

start = timeit.default_timer()

lu, piv = linalg.lu_factor(A_full)

end = timeit.default_timer()

time_lu_decomp = end-start
print(f'time - LU decomp with full matrix: {time_lu_decomp: .6f}')

# --- #

start = timeit.default_timer()

x_full = linalg.lu_solve((lu, piv), b)

end = timeit.default_timer()

time_lu_solve = end-start
print(f'time - LU solve with full matrix: {time_lu_solve: .6f}')

time - LU decomp with full matrix:  14.134139
time - LU solve with full matrix:  0.089523


In [21]:
""" method 2. Cholesky decomposition with full matrix """

start = timeit.default_timer()

U = linalg.cholesky(A_full, lower=False)

end = timeit.default_timer()

time_cho_decomp = end-start
print(f'time - Cholesky decomp with full matrix: {time_cho_decomp: .6f}')

# --- #

start = timeit.default_timer()

x = linalg.cho_solve((U, False), b)

end = timeit.default_timer()

time_cho_solve = end-start
print(f'time - Cholesky solve with full matrix: {time_cho_solve: .6f}')

time - Cholesky decomp with full matrix:  12.449920
time - Cholesky solve with full matrix:  0.910994


In [22]:
""" method 3. LU decomposition with band matrix (LAPACK) """

start = timeit.default_timer()

LU_band, piv, info = gbtrf(A_band_LU, lbw, ubw) #LAPACK # A_band_LU

end = timeit.default_timer()

time_lu_decomp_band = end-start
print(f'time - LU decomp with band matrix: {time_lu_decomp_band: .6f}')

# --- #

start = timeit.default_timer()

x_band, info = gbtrs(LU_band, lbw, ubw, b, piv) # LAPACK

end = timeit.default_timer()

time_lu_solve_band = end-start
print(f'time - LU solve with band matrix: {time_lu_solve_band: .6f}')

time - LU decomp with band matrix:  0.067160
time - LU solve with band matrix:  0.001418


  LU_band, piv, info = gbtrf(A_band_LU, lbw, ubw) #LAPACK # A_band_LU


In [23]:
""" method 4. Cholesky decomposition with band matrix """

start = timeit.default_timer()

U_band = linalg.cholesky_banded(A_band, lower=False)

end = timeit.default_timer()

time_cho_decomp_band = end-start
print(f'time - Cholesky decomp with band matrix: {time_cho_decomp_band: .6f}')

# --- #

start = timeit.default_timer()

x = linalg.cho_solve_banded((U_band, False), b)

end = timeit.default_timer()

time_cho_solve_band = end-start
print(f'time - Cholesky solve with band matrix: {time_cho_solve_band: .6f}')

time - Cholesky decomp with band matrix:  0.046657
time - Cholesky solve with band matrix:  0.004490


Summary:

1. Decomposition

In [24]:
print(f'time - LU decomp with full matrix: {time_lu_decomp: .6f}')
print(f'time - Cholesky decomp with full matrix: {time_cho_decomp: .6f}')
print(f'time - LU decomp with band matrix: {time_lu_decomp_band: .6f}') #LAPACK
print(f'time - Cholesky decomp with band matrix: {time_cho_decomp_band: .6f}')

time - LU decomp with full matrix:  14.134139
time - Cholesky decomp with full matrix:  12.449920
time - LU decomp with band matrix:  0.067160
time - Cholesky decomp with band matrix:  0.046657


2. Solver

In [25]:
print(f'time - LU solve with full matrix: {time_lu_solve: .6f}')
print(f'time - Cholesky solve with full matrix: {time_cho_solve: .6f}')
print(f'time - LU solve with band matrix: {time_lu_solve_band: .6f}')
print(f'time - Cholesky solve with band matrix: {time_cho_solve_band: .6f}')

time - LU solve with full matrix:  0.089523
time - Cholesky solve with full matrix:  0.910994
time - LU solve with band matrix:  0.001418
time - Cholesky solve with band matrix:  0.004490
