## Updating Cholesky Factorizations

In [1]:
using DataFrames
using LinearAlgebra
using Plots

In [2]:
include("../rollout.jl")
include("../testfns.jl")
include("./CovarianceMatrix.jl")
include("../testfns.jl")

TestGramacyLee (generic function with 1 method)

Let's concern ourselves with matrices. In particular, covariances matrices. Suppose we have some data generating process that produces $m$ samples with no gradient observations and $h+1$ samples with gradient observations of $d$ dimensions. Let's put some symbols to the aforementioned.

$$
X^{m+h+1} :=\begin{bmatrix} \textbf{x}^{-m} \;\dots \;\textbf{x}^{-1} | \;\textbf{x}^0 | \; \textbf{x}^1 
\;\dots\; \textbf{x}^h\end{bmatrix} \in \mathbb{R}^{d\times (m+h+1)}
$$

where $x^k \in \mathbb{R}^d$ and $-m \leq k \leq h$. If we consider the problem of computing mixed covariances, it proves useful to distinguise the covariance measures into several categorizations:
- $A$ will denote the covariances between function values strictly.
- $B$ will denote the covariances between fantasized function values and known samples.
- $C$ will denote the covariances between fantasized function values against themselves.
- $D$ will denote the covariances between fantasized gradients and known function values.
- $E$ will denote the covariances between fantasized gradients against fantasized function values.
- $G$ will denote the covariances between fantasized gradienst against themselves.

We'll denote the mixed covariance matrix as $K_{all}$, shown below:

$$
K_{all} = \begin{bmatrix}
    A & B^T & D^T \\
    B & C & E^T \\
    D & E & G
\end{bmatrix} = \begin{bmatrix}
    L_{11} & 0 & 0 \\
    L_{21} & L_{22} & 0 \\
    L_{31} & L_{32} & L_{33}
\end{bmatrix}\begin{bmatrix}
    L_{11}^T & L_{21}^T & L_{31}^T \\
    0 & L_{22}^T & L_{32}^T \\
    0 & 0 & L_{33}^T
\end{bmatrix}
$$

$\therefore$ We have the following relations, where $L_{11}$ is the known cholesky factor for A.
$$
A = L_{11}L_{11}^T \\
B^T = L_{11}L_{21}^T \;\land\; B = L_{21}L_{11}^T \\
D^T = L_{11}L_{31}^T \;\land\; D = L_{31}L_{11}^T \\
C = L_{21}L_{21}^T + L_{22}L_{22}^T \\
E^T = L_{21}L_{31}^T + L_{22}L_{32}^T \;\land\; E = L_{31}L_{21}^T + L_{32}L_{22}^T \\
G = L_{31}L_{31}^T + L_{32}L_{32}^T + L_{33}L_{33}^T
$$

Let's consider the problem of block updates without gradient observations first.

$$
K_{all} = \begin{bmatrix}
    A & B^T \\
    B & C \\
\end{bmatrix} = \begin{bmatrix}
    L_{11} & 0 \\
    L_{21} & L_{22} \\
\end{bmatrix}\begin{bmatrix}
    L_{11}^T & L_{21}^T \\
    0 & L_{22}^T \\
\end{bmatrix}
$$

$\therefore$ We have the following relations, where $L_{11}$ is the known cholesky factor for A.
$$
A = L_{11}L_{11}^T \\
B^T = L_{11}L_{21}^T \;\land\; B = L_{21}L_{11}^T \\
C = L_{21}L_{21}^T + L_{22}L_{22}^T \\
$$

In [241]:
m, n, d = 300, 300, 100
θ = [1.]
ψ = kernel_matern52(θ)

X = rand(d, m+n)
A = eval_KXX(ψ, X[:, 1:m])
L11 = cholesky(A).U'
Kall = eval_KXX(ψ, X)
Lall = cholesky(Kall).U'

B = @view Kall[m+1:end, 1:m]
C = @view Kall[m+1:end, m+1:end];

L21 = B / L11'
# L22 = cholesky(C - L21*L21').L

Lupdate = LowerTriangular(zeros(m+n, m+n))
Lupdate[1:m, 1:m] .= L11

# Lupdate[m+1:end, 1:m] .= L21
Lupdate[m+1:end, 1:m] .= B / L11'

# Lupdate[m+1:end, m+1:end] .= L22
Lupdate[m+1:end, m+1:end] .= cholesky(C - L21*L21').L

Lupdate ≈ Lall # This outputs false

  0.003750 seconds (5 allocations: 2.747 MiB)
  0.007891 seconds (52 allocations: 6.868 MiB)


true