## PDE 1 - 2D


#### Problem Setup

$\phi u + u_{x} + u_{y,y} = f(x,y)$

For the generation of our initial data samples we use:

$\phi = 2$ <br>
$u: \mathbb{R}^2 \rightarrow \mathbb{R}, \; u(x,y) = x^2 + y$ <br>
$f: \mathbb{R}^2 \rightarrow \mathbb{R}, \;f(x,y) = 2(x^2 + x + y)$ <br>
$X_i := (x_i, y_i) \in [0,1] \times [0,1] \in \mathbb{R}^2$ for $i \in \{1, \dotsc, n\}$ 

and our known function values will be $\{u(x_i,y_i), f(x_i,y_i)\}_{i \in \{1, \dotsc, n\}}$.

We assume that $u$ can be represented as a Gaussian process with Matérn kernel, where $\nu = 5/2$.

$u \sim \mathcal{GP}(0, k_{uu}(X_i, X_j; \theta))$, where $\theta = \{\sigma, l_x, l_y\}$.

Set the linear operator to:

$\mathcal{L}_X^{\phi} := \phi + \partial_x + \partial_{y,y}$

so that

$\mathcal{L}_X^{\phi} u = f$

Problem at hand: Estimate $\phi$ (we expect $\phi = 2$).


#### Step 1: Simulate data

In [1]:
import time
import numpy as np
import sympy as sp
from scipy.linalg import solve_triangular
import scipy.optimize as opt

In [2]:
# Global variables: x, y, n, y_u, y_f, s

*Parameters, that can be modified:*

In [3]:
# Number of data samples:
n = 20

# Noise of our data:
s = 1e-7

# Circumventing evaluations of kernel-derivatives at zero:
corr_nan = 1e-4

In [4]:
def simulate_data():
    x = np.random.rand(n)
    y = np.random.rand(n)
    y_u = np.multiply(x,x) + y
    y_f = 2*(np.multiply(x,x) + x + y)
    return (x,y,y_u,y_f)
(x,y,y_u,y_f) = simulate_data()

#### Step 2: Evaluate kernels

$k_{uu}(X_i, X_j; \theta) = \sigma \left( 1+ \sqrt{5}r_l + \frac{5}{3}r_l^2 \right) \exp \left( -\sqrt{5}r_l \right)$, where:

$r_l = \sqrt{\frac{1}{l_x^2}(x_i-x_j)^2 + \frac{1}{l_y^2}(y_i-y_j)^2}$

#### Step 2: Evaluate kernels

$k_{uu}(X_i, X_j; \theta) = \sigma exp(-\frac{1}{2l_x}(x_i-x_j)^2 - \frac{1}{2l_y}(y_i-y_j)^2)$

In [5]:
x_i, x_j, y_i, y_j, sigma, l_x, l_y, phi = sp.symbols('x_i x_j y_i y_j sigma l_x l_y phi')
# kuu_sym = sigma*sp.exp(-1/(2*l_x)*((x_i - x_j)**2) - 1/(2*l_y)*((y_i - y_j)**2))
r_l = sp.sqrt((x_i - x_j)**2/l_x + (y_i - y_j)**2/l_y)
kuu_sym = sigma*(1 + sp.sqrt(5)*r_l + 5/3*r_l**2)*sp.exp(-sp.sqrt(5)*r_l)
kuu_fn = sp.lambdify((x_i, x_j, y_i, y_j, sigma, l_x, l_y), kuu_sym, "numpy")
def kuu(x, y, sigma, l_x, l_y):
    k = np.zeros((x.size, x.size))
    for i in range(x.size):
        for j in range(x.size):
            k[i,j] = kuu_fn(x[i], x[j], y[i], y[j], sigma, l_x, l_y)
    return k

$k_{ff}(X_i,X_j;\theta,\phi) \\
= \mathcal{L}_{X_i}^{\phi} \mathcal{L}_{X_j}^{\phi} k_{uu}(X_i, X_j; \theta) \\
= \phi^2k_{uu} + \phi \frac{\partial}{\partial x_i}k_{uu} + \phi \frac{\partial^2}{\partial y_i^2}k_{uu} + \phi \frac{\partial}{\partial x_j}k_{uu} + \frac{\partial^2}{\partial x_i, x_j}k_{uu} + \frac{\partial^3}{\partial y_i^2 \partial x_j}k_{uu} + \phi \frac{\partial^2}{\partial y_j^2}k_{uu} + \frac{\partial^3}{\partial x_i \partial y_j^2}k_{uu} + \frac{\partial^4}{\partial y_i^2 \partial y_j^2}k_{uu}$

In [6]:
kff_sym = phi**2*kuu_sym \
        + phi*sp.diff(kuu_sym, x_i) \
        + phi*sp.diff(kuu_sym, y_i, y_i) \
        + phi*sp.diff(kuu_sym, x_j) \
        + sp.diff(kuu_sym, x_i, x_j) \
        + sp.diff(kuu_sym, y_i, y_i, x_j) \
        + phi*sp.diff(kuu_sym, y_j, y_j) \
        + sp.diff(kuu_sym, x_i, y_j, y_j) \
        + sp.diff(kuu_sym, y_i, y_i, y_j, y_j)
kff_fn = sp.lambdify((x_i, x_j, y_i, y_j, sigma, l_x, l_y, phi), kff_sym, "numpy")
def kff(x, y, sigma, l_x, l_y, phi):
    k = np.zeros((x.size, x.size))
    for i in range(x.size):
        for j in range(x.size):
            if i == j:
                k[i,j] = kff_fn(x[i], x[j] + corr_nan, y[i], y[j] + corr_nan, sigma, l_x, l_y, phi)
            else:
                k[i,j] = kff_fn(x[i], x[j], y[i], y[j], sigma, l_x, l_y, phi)
    return k

$k_{fu}(X_i,X_j;\theta,\phi) \\
= \mathcal{L}_{X_i}^{\phi} k_{uu}(X_i, X_j; \theta) \\
= \phi k_{uu} + \frac{\partial}{\partial x_i}k_{uu} + \frac{\partial^2}{\partial y_i^2}k_{uu}$

In [7]:
kfu_sym = phi*kuu_sym \
        + sp.diff(kuu_sym, x_i) \
        + sp.diff(kuu_sym, y_i, y_i)
kfu_fn = sp.lambdify((x_i, x_j, y_i, y_j, sigma, l_x, l_y, phi), kfu_sym, "numpy")
def kfu(x, y, sigma, l_x, l_y, phi):
    k = np.zeros((x.size, x.size))
    for i in range(x.size):
        for j in range(x.size):
            if i == j:
                k[i,j] = kfu_fn(x[i], x[j] + corr_nan, y[i], y[j] + corr_nan, sigma, l_x, l_y, phi)
            else:
                k[i,j] = kfu_fn(x[i], x[j], y[i], y[j], sigma, l_x, l_y, phi)
    return k

In [8]:
def kuf(x, y, sigma, l_x, l_y, phi):
    return kfu(x, y, sigma, l_x, l_y, phi).T

#### Step 3: Computing the negative log-likelihood (with block matrix inversion, Cholesky decomposition, potentially SVD)

We use the block-inversion technique: Let
$ K = \begin{pmatrix} K_{uu} & K_{uf} \\ K_{fu} & K_{ff} \end{pmatrix} = \begin{pmatrix} A & B \\ B^T & C \end{pmatrix}$. 

Then $det(K) = det(A) det(C-B^T A^{-1} B)$.

$K^{-1} = \begin{pmatrix} A^{-1} + A^{-1} B(C-B^T A^{-1} B)^{-1}B^T A^{-1} & -A^{-1}B(C-B^T A^{-1} B)^{-1} \\
            -(C - B^T A^{-1}B)^{-1}B^T A^{-1} & (C-B^T A^{-1} B)^{-1} \end{pmatrix}$
            
So it suffices to invert $A$ and $C-B^T A^{-1} B$.

A theorem about Schur-complements ensures that $K$ positive-definite implies the positive-definiteness of $K/A = C-B^T A^{-1} B$ as well, so Cholesky should work as well.

In [9]:
def nlml(params):
    
    sigma_exp = np.exp(params[0])
    l_x_exp = np.exp(params[1])
    l_y_exp = np.exp(params[2])
    # phi = params[3]
    
    A = kuu(x, y, sigma_exp, l_x_exp, l_y_exp) + s*np.eye(n)
    B = kfu(x, y, sigma_exp, l_x_exp, l_y_exp, params[3]).T
    C = kff(x, y, sigma_exp, l_x_exp, l_y_exp, params[3]) + s*np.eye(n)
    
    # Inversion of A
    A_inv = np.zeros((n, n))
    
    try:
        L = np.linalg.cholesky(A)
        L_inv = solve_triangular(L, np.identity(n), lower=True) # Slight performance boost over np.linalg.inv
        A_inv = L_inv.T @ L_inv
        logdet_A = 2*np.log(np.abs(np.diag(L))).sum()
    except np.linalg.LinAlgError:
        # Inverse of K via SVD
        u, s_mat, vt = np.linalg.svd(A)
        A_inv = vt.T @ np.linalg.inv(np.diag(s_mat)) @ u.T
        logdet_A = np.log(s_mat).sum()
        
    # Inversion of $C-B^T A^{-1} B$
    KA_inv = np.zeros((n, n))
    KA = C - B.T @ A_inv @ B
    
    try:
        L = np.linalg.cholesky(KA)
        L_inv = solve_triangular(L, np.identity(n), lower=True) # Slight performance boost over np.linalg.inv
        KA_inv = L_inv.T @ L_inv
        logdet_KA = 2*np.log(np.abs(np.diag(L))).sum()
    except np.linalg.LinAlgError:
        # Inverse of K via SVD
        u, s_mat, vt = np.linalg.svd(KA)
        KA_inv = vt.T @ np.linalg.inv(np.diag(s_mat)) @ u.T
        logdet_KA = np.log(s_mat).sum()
        
    # Piecing it together
    T = A_inv @ B @ KA_inv
    yKy = y_u @ (A_inv + T @ B.T @ A_inv) @ y_u - 2*y_u @ T @ y_f + y_f @ KA_inv @ y_f
    
    return (yKy + logdet_A + logdet_KA)

#### Step 4: Optimize hyperparameters

**1. Nelder-Mead**

In [None]:
Nfeval = 1
def callbackF(Xi):
    global Nfeval
    print('{0:4d}   {1: 3.6f}   {2: 3.6f}   {3: 3.6f}   {4: 3.6f}'.format(Nfeval, Xi[0], Xi[1], Xi[2], Xi[3]))
    Nfeval += 1

In [None]:
t0 = time.time()
m_n = opt.minimize(nlml, np.random.rand(4), method="Nelder-Mead", callback = callbackF,
                                        options={'maxfev':5000, 'fatol':0.001, 'xatol':0.001})
t_Nelder = time.time() - t0

   1    0.116469    0.855215    0.775121    0.477083
   2    0.116469    0.855215    0.775121    0.477083
   3    0.123309    0.828168    0.764355    0.505104
   4    0.116162    0.831066    0.691688    0.539053
   5    0.116162    0.831066    0.691688    0.539053
   6    0.125836    0.912892    0.534578    0.569342
   7    0.128333    0.846944    0.551904    0.588551
   8    0.128333    0.846944    0.551904    0.588551
   9    0.122872    0.904948    0.453878    0.631832
  10    0.122872    0.904948    0.453878    0.631832
  11    0.124872    0.878539    0.611459    0.576872
  12    0.120919    0.791329    0.590358    0.635649
  13    0.120919    0.791329    0.590358    0.635649
  14    0.115670    0.876094    0.519230    0.692109
  15    0.115670    0.876094    0.519230    0.692109
  16    0.114851    0.727819    0.486156    0.811356
  17    0.099321    0.688238    0.675715    0.791815
  18    0.096858    0.731916    0.575727    0.933497
  19    0.096858    0.731916    0.575727    0.

 156   -0.211201    1.064829    2.805962    1.717460
 157   -0.211201    1.064829    2.805962    1.717460
 158   -0.211201    1.064829    2.805962    1.717460
 159   -0.211201    1.064829    2.805962    1.717460
 160   -0.211201    1.064829    2.805962    1.717460
 161   -0.211201    1.064829    2.805962    1.717460
 162   -0.211201    1.064829    2.805962    1.717460
 163   -0.211201    1.064829    2.805962    1.717460


In [None]:
m_n