*Instead of manually implementing the gradient of the nlml as in 2D_example.ipynb, in this notebook we use symbolic differentiation instead.*

*Results:*

The algorithm actually works very elegantly with symbolic calculation.
Nonetheless it is really slow (for n = 2 with only 500 epochs it takes around 1/2 hour with ADAM)
To get good results we need at least n = 5 and the default epoch-value was 5000.
So we can expect at least 10 hours for this. Since we want to deal with many points (n = 100 for sure), this method is not really viable.

Also CG and Nelder didn't converge at all. 

At least with this notebook we could double-check the correctness of the gradient in 2D_example.ipynb. 

## PDE 1 - 2D with 4 parameters


#### Problem Setup

$\phi u + u_{x} + u_{y,y} = f(x,y)$

For the generation of our initial data samples we use:

$\phi = 2$ <br>
$X_i := (x_i, y_i) \in [0,1] \times [0,1]$ for $i \in \{1, \dotsc, n\}$ <br>
$u: \mathbb{R}^2 \rightarrow \mathbb{R}, \; u(x,y) = x^2 + y$ <br>
$f: \mathbb{R}^2 \rightarrow \mathbb{R}, \;f(x,y) = 2(x^2 + x + y)$

and our known function values will be $\{u(x_i,y_i), f(x_i,y_i)\}_{i \in \{1, \dotsc, n\}}$

We assume that $u$ can be represented as a Gaussian process with SE kernel.

$u \sim \mathcal{GP}(0, k_{uu}(X_i, X_j; \theta))$, where $\theta = \{\sigma, l_x, l_y\}$.

And the linear operator:

$\mathcal{L}_X^{\phi} = \phi + \partial_x + \partial_{y,y}$

so that

$\mathcal{L}_X^{\phi} u = f$

Problem at hand: Estimate $\phi$ (the closer to $\phi = 2$, the better).

#### Step 1: Simulate data

In [1]:
import time
import numpy as np
import sympy as sp
import pdb
from scipy.optimize import minimize

In [2]:
# Global variables: x, y, n, y_u, y_f, s

*Parameters, that can be modified:*

In [3]:
# Number of data samples:
n = 2

# Noise of our data:
s = 1e-7

In [4]:
def simulate_data():
    x = np.random.rand(n)
    y = np.random.rand(n)
    y_u = np.multiply(x,x) + y
    y_f = 2*(np.multiply(x,x) + x + y)
    return (x,y,y_u,y_f)
(x,y,y_u,y_f) = simulate_data()

#### Step 2: Evaluate kernels

I am squaring sigma, l_x and l_y right away due to the symbolic computation throughout:

$k_{uu}(X_i, X_j; \theta) = \sigma^2 exp(-\frac{1}{2l_x^2}(x_i-x_j)^2 - \frac{1}{2l_y^2}(y_i-y_j)^2)$

In [5]:
x_i, x_j, y_i, y_j, sigma, l_x, l_y, phi = sp.symbols('x_i x_j y_i y_j sigma l_x l_y phi')
kuu_sym = sigma**2*sp.exp(-1/(2*l_x**2)*((x_i - x_j)**2) - 1/(2*l_y**2)*((y_i - y_j)**2))
def kuu(x, y, sigma, l_x, l_y):
    k = sp.zeros(n)
    for i in range(n):
        for j in range(n):
            k[i,j] = kuu_sym.subs({x_i: x[i], x_j: x[j], y_i: y[i], y_j: y[j]})
    return k

$k_{ff}(X_i,X_j;\theta,\phi)$ <br>
$= \mathcal{L}_{X_i}^{\phi} \mathcal{L}_{X_j}^{\phi} k_{uu}(X_i, X_j; \theta)$ <br>
$= \phi^2k_{uu} + \phi \frac{\partial}{\partial x_i}k_{uu} + \phi \frac{\partial^2}{\partial y_i^2}k_{uu} + \phi \frac{\partial}{\partial x_j}k_{uu} + \frac{\partial^2}{\partial x_i, x_j}k_{uu} + \frac{\partial^3}{\partial y_i^2 \partial x_j}k_{uu} + \phi \frac{\partial^2}{\partial y_j^2}k_{uu} + \frac{\partial^3}{\partial x_i \partial y_j^2}k_{uu} + \frac{\partial^4}{\partial y_i^2 \partial y_j^2}k_{uu}$

In [6]:
kff_sym = phi**2*kuu_sym \
        + phi*sp.diff(kuu_sym, x_i) \
        + phi*sp.diff(kuu_sym, y_i, y_i) \
        + phi*sp.diff(kuu_sym, x_j) \
        + sp.diff(kuu_sym, x_i, x_j) \
        + sp.diff(kuu_sym, y_i, y_i, x_j) \
        + phi*sp.diff(kuu_sym, y_j, y_j) \
        + sp.diff(kuu_sym, x_i, y_j, y_j) \
        + sp.diff(kuu_sym, y_i, y_i, y_j, y_j)
def kff(x, y, sigma, l_x, l_y, phi):
    k = sp.zeros(n)
    for i in range(n):
        for j in range(n):
            k[i,j] = kff_sym.subs({x_i: x[i], x_j: x[j], y_i: y[i], y_j: y[j]})
    return k

$k_{fu}(X_i,X_j;\theta,\phi) \\
= \mathcal{L}_{X_i}^{\phi} k_{uu}(X_i, X_j; \sigma) \\
= \phi k_{uu} + \frac{\partial}{\partial x_i}k_{uu} + \frac{\partial^2}{\partial y_i^2}k_{uu}$

In [7]:
kfu_sym = phi*kuu_sym \
        + sp.diff(kuu_sym, x_i) \
        + sp.diff(kuu_sym, y_i, y_i)
def kfu(x, y, sigma, l_x, l_y, phi):
    k = sp.zeros(n)
    for i in range(n):
        for j in range(n):
            k[i,j] = kfu_sym.subs({x_i: x[i], x_j: x[j], y_i: y[i], y_j: y[j]})
    return k

In [8]:
def kuf(x, y, sigma, l_x, l_y, phi):
    return kfu(x, y, sigma, l_x, l_y, phi).T

#### Step 3: Compute NLML

Implementing the covariance matrix K and the (symbolic) nlml function

In [9]:
def K(sigma, l_x, l_y, phi, s):
    K_mat = sp.Matrix(sp.BlockMatrix([
        [kuu(x, y, sigma, l_x, l_y) + s*np.eye(n), kuf(x, y, sigma, l_x, l_y, phi)],
        [kfu(x, y, sigma, l_x, l_y, phi), kff(x, y, sigma, l_x, l_y, phi) + s*np.eye(n)]
    ]))
    return K_mat

In [10]:
def nlml(sigma, l_x, l_y, phi, s):
    #sigma, l_x, l_y, phi = params
    y_con = sp.Matrix(np.concatenate((y_u, y_f)))
    nlml1 = sp.log(sp.det(K(sigma, l_x, l_y, phi, s)))
    nlml2 = y_con.T*(K(sigma, l_x, l_y, phi, s).inv())*y_con
    nlml = nlml1 + nlml2[0,0]
    return nlml

#### Step 4: Optimize hyperparameters

**Nonlinear Conjugate Gradient**

Building the gradient:

In [11]:
nlml_sigma, nlml_l_x, nlml_l_y, nlml_phi = [sp.diff(nlml(sigma, l_x, l_y, phi, s), l) for 
                                            l in [sigma, l_x, l_y, phi]]

In [12]:
def grad_nlml(par):
    nlml_wp_sigma = lambda v: nlml_sigma.subs({sigma:v[0], l_x:v[1], l_y:v[2], phi:v[3]})
    nlml_wp_l_x = lambda v: nlml_l_x.subs({sigma:v[0], l_x:v[1], l_y:v[2], phi:v[3]})
    nlml_wp_l_y = lambda v: nlml_l_y.subs({sigma:v[0], l_x:v[1], l_y:v[2], phi:v[3]})
    nlml_wp_phi = lambda v: nlml_phi.subs({sigma:v[0], l_x:v[1], l_y:v[2], phi:v[3]})
    
    # Minimize (to be exact np.linalg.norm) can't deal with the scipy float type 
    # (=>'Float' object has no attribute 'sqrt')
    out = np.array([nlml_wp_sigma(par), nlml_wp_l_x(par), nlml_wp_l_y(par), nlml_wp_phi(par)], 
                   dtype=np.float64)
    
    return out

In [13]:
grad_nlml((0.70560699, -1.1791943 ,  1.74529172,  2.53646637))

array([-24.54939927,  -3.50291112,   7.92713189,  -7.17654828])

Run the conjugate gradient method:

In [14]:
def minimize_restarts_CG(n=10): 
    nlml_wp = lambda v: nlml(sigma, l_x, l_y, phi, s).subs({sigma:v[0], l_x:v[1], l_y:v[2], phi:v[3]})
    all_results = []
    for it in range(0,n):
        all_results.append(minimize(nlml_wp, np.random.rand(4), jac = grad_nlml,  method="CG"))
    filtered_results = [m for m in all_results if 0==m.status]
    return min(filtered_results, key = lambda x: x.fun)

In [None]:
t1 = time.time()
m = minimize_restarts_CG(2)
t_CG = time.time() - t1
print(m)

**ADAM**

In [None]:
def adams(grad, init, n_epochs=500, eta=10**-4, gamma=0.9,beta=0.99,epsilon=10**-8,noise_strength=0):
    params=np.array(init)
    param_traj=np.zeros([n_epochs+1,4])
    param_traj[0,]=init
    v=0;
    grad_sq=0;
    for j in range(n_epochs):
        noise=noise_strength*np.random.randn(params.size)
        g=np.array(grad(params))+noise
        v=gamma*v+(1-gamma)*g
        grad_sq=beta*grad_sq+(1-beta)*g*g
        v_hat=v/(1-gamma)
        grad_sq_hat=grad_sq/(1-beta)
        params=params-eta*np.divide(v_hat,np.sqrt(grad_sq_hat+epsilon))
        param_traj[j+1,]=params
    return param_traj

Running ADAM:

In [None]:
def minimize_restarts_adam(n=10):
    all_results = []
    for it in range(0,n):
        all_results.append(adams(grad_nlml, np.random.rand(4)))
    return all_results

In [None]:
t1 = time.time()
m = minimize_restarts_adam(2)
t_adam = time.time() - t1
print(m)

In [None]:
print(t_CG,'\n',t_adam)