# Inversion benchmark - data covariance

The covariance matrix appears in the misfit function,

$$
S(\mathbf{m}) = \frac{1}{2} \left[ (\mathbf{g}(\mathbf{m}) - \mathbf{d})^T \mathbf{C}_d^{-1} (\mathbf{g}(\mathbf{m}) - \mathbf{d}) + (\mathbf{m} - \mathbf{m}_p)^T \mathbf{C}_p^{-1} (\mathbf{m} - \mathbf{m}_p) \right]
$$

where the first term is the misfit between $\mathbf{d} = g(\mathbf{m})$ and data $\mathbf{d}$; the second term describes the misfit between model parameters $\mathbf{m}$ and their priors $\mathbf{m}_p$. If uncertainties are uncorrelated, then we can simplify the expression to the $\ell_2$-norm,

$$
S(\mathbf{m}) = \frac{1}{2} \sum_{i} \frac{\vert g(\mathbf{m}) - d \vert^2}{(\sigma_d)^2} + \frac{1}{2} \sum_{j} \frac{\vert \mathbf{m} - \mathbf{m}_p \vert^2}{(\sigma_p)^2}
$$

which is identical if the off-diagonal components of $\mathbf{C}$ are zeros (the diagonal in all cases contain the variance $\sigma^2$).

Here, we construct the covariance matrix $\mathbf{C}_d$ row-by-row using a KDTree to query neighbours within a set euclidean distance. We start with a Gaussian covariance function, but this can be expanded later.

In [None]:
import numpy as np
from petsc4py import PETSc
from scipy.spatial import cKDTree
from scipy import sparse
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# generate some data in 3D space
n = 20

x = np.linspace(0, 2*np.pi, n)
y = np.linspace(-1,1, n)
z = np.zeros(n)

data_coords = np.column_stack([x,y,z])

data = np.sin(x)
sigma_data = np.ones_like(data)*0.1

In [None]:
# plot the data
plt.scatter(x,y,c=sigma_data)

In [None]:
def gaussian(sigma, dist, length_scale):
    return sigma**2 * np.exp(-dist**2/(2.0*length_scale**2))


# set up matrix
mat = PETSc.Mat().create()
mat.setType(mat.Type.AIJ)
mat.setSizes((data.size, data.size))
mat.setPreallocationNNZ((data.size,10))
mat.setFromOptions()
mat.assemblyBegin()

# set up KDTree and maxdist to query
tree = cKDTree(data_coords)
maxdist = 10.0

for i in xrange(0, data.size):
    idx = tree.query_ball_point(data_coords[i], maxdist)
    dist = np.linalg.norm(data_coords[i] - data_coords[idx], axis=1)
    
    row = i
    col = idx
    val = gaussian(sigma_data[idx], dist, maxdist/4)
    
    mat.setValues(row, col, val)

mat.assemblyEnd()

In [None]:
indptr, indices, values = mat.getValuesCSR()

mat_csr = sparse.csr_matrix((values, indices, indptr), shape=(data.size, data.size))
mat_dense = mat_csr.todense()

---

**Plot of covariance matrix**

In [None]:
fig = plt.figure()
ax1 = fig.add_subplot(111)
im1 = ax1.imshow(mat_dense)
fig.colorbar(im1)

In [None]:
res_mult = mat_dense*data.reshape(-1,1)
res_linalg = data * np.linalg.solve(mat_dense, data)

plt.plot(x, data)
plt.plot(x, res_mult)
# plt.plot(x, res_linalg)