# Investigation of geophysical sensor data to inform priors

Since we don't have a really great idea of what constitutes a good set of priors for real data, here I try my best to sort out what is going on using what I hope will be simple, but robust, assumptions.

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import GPy

## Noise and length scale characteristics for gravity and magnetism

We've been running with some set of priors for gravity and magnetism, but in all fairness we have no idea what those should be.  We know they're both linear sensors that integrate over rock properties, with a 3-D sensitivity profile that gets broader with depth.  So by fitting a GP to them, we get some idea of the noise, and a lower limit on the relevant length scale.  Since they're on a grid, we could also consider the autocorrelation.

This isn't really meant to be a Bayesian analysis, but it's meant to give us some idea of the order of magnitude of the noise in a model that's flexible enough to respond to changes, but that insists on smoothness so we can pick off the delta-function component of the covariance.

In [None]:
gravdata = pd.read_csv("/Users/davidkohn/dev/obsidian/data/dataset1/gravity_400m_Gascoyne.txt", header=0)
print(gravdata.Latitude.min(), gravdata.Latitude.max())

In [None]:
gravdata

In [None]:
gravdata = gravdata[np.abs(gravdata.Latitude + 24.85) < 0.05]
gravdata = gravdata[np.abs(gravdata.Longitude - 116.1) < 0.05]
print(gravdata.grid_code.min(), gravdata.grid_code.max())
print(gravdata.shape)

In [None]:
X = np.array([gravdata.Latitude, gravdata.Longitude]).T
Y = np.array([gravdata.grid_code]).T
kernel = GPy.kern.Matern32(2)
print "X.shape =", X.shape
print "Y.shape =", Y.shape
model = GPy.models.GPRegression(X, Y, kernel)
model.optimize(messages=True)
fig = model.plot()
print model

This seems pretty weird -- the gravity data seems to have a very long length scale and no obvious noise.  But we can see from the contours that there is some structure.  Not sure what to make of that.

In [None]:
magdata = pd.read_csv("mag_TMI_gascoyne.txt", header=0)
print magdata.Latitude.min(), magdata.Latitude.max()
magdata = magdata[np.abs(magdata.Latitude + 24.85) < 0.015]
magdata = magdata[np.abs(magdata.Longitude - 116.1) < 0.015]
print magdata.grid_code.min(), magdata.grid_code.max()
print magdata.shape

In [None]:
X = np.array([magdata.Latitude, magdata.Longitude]).T
Y = np.array([magdata.grid_code]).T
kernel = GPy.kern.Matern32(2)
print "X.shape =", X.shape
print "Y.shape =", Y.shape
model = GPy.models.GPRegression(X, Y, kernel)
model.optimize(messages=True)
fig = model.plot()
print model

Magnetism, on the other hand, has at least some non-zero Gaussian noise to it.  But surely the length scale is kind of out of whack?  And are those repeated points there?

In [None]:
dX0 = X[:,0].reshape(36,36)[:,0]
print dX0
print dX0[1:] - dX0[:-1]
dX1 = X[:,1].reshape(36,36)[1,:]
print dX1
print dX1[1:] - dX1[:-1]

Oooh looks like they are.  Well, in a way that's useful, if those are real -- in principle they give us the noise scale.  But if it isn't real, it's not clear this would have worked.