# Geostatistics

<script>
    console.log("Hello. You'll see this printed in your browser's DevTools / Console. Feel free to delete this line.");
    document.querySelector('head').innerHTML += '<style>.slides { zoom: 1.0 !important; }</style>';
</script>

## 1. Introduction

In [2]:
import json
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
import bokeh
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
import holoviews as hv
from holoviews import opts

In [4]:
hv.extension('bokeh')
output_notebook()

## 1.1 Why observations?

* In many cases we can't observe the earth exhaustively
* Many processes are too complicated to calculate them deterministically

![image.png](attachment:image.png)

### But again: Why observations? Can't we just implement better theories?

The simple answer is *no*. Theories are developed upon experiments and experiments are nothing else, than observations in a controlled environment. For environmental science the main challenge is that we cannot always control the environment and replicate experiments. Thus, we rather speak of *campaigns*.

Consider the following example:

We develop the theory that a specific variable is increaseing over time in the soil. This could be a contaminant. This increase will be limited by some kind of physical properties of the soil. A very general model could be derived for the rules of limited growth:

$$ 
y = L - a * e^{-k*x}
$$

where $L$ is the limit, $k$ the growth rate and $a$ a factor for controlling the shape.

In [3]:
# Our model
def mod(x,a,k,l):
    if isinstance(x, (np.ndarray, list)):
        return np.fromiter(map(lambda _: mod(_,a,k,l), x), dtype=float)
    return l - a * np.exp(-k*x)   

# build an example
data = pd.read_csv('data/sample_data.txt', sep=' ', index_col=None)

# sample the data and apply the model
_x = [1, 5, 7,10,30,70,250,440]
_y = data['7'].values[_x]
cof, _ = curve_fit(mod, _x, _y)
x = np.linspace(0,500, 100)
y = list(map(lambda x: mod(x, *cof), x))

# create the plots
data = hv.Curve(data, 'index', '7', label='data').opts(color='#4ee7ed', alpha=0.8, muted_alpha=0.01)
samples = hv.Scatter(data, 'index', '7', label='samples')[_x].opts(color='#de60f2', marker='o',size=12)
model = hv.Curve(zip(x,y), label='model').opts(color='#2db75e', line_width=2, muted_alpha=0.01)

# build the full example
def obs_example(w=800, h=450):
    return (
        (samples * model).opts(width=w, height=h, legend_position='bottom_right').relabel('Model') +
        (samples * model * data).opts(width=w, height=h, legend_position='bottom_right', legend_muted=True).relabel('Reality')
    ).opts(tabs=True)

  """


In [4]:
obs_example(w=968, h=450)

Thus, we need a theory **how** and **when** to observe **what**. 

#### Conclusions

* We can't predict without observations
* We can't verify models and theory without observations
* The given example only looked at **one** state variable in **one** dimension (time).
* Real systems are multidimensional and instationary 

#### Trash in => trash out

The given model will never be able to predict the system

<hr>

## 1.2 Data Types

We are often faced to system properties, that are clearly influencing system behavior (processes), but cannot be measured on an interval or ratio scale. 

#### scales of measurement

* nominal

* ordinal

* interval

* ratio

How can we use information like ecosystem types, soil types ...

![image.png](attachment:image.png)

In [5]:
from bokeh.transform import factor_cmap
from bokeh.palettes import Set3_10 as colormap

# load sample data
with open('data/soil.geojson', 'r') as f:
    soil = json.load(f)

# convert
xs, ys, types = [], [], []
for f in soil['features']:
    _x, _y = zip(*f['geometry']['coordinates'][0])
    xs.append(_x)
    ys.append(_y)
    types.append(f['properties']['type'])

# build plot data
cmap=factor_cmap('type', palette=colormap, factors=list(set(types)))
_soil_data = dict(
    x=xs,
    y=ys,
    type=types
)

# build plot
soilmap = figure(title='Soil types', x_axis_location=None, y_axis_location=None, tooltips=[('Soil Type:', '@type')])
soilmap.grid.grid_line_color=None
soilmap.patches('x', 'y', source=_soil_data, fill_color=cmap, fill_alpha=.7, line_color='white', line_width=2);

In [6]:
show(soilmap)

One approach to use this data is to use the soil type as a proxy. We can assume that different types of soil have different properties. That means, we can attribute numbers of other scales of measure to the soil types.

In [7]:
proxy = {
    'A': 1, 'B': 1,
    'C': 1.2,
    'D': 1.4,
    'E': 2, 'F': 2, 'G': 2,
    'H': 3, 'I': 3
}

In [8]:
from bokeh.palettes import Viridis6 as palette
from bokeh.models import LogColorMapper

moist = [proxy[t] for t in types]
colormap=LogColorMapper(palette=palette)
moist_data = dict(
    x=xs,
    y=ys,
    type=types,
    moist=moist
)

# build plot
proxymap= figure(title='Soil types', x_axis_location=None, y_axis_location=None, tooltips=[('Soil Type:', '@type'), ('Moisture:', '@moist')])
proxymap.grid.grid_line_color = None
proxymap.patches('x', 'y', source=moist_data, fill_color={'field': 'moist', 'transform': colormap}, fill_alpha=.7, line_color='white', line_width=2);

In [9]:
show(proxymap)

## 1.3 Observations and scale

In [10]:
with open('data/field3d/step_100.txt', 'r') as f:
    field = np.loadtxt(f)

In [11]:
from bokeh.palettes import Viridis256 as palette

img = field.copy()
img[img <= 0.9] = np.NaN
img *= 20

coor = np.array([(21, 86), (4, 64), (25,52), (7,34), (24, 16), (40, 8), (48, 22), (65, 9), (85, 18), (97, 30), (86, 51), (97,71), (53, 56), (64,52), (61, 31), (77,68), (75,30)])

rain = figure(tooltips=[('Variable value:', '@image'), ('x:', '$x'), ('y:', '$y')])
_i = rain.image(image=[img], x=0, y=0, dw=100, dh=100, palette=palette, legend_label='rainfall')
_i.visible = False

rain.cross(coor[:,0], coor[:,1], color='darkred', size=14, legend_label='Observations')

rain.x_range.range_padding = rain.y_range.range_padding = 0
rain.legend.location = 'top_left'
rain.legend.click_policy = 'hide'

### Location
How can we decide when and **where** to observe what? And how often?

We want to observe rainfall to develop a hydrological model for a region. We have a number of devices available. We decide to spread the stations randomly over the regio of interest. Consider the nework shown in the map below. 

Is this a sufficient network to observe rainfall?

In [12]:
show(rain)

We need to know about the **spatial pattern** of the variable. 

### Extend
How can we decide **when** and where to observe what? And how often?

We want to observe a variable that changes on different scales. We might have a daily and an annual cycle. But actually we are interested in **long term trends**. Examples for variables like this are:

* Temperature

* Soil Moisture

We conducted a measurement campaign for several months. It looks like there is a **long term trend**:

In [48]:
campaign = (
    data.opts(
        default_tools=['save', 'reset','xwheel_zoom','pan'],active_tools=['xwheel_zoom'], toolbar='above'
    ).relabel('All data') * 
    data[60:120].opts(color="red", toolbar=None).relabel(' measurement campaign')
).opts(   legend_position="bottom_right", xlim=(50,130))

In [14]:
campaign.opts(width=968, height=450, legend_muted=False)

We need observe at least **one wavelength** to be able to describe the dynamics and **several wavelengths** to describe a trend.

### Certainty

Consider the measurement was taken often enough, long enough and spatially distributed. What did we actually measure?

In many cases we cannot oberve the variable of interest directly. Then, we need to observe somthing correlated and calculate or model the variable of interest. That will make our observation **less certain**.

Combined with errors and measurement precision of the used device, observations are always affected by uncertainty.

In [55]:
# construct large error bars
d = data[60:90]
error = d.data[['index', ]].copy()
error['pos'] = 0
error['error'] = .3*np.random.random(size=30) + 1.5

errorplot = (
        d.opts(color='red').relabel('campaign') *
        hv.Scatter(d.data, 'index', '7').opts(color='red', size=6) *
    hv.ErrorBars(error, label='uncertainty')
).opts(ylim=(-3, 3))

In [56]:
errorplot.opts(width=968, height=450)

We need to know **what** we observed and **how certain** we are about this observation.

### Support

The example above illustrated that we need to know about measurment techniques, to assess uncertainties. 

But the way we measure can also affect how we interpret the values.

In [101]:
df = pd.DataFrame({'x': np.arange(0, 30, 0.1), 'y': np.sin(np.arange(0,30, 0.1)) })
bd = [[i - 1.5, i + 1.5, j] for i,j in [(2.7, 0.427), (3.8, -0.612), (5.6, -0.631), (6.3, 0.2), (8.6, 0.734), (9.4, 0.2)]]

bars = [hv.Curve([ [b[0], b[2]], [b[1], b[2]]]).opts(color='#7298aa', line_width=4) for b in bd]
back = hv.Curve(df, 'x', 'y').opts(color='#98D1ED', toolbar='above', tools=['hover'])

support = (back * hv.Overlay(bars))

In [102]:
support.opts(width=968, height=450)

We need to assure, that the measurement **support** is smaller than the **volatility**.