# Use the geostatistical toolbox container

This short tutorial shows, how you can interact with the container using the help of `toolbox-runner`. 
The runner is still under development, and interacting with the result files is still a bit messy.
Right now, the results are placed into a temporary folder and the result files are extracted by hand. In  stable release,
`toolbox-runner` will have helper functions for that purpose.

In [1]:
from toolbox_runner import list_tools
import io
import tempfile
import json
import pandas as pd
import xarray as xr
import numpy as np
from pprint import pprint

Use `list_tools` to get a list or dict of all tool images present on your system. Additionally, a CSV file with example data is loaded.
The `list_tools` function accepts also a prefix, if you did not prefix your tool container with `'tbr_'`.

In [2]:
tools = list_tools(as_dict=True)
pprint(tools)

{'foobar': foobar: Foo Bar  FROM tbr_octave:latest VERSION: 0.1,
 'kriging': kriging: Kriging interpolation  FROM tbr_skgstat:latest VERSION: 1.0,
 'profile': profile: Dataset Profile  FROM tbr_profile:v1.0.0 VERSION: 0.2,
 'sample': sample: Sample field data  FROM tbr_skgstat:latest VERSION: 1.0,
 'simulation': simulation: Geostatistical simulation  FROM tbr_skgstat:latest VERSION: 1.0,
 'variogram': variogram: Variogram fitting  FROM tbr_skgstat:latest VERSION: 1.1}


In [3]:
# load some data samples
#df = pd.read_csv('in/meuse.csv')
#coords = df[['x', 'y']].values
#vals = df.lead.values
#print(df.head())

ds = xr.open_dataset('in/cmip_prec.nc')
ds

## General info about tools

Although using `toolbox-runner` is not mandatory and the tool container are self-contained, there are some helpful functions.
You can get structured metadata about each tool:

In [3]:
vario = tools.get('variogram')

print(vario.title)
print(vario.description)
print('\nPARAMETERS\n---------\n')
for key, conf in vario.parameters.items():
    print(f"## {key}")
    print(f"input type:    {conf['type']}")
    print(f"description:   {conf.get('description', 'no description provided')}")
    print()

Variogram fitting
Estimate an empirical variogram and fit a model

PARAMETERS
---------

## coordinates
input type:    file
description:   Pass either a (N, D) shaped numpy array, or a .mat file containing the matrix of observation location coordinates

## values
input type:    file
description:   Pass either a (N, 1) shaped numpy array or a .mat file containing the matrix of observations

## n_lags
input type:    integer
description:   no description provided

## bin_func
input type:    enum
description:   no description provided

## model
input type:    enum
description:   no description provided

## estimator
input type:    enum
description:   no description provided

## maxlag
input type:    string
description:   Can be 'median', 'mean', a number < 1 for a ratio of maximum separating distance or a number > 1 for an absolute distance

## fit_method
input type:    enum
description:   no description provided

## use_nugget
input type:    bool
description:   Enable the nugget parameter

## Variogram

Run the variogram tool to estimate a base variogram, which will then be used for the other tools.
The variogram parameterization is stored in a `'./out/variogram.json'` in the result tarball. Right now, we have to read 
this ourselfs and parse the json

The container does also produce HTML and JSON plotly figures, a PDF figure and a copy of all settings of the `skg.Variogram` instance.

In [4]:
vario = tools.get('variogram')
with tempfile.TemporaryDirectory() as dir:
    res = vario.run(result_path=dir, coordinates=coords, values=vals, maxlag='median', model='exponential', bin_func='scott')
    if res.has_errors:
        print(res.errors)
    else:
        vario_params = json.loads(res.get_file('./out/variogram.json').decode())
print(vario_params)

{'estimator': 'matheron', 'model': 'exponential', 'dist_func': 'euclidean', 'bin_func': 'scott', 'normalize': False, 'fit_method': 'trf', 'fit_sigma': None, 'use_nugget': False, 'maxlag': 1372.6660191029719, 'n_lags': 10, 'verbose': False}


## Simulation

Run a geostatistical simulation using the variogram from the last step. The simulation result is returned as `.mat` files. 
Right now, we need to put these into a file-like object and read them using numpy. That will be improved in the future.

The current version of the tool can only return the mean and standard deviation of the simulated fields. A future version
will have the optional export of a netCDF containing every simulation.

In [5]:
simu = tools.get('simulation')
with tempfile.TemporaryDirectory() as dir:
    res = simu.run(result_path=dir, coordinates=coords, values=vals, variogram=vario_params, n_simulations=10, grid='50x50')
    print(res.errors)
    print(res.log)
    print(res.outputs)

    buf = io.StringIO()
    buf.write(res.get_file('./out/simulation_mean.mat').decode())
    buf.seek(0)
    mean = np.loadtxt(buf)
    buf = io.StringIO()
    buf.write(res.get_file('./out/simulation_std.mat').decode())
    buf.seek(0)
    std = np.loadtxt(buf)
        

/bin/sh: 1: docker: not found

Estimating variogram...
exponential Variogram
---------------------
Estimator:         matheron
        
Effective Range:   1372.67
        
Sill:              17740.72
        
Nugget:            0.00
        
Starting 10 iterations seeded 42
[1/10]
[2/10]
[3/10]
[4/10]
[5/10]
[6/10]
[7/10]
[8/10]
[9/10]
[10/10]

['./out/STDERR.log', './out/STDOUT.log', './out/result.json', './out/simulation_mean.mat', './out/simulation_std.mat', './out/variogram.html', './out/variogram.json', './out/variogram.pdf']


#### Plot the simulation result

The tool does not yet include an automatic plot as result (like variogram), but we can easily build this using plotly.

In [11]:
import plotly.offline as py
import plotly.graph_objects as go
py.init_notebook_mode(connected=True)

In [12]:
fig = go.Figure()

xx = np.mgrid[coords[:,0].min():coords[:,0].max():50j]
yy = np.mgrid[coords[:,1].min():coords[:,1].max():50j]

fig.add_trace(go.Heatmap(z=mean.T, x=xx, y=yy))
fig.add_trace(go.Scatter(x=coords[:,0], y=coords[:,1], mode='markers', marker=dict(color=vals)))
py.iplot(fig)

## Kriging

We can pass the same variogram to the kriging tool and compare interpolation and simulation

In [9]:
krig = tools.get('kriging')
with tempfile.TemporaryDirectory() as dir:
    res = krig.run(result_path=dir, coordinates=coords, values=vals, variogram=vario_params, algorithm='ordinary', grid='50x50')
    print(res.errors)
    print(res.log)
    print(res.outputs)

    buf = io.StringIO()
    buf.write(res.get_file('./out/kriging.mat').decode())
    buf.seek(0)
    krig_field = np.loadtxt(buf)
    buf = io.StringIO()
    buf.write(res.get_file('./out/sigma.mat').decode())
    buf.seek(0)
    sig = np.loadtxt(buf)
        

/bin/sh: 1: docker: not found

Estimating variogram...
exponential Variogram
---------------------
Estimator:         matheron
        
Effective Range:   1372.67
        
Sill:              17740.72
        
Nugget:            0.00
        
Start interpolation...done. Took 0.07 seconds.

['./out/STDERR.log', './out/STDOUT.log', './out/kriging.mat', './out/result.json', './out/sigma.mat', './out/variogram.html', './out/variogram.json', './out/variogram.pdf']


In [13]:
fig = go.Figure()

xx = np.mgrid[coords[:,0].min():coords[:,0].max():50j]
yy = np.mgrid[coords[:,1].min():coords[:,1].max():50j]

fig.add_trace(go.Heatmap(z=krig_field.T, x=xx, y=yy))
fig.add_trace(go.Scatter(x=coords[:,0], y=coords[:,1], mode='markers', marker=dict(color=vals)))
py.iplot(fig)

## Sampling

sampling tool

In [21]:
sample = tools.get('sample')
step = sample.run(result_path='test/', field=ds.prec.values, sample_size=200, method='random', seed=42)
step

test/1667822677_sample.tar.gz

In [23]:
step.outputs

['STDERR.log', 'STDOUT.log', 'coordinates.mat', 'values.mat']

In [26]:
coords = step.get('coordinates.mat')
coords

array([[ 46.,  37.],
       [ 30.,  52.],
       [ 36.,  46.],
       [ 20.,   9.],
       [ 19.,   9.],
       [  5.,  36.],
       [  0.,  52.],
       [ 35.,  67.],
       [ 16.,  24.],
       [ 26.,  62.],
       [ 29.,  58.],
       [ 24.,  95.],
       [ 18.,  57.],
       [  7.,  81.],
       [ 10.,  78.],
       [ 43.,  39.],
       [ 24.,  80.],
       [ 17.,  71.],
       [ 34.,  81.],
       [  4.,  23.],
       [ 41.,  86.],
       [ 10.,  84.],
       [ 40.,   7.],
       [ 23.,  75.],
       [ 30.,   3.],
       [ 41.,  16.],
       [ 20.,  17.],
       [ 38.,  98.],
       [ 31., 103.],
       [ 18.,   9.],
       [  3.,  36.],
       [ 21.,  42.],
       [ 20.,  19.],
       [ 32.,  79.],
       [ 19.,  47.],
       [ 38.,  43.],
       [ 45.,  68.],
       [ 33.,  92.],
       [ 14.,  90.],
       [ 13.,  29.],
       [ 16.,  33.],
       [ 21.,  17.],
       [ 11.,  52.],
       [  0.,  89.],
       [  7.,  13.],
       [ 46.,   6.],
       [ 46.,  71.],
       [  7.,