# Basic model infrastructure and resampling
uses the sandbox_model.py toolbox

WARNING! This has not been tested on real data. Please check if resonable, especially when reshaping 1D arrays to fit grid model!

In [1]:
import os,sys
sys.path.append('../../sandbox')

#import sandbox as sb
import sandbox_model as sm

import numpy as np
import pandas as pd

First, we create an artificial dataset. That could be the coordinate and value representation of a parsed model, that is based on irregular grid cells. Here we assume, we get the locations of the cell midpoints.

In [2]:
def create_some_data(size = 120*166*105):
    x = np.random.random(size=size) * 300
    y = np.random.random(size=size) * 400
    z = np.random.random(size=size) * 100
    
    # Assume a paramter of grid cells, named zone
    zone = np.random.randint(0,64+1, size=size)
    
    return pd.DataFrame.from_dict({'x': x, 'y': y, 'z': z, 'zone': zone})

In [3]:
data = create_some_data()
data.head(3)

Unnamed: 0,x,y,z,zone
0,135.931242,393.248549,17.752643,34
1,162.933637,73.861392,86.891709,11
2,20.880435,247.728998,8.023002,44


## Grid / model creation

Now, we create a basic grid. We provide coordintes as 1D numpy ranges:

In [4]:
gx = np.arange(0,300,1)
gy = np.arange(0,400,1)
gz = np.arange(0,100,1)
grid = sm.Model(gx, gy, gz)

### Some methods
Unsure about the dimensions, or the shape, you numpy array should have?

In [5]:
grid.dimensions

(300, 400, 100)

Right now, there are no attributes saved. That could be the resampled zones, lithology or simply a mask.

Let's add something :)

In [6]:
density = np.empty((300,400,100))

In [7]:
grid.add_attribute('density', density)

In [8]:
grid.list_attributes() # also prints out the coordinates, because it can be unordered after reloading...

['X', 'Y', 'Z', 'density']

We can also save and load the model. This is based on the netCDF format. When loading, everything will be override!!!

In [9]:
grid.save('grid.nc')

In [10]:
grid.load('grid.nc')

Especially for the resampling we need a list of point coordinates, like np.meshgrid and np.stack creates:

In [11]:
grid.get_coords()

array([[  0,   0,   0],
       [  0,   0,   1],
       [  0,   0,   2],
       ...,
       [299, 399,  97],
       [299, 399,  98],
       [299, 399,  99]], dtype=int32)

To see the xarray dataset behind, just call:

In [12]:
grid.dataset

<xarray.Dataset>
Dimensions:  (X: 300, Y: 400, Z: 100)
Coordinates:
  * Y        (Y) int32 0 1 2 3 4 5 6 7 8 ... 391 392 393 394 395 396 397 398 399
  * X        (X) int32 0 1 2 3 4 5 6 7 8 ... 291 292 293 294 295 296 297 298 299
  * Z        (Z) int32 0 1 2 3 4 5 6 7 8 9 10 ... 90 91 92 93 94 95 96 97 98 99
Data variables:
    density  (X, Y, Z) float64 ...

## Resampling

We init a resampling object and pass our already created grid:

In [13]:
resampling = sm.Resampling(grid)

This class uses the Spotify Annoy library to find nearest neighbours of each grid cell inside the data positions. Therefore, a data lookup object with a tree forest has to be created.
Increasing the number of trees increases accuracy. That means, not all grid cells will be estimated correct! It is a trade-off between estimation (maybe 10 minutes) and perfect calculation, which could take several days.

_Reference: https://github.com/spotify/annoy_

In [14]:
resampling.build_data_lookup(data.iloc[:,0:3].values, tree_number=10)

HBox(children=(IntProgress(value=0, description='Building data lookup...', max=2091600, style=ProgressStyle(de…


Building trees... (That can take a while)


Because building the trees takes a while, we can save and load them again

In [20]:
resampling.save_lookup('data_lookup.ann')

In [14]:
resampling.load_lookup('data_lookup.ann') # Bug! Calculating distances does not work afterwards

Let's find the nearest neighbours for each grid cell, all grid cells that don't have one within 10 units, can be masked later as empty/invalid.

In [15]:
resampling.find_nearest_neighbours(threshold=10)

HBox(children=(IntProgress(value=0, description='Calculating distances...', max=12000000, style=ProgressStyle(…


Creating and applying mask...


True

## What's next?

This creates two important variables:

* **indices:** An 1D array of indices for each grid cell, to look up values in the data
* **mask** An 1D array that masks grid cells as true that could find a data point within the defined treshold distance 

In [17]:
resampling.indices

array([   7263,    7263,    7263, ...,  861012,  312140, 1362224])

In [18]:
resampling.mask

array([ True,  True,  True, ...,  True,  True,  True])

Let's add at least the mask to our grid

In [19]:
grid.add_attribute('mask', resampling.mask)

Cannot assign array. Model dimensions are: (300, 400, 100)


Yep...

In [20]:
mask = resampling.mask.reshape(grid.dimensions)

In [21]:
grid.add_attribute('mask', mask)

In [22]:
grid.list_attributes()

['X', 'Y', 'Z', 'density', 'mask']

## Example: Map zones to grid

In [23]:
# Map data values to grid cells
grid_zones = data['zone'].values[resampling.indices]

In [24]:
# Use mask to set invalid cells
grid_zones = np.where(resampling.mask, grid_zones, np.nan)

In [26]:
# Add to grid :)
grid.add_attribute('zone', grid_zones.reshape(grid.dimensions))

In [27]:
grid.list_attributes()

['X', 'Y', 'Z', 'density', 'mask', 'zone']

In [29]:
# How to get model for visualization?
grid.get_attribute('zone')

array([[[12., 12., 12., ..., 48., 45.,  9.],
        [12., 12., 12., ..., 45., 45.,  9.],
        [56., 61., 12., ..., 45., 45.,  9.],
        ...,
        [19., 52., 47., ...,  1.,  1., 17.],
        [ 9., 47., 47., ...,  1.,  1., 17.],
        [ 9.,  9.,  9., ..., 64.,  1., 17.]],

       [[ 9., 61., 11., ..., 64., 46., 46.],
        [61., 61., 11., ..., 46., 46., 46.],
        [61., 13., 13., ..., 64., 64., 63.],
        ...,
        [ 0.,  0.,  0., ...,  7., 61., 61.],
        [ 2., 10.,  6., ..., 62.,  7., 32.],
        [55., 55., 55., ..., 62., 56., 32.]],

       [[55., 55., 55., ..., 42., 16.,  6.],
        [32., 32., 32., ..., 16., 16.,  6.],
        [32., 60., 60., ..., 51., 51., 24.],
        ...,
        [64.,  3.,  3., ...,  4., 61., 61.],
        [ 3.,  3.,  3., ..., 15.,  9.,  9.],
        [33., 33., 63., ..., 23., 23.,  9.]],

       ...,

       [[ 5.,  5., 23., ..., 33., 33., 34.],
        [ 5., 24., 23., ..., 51., 51., 34.],
        [24., 24., 45., ..., 51., 34., 34.