### Disclaimer

The following notebook was compiled for the course 'Geostatistics' at Ghent University (lecturer-in-charge: Prof. Dr. Ellen Van De Vijver; teaching assistant: Pablo De Weerdt). It consists of notebook snippets created by Michael Pyrcz. The code and markdown (text) snippets were edited specifically for this course, using the 'Jura data set' (Goovaerts, 1997) as example in the practical classes. Some new code snippets are also included to cover topics which were not found in the Geostastpy package demo books.<br> <br>  **This is a draft notebook** The concepts are presented but actual methodology and results might differ from SGeMS outcomes.

This notebook is for educational purposes.<br> 

Guidelines for getting started were adapted from the 'Environmental Soil Sensing' course at Ghent University (lecturer-in-charge: Prof. Dr. Philippe De Smedt).<br> 

The Jura data set was taken from: Goovaerts P., 1997. Geostatistics for Natural Resources Evaluation. Oxford University Press.

**Don't forget to save a copy on your Google drive before starting**

You can also 'mount' your Google Drive in Google colab to directly access your Drive folders (e.g. to access data, previous notebooks etc.)

Do not hesitate to contact us for questions or feel free to ask questions during the practical sessions.

# Geostatistics: Introduction to geostatistical data analysis with Python

In [None]:
# Import required packages for setup
# -------------------------------------------- #

import sys
import os

In [None]:
#  Clone the repository and add it to the path

if 'google.colab' in sys.modules:

    repo_path = '/content/draft_E_I002454_Geostatistics'
    if not os.path.exists(repo_path):
        !git clone https://github.com/SENSE-UGent/E_I002454_Geostatistics.git
    if repo_path not in sys.path:
        sys.path.append(repo_path) #Default location in Google Colab after cloning

else:
    # if you are not using Google Colab, change the path to the location of the repository

    repo_path = r'c:/Users/pdweerdt/Documents/Repos/draft_E_I002454_Geostatistics' # Change this to the location of the repository on your machine
    if repo_path not in sys.path:
        sys.path.append(repo_path) 

# Import the setup function
from Utils.setup import check_and_install_packages

# Read the requirements.txt file

requirements_path = repo_path + '/Utils/requirements.txt'

with open(requirements_path) as f:
    required_packages = f.read().splitlines()

# Check and install packages
check_and_install_packages(required_packages)

#### Load Required libraries

In [None]:
import geostatspy
import geostatspy.GSLIB as GSLIB                              # GSLIB utilities, visualization and wrapper
import geostatspy.geostats as geostats                        # if this raises an error, you might have to check your numba installation   
print('GeostatsPy version: ' + str(geostatspy.__version__))   # these notebooks were tested with GeostatsPy version: 0.0.72

In [None]:
from Utils.func import (read_mod_file, beyond2, ik2d_v2_loc, ik2d_v2, ordrel2, 
    # cova2, 
    calculate_etype_and_conditional_variance)

We will also need some standard packages. These should have been installed.

In [None]:
from tqdm import tqdm                                         # suppress the status bar
from functools import partialmethod

tqdm.__init__ = partialmethod(tqdm.__init__, disable=True)
                                   
import numpy as np                                            # ndarrays for gridded data
                                       
import pandas as pd                                           # DataFrames for tabular data

import matplotlib.pyplot as plt                               # for plotting

from scipy import stats                                       # summary statistics

plt.rc('axes', axisbelow=True)                                # plot all grids below the plot elements

ignore_warnings = True                                        # ignore warnings?
if ignore_warnings == True:                                   
    import warnings
    warnings.filterwarnings('ignore')

from IPython.utils import io                                  # mute output from simulation

seed = 42                                                     # random number seed

### Optional libraries

These are not required to run the given version of this practical exercise, but might be useful if you want to extend this notebook with more code.

In [None]:
#  import math library
import math

import cmath

In [None]:
from scipy.stats import pearsonr                              # Pearson product moment correlation
from scipy.stats import spearmanr                             # spearman rank correlation    
                                   
import seaborn as sns                                         # advanced plotting

import matplotlib as mpl                                        

from matplotlib.ticker import (MultipleLocator, AutoMinorLocator) # control of axes ticks
from matplotlib.colors import ListedColormap 
import matplotlib.ticker as mtick 
import matplotlib.gridspec as gridspec

### Set the Working Directory

Do this to simplify subsequent reads and writes (avoid including the full address each time). 

##### For use in Google Colab

Run the following cell if you automatically want to get the data from the repository and store it on your Google Colab drive

In [None]:
# get the current directory and store it as a variable

cd = os.getcwd()
print('Current Working Directory is ', cd)

##### For local use

Only run the following cell if you have the data locally stored.

In [None]:
# set the working directory, place an r in front to address special characters
os.chdir(r'c:\Users\pdweerdt\Documents\Repos')

# get the current directory and store it as a variable

cd = os.getcwd()
print('Current Working Directory is ', cd)

### Loading Tabular & Gridded Data

Here's the section to load our data file into a Pandas' DataFrame object.

Let's load and visualize a grid also.

Check the datatype of your gridded data.

In this case it is actually also a .dat file, so we can use the same function to import it. The .grid extension was given to indicate that it is gridded data.

In [None]:
# Here you can adjust the relative Path to the data folder

data_path = cd + '/draft_E_I002454_Geostatistics/Hard_data' 

You can actually just import the prediction dataset but let's import the same data as used in SGeMS just to be sure. Note that actual thresholds can be slightly different.

In [None]:
file_name = '//Cd_9thresh.dat'

df = GSLIB.GSLIB2Dataframe(data_path + file_name) # read the data

df.head()

In [None]:
grid_file_name = '//rocktype.grid'

# load the data

df_grid = GSLIB.GSLIB2Dataframe(data_path + grid_file_name)

df_grid.head()

### Define feature of interest

In [None]:
feature = 'Cd'
unit = 'ppm'
dist_unit = 'km'

In [None]:
 # grid plotting parameters
xmin = 0; xmax = np.ceil(df.X.max()) # range of x values
ymin = 0; ymax = np.ceil(df.Y.max()) # range of y values

In [None]:
#  define a colormap

cmap = plt.cm.inferno                                         # color map inferno

cmap_rainb = plt.cm.turbo # similar to what is shown on the slides

## Calculate some statistics

In P1 we calculated some statistics

In [None]:
min_feat = round((df[feature].values).min(), 2)                    # calculate the minimum
max_feat = round((df[feature].values).max(), 2)                    # calculate the maximum
mean_feat = round((df[feature].values).mean(), 2)                  # calculate the mean
stdev_feat = round((df[feature].values).std(), 2)                  # calculate the standard deviation
n_feat = df[feature].values.size                                   # calculate the number of data

print('The minimum is ' + str(min_feat) + ' ' + str(unit) + '.')   # print univariate statistics
print('The maximum is ' + str(max_feat) + ' ' + str(unit) + '.')
print('The mean is ' + str(mean_feat) + ' ' + str(unit) + '.')
print('The standard deviation is ' + str(stdev_feat) + ' ' + str(unit) + '.')
print('The number of data is ' + str(n_feat) + '.')


# Indicator Kriging

**DRAFT VERSION**

#### Indicator Kriging for continuous data

To demonstrate indicator kriging variogram models are given, rather than calculate experimental variograms and then model them.

Let's first set up the basic indicator kriging parameters:

Now, let's specify the thresholds

In [None]:
#  often the thresholds are chosen corresponding to the 9 deciles of the data
#  
#  the probabilities corresponding to the deciles 
probabilities = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] 

#  the deciles are calculated using the numpy percentile function, 
# this function assumes percentages instead of fractions, so we multiply the probabilities by 100
deciles = np.percentile(df[feature].values, [p * 100 for p in probabilities]) # 10 deciles

print('The deciles are: ' + str(deciles))

ncut = 9                                                     # number of thresholds

thresholds = deciles.copy() # copy the deciles to the thresholds

The variograms have been modelled before so we can read the parameters from the .mod files

In [None]:
# read .mod file

# read all mod files corresonding to the 9 thresholds
# use a for loop to read the mod files and store them in a list
#  the mod files are stored in the data_path folder, with the name 'Cd_9thresh_0.mod', 'Cd_9thresh_1.mod', etc.
varios = []

for i in range(0, ncut):

    mod_file_name = '//variogramthr' + str(i+1) + '.mod' # name of the .mod file
    print('Reading ' + data_path + mod_file_name)
    var = read_mod_file(data_path + '//varmods_IK' + mod_file_name) # read the data
    # append the variogram to the list
    varios.append(var)

varios

Let's have a look at the IK function parameters

In [None]:
help(ik2d_v2_loc)

The code is still a bit slow, might take around 2min

In [None]:
# parameters for IK
ivtype = 1                                                    # variable type, 0 - categorical, 1 - continuous
tmin = -999; tmax = 9999
ndmin = 2; ndmax = 15                                         # minimum and maximum data for kriging 
radius = 1                                                  # maximum search distance
ktype = 1                                                     # kriging type, 0 - simple, 1 - ordinary

results = ik2d_v2_loc(df,'X','Y',feature,ivtype,ncut,
                      thresholds,probabilities,tmin, tmax, df_grid, 'x', 'y', 
                      ndmin, ndmax, radius, ktype, varios)

In [None]:
# let's have a look at the results
results.head()

In [None]:
# Plot one of the results
result = 'estimate_thresh_3'

GSLIB.locmap_st(results, 'x', 'y', result, 0, 5.2, ymin, ymax, 
                0, 1, # set the value range for the color map
                (
                    'Location Map Grid points ' 
               #   + str(grid_feature)
                 ), 
                 'X (km)', 'Y (km)',
             'probability', cmap_rainb)
plt.subplots_adjust(left=0.0, bottom=0.0, right=2, top=2.2, wspace=0.25, hspace=0.25); 

In [None]:
# plot all the results,

for i in range(1, ncut):
    plt.subplot(4,2,i) # 3 rows, 3 columns, i+1 is the index of the subplot
    GSLIB.locmap_st(results,'x', 'y', 'estimate_thresh_' + str(i),
                0, 5.2, ymin, ymax, 
                0, 1, # set the value range for the color map
                (
                    'Location Map Grid points ' 
               #   + str(grid_feature)
                 ), 
                 'X (km)', 'Y (km)',
             'probability', cmap_rainb)
    
plt.subplots_adjust(left=0.0, bottom=0.0, right=2, top=5.0, wspace=0.125, hspace=0.125); 
plt.show()

## Post processing

### CDF reconstruction parameters (ccdf)

In [None]:
# ccdf parameters

zmin=0.0 # lower bound
zmax=np.nan # unbounded

ltail=2 # lower tail; 1=power, #2 power with par, 3= lin, 4=hyperbo
ltpar=2.5 #omega
middle=1    # Straight Linear Interpolation for middle values
mpar=0      # not applicable for linear interpolation
utail=4 #upper tail; 1=power, #2 power with par, 3= lin, 4=hyperbo
utpar=1.5 # omega

UNEST = -1.0 #default value for unestimated in context of postprocessing functions


### ccdf value estimation

In [None]:
help(beyond2)

In [None]:
# let's map the probability to exceed a threshold of 1.2 ppm

zval = 1.2

# initialise a column for the results
results['prob_exc'] = np.nan

n_pred = results.shape[0]

ccut = thresholds

# loop over the results and calculate the cdf value for each threshold
# iterate over every row in results
for i in range(n_pred):
    cdfval = UNEST
    # get the cdf values for the current row
    ccdf = results.iloc[i, 2:11]
    
    cdfval = beyond2(1, len(ccut), ccut, ccdf, 9, thresholds, probabilities, 
                            zmin, zmax, ltail, ltpar, middle, mpar, utail, utpar, zval, cdfval)[1]
    
    results.at[i, 'prob_exc'] = 1-cdfval


In [None]:
# plot the resulting probability map

GSLIB.locmap_st(results, 'x', 'y', 'prob_exc',
                0, 5.2, ymin, ymax, 
                0, 1, # set the value range for the color map
                (
                    'Porbability to exceed threshold of 1.2 ppm ' 
               #   + str(grid_feature)
                 ), 
                 'X (km)', 'Y (km)',
             'probability', cmap_rainb)
plt.subplots_adjust(left=0.0, bottom=0.0, right=2, top=2.2, wspace=0.25, hspace=0.25);

In [None]:
# def calculate_etype_and_conditional_variance(ccdf, ccut, maxdis, zmin, zmax, ltail, ltpar, middle, mpar, utail, utpar):
#     """
#     Calculate the e-type and conditional variance based on ccdf and ccut arrays using beyond2 for CCDF reconstruction.

#     Parameters:
#         ccdf (list of float): Cumulative distribution function values.
#         ccut (list of float): Cutoff values corresponding to ccdf.
#         maxdis (int): Maximum discretization for calculations.
#         zmin (float): Minimum Z value.
#         zmax (float): Maximum Z value.
#         ltail (int): Option to handle values in the lower tail.
#         ltpar (float): Parameter for the lower tail option.
#         middle (int): Option to handle values in the middle.
#         mpar (float): Parameter for the middle option.
#         utail (int): Option to handle values in the upper tail.
#         utpar (float): Parameter for the upper tail option.

#     Returns:
#         tuple: A tuple containing e-type (float) and conditional variance (float).
#     """
#     dis = 1.0 / maxdis
#     cdfval = -0.5 * dis
#     etype = 0.0
#     ecv = 0.0

#     for _ in range(maxdis):
#         cdfval += dis
#         zval = -1.0

#         # Use beyond2 for CCDF reconstruction
#         zval = beyond2(1, len(ccut), ccut, ccdf, 0, [], [], zmin, zmax, ltail, ltpar, middle, mpar, utail, utpar, zval, cdfval)[0]

#         etype += zval
#         ecv += zval * zval

#     etype /= maxdis
#     ecv = max((ecv / maxdis - etype * etype), 0.0)

#     return etype, ecv

### E-type estimate, conditional variance

In [None]:
# e-type and conditional variance calculation
# let's do posprocessing for every predicted grid cell,
# every grid cell prediction constists a list of predicted cdf values 

# get the number of prediction locations
n_pred = results.shape[0]

ccut = thresholds

# initiate columns for e-type and conditional variance
results['etype'] = np.nan
results['ecv'] = np.nan

# iterate over every row in results
for i in range(n_pred):
    cdfval = -1.0
    zval = -1.0
    # get the cdf values for the current row
    ccdf = results.iloc[i, 2:11]
    # calculate the e-type and conditional variance
    etype, ecv = calculate_etype_and_conditional_variance(ccdf, ccut, 9, zmin, zmax, ltail, ltpar, middle, mpar, utail, utpar)
    # store the results in the DataFrame
    results.at[i, 'etype'] = etype
    results.at[i, 'ecv'] = ecv


In [None]:
# plot the etype 

GSLIB.locmap_st(results, 'x', 'y', 'etype', 0, 5.2, ymin, ymax,
                0, 3, # set the value range for the color map
                (
                    'E-type prediction IK' 
               #   + str(grid_feature)
                 ), 
                 'X (km)', 'Y (km)',
             'e-type' + str(unit), cmap_rainb)

plt.subplots_adjust(left=0.0, bottom=0.0, right=2, top=2.2, wspace=0.25, hspace=0.25);

In [None]:
# plot the conditional variance

GSLIB.locmap_st(results, 'x', 'y', 'ecv', 0, 5.2, ymin, ymax,
                0, 8, # set the value range for the color map
                (
                    'Conditional Variance IK' 
               #   + str(grid_feature)
                 ), 
                 'X (km)', 'Y (km)',
             'ecv' + str(unit), cmap_rainb)

plt.subplots_adjust(left=0.0, bottom=0.0, right=2, top=2.2, wspace=0.25, hspace=0.25);

# Sequential Gaussian simulation

**DRAFT VERSION**

For now only implmented for regular grid simulations

In [None]:
help(geostats.sgsim)

In [None]:
# Configure grid parameters
# Determine grid dimensions from the grid data
nx = len(df_grid['x'].unique())
ny = len(df_grid['y'].unique())
nz = 1  # 2D grid, so nz=1

# Get grid origin and cell size
xmn = df_grid['x'].min()
ymn = df_grid['y'].min()
zmn = 0.0  # 2D grid
xsiz = 0.05
ysiz = 0.05
zsiz = 0.05  # Not used in 2D

# Block discretization (for point kriging, set to 1)
nxdis, nydis, nzdis = 1, 1, 1

print(f"Grid dimensions: {nx} x {ny}")
print(f"Grid origin: ({xmn}, {ymn})")
print(f"Grid cell size: {xsiz} x {ysiz}")

In [None]:
# :param vcol: name of the variable column
#     :param wcol: name of the weight column, if None assumes equal weighting
#     :param ismooth: if True then use a reference distribution
#     :param dfsmooth: pandas DataFrame required if reference distribution is used
#     :param smcol: reference distribution property (required if reference
#                   distribution is used)
#     :param smwcol: reference distribution weight (required if reference
#                    distribution is used)

The code is still kinda slow. Please note this can take longtime (depending on number of realizations) around 1min per realization

In [None]:
zmin = 0.00; zmax = np.nan                                  # feature min and max values 

nreal = 50                                                 # number of realizations
ndmin = 0; ndmax = 15                                     # number of data for each kriging system
vario = GSLIB.make_variogram(nug=0.4,nst=1,it1=2,cc1=0.6,azi1=0.0,hmaj1=1.1,hmin1=1.1)
tmin = -999; tmax = 999

sim_sk = geostats.sgsim(df,'X','Y',feature,wcol=-1,scol=-1,tmin=tmin,tmax=tmax,itrans=1,ismooth=0,dftrans=0,tcol=0,
            twtcol=0,zmin=zmin,zmax=zmax,
            ltail=ltail,ltpar=ltpar,utail=utail,utpar=utpar, # as defined earlier!
            nsim=nreal,
            nx=nx,xmn=xmn,xsiz=xsiz,ny=ny,ymn=ymn,ysiz=ysiz, # grid, as defined earlier!
            seed=73073,
            ndmin=ndmin,ndmax=ndmax,nodmax=20,mults=1,nmult=3,noct=-1,ktype=0,colocorr=0.0,sec_map=0,vario=vario)

In [None]:
# plot the first 4 realizations



for isim in range(0,4):
    plt.subplot(2,2,isim+1) # 2 rows, 2 columns, isim+1 is the index of the subplot
    GSLIB.locpix_st(sim_sk[isim],xmin,xmax,ymin,ymax,xsiz,0,3,
                    df,'X','Y',feature,
                    'Sequential Gaussian Simulation w. Simple Kriging #' + str(isim+1),
                    'X(m)','Y(m)', str(feature) + '(' + str(unit) + ')', cmap_rainb)

plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=2.1, wspace=0., hspace=0.1); plt.show()

### Post processing

In [None]:
threshold = 1.2 # threshold for probability calculation

prob12 = geostats.local_probability_exceedance(realizations = sim_sk, threshold = threshold) # local probability featu > 1.2


GSLIB.locpix_st(prob12,xmin,xmax,ymin,ymax,xsiz,0.0,1.,df,'X','Y',
                feature,'PostSIM - Probability Exceed ' + str(threshold) + str(unit),
                'X(m)', 'Y(m)','Probability Exceedance',cmap_rainb)

plt.subplots_adjust(left=0.0, bottom=0.0, right=2.5, top=1., wspace=0.2, hspace=0.2); plt.show()

In [None]:
e_type = geostats.local_expectation(sim_sk)             # local expectation map
local_stdev = geostats.local_standard_deviation(sim_sk) # local standard deviation map

plt.subplot(2,2,1)
GSLIB.locpix_st(e_type,xmin,xmax,ymin,ymax,xsiz,
                0.0,3, #min and max for the color map
                df,'X','Y',feature,'PostSIM - e-type Model',
                'X(m)','Y(m)', str(feature) + 'e-type',cmap_rainb)

plt.subplot(2,2,2)
GSLIB.locpix_st(local_stdev,xmin,xmax,ymin,ymax,xsiz,
                0.0,3, #min and max for the color map
                df,'X','Y',feature,'PostSIM - Local Standard Deviation Model',
                'X(m)','Y(m)' ,str(feature) + 'Local Standard Deviation',cmap_rainb)

plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=2.5, wspace=0.2, hspace=0.2)
plt.show()

## Jackknife validation

Repeat the process but choose validation locations as the grid where you want to make predictions...

Note: you can validate the e-type predictions from IK and SGS

**NOT IMPLEMENTED YET, DIY**

In [None]:
file_name = '//validation.dat'

df_val = GSLIB.GSLIB2Dataframe(data_path + file_name) # read the data

df_val.head()

In [None]:
%%capture --no-display  

val_method = 'jk'

max_points = 15
min_points = 2
search_radii = [1,1]

feature = 'Cd'

n_feat = df_val[feature].values.size                                   # calculate the number of data

# Initialize empty lists to add to the results df
val_method_vals = []

MPE_vals = []
MSPE_vals = []
RMSPE_vals = []
MAPE_vals = []
rel_nna_vals = []
Pr_vals = []
Sr_vals = []

results_df_v = pd.DataFrame()

# Perform validation initialize variables

a_c = 0 # for the cumulative error
a = 0 # for the error
a_c_a = 0 #for the absolute cum error
a_c_s = 0 #for the squared cum error

data_pred = df.copy()
data_val = df_val.copy()

# Perform 
tmin = -999; tmax = 9999

method = '' #define method: OK, IK, ...

results_val = # method function with parameters here

data_val[method + feature] = results_val
data_val[method + feature + '_var'] = results_val_var # if applicable

# Calculate error on test set
data_val['r'] = data_val[method + feature] - data_val[feature] 

# calculate number of residuals without NA values
data_val['r'] = data_val['r'].replace(-99999, np.nan) # replace dummy value with NaN
n_feat = data_val['r'].count() # count the number of residuals without NA values

# print("r-value ", data_val['r'])

data_val['r_s'] = data_val['r']**2

data_val['r_a'] = data_val['r'].abs()

# Calculate cumulative error

a_c = data_val['r'].sum() #cumulative error

a_c_a = data_val['r_a'].sum() #cumulative absolute error

a_c_s = data_val['r_s'].sum() #cumulative squared error

# Round ac and aca
a_c = round(a_c, 2)
a_c_a = round(a_c_a, 2)
a_c_s = round(a_c_s, 2)

#calculate Mean prediction error
MPE = round(a_c/n_feat, 2)

print("Mean Prediction Error:", MPE)    

#Calculate Mean squared prediction error
MSPE = round(a_c_s/n_feat, 2)
print("Mean Squared Prediction Error:", MSPE)

#Calculate Root mean squared prediction error
RMSPE = round(math.sqrt(a_c_s/n_feat), 2)
print("Root Mean Squared Prediction Error:", RMSPE)

#calculate Mean absolute prediction error
MAPE = round(a_c_a/n_feat, 2)
print("Mean Absolute Prediction Error:", MAPE)

#Pearson correlation coefficient
#read in the data, drop na to avoid errors
data_cor = data_val.dropna(subset=[feature, method + feature])

#extract the columns of interest 
x = data_cor[feature]
y = data_cor[method + feature]

#calculate the Pearson's correlation coefficient 
corr_p, _ = pearsonr(x, y)
corr_p = round(corr_p, 2)
print('Pearsons correlation: %.3f' % corr_p)

# Spearman's Correlation:
#calculate the Spearman's correlation coefficient 
corr_s, _ = spearmanr(x, y)
corr_s = round(corr_s, 2)
print('Spearmans correlation: %.3f' % corr_s)

# Store the index values in the respective lists
MPE_vals.append(MPE)
MSPE_vals.append(MSPE)
RMSPE_vals.append(RMSPE)
MAPE_vals.append(MAPE)
Pr_vals.append(corr_p)
Sr_vals.append(corr_s)
val_method_vals.append(val_method)

# Create a new DataFrame to store the results for this variable and parameter settings
results_temp_df = pd.DataFrame()
results_temp_df['ValidationMethod'] = val_method_vals
results_temp_df['MPE'] = MPE_vals
results_temp_df['MSPE'] = MSPE_vals
results_temp_df['RMSPE'] = RMSPE_vals
results_temp_df['MAPE'] = MAPE_vals
results_temp_df['PearsonCorr'] = Pr_vals
results_temp_df['SpearmanCorr'] = Sr_vals

# Append the results for this variable and parameter settings to the main DataFrame
results_df_v_2d = pd.concat([results_df_v, results_temp_df], ignore_index=True)

results_df_v_2d.head()

# Draft Code

**Draft code for IK on a regular grid** under construction

## Regular grid IK

In [None]:
import matplotlib.ticker as mticker                           # custom colorpar ticks

In [None]:
nxdis = 1; nydis = 1                                          # block kriging discretizations, 1 for point kriging
ndmin = 2; ndmax = 15                                         # minimum and maximum data for kriging 
radius = 1                                                  # maximum search distance
ktype = 1                                                     # kriging type, 0 - simple, 1 - ordinary
ivtype = 1                                                    # variable type, 0 - categorical, 1 - continuous
tmin = -999; tmax = 9999

In [None]:
# %%capture --no-display   
UNEST = -999.0

no_trend = np.zeros((1,1))                                    # null ndarray not of correct size so ik2d will not trend
ikmap = ik2d_v2(df,'X','Y',feature,ivtype,0,ncut,thresholds,probabilities,no_trend,tmin,tmax,nx,xmn,xsiz,ny,ymn,ysiz,
                # nxdis,nydis,
                ndmin,ndmax,radius,ktype,varios)

In [None]:
def locpix_colormaps_st(array,xmin,xmax,ymin,ymax,step,vmin,vmax,df,xcol,ycol,vcol,title,xlabel,ylabel,vlabel_loc,vlabel,cmap_loc,cmap):
    xx, yy = np.meshgrid(
        np.arange(xmin, xmax, step), np.arange(ymax, ymin, -1 * step)
    )
    cs = plt.imshow(array,interpolation = None,extent = [xmin,xmax,ymin,ymax], vmin = vmin, vmax = vmax,cmap = cmap)
    plt.scatter(df[xcol],df[ycol],s=None,c=df[vcol],marker=None,cmap=cmap_loc,vmin=vmin,vmax=vmax,alpha=0.8,linewidths=0.8,
        edgecolors="black",)
    plt.title(title); plt.xlabel(xlabel); plt.ylabel(ylabel); plt.xlim(xmin, xmax); plt.ylim(ymin, ymax)
    cbar_loc = plt.colorbar(orientation="vertical",pad=0.08,ticks=[0, 1],
            format=mticker.FixedFormatter(['Shale','Sand'])); cbar_loc.set_label(vlabel_loc, rotation=270,labelpad=20)
    cbar = plt.colorbar(cs,orientation="vertical",pad=0.05); cbar.set_label(vlabel, rotation=270,labelpad=20)
    return cs

In [None]:
for i in range (0,ncut):
    plt.subplot(3,3,i+1) # 3 rows, 3 columns, i+1 is the index of the subplot
    GSLIB.pixelplt_st(ikmap[:,:,i],
                  df_grid['x'].min(),df_grid['x'].max(),df_grid['y'].min(),df_grid['y'].max(), #we have to use the actual min and max values
                  0.05,0,1,'IK','X (km)','Y (km)',
                  (str(feature) + str(i)), cmap_rainb); plt.show()

plt.subplots_adjust(left=0.0, bottom=0.0, right=1.0, top=1.1, wspace=0.2, hspace=0.1); 
plt.show()

### Post processing

In [None]:
UNEST = -1.0
zmin=0.0
zmax=5
ltail=2
ltpar=2.5
middle=1    # Straight Linear Interpolation:
mpar=0
utail=4 #1=power, #2 power with par, 3= lin, 4=hyperbo
utpar=1.5 # omega

In [None]:
# let's do posprocessing for every predicted grid cell,
# every grid cell prediction constists a list of predicted cdf values 

cdfval = 0.15
zval = UNEST

# make a copy of the grid to store the cdf values
ikmap_zvals = np.zeros((ny,nx,1))

# iterate over the grid cells and get the cdf values for each cell
for i in range(0,ny):
    for j in range(0,nx):
        zval = -1
        # print the grid cell value
        # print(f"Grid cell ({i}, {j}): {ikmap[i,j]}")
        zval = beyond2(ivtype,ncut,thresholds,ikmap[i,j],ncut,thresholds,probabilities,
                0,zmax,ltail,ltpar,middle,mpar,utail,utpar,zval,cdfval)
        # append the zval to the grid cell
        ikmap_zvals[i,j] = zval
        print(f"Grid cell ({i}, {j}): zval={zval}")



# zval = beyond2(ivtype,ncut,thresholds,ikmap[3],ng,gcut,probabilities,0,zmax,ltail,ltpar,middle,mpar,utail,utpar,UNEST,0.15)

In [None]:
#  calculate e-type and conditional variance for every grid cell
ikmap_etype = np.zeros((ny,nx,1))
ikmap_ecv = np.zeros((ny,nx,1))

for i in range(0,ny):
    for j in range(0,nx):
        # print the grid cell value
        # print(f"Grid cell ({i}, {j}): {ikmap[i,j]}")
        etype, ecv = calculate_etype_and_conditional_variance(ikmap[i,j], thresholds, 100, zmin, zmax, ltail, ltpar, middle, mpar, utail, utpar)
        # append the zval to the grid cell
        ikmap_etype[i,j] = etype
        ikmap_ecv[i,j] = ecv
        print(f"Grid cell ({i}, {j}): etype={etype}, ecv={ecv}")
        

In [None]:
# plot the etype
# get the min and max values of the grid
# fill unest with nan first
ikmap_etype[ikmap_etype == -1] = np.nan
ikmap_ecv[ikmap_ecv == UNEST] = np.nan
# get the min and max values of the grid
vmin = np.nanmin(ikmap_etype[:,:,0])
vmax = np.nanmax(ikmap_etype[:,:,0])

print(vmin,vmax)
plt.imshow(ikmap_etype[:,:,0],interpolation = None,extent = [xmin,xmax,ymin,ymax], vmin = 0, vmax = 3,
           cmap = cmap_rainb)
# add colorbar
plt.colorbar(orientation="vertical",pad=0.08,ticks=[vmin, vmax])
