### Disclaimer

The following notebook was compiled for the course 'Geostatistics' at Ghent University (lecturer-in-charge: Prof. Dr. Ellen Van De Vijver; teaching assistant: Pablo De Weerdt). It consists of notebook snippets created by Michael Pyrcz. The code and markdown (text) snippets were edited specifically for this course, using the 'Jura data set' (Goovaerts, 1997) as example in the practical classes. Some new code snippets are also included to cover topics which were not found in the Geostastpy package demo books.

This notebook is for educational purposes.<br> 

Guidelines for getting started were adapted from the 'Environmental Soil Sensing' course at Ghent University (lecturer-in-charge: Prof. Dr. Philippe De Smedt).<br> 

The Jura data set was taken from: Goovaerts P., 1997. Geostatistics for Natural Resources Evaluation. Oxford University Press.

**Don't forget to save a copy on your Google drive before starting**

You can also 'mount' your Google Drive in Google colab to directly access your Drive folders (e.g. to access data, previous notebooks etc.)

Do not hesitate to contact us for questions or feel free to ask questions during the practical sessions.

# Geostatistics: Introduction to geostatistical data analysis with Python

In [1]:
# Import required packages for setup
# -------------------------------------------- #

import sys
import os

In [2]:
# if you are not using Google Colab, change the path to the location of the repository
sys.path.append(r'c:\Users\pdweerdt\Documents\Repos\draft_E_I002454_Geostatistics')

In [3]:
#  Clone the repository and add it to the path
if 'google.colab' in sys.modules:
    !git clone https://github.com/SENSE-UGent/E_I002454_Geostatistics.git
    sys.path.append('/content/E_I002454_Geostatistics') #Default location in Google Colab after cloning
else:
    # if you are not using Google Colab, change the path to the location of the repository
    sys.path.append(r'c:\Users\pdweerdt\Documents\Repos\E_I002454_Geostatistics')

# Import the setup function
from Utils.setup import check_and_install_packages

# Read the requirements.txt file
if 'google.colab' in sys.modules:
    requirements_path = '/content/E_I002454_Geostatistics/Utils/requirements.txt'
else:
    requirements_path = 'c:/Users/pdweerdt/Documents/Repos/E_I002454_Geostatistics/Utils/requirements.txt'

with open(requirements_path) as f:
    required_packages = f.read().splitlines()

# Check and install packages
check_and_install_packages(required_packages)

#### Load Required libraries

In [4]:
import geostatspy
import geostatspy.GSLIB as GSLIB                              # GSLIB utilities, visualization and wrapper
import geostatspy.geostats as geostats                        # if this raises an error, you might have to check your numba isntallation   
print('GeostatsPy version: ' + str(geostatspy.__version__))   # these notebooks were tested with GeostatsPy version: 0.0.72

GeostatsPy version: 0.0.72


We will also need some standard packages. These should have been installed.

In [5]:
from tqdm import tqdm                                         # suppress the status bar
from functools import partialmethod

tqdm.__init__ = partialmethod(tqdm.__init__, disable=True)
                                   
import numpy as np                                            # ndarrays for gridded data
                                       
import pandas as pd                                           # DataFrames for tabular data

import matplotlib.pyplot as plt                               # for plotting

from scipy import stats                                       # summary statistics

plt.rc('axes', axisbelow=True)                                # plot all grids below the plot elements

ignore_warnings = True                                        # ignore warnings?
if ignore_warnings == True:                                   
    import warnings
    warnings.filterwarnings('ignore')

from IPython.utils import io                                  # mute output from simulation

seed = 42                                                     # random number seed

In [6]:
from ipywidgets import interactive                      # widgets and interactivity
from ipywidgets import widgets                            
from ipywidgets import Layout
from ipywidgets import Label
from ipywidgets import VBox, HBox

### Optional libraries

These are not required to run the given version of this practical exercise, but might be useful if you want to extend this notebook with more code.

In [7]:
#  import math library
import math

import cmath

In [8]:
from scipy.stats import pearsonr                              # Pearson product moment correlation
from scipy.stats import spearmanr                             # spearman rank correlation    
                                   
import seaborn as sns                                         # advanced plotting

import matplotlib as mpl                                        

from matplotlib.ticker import (MultipleLocator, AutoMinorLocator) # control of axes ticks
from matplotlib.colors import ListedColormap 
import matplotlib.ticker as mtick 
import matplotlib.gridspec as gridspec

### Set the Working Directory

Do this to simplify subsequent reads and writes (avoid including the full address each time). 

##### For use in Google Colab

Run the following cell if you automatically want to get the data from the repository and store it on your Google Colab drive

In [9]:
# change the working directory to the cloned repository

os.chdir('E_I002454_Geostatistics')

# get the current directory and store it as a variable

cd = os.getcwd()
print("Current Working Directory is " , cd)

FileNotFoundError: [WinError 2] The system cannot find the file specified: 'E_I002454_Geostatistics'

##### For local use

Only run the following cell if you have the data locally stored.

In [10]:
# set the working directory, place an r in front to address special characters
os.chdir(r'C:\\Users\\pdweerdt\\OneDrive - UGent\\I002454 - Geostatistics\\AY 2024-2025\\Practicals')

# get the current directory and store it as a variable

cd = os.getcwd()
print("Current Working Directory is " , cd)

Current Working Directory is  C:\Users\pdweerdt\OneDrive - UGent\I002454 - Geostatistics\AY 2024-2025\Practicals


### Loading Tabular & Gridded Data

Here's the section to load our data file into a Pandas' DataFrame object.

Let's load and visualize a grid also.

Check the datatype of your gridded data.

In this case it is actually also a .dat file, so we can use the same function to import it. The .grid extension was given to indicate that it is gridded data.

In [11]:
# Here you can adjust the relative Path to the data folder

data_path = cd + '/Hard_data' 

In [12]:
file_name = '//prediction.dat'

df = GSLIB.GSLIB2Dataframe(data_path + file_name) # read the data

df.head()

Unnamed: 0,Xloc,Yloc,Landuse,Rock,Cd,Co,Cr,Cu,Ni,Pb,Zn
0,2.386,3.077,3.0,3.0,1.74,9.32,38.32,25.72,21.32,77.36,92.56
1,2.544,1.972,2.0,2.0,1.335,10.0,40.2,24.76,29.72,77.88,73.56
2,2.807,3.347,2.0,3.0,1.61,10.6,47.0,8.88,21.4,30.8,64.8
3,4.308,1.933,3.0,2.0,2.15,11.92,43.52,22.7,29.72,56.4,90.0
4,4.383,1.081,3.0,5.0,1.565,16.32,38.52,34.32,26.2,66.4,88.4


### Define feature of interest

In [None]:
feature = 'Cd'
unit = 'ppm'
dist_unit = 'km'

In [14]:
#  define a colormap

cmap = plt.cm.inferno                                         # color map inferno

cmap_rainb = plt.cm.turbo # similar to what is shown on the slides

### Experimental Variograms

We can use the location maps to help determine good variogram calculation parameters. For example:

```p
tmin = -9999.; tmax = 9999.; 
lag_dist = 100.0; lag_tol = 50.0; nlag = 7; bandh = 9999.9; azi = azi; atol = 22.5
```
* **tmin**, **tmax** are trimming limits - set to have no impact, no need to filter the data
* **lag_dist**, **lag_tol** are the lag distance, lag tolerance - set based on the common data spacing and tolerance as 50% of lag distance to avoid overlapping or missing pairs
* **nlag** is number of lags -
* **bandh** is the horizontal band width - here set to have no effect
* **azi** is the azimuth -  it has not effect since we set atol, the azimuth tolerance, to 90.0 -> omnidirectional variogram

#### Dashboard for Interactive Variogram Calculation

Below we make a dashboard with the ipywidgets and matplotlib Python packages for calculating experimental variograms.

We can set the range of values that we want to explore per variogram parameter.

In [15]:
# interactive calculation of the experimental variogram
l = widgets.Text(value='                              Variogram Calculation Interactive Demonstration, Michael Pyrcz, Associate Professor, The University of Texas at Austin',layout=Layout(width='950px', height='30px'))

# set lag
lag = widgets.FloatSlider(
                        min = 0.1, max = 1, value = 0.1, step = 0.1, # optionally adjust the min, max and step size
                        description = 'lag',orientation='vertical',layout=Layout(width='90px', height='200px'),
                        continuous_update=False
                        )
lag.style.handle_color = 'gray'

# set lag tolerance
lag_tol = widgets.FloatSlider(
                            min = 0.01, max = 1, value = 0.01, step = 0.01, # optionally adjust the min, max and step size
                            description = 'lag tolerance',orientation='vertical',layout=Layout(width='90px', height='200px'),
                            continuous_update=False
                            )
lag_tol.style.handle_color = 'gray'

# set number of lags
nlag = widgets.IntSlider(
                        min = 1, max = 100, value = 1, step = 1, # optionally adjust the min, max and step size
                        description = 'number of lags',orientation='vertical',layout=Layout(width='90px', height='200px'),
                        continuous_update=False
                        )
nlag.style.handle_color = 'gray'

# set azimuth
azi = widgets.FloatSlider(
                        min = 0, max = 360, value = 0, step = 5, # optionally adjust the min, max and step size
                        description = 'azimuth',orientation='vertical',layout=Layout(width='90px', height='200px'),
                        continuous_update=False
                        )
azi.style.handle_color = 'gray'

# set azimuth tolerance
azi_tol = widgets.FloatSlider(
                            min = 10, max = 90, value = 90, step = 5, # optionally adjust the min, max and step size
                            description = 'azimuth tolerance',orientation='vertical',layout=Layout(width='120px', height='200px'),
                            continuous_update=False
                            )
azi_tol.style.handle_color = 'gray'

# set bandwidth
bandwidth = widgets.FloatSlider(
                                min = 0.1, max = 1000, value = 1000, step = 0.5, # optionally adjust the min, max and step size
                                description = 'bandwidth',orientation='vertical',layout=Layout(width='90px', height='200px'),
                                continuous_update=False
                                )
azi_tol.style.handle_color = 'gray'


ui1 = widgets.HBox([lag,lag_tol,nlag,azi,azi_tol,bandwidth],) # basic widget formatting    
ui = widgets.VBox([l,ui1],)

In [None]:
# function to take parameters, calculate variogram and plot

def f_make(lag,lag_tol,nlag,azi,azi_tol,bandwidth):     
    global lags,gammas,npps # define global variables, stored while tweaking the parameters
    tmin = -9999.9; tmax = 9999.9
    lags, gammas, npps = geostats.gamv(df,"Xloc","Yloc",feature,tmin,tmax,lag,lag_tol,nlag,azi,azi_tol,bandwidth, isill=None)
    
    # plot experimental variogram
    scatter = plt.scatter(lags,gammas,color = 'darkorange',edgecolor='black',s = npps*0.05,label = 'Azimuth ' +str(azi))

    plt.xlabel(r'Lag Distance $\bfh$ (' + dist_unit + ')')
    plt.ylabel(r'$\gamma \bf(h)$ (' + unit + '$^2$)')
    
    if azi_tol < 90.0:
        plt.title('Directional Variogram - Azi ' + str(azi))
    else:
        plt.title('Omnidirectional Variogram ')
    plt.xlim([0,5]); plt.ylim([0,1.8])

    plt.grid(True)
    
    legend = plt.legend(*scatter.legend_elements("sizes", num=6),loc='upper left')
    legend.set_title('Number of Pairs/20')
    
    plt.subplots_adjust(left=0.0, bottom=0.0, right=1.1, top=0.7, wspace=0.3, hspace=0.3)
    plt.show()

In [17]:
# connect the function to make the samples and plot to the widgets    
interactive_plot = widgets.interactive_output(f_make, {'lag':lag,'lag_tol':lag_tol,'nlag':nlag,'azi':azi,'azi_tol':azi_tol,'bandwidth':bandwidth})
interactive_plot.clear_output(wait = True)               # reduce flickering by delaying plot updating

In [18]:
# display the interactive plot
display(ui, interactive_plot)                             # display the interactive plot

VBox(children=(Text(value='                              Variogram Calculation Interactive Demonstration, Mich…

Output()

### Variogram modelling



Fit a positive definite variogram model 
* **nug**: nugget effect

* **c1 / c2**: contributions of the sill - note, **c1** is set at 1.0 - **nug** - **c2**

* **hmaj1 / hmaj2**: range in the major direction

* **hmin1 / hmin2**: range in the minor direction

#### Dashboard for Interactive Variogram Calculation

In [None]:
# interactive calculation of the sample set (control of source parametric distribution and number of samples)
l = widgets.Text(value='               Variogram Modeling, Michael Pyrcz, Professor, The University of Texas at Austin',layout=Layout(width='950px', height='30px'))

# set the nugget
nug = widgets.FloatSlider(min = 0.01, max = 1.0, value = 0.01, step = 0.01,
                          description = r'c_0',orientation='vertical',
                          layout=Layout(width='60px', height='200px')
                          )
nug.style.handle_color = 'gray'

it1 = widgets.Dropdown(options=['Spherical', 'Exponential', 'Gaussian'],value='Exponential',
    description=r'$Type_1$:',disabled=False,layout=Layout(width='200px', height='30px'))

# set the sill contribution
c1 = widgets.FloatSlider(min=0.001, max = 1.0, value = 0.001, step = 0.01,
                         description = r'c_1',orientation='vertical',
                         layout=Layout(width='60px', height='200px')
                         )
c1.style.handle_color = 'gray'

# set the range 
hmaj1 = widgets.FloatSlider(min=0.01, max = 6, value = 0.01, step = 0.01,
                            description = r'a_{1,maj}',orientation='vertical',
                            layout=Layout(width='60px', height='200px'))
hmaj1.style.handle_color = 'black'

# set the range
hmin1 = widgets.FloatSlider(min = 0, max = 6, value = 0.01, step = 0.01, description = r'a_{1,min}',orientation='vertical',layout=Layout(width='60px', height='200px'))
hmin1.style.handle_color = 'red'

ui9 = widgets.HBox([nug,it1,c1,hmaj1,hmin1],)                   # basic widget formatting   
ui10 = widgets.VBox([l,ui9],)


In [None]:
# functions to take parameters, make sample and plot to the widgets

def convert_type(it):
    if it == 'Spherical':
        return 1
    elif it == 'Exponential':
        return 2
    else: 
        return 3

def f_make_omni_mod(nug,it1,c1, hmaj1,hmin1):                       # function to take parameters, make sample and plot
    azimuth = azi.value
    it1 = convert_type(it1)
    nst = 1
    
    vario = GSLIB.make_variogram(nug,nst,it1,c1,0.0,hmaj1,hmin1) # make model object
    nlag = 100000; xlag = 0.0001;           # lags for model plotting (not the same as experimental variogram lags!)
    index_maj,h_maj,gam_maj,cov_maj,ro_maj = geostats.vmodel(nlag,xlag,0.0,vario)   # project the model in the major azimuth

    # plot experimental variogram
    plt.scatter(lags,gammas,color = 'black',s = npps*0.03,label = 'Major Azimuth ' +str(azimuth), alpha = 0.8,zorder=10)
    plt.plot(h_maj,gam_maj,color = 'black',lw=3,zorder=10)

    plt.xlabel(r'Lag Distance $\bfh$ (' + dist_unit + ')')
    plt.ylabel(r'$\gamma \bf(h)$ (' + unit + '$^2$)')
    
    if azi_tol.value < 90.0:
        plt.title('Major Directional ' + feature + ' Variogram - Azi. ' + str(azimuth))
    else: 
        plt.title('Omni Directional ' + feature + ' Variogram ')

    plt.xlim([0,5]); plt.ylim([0, 1.8])
    plt.legend(loc="upper left")
    plt.grid(True)

    plt.subplots_adjust(left=0.0, bottom=0.0, right=1.1, top=0.7, wspace=0.3, hspace=0.3)
    plt.show()

In [21]:
# connect the function to make the samples and plot to the widgets    
interactive_plot2 = widgets.interactive_output(f_make_omni_mod, {'nug':nug, 'it1':it1, 'c1':c1, 'hmaj1':hmaj1, 'hmin1':hmin1})
interactive_plot2.clear_output(wait = True)               # reduce flickering by delaying plot updating  

In [22]:
display(ui10, interactive_plot2)                           # display the interactive plot

VBox(children=(Text(value='               Variogram Modeling, Michael Pyrcz, Professor, The University of Texa…

Output()

If you change parameters for the experimental variogramm