## Spatial Data Analytics 

### Interactive Demonstration of Variogram h-Scatterplots 

#### Reidar B Bratvold, Professor, University of Stavanger

This a simple demonstration of the variogram h-scatterplot for a **1D dataset** with variable spatial continuity and visualization.

* we will see the correlogram (equal to the covariance function when the sill, variance is 1.0) is the correlation coefficient of the h-scatterplot. 

* there is some deviation due to the lag effect, the edge effect with variogram calculation that excludes some of the data (e.g., at large lags only the samples at the edges of the area of interest are included in the pairs)

* we will perform the calculations in 1D for fast run times and ease of visualization.

#### Load the required libraries

In [None]:
supress_warnings = True
import os
import numpy as np                                          # arrays
import matplotlib.pyplot as plt                             # plotting
from matplotlib.ticker import (MultipleLocator, AutoMinorLocator) # control of axes ticks
import pandas as pd                                         # dataframes          
from ipywidgets import interactive                          # plotting widgets and interactivity
from ipywidgets import widgets                            
from ipywidgets import Layout
from ipywidgets import Label
from ipywidgets import VBox, HBox
cmap = plt.cm.inferno                                     # default color bar, no bias and friendly for color vision defeciency
plt.rc('axes', axisbelow=True)                            # grid behind plotting elements
if supress_warnings == True:
    import warnings                                       # supress any warnings for this demonstration
    warnings.filterwarnings('ignore')       

#### Create function for adding grid to a matplotlib plot.

In [None]:

def add_grid():
    """
    Adds a grid to the current matplotlib plot.

    - Enables both major and minor grids.
    - Sets major grid lines with a linewidth of 1.0.
    - Sets minor grid lines with a linewidth of 0.2.
    - Adjusts tick parameters: 
      - Major ticks have a length of 7.
      - Minor ticks have a length of 4.
    - Enables minor ticks on both x and y axes.

    Example usage:
        plt.plot([1, 2, 3], [4, 5, 6])
        add_grid()
        plt.show()
    """
    plt.gca().grid(True, which='major', linewidth=1.0)  # Add major grid
    plt.gca().grid(True, which='minor', linewidth=0.2)  # Add minor grid
    plt.gca().tick_params(which='major', length=7)  # Set major tick length
    plt.gca().tick_params(which='minor', length=4)  # Set minor tick length
    plt.gca().xaxis.set_minor_locator(AutoMinorLocator())  # Enable minor x-ticks
    plt.gca().yaxis.set_minor_locator(AutoMinorLocator())  # Enable minor y-ticks

#### Load the Dataset

It is a small 1D dataset.


In [None]:
df = pd.read_csv('1D_Porosity.csv')
npor = df['Nporosity']
df.head()

#### Create a affine function

We apply an affine transformation to ensure a dataset has a variance of 1.0.
An affine transformation is of the form:

$X^{\prime} = aX + b$

where:

- $a$  is a scaling factor (used to adjust variance),

- $b$  is a shifting factor (used to adjust mean, often to zero).

To make the function fully affine, we should also allow for shifting ($b$), typically to center the data at zero:


$X^{\prime} = \frac{X - \mu}{\sigma}$


where:

- $\mu = \text{mean}(X)$  (shifts the mean to 0),

- $\sigma = \sqrt{\text{Var}(X)}$  (scales the variance to 1).

This is actually standardization (z-score normalization), which is an affine transformation.

Affine Function (Variance = 1, Mean = 0)

In [None]:

def affine(data):
    """
    Applies an affine transformation to ensure the dataset has mean = 0 and variance = 1.

    Parameters:
    data (array-like): Input dataset (1D NumPy array or list).

    Returns:
    numpy.ndarray: Transformed dataset with mean = 0 and variance = 1.
    """
    data = np.array(data)  # Convert input to NumPy array
    mean_x = np.mean(data)  # Compute mean
    var_x = np.var(data, ddof=0)  # Compute variance
    
    if var_x == 0:
        raise ValueError("Variance is zero; affine transformation is not possible.")
    
    a = 1 / np.sqrt(var_x)  # Scaling factor
    b = -mean_x / np.sqrt(var_x)  # Shifting factor to ensure mean = 0
    
    transformed_data = a * data + b  # Apply affine transformation

    return transformed_data


In [None]:
# Apply affine transformation
aff_npor = affine(npor) # ensure variance is 1.0 for results to work below

In [None]:
# Check the results
print("Original Mean:", np.mean(npor), "Variance:", np.var(npor, ddof=0))
print("Transformed Mean:", np.mean(aff_npor), "Variance:", np.var(aff_npor, ddof=0))

In [None]:
# Create figure and axis with specified figsize
fig, ax = plt.subplots(figsize=(12, 8))

# Plot the porosity data
ax.plot(aff_npor, color='red', linestyle='--', alpha=0.2, zorder=1, label="Porosity Trend")
ax.scatter(np.arange(len(aff_npor)), aff_npor, color='red', edgecolor='black', zorder=2, label="Data Points")

# Set labels and title
ax.set_xlabel('Depth (m)', fontsize=12)
ax.set_ylabel(r'Standardized Porosity, $\overline{x} = 0.0$, $s_x = 1.0$', fontsize=12)
ax.set_title('Porosity for a Single Vertical Well', fontsize=14)

# Set limits
ax.set_xlim([0, 39])
ax.set_ylim([-2.5, 2.5])

# Add grid
ax.grid(True, linestyle='--', alpha=0.5)

# Add legend
ax.legend()

# Show the plot
plt.show()

Notice that we ensured that the dataset variance is 1.0 as we assume this to calculate the correlogram below.

#### Interactive Interface

Here's the interactive interface. I calculate the variogram, plot the h-scatterplot and calculate and annotate the correlogram / h-scatterplot correlation coefficient.  

* the user specifies lag to investigate

### Interactive Variogram h-scatterplot Demonstration 

Change the number of sample data, train/test split and the data noise and observe overfit! Change the model order to observe a specific model example.

### The Inputs

* **lag** - the lag number to calculate, h = lag $\times$ data spacing

In [None]:

import ipywidgets as widgets
from ipywidgets import Layout
from matplotlib.ticker import AutoMinorLocator

# Load dataset
df = pd.read_csv('https://raw.githubusercontent.com/GeostatsGuy/GeoDataSets/master/1D_Porosity.csv')
npor = df['Nporosity']

# Define function to add grid with minor ticks
def add_grid(ax):
    ax.grid(True, which="major", linewidth=1.0, linestyle="--", alpha=0.5)
    ax.grid(True, which="minor", linewidth=0.5, linestyle=":", alpha=0.3)
    ax.xaxis.set_minor_locator(AutoMinorLocator())
    ax.yaxis.set_minor_locator(AutoMinorLocator())

# Interactive widgets
title_widget = widgets.HTML(
    value="<h3>Variogram h-Scatterplot Demonstration</h3>",
    layout=Layout(width="1000px")
)

lag_slider = widgets.IntSlider(
    min=1, max=40, value=5, step=1,
    description="Lag",
    orientation="horizontal",
    style={"description_width": "initial"},
    layout=Layout(width="1000px"),
    continuous_update=False
)

ui_controls = widgets.VBox([title_widget, lag_slider])

# Function to generate plots
def run_plot(lag):
    size = 0.25
    
    # Compute variogram values
    gamma_all = []
    num_pairs_all = []
    for ilag in range(40):
        valid_diffs = (npor - npor.shift(ilag)).dropna()
        num_pairs_all.append(len(valid_diffs))
        gamma_all.append(np.average(np.square(valid_diffs)) * 0.5)

    gamma = gamma_all[lag]
    
    # Compute correlation coefficient
    aff_npor = pd.Series(npor) 
    aff_npor_shift = aff_npor.shift(lag)
    valid_idx = aff_npor_shift.dropna().index
    correl = np.round(np.corrcoef(aff_npor[valid_idx], aff_npor_shift[valid_idx]), 2)[0, 1]

    # Create figure with 2 side-by-side subplots
    fig, axes = plt.subplots(1, 2, figsize=(14, 6))

    # **Left Plot: Experimental Variogram**
    ax1 = axes[0]
    scatter = ax1.scatter(np.arange(40) * size, gamma_all, s=num_pairs_all, color="red", edgecolor="black")
    ax1.scatter(lag * size, gamma, color="darkorange", edgecolor="black", s=40, zorder=10)
    
    # Reference lines
    ax1.axvline(lag * size, color="black", linestyle="--", zorder=1)
    ax1.axhline(gamma, color="black", linestyle="--", zorder=1)
    
    # Formatting
    ax1.set_xlim(0, 10)
    ax1.set_ylim(0, 3)
    ax1.set_xlabel(r"$\bf{h}$ Lag Distance")
    ax1.set_ylabel(r"$\gamma(\bf{h})$ Variogram")
    ax1.set_title("Experimental Variogram")
    add_grid(ax1)
    
    # Annotations
    ax1.annotate(r"$\gamma(\bf{h}) =$ " + str(np.round(gamma, 2)), (lag * size + 0.2, gamma / 2))
    ax1.annotate(r"$\bf{h} =$ " + str(lag * size), (lag * size - 0.3, 0.03), rotation=90)
    ax1.annotate(r"$\sigma^2 - \gamma(\bf{h}) =$ " + str(np.round(1.0 - gamma, 2)), (lag * size + 0.2, (gamma + 1.0) / 2), color="red")

    # Legend for number of pairs
    legend = ax1.legend(*scatter.legend_elements("sizes", num=4), loc="upper left")
    legend.set_title("Number of Pairs")

    # **Right Plot: h-Scatter Plot**
    ax2 = axes[1]
    ax2.scatter(npor, aff_npor_shift, color="darkorange", edgecolor="black", s=20, label="Pairs")
    ax2.plot([-3, 3], [-3, 3], color="black", linestyle="--")  # Identity line
    
    # Formatting
    ax2.set_xlim(-3, 3)
    ax2.set_ylim(-3, 3)
    ax2.set_xlabel(r"$Z(\bf{u})$ Tail")
    ax2.set_ylabel(r"$Z(\bf{u}+ \bf{h})$ Head")
    ax2.set_title(r"h-Scatter Plot, lag = " + str(lag) + r", $\bf{h} =$ " + str(lag * size))
    ax2.annotate(r"$\rho_{Z(\bf{u}),Z(\bf{u} + \bf{h})}$ = " + str(correl), (1.0, -2.5), fontsize=12)
    add_grid(ax2)

    plt.tight_layout()
    plt.show()

# Connect function to widgets
interactive_plot = widgets.interactive_output(run_plot, {"lag": lag_slider})
interactive_plot.clear_output(wait=True)


In [None]:

# Display the UI and interactive plot
display(ui_controls, interactive_plot)

# The End