# Auto-covariance, Correlogram and Semi-Variogram

## Basic Variography in Python

### Spatial Continuity

**Spatial Continuity** is the correlation between values over distance.

- No spatial continuity – no correlation between values over distance, random values at each location in space regardless of separation distance.

- Homogenous phenomenon have perfect spatial continuity, since all values as the same (or very similar) they are correlated.

## Autocovariance

Autocovariance is a measure of the variance within a dataset as a function of distance or lag. It helps in understanding how data points at different spatial locations relate to each other. The autocovariance for a lag 
$h$ is defined as:

$$
C(h) = \frac{1}{n-1} \sum_{i=1}^{n-h} \big( z(u_i) - \overline{Z}_{original} \big) \big( x(u_{i+h}) - \overline{Z}_{lagged} \big)
$$

where $z(u_i)$ is the value at location $u_i$, $\bar{Z}$ is the mean of the dataset, and $n$ is the total number of observations.


## Correlogram

A correlogram, or autocorrelation plot, shows the autocorrelation of a time or spatial series as a function of lag. It's a visual tool to identify the presence of spatial autocorrelation.

$$
r(h) = \frac{Cov(h)}{Cov(0)}= \frac{Cov(h)}{Variance}
$$

## Semi-Variogram

$$
\hat{\gamma}(h) = \frac{1}{2N(h)} \sum_{i=1}^{N(h)} [z(u_i) - z(u_{i+h})]^2
$$

## Examples


### Loading and Preprocessing Data

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os

Set the code location directory

In [2]:
# os.chdir(r'/Users/reidarbbratvold/Documents/UiS/Undervisning/2024/MOD550/Lectures/Python/code/')

Set global font sizes

### Calculating Auto-Covariance

To calculate autocovariance, we first need to define the function that computes autocovariance for a given lag.

### Parameters

- `data`: A sequence (list or array) of numerical data points

- `lag`: An integer indicating the time lag between the compared points in the series. A lag of 1 means each point is compared with the next one, a lag of 2 means each point is compared with the one two steps ahead, and so on.

### Process

- `n = len(data) - lag`: This line calculates the number of observation pairs that can be formed given the lag. Since each lag reduces the number of pairs by 'lag' units, we subtract the lag from the total length of the dataset.

- `mean_original = sum(data[:-lag]) / n`: Computes the mean of the dataset excluding the last 'lag' observations. This mean is used for the "original" part of the data pairs being compared.

- `mean_lagged = sum(data[lag:]) / n`: Computes the mean of the dataset excluding the first 'lag' observations. This mean is used for the "lagged" part of the data pairs.

- `covariance = sum((data[i] - mean_original) * (data[i + lag] - mean_lagged) for i in range(n)) / (n - 1)`: This line performs the core calculation of autocovariance. For each pair of observations separated by 'lag', it calculates the product of the deviations of each observation from their respective means (original and lagged). Summing these products and dividing by $n−1$ (where $n−1$ represents the sample size adjusted for degrees of freedom) gives the sample autocovariance.

### Example

#### Create a small dataset

In [3]:

# Define data as a dictionary
# Keys are column names, and values are lists of column data
data = {
    'Depth': [2040, 2041, 2042, 2043, 2044, 2045, 2046],
    'Porosity': [8.25, 9.00, 6.25, 5.00, 5.30, 4.75, 5.00]
}

# Create DataFrame
data_7 = pd.DataFrame(data)

# Display the DataFrame
print(data_7)


   Depth  Porosity
0   2040      8.25
1   2041      9.00
2   2042      6.25
3   2043      5.00
4   2044      5.30
5   2045      4.75
6   2046      5.00


In [4]:
array_7 = data_7['Porosity']
array_7

0    8.25
1    9.00
2    6.25
3    5.00
4    5.30
5    4.75
6    5.00
Name: Porosity, dtype: float64

#### Illustrate how to cut and slice the data in the array

In [5]:
array_7[:-1]

0    8.25
1    9.00
2    6.25
3    5.00
4    5.30
5    4.75
Name: Porosity, dtype: float64

In [6]:
array_7[:-2]

0    8.25
1    9.00
2    6.25
3    5.00
4    5.30
Name: Porosity, dtype: float64

In [7]:
array_7[1:]

1    9.00
2    6.25
3    5.00
4    5.30
5    4.75
6    5.00
Name: Porosity, dtype: float64

In [8]:
array_7[2:]

2    6.25
3    5.00
4    5.30
5    4.75
6    5.00
Name: Porosity, dtype: float64

#### Calculate the autocorvariance, correlogram and semi-variance for the specified lag

In [9]:

lag = 1
n = len(array_7) - lag  # Adjusted for the actual number of pairs used in calculation

# Calculate means
mean_original = np.mean(array_7[:-lag])
mean_lagged = np.mean(array_7[lag:])

# Calculate covariance for the specified lag
covariance = sum((array_7[i] - mean_original) * (array_7[i + lag] - mean_lagged) for i in range(n)) / (n - 1)

# Calculate variances
variance_original = sum((array_7[i] - mean_original)**2 for i in range(n)) / (n - 1)
variance_lagged = sum((array_7[i + lag] - mean_lagged)**2 for i in range(n)) / (n - 1)

# Calculate correlogram value (correlation coefficient for the specified lag)
correlation = covariance / (np.sqrt(variance_original) * np.sqrt(variance_lagged))

# Calculate semivariogram for the specified lag
semivariance = sum((array_7[i + lag] - array_7[i])**2 for i in range(n)) / (2*n)

print(f"Lag {lag}: Autocovariance = {covariance}")
print(f"Lag {lag}: Correlation = {correlation}")
print(f"Lag {lag}: Semivariance = {semivariance}")


Lag 1: Autocovariance = 2.0745
Lag 1: Correlation = 0.7161880442502433
Lag 1: Semivariance = 0.8452083333333333


![table](../figs/autocovar_example.png)

In [10]:
def calculate_autocovariance(data, col_name, max_lag):
    autocovariances = {}
    series = data[col_name]
    lags = range(max_lag +1)
    autocovariances[0] = series.var(ddof=1)  


# The End