# semivariance module

Available classes and functions:

- ArealSemivariance: Class calculates semivariance of areas for Poisson Kriging (area to area and area to point),
- RegularizedSemivariogram: Class performs deconvolution of semivariogram of areal data,
- calculate_covariance: Function calculates covariance of a given set of points,
- calculate_semivariance: Function calculates semivariance of a given set of points,
- calculate_weighted_semivariance: Function calculates weighted semivariance,
- calculate_directional_semivariogram: Function calculates semivariogram within specified ellipse,
- build_variogram_point_cloud: Function creates OrderedDict with lags and variances between points within specific lag,
- show_variogram_cloud: function shows boxplots of lags and squared differences between ponts' values within specific lag,
- calc_semivariance_from_pt_cloud: based on Point Cloud semivariogram is calculated,
- remove_outliers: removes outliers from point cloud variogram,
- TheoreticalSemivariogram: Class calculates theoretical semivariogram.

## ```ArealSemivariance```

### Class initialization

```python
pyinterpolate.semivariance.ArealSemivariance(
    areal_data,
    areal_step_size,
    max_areal_range,
    areal_points_data,
    weighted_semivariance=False,
    verbose=False)

```

Class calculates semivariance of areas for Poisson Kriging (area to area and area to point).


INITIALIZATION PARAMS:

- **areal_data**: (_numpy array_ / _list_) 

```python
[area_id, area_geometry, centroid x,centroid y, value]
```
- **areal_step_size**: (_float_) step size for search radius,
- **max_areal_range**: (*float*) max distance to perform distance and semivariance calculations,
- **areal_points_data**: (_numpy array_ / _list_)

```python
[
    area_id,
    [point_position_x, point_position_y, value]
]
```
- **weighted_semivariance**: (_bool_) if ```False``` then each distance is treated equally when calculating theoretical semivariance; if ```True``` then semivariances closer to the point of origin have more weight,
- **verbose**: (_bool_) if ```True``` then all messages are printed, otherwise nothing.

### Class public methods:

- **regularize_semivariogram**: Function calculates regularized point support semivariogram,
- **show_semivariograms**: Function shows semivariograms calculated by the class: Empirical semivariogram, Theoretical model, Inblock Semivariance, Within-block semivariogram, Between blocks semivariogram, Regularized output.

---

### ```ArealSemivariance.regularize_semivariogram()```

```python
ArealSemivariance.regularize_semivariogram(
    self,
    within_block_semivariogram=None,
    between_blocks_semivariogram=None,
    empirical_semivariance=None,
    theoretical_semivariance_model=None)
```

Function calculates regularized point support semivariogram in the form given in:

> Goovaerts P., Kriging and Semivariogram Deconvolution in the Presence of Irregular Geographical Units, Mathematical Geology 40(1), 101-128, 2008

Function has the form:

$$\gamma_{v(h)} = \gamma(v, v_h) - \gamma_{h}(v, v)$$

where:

- $\gamma_{v(h)}$ - regularized semivariogram,
- $\gamma(v, v_h)$ - semivariogram value between any two blocks separated by the distance $h$,
- $\gamma_{h}(v, v)$ - arithmetical average of within-block semivariogram.

INPUT:

- **within_block_semivariogram**: (_numpy array_) mean semivariance between the blocks:

$$ \gamma_{h}(v, v) = \frac{1}{2*N(h)} \sum^{N(h)}_{a=1} [\gamma(v\alpha, v\alpha) + \gamma(v\alpha+h, v\alpha+h)]$$

where:

$\gamma(v\alpha, v\alpha)$ and $\gamma(v\alpha+h, v\alpha+h)$ are the inblock semivariances of block $\alpha$ and block $\alpha+h$ separated by the distance $h$ weighted by the inblock population.

- **between_blocks_semivariogram**: (_numpy array_) semivariance between all blocks calculated from the theoretical model,
- **empirical_semivariance**: (_numpy array_) empirical semivariance between area centroids, ```default=None```, if ```None``` is provided then empirical semivariance is computed by the ```_calculate_empirical_semivariance``` method from area centroids,
- **theoretical_semivariance_model**: (_TheoreticalSemivariogram_) theoretical semivariance model from ```TheoreticalSemivariance``` class, default is ```None```, if ```None``` is provided then theoretical model is derived from area centroids and empirical semivariance.

OUTPUT:

- **semivariance**: (```numpy array```) of lag, semivariance values and number of areas within lag where: 

```semivariance[0] = array of lags```;

```semivariance[1] = array of lag's values```; 

```semivariance[2] = array of number of points in each lag```.

---

### ```ArealSemivariance.show_semivariograms()```

```python
ArealSemivariance.show_semivariograms(self)
```

Function shows semivariograms calculated by the class: Empirical semivariogram, Theoretical model, Inblock Semivariance, Within-block semivariogram, Between blocks semivariogram, Regularized output.

***

## ```RegularizedSemivariogram```

### Class initialization

```python
pyinterpolate.semivariance.RegularizedSemivariogram(self)

```

Class performs deconvolution of semivariogram of areal data. Whole procedure is based on the iterative process described in: 
    
> Goovaerts P., Kriging and Semivariogram Deconvolution in the Presence of Irregular Geographical Units, Mathematical Geology 40(1), 101-128, 2008

Class works as follow:

- initialize your object (no parameters),
- then use ```fit()``` method to build initial point support model,
- then use ```transform()``` method to perform semivariogram regularization,
- save semivariogram model with ```export_model()``` method.

### Class public methods:

- **fit** - fits areal data and point support data into a model, initialize experimental semivariogram, theoretical semivariogram model, regularized point support model and deviation.
- **transform** - performs semivariogram regularization, which is an iterative process,
- **export_regularized_model** - Function exports final regularized model parameters into specified csv file.
- **show_baseline_semivariograms** - Function shows experimental semivariogram, initial theoretical semivariogram and initial regularized semivariogram after fit() operation.
- **show_semivariograms** - plots experimental semivariogram of area data, theoretical curve of area data, regularized model values and regularized model theoretical curve.

---

### ```RegularizedSemivariogram.fit()```

```python
RegularizedSemivariogram.fit(self,
                             areal_data,
                             areal_step_size,
                             max_areal_range,
                             point_support_data,
                             weighted_lags=True,
                             store_models=False)
```

Function fits area and point support data to the initial regularized models.

INPUT:

- **areal_data**: (_numpy array_) areal data prepared with the function ```prepare_areal_shapefile()```, where data is a ```numpy array```in the form: 

```python
[area_id, area_geometry, centroid x, centroid y, value]
```
- **areal_step_size**: (_float_) step size between each lag, usually it is a half of distance between lags,
- **max_areal_range**: (*float*) max distance to perform distance and semivariance calculations,
- **point_support_data**: (_numpy array_) point support data prepared with the function ```get_points_within_area()```, where data is a ```numpy array``` in the form:

```python
[
    area_id,
    [point_position_x, point_position_y, value]
]
```
- **weighted_lags**: (_bool_) lags weighted by number of points; if ```True``` then during semivariogram fitting error of each model is weighted by number of points for each lag. In practice it means that more reliable data (lags) have larger weights and semivariogram is modeled to better fit to those lags,
- **store_models**: (_bool_) if ```True``` then experimental, regularized and theoretical models are stored in lists after each iteration. It is important for a debugging process.

OUTPUT:

None, class is updating its internal parameters. Usually after fitting you should perform regularization with ```transform()``` method.

---

### ```RegularizedSemivariogram.transform()```

```python
RegularizedSemivariogram.transform(self,
                                   max_iters=25,
                                   min_deviation_ratio=0.01,
                                   min_diff_decrease=0.01,
                                   min_diff_decrease_reps=3)
```

Function transofrms fitted data and performs semivariogram regularization iterative procedure.

INPUT:

- **max_iters**: (_int_) maximum number of iterations,
- **min_deviation_ratio**: (_float_) minimum ratio between deviation and initial deviation (D(i) / D(0)) below each algorithm is stopped,
- **min_diff_decrease**: (_float_) minimum absolute difference between new and optimal deviation divided by optimal deviation: ABS(D(i) - D(opt)) / D(opt). If it is recorded ```n``` times (controled by the ```min_diff_d_stat_reps``` param) then algorithm is stopped,
- **min_diff_decrease_reps**: (_int_) number of iterations when algorithm is stopped if condition ```min_diff_d_stat``` is fulfilled.

OUTPUT:

None, class is updating its internal parameters. Usually after transforming you should export your theoretical model with ```export_regularized_model()``` method.

---

### ```RegularizedSemivariogram.export_regularized_model()```

```python
RegularizedSemivariogram.export_regularized_model(self,
                                                  filename)
```

Function exports final regularized model parameters into specified csv file.

INPUT:

- **filename**: (_str_) filename for model parameters (nugget, sill, range, model type).

OUTPUT:

Method saves regularized model into csv file.

---

### ```RegularizedSemivariogram.show_baseline_semivariograms()```

```python
RegularizedSemivariogram.show_baseline_semivariograms(self)
```

Function shows experimental semivariogram, initial theoretical semivariogram and initial regularized semivariogram after ```fit()``` operation.

---

### ```RegularizedSemivariogram.show_semivariograms()```

```python
RegularizedSemivariogram.show_semivariograms(self)
```

Function shows experimental semivariogram, theoretical semivariogram and regularized semivariogram after semivariogram regularization with ```transform()``` method.

***

## ```calculate_covariance()```

```python
pyinterpolate.semivariance.calculate_covariance(
    data,
    step_size,
    max_range)
```

Function calculates covariance of a given set of points.

Equation for calculation is:

$$covariance = \frac{1}{N} * \sum_{i=1}^{N} [z(x_{i} + h) * z(x_{i})] - u^{2}$$

where:

$N$ - number of observation pairs,

$h$ - distance (lag),

$z(x_{i})$ - value at location $z_{i}$,

$(x_{i} + h)$ - location at a distance $h$ from $x_{i}$,

$u$ - mean of observations at a given lag distance.


INPUT:

- **data**: (_numpy array_) coordinates and their values,
- **step_size**: (_float_) distance between lags within each points are included in the calculations,
- **max_range**: (*float*) maximum range of analysis.


OUTPUT:

- (```numpy array```) covariance - array of pair of lag and covariance values where:

```covariance[0] = array of lags```;

```covariance[1] = array of lag's values```;

```covariance[2] = array of number of points in each lag```.

***

## ```calculate_semivariance()```

```python
pyinterpolate.semivariance.calculate_semivariance(
    data,
    step_size,
    max_range)
```

Function calculates semivariance of a given set of points.

Equation for calculation is:

$$semivariance = \frac{1}{2N} * \sum_{i=1}^{N} [z(x_{i} + h) - z(x_{i})]^{2}$$

where:

$N$ - number of observation pairs,

$h$ - distance (lag),

$z(x_{i})$ - value at location $z_{i}$,

$(x_{i} + h)$ - location at a distance $h$ from $x_{i}$,


INPUT:

- **data**: (_numpy array_) coordinates and their values,
- **step_size**: (_float_) distance between lags within each points are included in the calculations,
- **max_range**: (*float*) maximum range of analysis.


OUTPUT:

- (```numpy array```) semivariance - array of pair of lag and semivariance values where:

```semivariance[0] = array of lags```;

```semivariance[1] = array of lag's values```;

```semivariance[2] = array of number of points in each lag```.

***

## ```calculate_weighted_semivariance()```

```python
pyinterpolate.semivariance.calculate_weighted_semivariance(
    data,
    step_size,
    max_range)
```

Function calculates weighted semivariance following _Monestiez et al._:

> A. Monestiez P, Dubroca L, Bonnin E, Durbec JP, Guinet C: Comparison of model based geostatistical methods in ecology: application to fin whale spatial distribution in northwestern Mediterranean Sea. In Geostatistics Banff 2004 Volume 2. Edited by: Leuangthong O, Deutsch CV. Dordrecht, The Netherlands, Kluwer Academic Publishers; 2005:777-786.


> B. Monestiez P, Dubroca L, Bonnin E, Durbec JP, Guinet C: Geostatistical modelling of spatial distribution of Balenoptera physalus in the northwestern Mediterranean Sea from sparse count data and heterogeneous observation efforts. Ecological Modelling 2006 in press.

Equation for calculation is:

$$s(h) = \frac{1}{2*\sum_{a=1}^{N(h)} c_{a}} * \sum_{a=1}^{N(h)} c_{a}*(z(u_{a}) - z(u_{a} + h))^2 - m'$$

where:

$$c_{a} = \frac{n(u_{a}) * n(u_{a} + h)}{n(u_{a}) + n(u_{a} + h)}$$

where:

$s(h)$ Semivariogram of the risk,

$n(u_{a})$ - size of the population at risk in the unit a,

$z(u_{a})$ - mortality rate at the unit a,

$u_{a} + h$ - area at the distance (h) from the analyzed area,

$m'$ - population weighted mean of rates.

INPUT:

- **data**: (_numpy array_) coordinates and their values and weights:

```python
[coordinate x, coordinate y, value, weight]
```
- **step_size**: (_float_) distance between lags within each points are included in the calculations,
- **max_range**: (*float*) maximum range of analysis.


OUTPUT:

- (```numpy array```) semivariance - array of pair of lag and semivariance values where:

```semivariance[0] = array of lags```;

```semivariance[1] = array of lag's values```;

```semivariance[2] = array of number of points in each lag```.

***

## ```calculate_directional_semivariogram()```

```python
pyinterpolate.semivariance.calculate_directional_semivariogram(
    data,
    step_size,
    max_range,
    direction=0,
    tolerance=0.1)
```

Function calculates directional semivariogram of points. Semivariance is calculated as:

$$semivariance = \frac{1}{2N} * \sum_{i=1}^{N} [z(x_{i} + h) - z(x_{i})]^{2}$$

where:

- $N$ - number of observation pairs,
- $h$ - distance (lag),
- $z(x_{i})$ - value at location $z_{i}$,
- $(x_{i} + h)$ - location at a distance $h$ from $x_{i}$.

INPUT:

- **data**: (*numpy array8) coordinates and their values,
- **step_size**: (*float*) distance between lags within each points are included in the calculations,
- **max_range**: (*float*) maximum range of analysis,
- **direction**: (*float*) direction of semivariogram, values from 0 to 360 degrees:

-- 0 or 180: is NS direction,

-- 90 or 270 is EW direction,

-- 30 or 210 is NE-SW direction,

-- 120 or 300 is NW-SE direction,

- tolerance: (float) value in range (0-1) normalized to [0 : 0.5] to select tolerance of semivariogram. If tolerance
is 0 then points must be placed at a single line with beginning in the origin of coordinate system and angle
given by y axis and direction parameter. If tolerance is greater than 0 then semivariance is estimated
from elliptical area with major axis with the same direction as the line for 0 tolerance and minor axis
of a size:

$$(tolerance * step\_size)$$

and major axis (pointed in NS direction):

$$((1 - tolerance) * step\_size)$$

and baseline point at a center of ellipse. Tolerance == 1 (normalized to 0.5) creates omnidirectional semivariogram.

OUTPUT:

- (*numpy array*) **semivariance** - array of pair of lag and semivariance values where:

-- semivariance[0] = array of lags;

-- semivariance[1] = array of lag's values;

-- semivariance[2] = array of number of points in each lag.

***

## ```build_variogram_point_cloud()```

```python
pyinterpolate.semivariance.build_variogram_point_cloud(
    data,
    step_size,
    max_range)
```

Function calculates variogram point cloud of a given set of points for a given set of distances. Variogram is calculated as a squared difference of each point against other point within range specified by step_size parameter. 

INPUT:

- **data**: (*numpy array*) coordinates and their values and weights:

```python
[coordinate x, coordinate y, value, weight]
```
- **step_size**: (*float*) distance between lags within each points are included in the calculations,
- **max_range**: (*float*) maximum range of analysis.


OUTPUT:

- (*OrderedDict*) variogram_cloud - dict with pairs {lag: list of squared differences}.

---

## ```show_variogram_cloud()```

```python
pyinterpolate.semivariance.show_variogram_cloud(
    variogram_cloud,
    figsize=None)
```

Function shows boxplots of variogram lags. It is especially useful when you want to check outliers in your dataset.

INPUT:

- **variogram_cloud**: (*OrderedDict*) lags and halved squared differences between points,
- **figsize**: (*tuple*), default is `None`.

---

## ```calc_semivariance_from_pt_cloud()```

```python
pyinterpolate.semivariance.calc_semivariance_from_pt_cloud(
    pt_cloud_dict)
```

Function calculates experimental semivariogram from point cloud variogram.

INPUT:

- **pt_cloud_dict**: (*OrderedDict*) {lag: [values]}.

OUTPUT:

- (*numpy array*) [lag, semivariance, number of points].

---

## ```remove_outliers()```

```python
pyinterpolate.semivariance.remove_outliers(data_dict,
                                           exclude_part='top',
                                           weight=1.5)
```

Function removes outliers from the variogram point cloud for each lag and returns dict without extreme values from the top, bottom or both parts of the variogram point cloud for a given lag. Algorithm uses quartiles to remove outliers:

(1)
$$BottomOutlier < Q1 - w*(Q3-Q1)$$

(2)
$$Q3 + w*(Q3-Q1) < TopOutlier$$

where:

- $Q1$ - 1st quantile (25%)
- $Q3$ - 3rd quantile (75%)
- $w$ - weight associated with the algorithm, larger weight => less number of values treated as an outlier.


INPUT:

- **data_dict**: (*OrderedDict*) with {lag: list of values},
- **exclude_part**: (*str*) default = `'top'`, available `'top'`, `'both'` or `'bottom'` - part of the variogram point cloud which is excluded from a given lag.
- **weight**: (*float*) default=1.5, affects number of values which are removed.

OUTPUT:

- (*OrderedDict*) {lag: [variances between point pairs within a given lag]}

---

## ```TheoreticalSemivariogram```

### Class initialization

```python
pyinterpolate.semivariance.TheoreticalSemivariogram(
    self,
    points_array=None,
    empirical_semivariance=None,
    verbose=False)

```

Class calculates theoretical semivariogram.

Available theoretical models:

    - spherical_model(distance, nugget, sill, semivar_range)
    - gaussian_model(distance, nugget, sill, semivar_range)
    - exponential_model(distance, nugget, sill, semivar_range)
    - linear_model(distance, nugget, sill, semivar_range)
    - cubic_model(distance, nugget, sill, semivar_range)
    - circular_model(distance, nugget, sill, semivar_range)
    - power_model(distance, nugget, sill, semivar_range)


INITIALIZATION PARAMS:

- **points_array**: (_numpy array_) analysed points where the last column is representing values, typically x, y, value, 
- **empirical_semivariance**: (_numpy array_) semivariance where first row of array represents lags and the second row represents semivariance's values for given lag.

### Class public methods:

- **fit_semivariance**: Method fits experimental points into chosen semivariance model type,
- **find_optimal_model**: Method fits experimental points into all available models and choose one with the lowest error,
- **export_model**: Function exports semivariance model to the csv file,
- **import_model**: Function imports semivariance model and updates it's parameters,
- **export_semivariance**: Method exports theoretical semivariance and experimental semivariance to csv file,
- **show_experimental_semivariogram**:  Function shows experimental semivariogram of a given model,
- **show_semivariogram**: Function shows experimental and theoretical semivariogram in one plot.

---

### ```TheoreticalSemivariogram.fit_semivariance()```

```python
TheoreticalSemivariogram.fit_semivariance(
    self,
    model_type,
    number_of_ranges=16)
```

Method fits experimental points into chosen semivariance model type.

INPUT:

- **model_type**: (_str_) 'exponential', 'gaussian', 'linear', 'spherical',
- **number_of_ranges**: (_int_) deafult = 16. Used to create an array of equidistant ranges between minimal range of empirical semivariance and maximum range of empirical semivariance.

OUTPUT:

- (model_type, model parameters)

---

### ```TheoreticalSemivariogram.find_optimal_model()```

```python
TheoreticalSemivariogram.find_optimal_model(
    self,
    weighted=False,
    number_of_ranges=16)
```

Method fits experimental points into all available models and choose one with the lowest error.

INPUT:

- **weighted**: (_bool_) default=```False```. If ```True``` then each lag is weighted by:

$$\frac{\sqrt{N(h)}}{\gamma_{experimental}(h)}$$

where:

$N(h)$ - number of point pairs in a given range,

$\gamma_{experimental}(h)$ - value of experimental semivariogram for $h$.
- **number_of_ranges**: (_int_) default=16. Used to create an array of equidistant ranges between minimal range of empirical semivariance and maximum range of empirical semivariance.

OUTPUT:

- model_type

Function updates class parameters with model properties.

---

### ```TheoreticalSemivariogram.export_model()```

```python
TheoreticalSemivariogram.export_model(self, filename)
```

Function exports semivariance model to the csv file. Columns of csv file are: name, nugget, sill, range, model_error.

---

### ```TheoreticalSemivariogram.import_model()```

```python
TheoreticalSemivariogram.import_model(self, filename)
```

Function imports semivariance model and updates its parameters (model name, nugget, sill, range, model_error).

---

### ```TheoreticalSemivariogram.export_semivariance()```

```python
TheoreticalSemivariogram.export_semivariance(self, filename)
```

Function exports semivariance data into csv file. Exported data has three columns: `lags`, `experimental`, `theoretical` where theoretical values are calculated from the fitted model and lags given by experimental semivariogram.

---

### ```TheoreticalSemivariogram.show_experimental_semivariogram()```

```python
TheoreticalSemivariogram.show_experimental_semivariogram(self)
```

Function shows experimental semivariogram of a given model.

---

### ```TheoreticalSemivariogram.show_semivariogram()```

```python
TheoreticalSemivariogram.show_semivariogram(self)
```

Function shows experimental and theoretical semivariogram in one plot.