# Semivariance module

Available classes and functions:

- ArealSemivariance: Class calculates semivariance of areas for Poisson Kriging (area to area and area to point),
- RegularizedSemivariogram: Class performs deconvolution of semivariogram of areal data,
- calculate_covariance: Function calculates covariance of a given set of points,
- calculate_semivariance: Function calculates semivariance of a given set of points,
- calculate_weighted_semivariance: Function calculates weighted semivariance,
- TheoreticalSemivariogram: Class calculates theoretical semivariogram.

## ArealSemivariance

### Class initialization

```python
pyinterpolate.semivariance.areal_semivariance.areal_semivariance.ArealSemivariance(
    areal_data, areal_lags, areal_step_size, areal_points_data,
    weighted_semivariance=False, verbose=False)

```

Class calculates semivariance of areas for Poisson Kriging (area to area and area to point).


INITIALIZATION PARAMS:

- **areal_data**: (```numpy array``` / ```list```) 

```python
[area_id, area_geometry, centroid x,centroid y, value]
```
- **areal_lags**: (```numpy array``` / ```list```) - array of lags (ranges of search),
- **areal_step_size**: (```float```) step size for search radius,
- **areal_points_data**: (```numpy array``` / ```list```)

```python
[
    area_id,
    [point_position_x, point_position_y, value]
]
```
- **weighted_semivariance**: (```bool```) if ```False``` then each distance is treated equally when calculating theoretical semivariance; if ```True``` then semivariances closer to the point of origin have more weight,
- **verbose**: (```bool```) if ```True``` then all messages are printed, otherwise nothing.

### Class public methods:

- **regularize_semivariogram**: Function calculates regularized point support semivariogram,
- **show_semivariograms**: Function shows semivariograms calculated by the class: Empirical semivariogram, Theoretical model, Inblock Semivariance, Within-block semivariogram, Between blocks semivariogram, Regularized output.

---

### ArealSemivariance.regularize_semivariogram

```python
ArealSemivariance.regularize_semivariogram(self,within_block_semivariogram=None,
                                           between_blocks_semivariogram=None,
                                           empirical_semivariance=None,
                                           theoretical_semivariance_model=None)
```

Function calculates regularized point support semivariogram in the form given in:

> Goovaerts P., Kriging and Semivariogram Deconvolution in the Presence of Irregular Geographical Units, Mathematical Geology 40(1), 101-128, 2008

Function has the form:

$$\gamma_{v(h)} = \gamma(v, v_h) - \gamma_{h}(v, v)$$

where:

- $\gamma_{v(h)}$ - regularized semivariogram,
- $\gamma(v, v_h)$ - semivariogram value between any two blocks separated by the distance $h$,
- $\gamma_{h}(v, v)$ - arithmetical average of within-block semivariogram.

INPUT:

- **within_block_semivariogram**: mean semivariance between the blocks:

$$ \gamma_{h}(v, v) = \frac{1}{2*N(h)} \sum^{N(h)}_{a=1} [\gamma(v\alpha, v\alpha) + \gamma(v\alpha+h, v\alpha+h)]$$

where:

$\gamma(v\alpha, v\alpha)$ and $\gamma(v\alpha+h, v\alpha+h)$ are the inblock semivariances of block $\alpha$ and block $\alpha+h$ separated by the distance $h$ weighted by the inblock population.

- **between_blocks_semivariogram**: semivariance between all blocks calculated from the theoretical model,
- **empirical_semivariance**: (```numpy array```) empirical semivariance between area centroids, ```default=None```, if ```None``` is provided then empirical semivariance is computed by the ```_calculate_empirical_semivariance``` method from area centroids,
- **theoretical_semivariance_model**: (```TheoreticalSemivariogram```) theoretical semivariance model from ```TheoreticalSemivariance``` class, default is ```None```, if ```None``` is provided then theoretical model is derived from area centroids and empirical semivariance.

OUTPUT:

- **semivariance**: (```numpy arrays```) of lag, semivariance values and number of areas within lag where: 

```semivariance[0] = array of lags```;

```semivariance[1] = array of lag's values```; 

```semivariance[2] = array of number of points in each lag```.

---

### ArealSemivariance.show_semivariograms

```python
ArealSemivariance.show_semivariograms(self)
```

Function shows semivariograms calculated by the class: Empirical semivariogram, Theoretical model, Inblock Semivariance, Within-block semivariogram, Between blocks semivariogram, Regularized output.

***

## RegularizedSemivariogram

### Class initialization

```python
pyinterpolate.semivariance.semivariogram_deconvolution.regularize_semivariogram.RegularizedSemivariogram(self)

```

Class performs deconvolution of semivariogram of areal data. Whole procedure is based on the iterative process described in: 
    
> Goovaerts P., Kriging and Semivariogram Deconvolution in the Presence of Irregular Geographical Units, Mathematical Geology 40(1), 101-128, 2008

Class works as follow:

- initialize your object (no parameters),
- then use fit() method to build initial point support model,
- then use transform() method to perform semivariogram regularization.

### Class public methods:

- **fit** - fits areal data and point support data into a model, initialize experimental semivariogram, theoretical semivariogram model, regularized point support model and deviation.
- **transform** - performs semivariogram regularization, which is an iterative process,
- **export_regularized_model** - Function exports final regularized model parameters into specified csv file.
- **show_baseline_semivariograms** - Function shows experimental semivariogram, initial theoretical semivariogram and initial regularized semivariogram after fit() operation.
- **show_semivariograms** - plots experimental semivariogram of area data, theoretical curve of area data, regularized model values and regularized model theoretical curve.

---

### RegularizedSemivariogram.fit

```python
RegularizedSemivariogram.fit(self, areal_data, areal_lags, areal_step_size, point_support_data,
                             ranges=16, weighted_lags=True, store_models=False)
```

Function fits area and point support data to the initial regularized models.

INPUT:

- **areal_data**: (```numpy array```) areal data prepared with the function ```prepare_areal_shapefile()```, where data is a ```numpy array```in the form: 

```python
[area_id, area_geometry, centroid x, centroid y, value]
```
- **areal_lags**: (```list``` / ```numpy array```) lags between each distance between areas,
- **areal_step_size**: (```float```) step size between each lag, usually it is a half of distance between lags,
- **point_support_data**: (```numpy array```) point support data prepared with the function ```get_points_within_area()```, where data is a ```numpy array``` in the form:

```python
[
    area_id,
    [point_position_x, point_position_y, value]
]
```
- **ranges**: (```int```) number of ranges to test during semivariogram fitting. More steps == more accurate _nugget_ and _range_ prediction, but longer calculations,
- **weighted_lags**: (```bool```) lags weighted by number of points; if ```True``` then during semivariogram fitting error of each model is weighted by number of points for each lag. In practice it means that more reliable data (lags) have larger weights and semivariogram is modeled to better fit to those lags,
- **store_models**: (```bool```) if ```True``` then experimental, regularized and theoretical models are stored in lists after each iteration. It is important for a debugging process.

OUTPUT:

None, class is updating its internal parameters. Usually after fitting you should perform regularization with ```transform()``` method.

---

### RegularizedSemivariogram.transform

```python
RegularizedSemivariogram.transform(self, max_iters=25,
                                   min_deviation_ratio=0.01,
                                   min_diff_decrease=0.01,
                                   min_diff_decrease_reps=3)
```

Function transofrms fitted data and performs semivariogram regularization iterative procedure.

INPUT:

- **max_iters**: (```int```) maximum number of iterations,
- **min_deviation_ratio**: (```float```) minimum ratio between deviation and initial deviation (D(i) / D(0)) below each algorithm is stopped,
- **min_diff_decrease**: (```float```) minimum absolute difference between new and optimal deviation divided by optimal deviation: ABS(D(i) - D(opt)) / D(opt). If it is recorded ```n``` times (controled by the ```min_diff_d_stat_reps``` param) then algorithm is stopped,
- ```min_diff_decrease_reps```: (```int```) number of iterations when algorithm is stopped if condition ```min_diff_d_stat``` is fulfilled.

OUTPUT:

None, class is updating its internal parameters. Usually after transforming you should export your theoretical model with ```export_regularized_model()``` method.

---

### RegularizedSemivariogram.export_regularized_model

```python
RegularizedSemivariogram.export_regularized_model(self, filename)
```

Function exports final regularized model parameters into specified csv file.

INPUT:

- **filename**: (```str```) filename for model parameters (nugget, sill, range, model type).

OUTPUT:

Method saves regularized model into csv file.

---

### RegularizedSemivariogram.show_baseline_semivariograms

```python
RegularizedSemivariogram.show_baseline_semivariograms(self)
```

Function shows experimental semivariogram, initial theoretical semivariogram and initial regularized semivariogram after ```fit()``` operation.

---

### RegularizedSemivariogram.show_semivariograms

```python
RegularizedSemivariogram.show_semivariograms(self)
```

Function shows experimental semivariogram, theoretical semivariogram and regularized semivariogram after semivariogram regularization with ```transform()``` method.

***

## calculate_covariance

```python
pyinterpolate.semivariance.semivariogram_estimation.calculate_covariance.calculate_covariance(
    data, lags, step_size)
```

Function calculates covariance of a given set of points.

Equation for calculation is:

$$covariance = \frac{1}{N} * \sum_{i=1}^{N} [z(x_{i} + h) * z(x_{i})] - u^{2}$$

where:

$N$ - number of observation pairs,

$h$ - distance (lag),

$z(x_{i})$ - value at location $z_{i}$,

$(x_{i} + h)$ - location at a distance $h$ from $x_{i}$,

$u$ - mean of observations at a given lag distance.


INPUT:

- **data**: (```numpy array```) coordinates and their values,
- **lags**: (```numpy array```) lags between points,
- **step_size**: (```float```) distance between lags within each points are included in the calculations.


OUTPUT:

- (```numpy array```) covariance - array of pair of lag and covariance values where:

```covariance[0] = array of lags```;

```covariance[1] = array of lag's values```;

```covariance[2] = array of number of points in each lag```.

***

## calculate_semivariance

```python
pyinterpolate.semivariance.semivariogram_estimation.calculate_semivariance.calculate_semivariance(
    data, lags, step_size)
```

Function calculates semivariance of a given set of points.

Equation for calculation is:

$$semivariance = \frac{1}{2N} * \sum_{i=1}^{N} [z(x_{i} + h) - z(x_{i})]^{2}$$

where:

$N$ - number of observation pairs,

$h$ - distance (lag),

$z(x_{i})$ - value at location $z_{i}$,

$(x_{i} + h)$ - location at a distance $h$ from $x_{i}$,


INPUT:

- **data**: (```numpy array```) coordinates and their values,
- **lags**: (```numpy array```) lags between points,
- **step_size**: (```float```) distance between lags within each points are included in the calculations.


OUTPUT:

- (```numpy array```) semivariance - array of pair of lag and semivariance values where:

```semivariance[0] = array of lags```;

```semivariance[1] = array of lag's values```;

```semivariance[2] = array of number of points in each lag```.

***

## calculate_weighted_semivariance

```python
pyinterpolate.semivariance.semivariogram_estimation.calculate_semivariance.calculate_weighted_semivariance(data, lags, step_size)
```

Function calculates weighted semivariance following _Monestiez et al._:

> A. Monestiez P, Dubroca L, Bonnin E, Durbec JP, Guinet C: Comparison of model based geostatistical methods in ecology: application to fin whale spatial distribution in northwestern Mediterranean Sea. In Geostatistics Banff 2004 Volume 2. Edited by: Leuangthong O, Deutsch CV. Dordrecht, The Netherlands, Kluwer Academic Publishers; 2005:777-786.


> B. Monestiez P, Dubroca L, Bonnin E, Durbec JP, Guinet C: Geostatistical modelling of spatial distribution of Balenoptera physalus in the northwestern Mediterranean Sea from sparse count data and heterogeneous observation efforts. Ecological Modelling 2006 in press.

Equation for calculation is:

$$s(h) = \frac{1}{2*\sum_{a=1}^{N(h)} c_{a}} * \sum_{a=1}^{N(h)} c_{a}*(z(u_{a}) - z(u_{a} + h))^2 - m'$$

where:

$$c_{a} = \frac{n(u_{a}) * n(u_{a} + h)}{n(u_{a}) + n(u_{a} + h)}$$

where:

$s(h)$ Semivariogram of the risk,

$n(u_{a})$ - size of the population at risk in the unit a,

$z(u_{a})$ - mortality rate at the unit a,

$u_{a} + h$ - area at the distance (h) from the analyzed area,

$m'$ - population weighted mean of rates.

INPUT:

- **data**: (```numpy array```) coordinates and their values and weights:

```python
[coordinate x, coordinate y, value, weight]
```
- **lags**: (```numpy array```) lags between points,
- **step_size**: (```float```) distance between lags within each points are included in the calculations.


OUTPUT:

- (```numpy array```) semivariance - array of pair of lag and semivariance values where:

```semivariance[0] = array of lags```;

```semivariance[1] = array of lag's values```;

```semivariance[2] = array of number of points in each lag```.

***

## TheoreticalSemivariogram

### Class initialization

```python
pyinterpolate.semivariance.semivariogram_fit.fit_semivariance.TheoreticalSemivariogram(
    self, points_array=None, empirical_semivariance=None, verbose=False)

```

Class calculates theoretical semivariogram.

Available theoretical models:

    - spherical_model(distance, nugget, sill, semivar_range)
    - gaussian_model(distance, nugget, sill, semivar_range)
    - exponential_model(distance, nugget, sill, semivar_range)
    - linear_model(distance, nugget, sill, semivar_range)


INITIALIZATION PARAMS:

- **points_array**: (```numpy array```) analysed points where the last column is representing values, typically x, y, value, 
- **empirical_semivariance**: (```numpy array```) semivariance where first row of array represents lags and the second row represents semivariance's values for given lag.

### Class public methods:

- **fit_semivariance**: Method fits experimental points into chosen semivariance model type,
- **find_optimal_model**: Method fits experimental points into all available models and choose one with the lowest error,
- **export_model**: Function exports semivariance model to the csv file,
- **import_model**: Function imports semivariance model and updates it's parameters,
- **show_experimental_semivariogram**:  Function shows experimental semivariogram of a given model,
- **show_semivariogram**: Function shows experimental and theoretical semivariogram in one plot.

---

### TheoreticalSemivariogram.fit_semivariance

```python
TheoreticalSemivariogram.fit_semivariance(self, model_type, number_of_ranges=16)
```

Method fits experimental points into chosen semivariance model type.

INPUT:

- **model_type**: (```str```) 'exponential', 'gaussian', 'linear', 'spherical',
- **number_of_ranges**: (```int```) deafult = 16. Used to create an array of equidistant ranges between minimal range of empirical semivariance and maximum range of empirical semivariance.

OUTPUT:

- (model_type, model parameters)

---

### TheoreticalSemivariogram.find_optimal_model

```python
TheoreticalSemivariogram.find_optimal_model(self, weighted=False, number_of_ranges=16)
```

Method fits experimental points into all available models and choose one with the lowest error.

INPUT:

- **weighted**: (```bool```) default=```False```. If ```True``` then each lag is weighted by:

$$\frac{\sqrt{N(h)}}{\gamma_{experimental}(h)}$$

where:

$N(h)$ - number of point pairs in a given range,

$\gamma_{experimental}(h)$ - value of experimental semivariogram for $h$.
- **number_of_ranges**: (```int```) default=16. Used to create an array of equidistant ranges between minimal range of empirical semivariance and maximum range of empirical semivariance.

OUTPUT:

- model_type

Function updates class parameters with model properties.

---

### TheoreticalSemivariogram.export_model

```python
TheoreticalSemivariogram.export_model(self, filename)
```

Function exports semivariance model to the csv file. Columns of csv file are: name, nugget, sill, range.

---

### TheoreticalSemivariogram.import_model

```python
TheoreticalSemivariogram.import_model(self, filename)
```

Function imports semivariance model and updates it's parameters (model name, nugget, sill, range).

---

### TheoreticalSemivariogram.show_experimental_semivariogram

```python
TheoreticalSemivariogram.show_experimental_semivariogram(self)
```

Function shows experimental semivariogram of a given model.

---

### TheoreticalSemivariogram.show_semivariogram

```python
TheoreticalSemivariogram.show_semivariogram(self)
```

Function shows experimental and theoretical semivariogram in one plot.