<a href="https://colab.research.google.com/github/guiattard/PythonForGeosciences/blob/master/geostatistics-applied-to-hydrogeology-with-scikit-gstat/geostatistics-applied-to-hydrogeology-with-scikit-gstat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#@title Copyright 2020 Guillaume Attard { display-mode: "form" }
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Geostatistics applied to hydrogeology with Scikit-GStat

by [Guilllaume Attard](https://guillaumeattard.com/) - pythonforgeosciences.com

last update 17-12-2020

Notebook status : *Under construction*

## Introduction

### Context

Spatially continuous data play a significant role in planning, risk assessment and decision making in environmental management [(Li et al. 2011)](https://doi.org/10.1016/j.envsoft.2011.07.004). However, these data are not always available and often difficult or expensive to acquire. The acquisition of environmental data such as groundwater temperature, hydraulic head, substance concentration of soil are usually collected by point sampling. Then, geoscientists often require spatial interpolation methods to get spatially continuous data over a region of interest, and here comes the Geostatistics. 

### What is geostatistics?

Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including petroleum geology, hydrogeology, hydrology, meteorology, oceanography, geochemistry, geometallurgy, geography, forestry, environmental control, landscape ecology, soil science, and agriculture ([wikipedia definition](https://en.wikipedia.org/wiki/Geostatistics)).

The principle of geostatistic is well resumed on the [documentation webpage](): 
>The basic idea of geostatistics is to describe and estimate spatial correlations in a set of point data. The typical application is geostatistics is an interpolation. Therefore, although using point data, a basic concept is to understand these point data as a sample of a (spatially) continuous variable that can be described as a random field, or to be more precise, a Gaussian random field in many cases. The most fundamental assumption in geostatistics is that any two values xi and xi+h are more similar, the smaller h is, which is a separating distance on the random field. In other words: close observation points will show higher covariances than distant points. In case this most fundamental conceptual assumption does not hold for a specific variable, geostatistics will not be the correct tool to analyse and interpolate this variable.

### What we do here

The aim of this notebook is to build a piezometric map using the Python Scikit-GStat library and to compare the results with other standard interpolation techniques. After some setups, we will download a dataset of points giving hydraulic head of a groundwater body located in the area of Lyon (France) and we will clean this dataset to eliminate all points outside of our groundwater body of interest.

In a second part, we will apply two standard interpolation methods (i.e. linear and cubic) given by the *griddata* function (from *scipy.interpolate*) to map the hydraulic head across our area of interest. The limits of both interpolations methods will be discused.

Finally, we will explore the ordinary kriging method given by the Scikit-GStat library:
- We will first build a semi-variogramm exploring different parameters to better understand the relationship between measurements variablity and distance between measurements. 
- Secondly, we will build an ordinary kriging model to interpolate the hydraulic head across our area of interest. 
- We will finally see how to plot the error estimation across our area of interest.

Please note the reference of this library:and the full documentation:
-  *Mirko Mälicke, & Helge David Schneider. (2019, November 7). Scikit-GStat 0.2.6: A scipy flavoured geostatistical analysis toolbox written in Python. (Version v0.2.6). Zenodo. http://doi.org/10.5281/zenodo.3531816*,
- the full documentation of this Python library can be downloaded [here](https://mmaelicke.github.io/scikit-gstat/SciKitGStat.pdf).

Finally, please note that the name kriging refers to its inventor Dave Krige who published the method in 1951: 
- *Krige, D. G. (1951). A statistical approach to some basic mine valuation problems on the Witwatersrand. Journal of the Southern African Institute of Mining and Metallurgy, 52(6), 119-139.*

