*The codes were developed on Windows 10, and were not tested on other machines. Anaconda 5.2.0 is chosen as a Python interpreter.*

This notebook introduces the concept and methodology of generating variograms.

# Basics of Variograms

> **Variogram** is a measure of dissimilarity over a distance. It shows how two data points are correlated from a spatial perspective, and provides useful insights when trying to estimate the value of an unknown location using collected sample data from other locations.

Let's say that you are a spatial data analyst of a gold mining company, and want to know the distribution of gold percentage over 100m x 100m mining area. To understand the characteritics of the rock formations, you take 100 random rock samples from the mining area, but obviously these 100 data points are not enough to estimate gold percentage over every single spatial locations in the area. So you analyze the available data (100 rock samples from random locations) and simulate full 2D-surface plot for gold percentage over the mining area.

This 2D surface simulation from sparse spatial data is a multi-step process that involves many complicated statistical techiniques. 

In [120]:
import random
import scipy
import matplotlib.pyplot as plt
from matplotlib import cm
import pandas as pd
import numpy as np
%matplotlib notebook

In [128]:
data = pd.read_excel('sample_data/2D_Data.xlsx', sheet_name='truth_1', header=None)
data = data.apply(lambda x: (x + 3.47) * 0.75)

In [129]:
fig, ax = plt.subplots()
im = ax.imshow(data, cmap=cm.Wistia)
cb = fig.colorbar(im)
#cb.set_ticks([0, 5])
cb.set_label('percentage (%)')
ax.set_xlabel('x (m)')
ax.set_ylabel('y (m)')
ax.set_title('Gold Percentage over a Mining Area')

<IPython.core.display.Javascript object>

Text(0.5,1,'Gold Percentage over a Mining Area')