<p align="center">
    <img src="https://github.com/GeostatsGuy/GeostatsPy/blob/master/TCG_color_logo.png?raw=true" width="220" height="240" />

</p>

## Data Analytics 

### Monte Carlo Simulation in Python 


#### Michael Pyrcz, Associate Professor, The University of Texas at Austin 

##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)


Monte Carlo Simulation in Python 

Here's a simple workflow, demonstration of Monte Carlo simulation for subsurface uncertainty modeling workflows. This should help you get started with building subsurface models that integrate uncertainty sources.  

#### Monte Carlo Simulation

Definition: random sampling from a distribution

Procedure: 

1. Model the representative distribution (CDF)
2. Draw a random value from a uniform [0,1] distribution (p-value)
3. Apply the inverse of the CDF to calculate the associated realization

In practice, Monte Carlo simulation refers to the workflow with multiple realizations drawn to buld an uncertainty model. 

\begin{equation}
X^\ell = F_x(p^\ell),  \, \forall \, \ell = 1,\ldots, L
\end{equation}

where $X^\ell$ is the realization of the variable $X$ drawn from its CDF, $F_x$, with cumulative probability, p-value, $p^\ell$.  

#### Monter Carlo Simulation for Uncertainty Modeling

It would be trivial to apply Monte Carlo simulation to a single random variable, after many realizations one would get back the original distribution. The general approach is to:

1. Model all distributions for the input, variables of interest $F_{x_1},\ldots,F_{x_m}$.
2. For each realization draw $p^\ell_{1},\ldots,p^\ell_{m}$, p-values
3. Apply the inverse of each distribution to calculate a realization of each variable, $X^\ell_j = F_{x^\ell_j}^{-1}(p^\ell_j),  \, \forall \, j = 1,\ldots$, $m$ variables.
4. Apply each set of variables for a $\ell$ realization to the transfer function to calculate the ouptput realization, $Y^\ell = F(X_1^\ell,\ldots,X_m^\ell)$.

Monte Carlo Simulation (MCS) is extremely powerful

* Possible to easily simulate uncertainty models for complicated systems 
* Simulations are conducted by drawing values at random from specified uncertainty distributions for each variable
* A single realization of each variable, $X_1^\ell, X_2^\ell,\ldots,X_m^\ell$ is applied to the transfer function to calculate the realization of the variable of interest (output, decision criteria):

\begin{equation}
Y^\ell = f(X_1^\ell,\ldots,X_m^\ell), \, \forall \, \ell = 1,\ldots, L
\end{equation}

* The MCS method builds empirical uncertainty models by random sampling

How many realizations, $L$?

The answer is enough! If the MCS computational cost is low then **many** is the right answer. If too few realizations are calculated then the summary statistics and the entire CDF of the output, decision criteria may be incorrect. This is caused by fluctuations due to not enough samples (see the 'Law of Small Numbers').

The MCS method is very powerful. You can simulate output distributions that could not be calculated analytically.  

#### Limitations

The MCS method above assumes:
1. **representativity** - the distribution is representative
2. **independence** - the variables are independent of eachother
3. **stationarity** - all realizations for each variable are from the same distribution 
  
#### Getting Started

Here's the steps to get setup in Python with the GeostatsPy package:

1. Install Anaconda 3 on your machine (https://www.anaconda.com/download/). 
2. From Anaconda Navigator (within Anaconda3 group), go to the environment tab, click on base (root) green arrow and open a terminal. 
3. In the terminal type: pip install geostatspy. 
4. Open Jupyter and in the top block get started by copy and pasting the code block below from this Jupyter Notebook to start using the geostatspy functionality. 

There are examples below with these functions. You can go here to see a list of the available functions, https://git.io/fh4eX, other example workflows and source code. 

In [1]:
import geostatspy.GSLIB as GSLIB          # GSLIB utilities, visualization and wrapper
import geostatspy.geostats as geostats    # GSLIB methods convert to Python        

We will also need some standard packages. These should have been installed with Anaconda 3.

In [2]:
import numpy as np                        # ndarrys for gridded data
import pandas as pd                       # DataFrames for tabular data
import os                                 # set working directory, run executables
import matplotlib.pyplot as plt           # for plotting
from scipy import stats                   # summary statistics
import math                               # trig etc.
import random
from ipywidgets import interactive        # widgets and interactivity
from ipywidgets import widgets                            
from ipywidgets import Layout
from ipywidgets import Label
from ipywidgets import VBox, HBox

#### Set the working directory

I always like to do this so I don't lose files and to simplify subsequent read and writes (avoid including the full address each time). 

In [3]:
# interactive calculation of the random sample set (control of source parametric distribution and number of samples)
l = widgets.Text(value='                                      Monte Carlo Simulation Demonstration, Michael Pyrcz, Associate Professor, The University of Texas at Austin',layout=Layout(width='950px', height='30px'))

operator = widgets.RadioButtons(options=['add', 'mult'],description='Operator:',disabled=False,layout=Layout(width='230px', height='50px'))

L = widgets.IntSlider(min=1, max = 10000, value = 50, description = '$L$:',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)
L.style.handle_color = 'gray'

uiL = widgets.VBox([L,operator])

dist1 = widgets.Dropdown(
    options=['Uniform','Triangular','Gaussian'],
    value='Gaussian',
    description='$X_1$:',
    disabled=False,
    layout=Layout(width='200px', height='30px')
)
min1 = widgets.FloatSlider(min=0.0, max = 100.0, value = 10.0, description = 'Min',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)
min1.style.handle_color = 'blue'
max1 = widgets.FloatSlider(min=0.0, max = 100.0, value = 30.0, description = 'Max',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)
max1.style.handle_color = 'blue'
ui1 = widgets.VBox([dist1,min1,max1],kwargs = {'justify_content':'center'}) 

dist2 = widgets.Dropdown(
    options=['Triangular', 'Uniform', 'Gaussian'],
    value='Gaussian',
    description='$X_2$:',
    disabled=False,
    layout=Layout(width='200px', height='30px')
)
min2 = widgets.FloatSlider(min=0.0, max = 100.0, value = 10.0, description = 'Min',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)
min2.style.handle_color = 'red'
max2 = widgets.FloatSlider(min=0.0, max = 100.0, value = 30.0, description = 'Max',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)
max2.style.handle_color = 'red'
ui2 = widgets.VBox([dist2,min2,max2],kwargs = {'justify_content':'center'})

dist3 = widgets.Dropdown(
    options=['Triangular', 'Uniform', 'Gaussian'],
    value='Gaussian',
    description='$X_3$:',
    disabled=False,
    layout=Layout(width='200px', height='30px')
)
min3 = widgets.FloatSlider(min=0.0, max = 100.0, value = 10.0, description = 'Min',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)
min3.style.handle_color = 'yellow'
max3 = widgets.FloatSlider(min=0.0, max = 100.0, value = 30.0, description = 'Max',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)
max3.style.handle_color = 'yellow'
ui3 = widgets.VBox([dist3,min3,max3],kwargs = {'justify_content':'center'})

ui = widgets.HBox([uiL,ui1,ui2,ui3])
ui2 = widgets.VBox([l,ui],)

def make_dist(dist,zmin,zmax,L):
    if dist == 'Triangular':
        z = np.random.triangular(left=zmin, mode=(zmax+zmin)*0.5, right=zmax, size=L)
        pdf = stats.triang.pdf(np.linspace(0.0,100.0,1000), loc = zmin, c = 0.5, scale = zmax-zmin)* 2 * L 
    if dist == 'Uniform':
        z = np.random.uniform(low=zmin, high=zmax, size=L)
        pdf = stats.uniform.pdf(np.linspace(0.0,100.0,1000), loc = zmin, scale = zmax-zmin) * 2 * L
    if dist == 'Gaussian':
        mean = (zmax + zmin)*0.5; sd = (zmax - zmin)/6.0
        z = np.random.normal(loc = mean, scale = sd, size=L)
        pdf = stats.norm.pdf(np.linspace(0.0,100.0,1000), loc = mean, scale = sd) * 2 * L
    return z, pdf
        
def f_make(L,operator,dist1,min1,max1,dist2,min2,max2,dist3,min3,max3): 
    np.random.seed(seed = 73073)
    x1, pdf1 = make_dist(dist1,min1,max1,L)
    x2, pdf2 = make_dist(dist2,min2,max2,L)
    x3, pdf3 = make_dist(dist3,min3,max3,L)

    xvals = np.linspace(0.0,100.0,1000)
    plt.subplot(241)
    plt.hist(x1,density = False,bins=np.linspace(0,100,50),weights=None,color='blue',alpha=0.7,edgecolor='grey')
    plt.plot(xvals,pdf1,'--',color='black',linewidth = 3)
    plt.xlim(0,100); plt.xlabel("$X_1$"); plt.title("First Predictor Feature, $X_1$"); plt.ylabel('Frequency')
 
    plt.subplot(242)
    plt.hist(x2,density = False,bins=np.linspace(0,100,50),weights=None,color='red',alpha=0.7,edgecolor='grey')
    plt.plot(xvals,pdf2,'--',color='black',linewidth = 3)
    plt.xlim(0,100); plt.xlabel("$X_1$"); plt.title("Second Predictor Feature, $X_2$"); plt.ylabel('Frequency')
 
    plt.subplot(243)
    plt.hist(x3,density = False,bins=np.linspace(0,100,50),weights=None,color='yellow',alpha=0.7,edgecolor='grey')
    plt.plot(xvals,pdf3,'--',color='black',linewidth = 3)
    plt.xlim(0,100); plt.xlabel("$X_1$"); plt.title("Third Predictor Feature, $X_3$"); plt.ylabel('Frequency')
 
    y = np.zeros(L)
    ymin = 0.0
    if operator == "add":
        y = x1 + x2 + x3
    elif operator == "mult":
        y = x1 * x2 * x3
        
    ymax = max(round((np.max(y)+50)/100)*100,100) # round up to nearest hundreds to avoid the chart jumping around
    
    plt.subplot(244)
    plt.hist(y,density = False,bins=np.linspace(ymin,ymax,50),weights=None,color='black',alpha=0.5,edgecolor='black')
    plt.xlabel("$Y$"); plt.title("Response Feature, $y = X_1 + X_2 + X_3$"); plt.ylabel('Frequency')
    plt.xlim(ymin,ymax)
    
    plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=1.2, wspace=0.3, hspace=0.2)
    plt.show()    

interactive_plot = widgets.interactive_output(f_make, {'L':L,'operator':operator,'dist1':dist1,'min1':min1,'max1':max1,'dist2':dist2,'min2':min2,'max2':max2,'dist3':dist3,'min3':min3,'max3':max3})
interactive_plot.clear_output(wait = True)                # reduce flickering by delaying plot updating    

### Monte Carlo Simulation Demonstration

* specify the distributions for 3 Random Variables, $X_1$, $X_2$, and $X_3$ and select the operator $y = f(X_1,X_2,X_3)$

* observe the distribution of the resulting Monte Carlos Simulation realization histograms of $X_1^{\ell}$, $X_2^{\ell}$, $X_3^{\ell}$, and $y^{\ell}$

#### Michael Pyrcz, Associate Professor, University of Texas at Austin 

##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)

### The Inputs

* **$L$**: number of realizations, **Operator**: addition for $y = X_1 + X_2 + X_3$, multiplication for $y = X_1 \times X_2 \times X_3$

* **$X_1$, $X_2$, and $X_3$**: distribution type, min and max. Assume mode or mean is centered and 3 st.dev. for Gaussian

In [4]:
display(ui2, interactive_plot)                            # display the interactive plot

VBox(children=(Text(value='                                      Monte Carlo Simulation Demonstration, Michael…

Output()

#### Comments

This was a basic demonstration of Monte Carlo simulation for uncertainty analysis. A lot more could be done, for example, more complicated transfer functions and a combination of non-parametric and parametric distributions. Also, one could integrate relationships between the variables (we assumed independent here).

I have other demonstrations on the basics of working with DataFrames, ndarrays, univariate statistics, plotting data, declustering, data transformations, trend modeling, multivariate analysis and many other workflows available at https://github.com/GeostatsGuy/PythonNumericalDemos and https://github.com/GeostatsGuy/GeostatsPy. 
  
I hope this was helpful,

*Michael*

#### The Author:

### Michael Pyrcz, Associate Professor, University of Texas at Austin 
*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*

With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development. 

For more about Michael check out these links:

#### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)

#### Want to Work Together?

I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.

* Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you! 

* Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!

* I can be reached at mpyrcz@austin.utexas.edu.

I'm always happy to discuss,

*Michael*

Michael Pyrcz, Ph.D., P.Eng. Associate Professor The Hildebrand Department of Petroleum and Geosystems Engineering, Bureau of Economic Geology, The Jackson School of Geosciences, The University of Texas at Austin

#### More Resources Available at: [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)