In [1]:
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')

## Sampling a 4-dimensional MultiVariate Normal distribution (MVN) via the ParaMonte library's ParaDRAM routine    

**The logic of Monte Carlo sampling** is very simple:

Suppose we have a mathematical objective function defined on a `ndim`-dimensional domain. Typically, this domain is defined on the space of real numbers. In general, understanding and visualizing the structure of such objective functions is an extremely difficult task, if not impossible. As a better alternative, you employ Monte Carlo techniques that can randomly explore the structure of the objective function and find the function's extrema.  

In the following example below, an example of one of the simplest such objective functions, the **Multivariate Normal Distribution (MVN)**, is constructed and sampled using the ParaMonte library samplers, here, the **ParaDRAM** sampler (**Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo sampler**).  

Suppose we want to sample random points from a [MultiVariate Normal (MVN) Probability Density Function (PDF)](https://en.wikipedia.org/wiki/Multivariate_normal_distribution). The PDF equation of MVN is the following,  

$$
\large
\pi (x, \mu, \Sigma, k = 4) = (2\pi)^{-\frac{k}{2}} ~ |\Sigma|^{-\frac{1}{2}} ~ 
\exp
\bigg( 
-\frac{1}{2} (x-\mu)^T ~ \Sigma^{-1} ~ (x-\mu)
\bigg)
$$

The following Python function `getLogFunc()` returns the natural logarithm of the Probability Density Function of the multivariate Normal distribution.  

---  

> Since the mathematical objective functions (e.g., probability density functions) can take extremely small or large values, we often work with their natural logarithms instead. **This is the reason behind the naming convention used in the ParaMonte library for** the user's objective functions: **getLogFunc**, implying that **the user must provide a function that returns the natural logarithm of the target objective function**.  
  
  
See [this Jupyter Notebook](https://nbviewer.jupyter.org/github/cdslaborg/paramontex/blob/master/Python/Jupyter/working_with_logarithm_of_objective_function.ipynb) for an in-depth discussion of why we need to work with the logarithm of mathematical objective functions in optimization and sampling problems.  

---  

In [2]:
# activate interactive plotting in Jupyter environment
%matplotlib notebook

In [3]:
import numpy as np

NDIM = 4                                # number of dimensions of the domain of the MVN PDF
MEAN =  np.double([-10, 15., 20., 0.0]) # This is the mean of the MVN PDF.
COVMAT = np.double( [ [1.0,.45,-.3,0.0] # This is the covariance matrix of the MVN PDF.
                    , [.45,1.0,0.3,-.2]
                    , [-.3,0.3,1.0,0.6]
                    , [0.0,-.2,0.6,1.0]
                    ] )

INVCOV = np.linalg.inv(COVMAT) # This is the inverse of the covariance matrix of the MVN distribution.

# The following is the log of the coefficient used in the definition of the MVN.

MVN_COEF = NDIM * np.log( 1. / np.sqrt(2.*np.pi) ) + np.log( np.sqrt(np.linalg.det(INVCOV)) )

# the logarithm of objective function: log(MVN)

def getLogFunc(point):
    """
    Return the logarithm of the MVN PDF.
    """
    normedPoint = MEAN - point
    return MVN_COEF - 0.5 * ( np.dot(normedPoint,np.matmul(INVCOV,normedPoint)) )

We will sample random points from the objective function by calling the **ParaDRAM** sampler (**Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo sampler**) of the ParaMonte library.  

## Running a ParaDRAM simulation on a single processor

>**To run the sampler in parallel**, you will have to first save the MPI-enabled script as an external file. Visit the [ParaMonte library's documentation website](http://cdslab.org/paramonte/notes/run/python/) for more information, or see [this parallel ParaDRAM simulation Jupyter notebook example](https://nbviewer.jupyter.org/github/cdslaborg/paramontex/blob/master/Python/Jupyter/sampling_multivariate_normal_density_function_via_paradram_parallel.ipynb).  

In [4]:
import paramonte as pm

First we will check what version of the ParaMonte library your are using,

In [5]:
print( pm.version.kernel.get() )
print( pm.version.interface.get() )

ParaMonte Python Kernel Version 1.2.0
ParaMonte Python Interface Version 2.0.0


We can also **dump** the ParaMonte library interface version,

In [6]:
print( pm.version.kernel.dump() )
print( pm.version.interface.dump() )

1.2.0
2.0.0


We can also verify the existence of the essential components of the current installation of the ParaMonte Python library on the system,

In [7]:
pm.verify()


:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::::                                                                                       ::::

       _/_/_/_/                                   _/_/    _/_/
        _/    _/                                  _/_/_/_/_/                     _/
       _/    _/ _/_/_/_/   _/ /_/_/ _/_/_/_/     _/  _/  _/   _/_/   _/_/_/   _/_/_/  _/_/_/
      _/_/_/   _/    _/   _/_/     _/    _/     _/      _/  _/   _/ _/    _/   _/   _/_/_/_/                        
     _/       _/    _/   _/       _/    _/     _/      _/  _/   _/ _/    _/   _/   _/    
  _/_/_/       _/_/_/_/ _/         _/_/_/_/ _/_/_/  _/_/_/  _/_/  _/    _/   _/_/   _/_/_/


                                          ParaMonte
                                   plain powerful parallel
                                     Monte Carlo library
                                        Version 2.0.0

::::                                       

---  
### Setting up the ParaDRAM simulation specifications

Now, define a ParaDRAM sampler instance,  

In [8]:
pmpd = pm.ParaDRAM()

The **simplest scenario** is to **run the simulation with the default specifications** that are appropriately determined by the ParaDRAM sampler. However, for the purpose of clarity reproducibility, we will specify a few simulation specs,

> For a complete list of all ParaDRAM simulation specifications, visit the [ParaDRAM simulation specifications webpage](https://www.cdslab.org/paramonte/notes/usage/paradram/specifications/).  

### The simulation output file names and storage path   
We will store all output files in a directory with the same name as this Jupyter Notebook's name and with the same prefix.  

> By default, all ParaDRAM simulation output files are named properly (see [this page](https://www.cdslab.org/paramonte/notes/usage/paradram/output/) for a review of ParaDRAM simulation output files). **In general, it is a good habit to specify an output file prefix for any ParaDRAM simulation**. This way, the restart of an incomplete simulation, if it happens for some reasons, would be seamless and doable by just rerunning the simulation, without any change of any sort to any parts of the codes or the simulation.  

In [9]:
pmpd.spec.outputFileName = "./sampling_multivariate_normal_density_function_via_paradram/mvn_serial"

### Ensuring the reproducibility of the results  
We will also initialize the random seed to a fixed number to ensure the reproducibility of the simulation results in the future,  

In [10]:
pmpd.spec.randomSeed = 3751

Not so important, but we will also specify a chainSize here, 30000, the number of **unique points** to be sampled from the objective function. This number is much smaller than the default 100,000 unique sample points of the sampler.  

In [11]:
pmpd.spec.chainSize = 30000

For the sake of illustration, we will also request an ASCII-format restart output file,  

In [12]:
pmpd.spec.restartFileFormat = "ascii"

### Calling the ParaDRAM sampler  
Note that none of the above specification settings are necessary, but is in general a good habit to define them prior to the simulation.  
We now call the ParaDRAM sampler routine to run the sampling simulation,  

In [13]:
# run the ParaDRAM sampler
pmpd.runSampler ( ndim = 4
                , getLogFunc = getLogFunc
                )


ParaDRAM - NOTE: Running the ParaDRAM sampler in serial mode...
ParaDRAM - NOTE: To run the ParaDRAM sampler in parallel mode visit:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     https://www.cdslab.org/paramonte
ParaDRAM - NOTE: 
ParaDRAM - NOTE: If you are using Jupyter notebook, check the Jupyter's 
ParaDRAM - NOTE: terminal window for realtime simulation progress and report.


ParaDRAM - NOTE: To read the generated output files, try:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     pmpd.readReport()      # to read the summary report from the output report file.
ParaDRAM - NOTE:     pmpd.readSample()      # to read the final i.i.d. sample from the output sample file.
ParaDRAM - NOTE:     pmpd.readChain()       # to read the uniquely-accepted points from the output chain file.
ParaDRAM - NOTE:     pmpd.readMarkovChain() # to read the Markov Chain. NOT recommended for very large chains.
ParaDRAM - NOTE:     pmpd.readRestart()     # to read the contents of an ASCII-format output restart file.
ParaDRAM 

This will print the realtime simulation progress information on your **Anaconda prompt window** (not inside your Jupyter notebook). Once the simulation is finished, the ParaDRAM routine generates 5 output files, each of which contains information about certain aspects of the simulation,  

-   [mvn_serial_process_1_report.txt](sampling_multivariate_normal_density_function_via_paradram/mvn_serial_process_1_report.txt)  
    This file contains reports about all aspects of the simulation, from the very beginning of the simulation to the very last step, including the postprocessing of the results and final sample refinement.  

-   [mvn_serial_process_1_progress.txt](sampling_multivariate_normal_density_function_via_paradram/mvn_serial_process_1_progress.txt)  
    This file contains dynamic realtime information about the simulation progress. The frequency of the progress report to this file depends on the value of the simulation specification `progressReportPeriod`.  

-   [mvn_serial_process_1_sample.txt](sampling_multivariate_normal_density_function_via_paradram/mvn_serial_process_1_sample.txt)  
    This file contains the final fully-refined decorrelated i.i.d. random sample from the objective function.  

-   [mvn_serial_process_1_chain.txt](sampling_multivariate_normal_density_function_via_paradram/mvn_serial_process_1_chain.txt)  
    This file contains the Markov chain, generally in either binary or condensed (compact) ASCII format to improve the storage efficiency of the chain on your computer.  



Now, we can read the generated output sample, contained in the file suffixed with `*_sample.txt`.

In [14]:
pmpd.readSample()


ParaDRAM - NOTE: 1 files detected matching the pattern: "./sampling_multivariate_normal_density_function_via_paradram/mvn_serial*_sample.txt"


ParaDRAM - NOTE: processing sample file: D:\Dropbox\Projects\20180101_ParaMonte\fb\paramontex\Python\Jupyter\sampling_multivariate_normal_density_function_via_paradram\mvn_serial_process_1_sample.txt
ParaDRAM - NOTE: reading the file contents... done in 0.02593 seconds.
ParaDRAM - NOTE: ndim = 4, count = 7504
ParaDRAM - NOTE: parsing file contents... 
ParaDRAM - NOTE: computing the sample correlation matrix... 
ParaDRAM - NOTE: adding the correlation graphics tools... 
ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.013963 seconds.
ParaDRAM - NOTE: computing the sample covariance matrix... 
ParaDRAM - NOTE: adding the covariance graphics tools... 
ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.011968 seconds.
ParaDRAM - NOTE: computing the sample autocorrelations... 
ParaDRAM - NOTE: adding 

By default, the sampler generates a highly refined decorrelated final sample. Essentially, **the ParaDRAM sampler refining the Markov chain for as long as there is any residual auto-correlation in any of the individual variables of the Markov chain that has been sampled**. This is an extremely strong and conservative refinement strategy (and it is done so in the ParaDRAM algorithm for a good reason: to **ensure independent and identically distributed (i.i.d.) samples from the objective function**.  

This extreme refinement can be relaxed and controlled by the user, by setting the following ParaDRAM simulation specification variable prior to the simulation run,

In [15]:
pmpd.spec.sampleRefinementCount = 1

If you rerun the simulation with this specification set and, you will notice about a **three-fold** increase in the final sample size. Note that the above specification has to be set **before** running the simulation in the above if you really want to use it (also, do not forget to specify a new output file prefix for the new simulation because simulations cannot be overwritten). A value of `1` will cause the ParaDRAM sampler to refine the Markov chain for only one round, which what most people in the MCMC community do.

To quickly visualize the generated sample as a histogram, try,

In [16]:
pmpd.sampleList[0].plot.distplot()

ParaDRAM - NOTE: making the distplot plot... 

<IPython.core.display.Javascript object>

done in 0.13361 seconds.


How about adding a target value to this histogram plot? By default, if no `value` is specified for the target object, it will add the state corresponding to the mode of objective funciton to the plot, as inferred from the sample file,  

In [17]:
pmpd.sampleList[0].plot.distplot.reset()
pmpd.sampleList[0].plot.distplot()
pmpd.sampleList[0].plot.distplot.currentFig.axes.set_xlabel("Standard Gaussian Random Value")
pmpd.sampleList[0].plot.distplot.target() # add a line corresponding to the maxLogFunc (mode) of the sampled points.

ParaDRAM - NOTE: making the distplot plot... 

<IPython.core.display.Javascript object>

done in 0.124666 seconds.


However, assiging other values is also possible. For example, let's say that we want to add the mean and 1-sigma limits to the plot, instead of the single mode,  

In [18]:
pmpd.sampleList[0].plot.distplot()
pmpd.sampleList[0].plot.distplot.currentFig.axes.set_xlabel("Standard Gaussian Random Value")

avg = pmpd.sampleList[0].df.SampleVariable1.mean()
pmpd.sampleList[0].plot.distplot.target( value = avg )

std = pmpd.sampleList[0].df.SampleVariable1.std()
pmpd.sampleList[0].plot.distplot.target.axvline.kws.linestyle = "--"
pmpd.sampleList[0].plot.distplot.target( value = avg - std )
pmpd.sampleList[0].plot.distplot.target( value = avg + std )

ParaDRAM - NOTE: making the distplot plot... 

<IPython.core.display.Javascript object>

done in 0.109705 seconds.


We can also plot all of the variables at the same time in a single figure,

In [19]:
pmpd.sampleList[0].df.head()

Unnamed: 0,SampleLogFunc,SampleVariable1,SampleVariable2,SampleVariable3,SampleVariable4
0,-7.278832,-10.609075,14.000948,21.591041,1.234182
1,-7.278832,-10.609075,14.000948,21.591041,1.234182
2,-4.72546,-11.363133,13.571979,21.002667,1.223883
3,-4.72546,-11.363133,13.571979,21.002667,1.223883
4,-4.72546,-11.363133,13.571979,21.002667,1.223883


In [20]:
pmpd.sampleList[0].plot.distplot( xcolumns = [1,2,3,4] )

ParaDRAM - NOTE: making the distplot plot... 

<IPython.core.display.Javascript object>

done in 0.12071 seconds.


Perhaps, we would also want to add legend and change the x-axis title,

In [21]:
pmpd.sampleList[0].plot.distplot.reset() # reseting to the default is not necessary, but does not hurt either.
pmpd.sampleList[0].plot.distplot.legend.enabled = True
pmpd.sampleList[0].plot.distplot( xcolumns = [1,2,3,4] )
pmpd.sampleList[0].plot.distplot.currentFig.axes.set_xlabel("Multivariate Normal Variables")

ParaDRAM - NOTE: making the distplot plot... 

<IPython.core.display.Javascript object>

done in 0.132639 seconds.


Text(0.5, 31.07499999999998, 'Multivariate Normal Variables')

To make a trace-plot of the sample, try,  

In [22]:
pmpd.sampleList[0].plot.line()

ParaDRAM - NOTE: making the line plot... 

<IPython.core.display.Javascript object>

done in 0.176526 seconds.


To change the scale of the x-axis, try,  

In [23]:
pmpd.sampleList[0].plot.line()
pmpd.sampleList[0].plot.line.currentFig.axes.set_xscale("log")

ParaDRAM - NOTE: making the line plot... 

<IPython.core.display.Javascript object>

done in 0.213428 seconds.


By default, the color of the line in the trace-plot will represent the value returned by `getLogFunc()` at the given sampled point.

In [24]:
pmpd.sampleList[0].plot.line.ccolumns

'SampleLogFunc'

To turn the color off, we can instead try,

In [25]:
pmpd.sampleList[0].plot.line.ccolumns = None
pmpd.sampleList[0].plot.line()
pmpd.sampleList[0].plot.line.currentFig.axes.set_xscale("log")

ParaDRAM - NOTE: making the line plot... 

<IPython.core.display.Javascript object>

done in 0.113696 seconds.


There are many other properties of the plot that can be set or modified via the attributes of the `pmpd.sampleList[0].plot.line` object. To see them all, **uncomment** the following line to see the documentation of the object,  

In [26]:
# pmpd.sampleList[0].plot.line.helpme()

We could also plot all variables in the sample plot, 

In [27]:
pmpd.sampleList[0].plot.line.reset()         # reset to the default settings
pmpd.sampleList[0].plot.line.ccolumns = None # turn off the color-mapping
pmpd.sampleList[0].plot.line.ycolumns = pmpd.sampleList[0].df.columns[1:] # exclude the SampleLogFunc column
pmpd.sampleList[0].plot.line()
pmpd.sampleList[0].plot.line.currentFig.axes.set_xscale("log")

ParaDRAM - NOTE: making the line plot... 

<IPython.core.display.Javascript object>

done in 0.119679 seconds.


We could also make scatter or line-scatter trace-plots of the sampled points. The ParaMonte library has a visualization tool for that purpose with some nice predefined settings that can be changed by the user. For example, let's make a scatter and line-scatter traceplots of the first sampled variable,

In [28]:
pmpd.sampleList[0].plot.scatter()

ParaDRAM - NOTE: making the scatter plot... 

<IPython.core.display.Javascript object>

done in 0.153586 seconds.


In [29]:
pmpd.sampleList[0].plot.lineScatter()
pmpd.sampleList[0].plot.lineScatter.currentFig.axes.set_xscale("log")

ParaDRAM - NOTE: making the lineScatter plot... 

<IPython.core.display.Javascript object>

done in 0.193483 seconds.


Setting or modifying the properties of a scatter or lineScatter plot is identical to the line plot. By default, the sampler makes a scatter plot of only the first sampled variable as a function of the order by which the points appear in the sample file. Alternatively, we may want to plot individual variables against each other. For example, to make scatter plots of all variables against the `logFunc`, 

In [30]:
pmpd.sampleList[0].plot.scatter.reset()
pmpd.sampleList[0].plot.scatter.ycolumns = "SampleLogFunc"
for variable in pmpd.sampleList[0].df.columns[1:]: # exclude the SampleLogFunc column
    pmpd.sampleList[0].plot.scatter.xcolumns = variable
    pmpd.sampleList[0].plot.scatter()

ParaDRAM - NOTE: making the scatter plot... 

<IPython.core.display.Javascript object>

done in 0.141652 seconds.
ParaDRAM - NOTE: making the scatter plot... 

<IPython.core.display.Javascript object>

done in 0.132612 seconds.
ParaDRAM - NOTE: making the scatter plot... 

<IPython.core.display.Javascript object>

done in 0.152593 seconds.
ParaDRAM - NOTE: making the scatter plot... 

<IPython.core.display.Javascript object>

done in 0.134639 seconds.


To make bivariate plots of sampled variables against each other, instead of traceplots, simply specify the columns of data to plot,

In [31]:
pmpd.sampleList[0].plot.lineScatter( xcolumns = 1, ycolumns = 2)

ParaDRAM - NOTE: making the lineScatter plot... 

<IPython.core.display.Javascript object>

done in 0.153588 seconds.


To make 3D plots of data, we can try the ParaMonte 3D visualzation tools, like, `line3`, `scatter3`, or `lineScatter3`,

In [32]:
pmpd.sampleList[0].plot.lineScatter3()

ParaDRAM - NOTE: making the lineScatter3 plot... 

<IPython.core.display.Javascript object>

done in 0.066833 seconds.


To make kernel density contour plots of the sampled points, we can try 

In [33]:
pmpd.sampleList[0].plot.contour()

ParaDRAM - NOTE: making the contour plot... 

<IPython.core.display.Javascript object>

done in 0.458773 seconds.


To make kernel density filled contour plots of the sampled points, we can try,

In [34]:
pmpd.sampleList[0].plot.contourf()

ParaDRAM - NOTE: making the contourf plot... 

<IPython.core.display.Javascript object>

done in 0.45477 seconds.


Let's add the mode of the distribution as target to the distribution, too,  

In [35]:
pmpd.sampleList[0].plot.contourf()
pmpd.sampleList[0].plot.contourf.target()

ParaDRAM - NOTE: making the contourf plot... 

<IPython.core.display.Javascript object>

done in 0.459769 seconds.


In [37]:
pmpd.sampleList[0].plot.contourf.reset()
pmpd.sampleList[0].plot.contourf()
pmpd.sampleList[0].plot.contourf.target()

ParaDRAM - NOTE: making the contourf plot... 

<IPython.core.display.Javascript object>

done in 0.433841 seconds.


we could add as many target as desired. For example, let's add the point corresponding to the maximum of the two variables in the above plot, and let's give it a purple color,    

In [38]:
max1 = pmpd.sampleList[0].df.SampleVariable1.max()
max2 = pmpd.sampleList[0].df.SampleVariable2.max()
pmpd.sampleList[0].plot.contourf.target.axvline.kws.linestyle = "--"
pmpd.sampleList[0].plot.contourf.target.axhline.kws.linestyle = "--"
pmpd.sampleList[0].plot.contourf.target.axvline.kws.color = "purple"
pmpd.sampleList[0].plot.contourf.target.axhline.kws.color = "purple"
pmpd.sampleList[0].plot.contourf.target.scatter.kws.color = "purple"
pmpd.sampleList[0].plot.contourf.target( value  = [max1, max2] )

To make 3D kernel density contour plots of the sampled points, we can try,

In [39]:
pmpd.sampleList[0].plot.contour3()

ParaDRAM - NOTE: making the contour3 plot... 

<IPython.core.display.Javascript object>

done in 0.435833 seconds.


By default, the kernel density plot displays `SampleVariable2` vs. `SampleVariable1`, if no other specifications are made by the user.

There are also the `kdeplot1()` and `kdeplot2()` visualization tools that make calls to the `kdeplot()` function of the seaborn library. However, those function implementations in seaborn are currently computationally too slow and therefore we will not diplay them here, though you can try them by yourself.

There is also the ParaMonte `jointplot()` visualization tool which implicitly calls the `jointplot()` function of the seaborn library.

In [40]:
pmpd.sampleList[0].plot.jointplot(xcolumns = 3, ycolumns = 4)

ParaDRAM - NOTE: making the jointplot plot... 

<IPython.core.display.Javascript object>

done in 3.54651 seconds.


We can also make a grid plot of all of the variables against each other via,  

In [41]:
pmpd.sampleList[0].plot.grid()

ParaDRAM - NOTE: making the grid plot... 


<IPython.core.display.Javascript object>

generating subplot #1: (0,0) out of 16... done in 0.016705 seconds.
generating subplot #2: (0,1) out of 16... done in 0.730049 seconds.
generating subplot #3: (0,2) out of 16... done in 0.688156 seconds.
generating subplot #4: (0,3) out of 16... done in 0.6732 seconds.
generating subplot #5: (1,0) out of 16... done in 0.855678 seconds.
generating subplot #6: (1,1) out of 16... done in 0.016705 seconds.
generating subplot #7: (1,2) out of 16... done in 0.684202 seconds.
generating subplot #8: (1,3) out of 16... done in 0.696137 seconds.
generating subplot #9: (2,0) out of 16... done in 0.866683 seconds.
generating subplot #10: (2,1) out of 16... done in 0.837731 seconds.
generating subplot #11: (2,2) out of 16... done in 0.016705 seconds.
generating subplot #12: (2,3) out of 16... done in 0.695172 seconds.
generating subplot #13: (3,0) out of 16... done in 0.838722 seconds.
generating subplot #14: (3,1) out of 16... done in 0.838789 seconds.
generating subplot #15: (3,2) out of 16... do

By default, since the grid plot is rather computationally demanding, the grid plot adds only up to 5 columns of data to the grid plot,  

In [42]:
pmpd.sampleList[0].plot.grid.columns

Index(['SampleLogFunc', 'SampleVariable1', 'SampleVariable2',
       'SampleVariable3'],
      dtype='object')

This behavior can be overriden by the user by assigning more or less columns of the data to the `columns` atribute of the grid plot, like,  

In [43]:
pmpd.sampleList[0].plot.grid.columns = [1,2,3] # indices and column names can be mixed
pmpd.sampleList[0].plot.grid.plotType.lower.value = "contourf"
pmpd.sampleList[0].plot.grid()

ParaDRAM - NOTE: making the grid plot... 


<IPython.core.display.Javascript object>

generating subplot #1: (0,0) out of 9... done in 0.014961 seconds.
generating subplot #2: (0,1) out of 9... done in 0.459723 seconds.
generating subplot #3: (0,2) out of 9... done in 0.442814 seconds.
generating subplot #4: (1,0) out of 9... done in 0.746004 seconds.
generating subplot #5: (1,1) out of 9... done in 0.014961 seconds.
generating subplot #6: (1,2) out of 9... done in 0.446835 seconds.
generating subplot #7: (2,0) out of 9... done in 0.722062 seconds.
generating subplot #8: (2,1) out of 9... done in 0.713097 seconds.
generating subplot #9: (2,2) out of 9... done in 0.014961 seconds.
generating colorbar... done in 0.015957 seconds.


If one corner of the plot is not needed, we cal also simply hide it by disabling it,

In [44]:
pmpd.sampleList[0].plot.grid.columns = [1,2,3] # indices and column names can be mixed
pmpd.sampleList[0].plot.grid.plotType.lower.value = "contourf"
pmpd.sampleList[0].plot.grid.plotType.upper.enabled = False # disable the upper triangle
pmpd.sampleList[0].plot.grid()

ParaDRAM - NOTE: making the grid plot... 


<IPython.core.display.Javascript object>

generating subplot #1: (0,0) out of 9... done in 0.015946 seconds.
generating subplot #2: (0,1) out of 9... done in 0.0 seconds.
generating subplot #3: (0,2) out of 9... done in 0.0 seconds.
generating subplot #4: (1,0) out of 9... done in 0.788875 seconds.
generating subplot #5: (1,1) out of 9... done in 0.015946 seconds.
generating subplot #6: (1,2) out of 9... done in 0.0 seconds.
generating subplot #7: (2,0) out of 9... done in 0.642282 seconds.
generating subplot #8: (2,1) out of 9... done in 0.624362 seconds.
generating subplot #9: (2,2) out of 9... done in 0.015946 seconds.
generating colorbar... done in 0.022906 seconds.


You can also hide the upper or the lower triangles or the colorbar or all three within the current figure by using the "hide" method of the grid object,  

In [45]:
pmpd.sampleList[0].plot.grid.reset()
pmpd.sampleList[0].plot.grid()
pmpd.sampleList[0].plot.grid.hide('lower')

ParaDRAM - NOTE: making the grid plot... 


<IPython.core.display.Javascript object>

generating subplot #1: (0,0) out of 16... done in 0.015459 seconds.
generating subplot #2: (0,1) out of 16... done in 0.726057 seconds.
generating subplot #3: (0,2) out of 16... done in 0.723032 seconds.
generating subplot #4: (0,3) out of 16... done in 0.701156 seconds.
generating subplot #5: (1,0) out of 16... done in 0.903581 seconds.
generating subplot #6: (1,1) out of 16... done in 0.015459 seconds.
generating subplot #7: (1,2) out of 16... done in 0.745971 seconds.
generating subplot #8: (1,3) out of 16... done in 0.719109 seconds.
generating subplot #9: (2,0) out of 16... done in 0.912557 seconds.
generating subplot #10: (2,1) out of 16... done in 0.900592 seconds.
generating subplot #11: (2,2) out of 16... done in 0.015459 seconds.
generating subplot #12: (2,3) out of 16... done in 0.739986 seconds.
generating subplot #13: (3,0) out of 16... done in 0.8717 seconds.
generating subplot #14: (3,1) out of 16... done in 0.866647 seconds.
generating subplot #15: (3,2) out of 16... do

To show the triangle again, you can call the `show()` method. To see what input options are available with `hide()` or `show()`, get help like the following,

In [47]:
pmpd.sampleList[0].plot.grid.helpme("hide")



        Hides the requested part of the grid plot.

        **Parameters**

            part

                A string with the following possible values:

                    "lower"
                        hides the lower triangle of the grid plot.

                    "upper"
                        hides the upper triangle of the grid plot.

                    "diag"
                        hides the diagonal of the grid plot.

                    "all"
                        hides all grid plots and the colorbar.

                    "colorbar"
                        hides the colorbar of the grid plot.

                    The string can also be a mix of the above keywords,
                    separated by the ``+`` sign or some other delimiter.
                    For example, ``"lower+upper+colorbar"``

        **Returns**

            None

        


### Ensuring the independent and  identical distribution (i.i.d. property) of the resulting sample

We can more formally ensure the i.i.d. property of the resulting sample from the simulation by computing and visualizing the autocorrelation of the sampled points. Whenever a ParaDRAM output sample file is read, the sampler automatically computes the autocorrelation of sampled points along each dimension of the domain of the objective function.

In [48]:
pmpd.sampleList[0].stats.autocorr()
pmpd.sampleList[0].stats.autocorr.plot.line()

ParaDRAM - NOTE: adding the autocrrelation graphics tools... 
ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: making the line plot... 

<IPython.core.display.Javascript object>

done in 0.311167 seconds.


The above AutoCorrelation plot is reassuring since the sampled points do not appear to be correlated with each other at all. This is because the ParaDRAM routine, by default, applies many rounds aggressive refinement of the raw (verbose) Markov chain as it deems necessary to remove any residual correlations from the final output refined sample that we have plotted in the above.  

We can compare the autocorrelation of the refined sample in the above with the autocorrelation of the raw (verbose) Markov chain,  

In [49]:
pmpd.readMarkovChain()


ParaDRAM - NOTE: 1 files detected matching the pattern: "./sampling_multivariate_normal_density_function_via_paradram/mvn_serial*_chain.txt"


ParaDRAM - NOTE: processing chain file: D:\Dropbox\Projects\20180101_ParaMonte\fb\paramontex\Python\Jupyter\sampling_multivariate_normal_density_function_via_paradram\mvn_serial_process_1_chain.txt
ParaDRAM - NOTE: reading the file contents... done in 3.905302 seconds.
ParaDRAM - NOTE: ndim = 4, count = 104075
ParaDRAM - NOTE: parsing file contents... 
ParaDRAM - NOTE: computing the sample correlation matrix... 
ParaDRAM - NOTE: adding the correlation graphics tools... 
ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.012997 seconds.
ParaDRAM - NOTE: computing the sample covariance matrix... 
ParaDRAM - NOTE: adding the covariance graphics tools... 
ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.013964 seconds.
ParaDRAM - NOTE: computing the sample autocorrelations... 
ParaDRAM - NOTE: adding 

In [50]:
pmpd.markovChainList[0].stats.autocorr()
pmpd.markovChainList[0].stats.autocorr.plot.line()

ParaDRAM - NOTE: adding the autocrrelation graphics tools... 
ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: making the line plot... 

<IPython.core.display.Javascript object>

done in 0.699096 seconds.


**Compare the above plot to the same plot that we got for the resulting refined samples earlier in the above**. Unlike the refined sample, the Markov chain is significantly correlated with itself along each dimension. The large amount of autocorrelation seen for `SampleLogFunc` is because of the fact that we started the MCMC sampling from a very bad low-probability location, which is also visible the grid plots in the above.

### Obtaining the summary statistics of the simulation results  

To get the statistics of the maximum of the function, try,

In [56]:
print( "maxLogFunc: {}".format(pmpd.sampleList[0].stats.maxLogFunc.value) )
print( "The location of maxLogFunc: {}".format(pmpd.sampleList[0].stats.maxLogFunc.state.values) )

maxLogFunc: -2.5751582
The location of maxLogFunc: [-10.060528    14.949581    19.994062     0.04765628]


which is again reassuring, since we already know that the maximum of the standard Gaussian distribution happens at zero, which is very close to the ParaDRAM sampler's estimated location of `maxLogFunc` in the above.

Further statistics of the resulting sample and the raw verbose Markov chain can be computed from the dataFrame. For example, via,  

In [57]:
pmpd.sampleList[0].df.describe()

Unnamed: 0,SampleLogFunc,SampleVariable1,SampleVariable2,SampleVariable3,SampleVariable4
count,7504.0,7504.0,7504.0,7504.0,7504.0
mean,-4.573896,-10.025751,14.98144,20.012209,0.016573
std,1.417971,0.99576,1.002555,1.003504,1.007613
min,-13.43098,-13.575107,11.456387,16.745422,-3.664305
25%,-5.259487,-10.695753,14.299378,19.309565,-0.666352
50%,-4.234739,-10.02113,14.98697,20.004113,0.019996
75%,-3.532682,-9.340668,15.662927,20.690255,0.689358
max,-2.575158,-6.350774,18.59221,23.749184,3.549447


Some statistics are also automatically computed and reported in the output report file of every ParaDRAM simulation (which ends with the suffix `*_report.txt`). One can either directly inspect the contents of this file, or use the `readReport()` function of the ParaDRAM sampler object, like,  

In [58]:
pmpd.readReport()


ParaDRAM - NOTE: 1 files detected matching the pattern: "./sampling_multivariate_normal_density_function_via_paradram/mvn_serial*_report.txt"


ParaDRAM - NOTE: processing report file: D:\Dropbox\Projects\20180101_ParaMonte\fb\paramontex\Python\Jupyter\sampling_multivariate_normal_density_function_via_paradram\mvn_serial_process_1_report.txt
ParaDRAM - NOTE: reading the file contents... ParaDRAM - NOTE: parsing the report file contents...
done in 0.007981 seconds.

ParaDRAM - NOTE: The processed report files are now stored in the newly-created
ParaDRAM - NOTE: component `reportList` of the ParaDRAM object as a Python list.
ParaDRAM - NOTE: For example, to access the entire contents of the first (or the only) report file, try:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     pmpd.reportList[0].contents.print()
ParaDRAM - NOTE: 
ParaDRAM - NOTE: where you will have to replace `pmpd` with your ParaDRAM instance name.
ParaDRAM - NOTE: To access the simulation statistics and information, examine the

In [59]:
print(pmpd.reportList[0].stats.chain.refined.ess.value)
print(pmpd.reportList[0].stats.chain.refined.ess.description)

7504
This is the estimated Effective (decorrelated) Sample Size (ESS) of the final refined chain.


### Monitoring and ensuring the diminishing adaptation criterion of the 

Now, to see the effects of setting the starting point of the MCMC sampler, we will take a look at the full chain (in compact form) of **uniquely sampled points** form the objective function,

In [60]:
pmpd.readChain()


ParaDRAM - NOTE: 1 files detected matching the pattern: "./sampling_multivariate_normal_density_function_via_paradram/mvn_serial*_chain.txt"


ParaDRAM - NOTE: processing chain file: D:\Dropbox\Projects\20180101_ParaMonte\fb\paramontex\Python\Jupyter\sampling_multivariate_normal_density_function_via_paradram\mvn_serial_process_1_chain.txt
ParaDRAM - NOTE: reading the file contents... done in 0.044914 seconds.
ParaDRAM - NOTE: ndim = 4, count = 30000
ParaDRAM - NOTE: parsing file contents... 
ParaDRAM - NOTE: computing the sample correlation matrix... 
ParaDRAM - NOTE: adding the correlation graphics tools... 
ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.011972 seconds.
ParaDRAM - NOTE: computing the sample covariance matrix... 
ParaDRAM - NOTE: adding the covariance graphics tools... 
ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.013958 seconds.
ParaDRAM - NOTE: computing the sample autocorrelations... 
ParaDRAM - NOTE: adding t

Just as with the refined sample, we can also make the same types of plots for the compact and verbose (Markov) chains, for example, a grid plot,  

In [61]:
pmpd.chainList[0].df.head()

Unnamed: 0,ProcessID,DelayedRejectionStage,MeanAcceptanceRate,AdaptationMeasure,BurninLocation,SampleWeight,SampleLogFunc,SampleVariable1,SampleVariable2,SampleVariable3,SampleVariable4
0,1,0,1.0,0.0,1,1,-329.22317,0.0,0.0,0.0,0.0
1,1,0,0.666667,0.0,1,2,-319.38776,-1.131467,-1.01147,2.440997,-1.638004
2,1,0,0.6,0.0,2,2,-292.87086,-2.70869,-1.74643,2.887417,0.428331
3,1,0,0.571429,0.0,3,2,-290.73338,-4.913328,-2.901997,2.351047,-0.017945
4,1,0,0.625,0.0,4,1,-246.42733,-6.31188,-1.069209,3.394714,0.746134


In [62]:
pmpd.chainList[0].plot.grid(columns = [7,8,9,10]) # plot SampleVariable1 to SampleVariable4

ParaDRAM - NOTE: making the grid plot... 


<IPython.core.display.Javascript object>

generating subplot #1: (0,0) out of 16... done in 0.021701 seconds.
generating subplot #2: (0,1) out of 16... done in 0.831742 seconds.
generating subplot #3: (0,2) out of 16... done in 0.864684 seconds.
generating subplot #4: (0,3) out of 16... done in 0.789919 seconds.
generating subplot #5: (1,0) out of 16... done in 0.877617 seconds.
generating subplot #6: (1,1) out of 16... done in 0.021701 seconds.
generating subplot #7: (1,2) out of 16... done in 0.798862 seconds.
generating subplot #8: (1,3) out of 16... done in 0.794872 seconds.
generating subplot #9: (2,0) out of 16... done in 0.862692 seconds.
generating subplot #10: (2,1) out of 16... done in 0.852718 seconds.
generating subplot #11: (2,2) out of 16... done in 0.021701 seconds.
generating subplot #12: (2,3) out of 16... done in 0.750988 seconds.
generating subplot #13: (3,0) out of 16... done in 0.870703 seconds.
generating subplot #14: (3,1) out of 16... done in 0.873663 seconds.
generating subplot #15: (3,2) out of 16... 

We can also add target values to this grid plot via the `addTarget()` method of the `GridPlot` object. By default, there are a few types target that are currently available out of the box, like `"mode"`, `"mean"`, `"median"`. The `"mode"` option corresponds to the state where the maximum of the objective function occurs. This is the default `value` if no `value` is provided to addTarget.

In [63]:
pmpd.chainList[0].plot.grid(columns = [8,7,10,9]) # plot in random order, just for illustration.
pmpd.chainList[0].plot.grid.addTarget()

ParaDRAM - NOTE: making the grid plot... 


<IPython.core.display.Javascript object>

generating subplot #1: (0,0) out of 16... done in 0.020204 seconds.
generating subplot #2: (0,1) out of 16... done in 0.811794 seconds.
generating subplot #3: (0,2) out of 16... done in 0.827818 seconds.
generating subplot #4: (0,3) out of 16... done in 0.774894 seconds.
generating subplot #5: (1,0) out of 16... done in 0.877682 seconds.
generating subplot #6: (1,1) out of 16... done in 0.020204 seconds.
generating subplot #7: (1,2) out of 16... done in 0.80182 seconds.
generating subplot #8: (1,3) out of 16... done in 0.781938 seconds.
generating subplot #9: (2,0) out of 16... done in 0.938488 seconds.
generating subplot #10: (2,1) out of 16... done in 0.942478 seconds.
generating subplot #11: (2,2) out of 16... done in 0.020204 seconds.
generating subplot #12: (2,3) out of 16... done in 0.826788 seconds.
generating subplot #13: (3,0) out of 16... done in 0.91851 seconds.
generating subplot #14: (3,1) out of 16... done in 0.954446 seconds.
generating subplot #15: (3,2) out of 16... do

The color in the scatter plots represents the value of `logFunc` at each point. The blue color of the density plots represents the density of the sampled points at any given location.  

 We can also add targets to these plots, representing a specific value of interest. For example, we may be interested in displaying the location of the maximum `logFunc` on these plots. This is the default value that is loaded on the targets when the `grid` object is created for the first time.

### Ensuring diminishing adaptation in the adaptive MCMC sampling

By default, the ParaDRAM sampler adapts the proposal distribution of the sampler throughout the entire simulation. This breaks the ergodicity condition of the Markov chain, however, **if** the adaptation tends to continuously and progressively diminish throughout the entire simulation, then we could potentially trust the Markov chain simulation results. The alternative is to simply turn off the adaptation, but that is rarely a good strategy. Instead, **a better solution is to keep the adaptivity on, however, simulate for a long-enough times to ensure convergence of the Markov chain to the target density has occurred**.  

**A unique feature of the ParaDRAM sampler** is that, it provides a measure of the amount of adaptation of the proposal distribution of the Markov chain by measuring how much the proposal distribution progressively changes throughout the simulation. In essence, the computed quantity, represented by the column **AdaptationMeasure** in the output chain files, represents an upper limit on the **total variation distance** between each updated proposal distribution and the last one used before it.  

We can plot the **AdaptationMeasure** for this sampling problem to ensure that the adaptation of the proposal distribution happens progressively less and less throughout the simulation. **If AdaptationMeasure does not diminish monotonically throughout the simulation, it would be a strong indication of the lack of convergence of the MCMC sampler to the target objective function**.

In [64]:
pmpd.markovChainList[0].plot.scatter(ycolumns="AdaptationMeasure",ccolumns=None)
pmpd.markovChainList[0].plot.scatter.currentFig.axes.set_yscale("log")
pmpd.markovChainList[0].plot.scatter.currentFig.axes.set_ylim([1.e-5,1])

ParaDRAM - NOTE: making the scatter plot... 

<IPython.core.display.Javascript object>

done in 0.304213 seconds.


(1e-05, 1)

**The above curve is exactly the kind of diminishing adaptation that we would want to see in a ParaDRAM simulation**: At the beginning, the sampler appears to struggle for a while to find the shape of the target objective function, but around step 100, it appears to have found the peak of the target and starts to quickly converge and adapt the shape of the proposal distribution of the Markov Chain sampler to the shape of the objective function.

### Visualizing the adaptation of the proposal distribution of the ParaDRAM sampler  

We can also parse the contents of the **ASCII**-format restart file generated by the ParaDRAM sampler to gain insight into how the proposal distribution has been updated over the course of the simulation.  

In [65]:
pmpd.readRestart()


ParaDRAM - NOTE: 1 files detected matching the pattern: "./sampling_multivariate_normal_density_function_via_paradram/mvn_serial*_restart.txt"


ParaDRAM - NOTE: processing restart file: D:\Dropbox\Projects\20180101_ParaMonte\fb\paramontex\Python\Jupyter\sampling_multivariate_normal_density_function_via_paradram\mvn_serial_process_1_restart.txt
ParaDRAM - NOTE: reading the file contents... .
done in 0.710959 seconds.
ParaDRAM - NOTE: adding the graphics tools... 
ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.002023 seconds.
ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a covmat2 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a covmat3 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a cormat2 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE

In [66]:
pmpd.restartList[0].plot.cormat2.reset()
pmpd.restartList[0].plot.cormat2()

pmpd.restartList[0].plot.cormat2.reset()
pmpd.restartList[0].plot.cormat2(dimensionPair = [2,3])

ParaDRAM - NOTE: creating a cormat2 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: making the cormat2 plot... 

<IPython.core.display.Javascript object>

done in 0.381012 seconds.
ParaDRAM - NOTE: creating a cormat2 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: making the cormat2 plot... 

<IPython.core.display.Javascript object>

done in 0.298239 seconds.


In [67]:
pmpd.restartList[0].plot.covmat2.reset()
pmpd.restartList[0].plot.covmat2()

pmpd.restartList[0].plot.covmat2.reset()
pmpd.restartList[0].plot.covmat2(dimensionPair = [2,3])

ParaDRAM - NOTE: creating a covmat2 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: making the covmat2 plot... 

<IPython.core.display.Javascript object>

done in 0.316187 seconds.
ParaDRAM - NOTE: creating a covmat2 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: making the covmat2 plot... 

<IPython.core.display.Javascript object>

done in 0.318182 seconds.


In [68]:
pmpd.restartList[0].plot.covmat3.reset()
pmpd.restartList[0].plot.covmat3()

pmpd.restartList[0].plot.covmat3.reset()
pmpd.restartList[0].plot.covmat3(dimensionPair = [2,3])

ParaDRAM - NOTE: creating a covmat3 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: making the covmat3 plot... 

<IPython.core.display.Javascript object>

done in 0.268315 seconds.
ParaDRAM - NOTE: creating a covmat3 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: making the covmat3 plot... 

<IPython.core.display.Javascript object>

done in 0.274266 seconds.


As seen in the above plots, the proposal adaptations gradually diminish toward the end of the simulation, and the shape of the proposal distribution stabilizes relatively fast.

### Visualizing the covariance and the correlation matrices of the objective function

To construct and visualize the correlation matrix of the sampled points in the chain, we can try, 

In [69]:
pmpd.sampleList[0].stats.cormat.plot.heatmap()

ParaDRAM - NOTE: making the heatmap plot... 

<IPython.core.display.Javascript object>

By default, the covariance and correlation matrices are precmputed at the time of reading the data. In particular, the correlation matrix is computed via the **Pearson's method of computing the correlation**. We can easily recompute the correlation matrix using either **Spearman's** or **Kendal's** rank correlation coefficients, like,  

In [70]:
pmpd.sampleList[0].stats.cormat(method="spearman",reself=True).plot.heatmap()

ParaDRAM - NOTE: adding the correlation graphics tools... 
ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.014976 seconds.
ParaDRAM - NOTE: making the heatmap plot... 

<IPython.core.display.Javascript object>

The input argument `reself` requests the function to **re**turn an instance of the **self** object as the function output. To visualize the covariance matrix we can try,  

In [71]:
pmpd.sampleList[0].stats.covmat.plot.heatmap()

ParaDRAM - NOTE: making the heatmap plot... 

<IPython.core.display.Javascript object>

Note that the same plots can be also made for the compact and verbose (Markov) chains. For brevity, we do not show them here.  

>**There are many more functionalities and features of the ParaMonte library that were neither explored nor mentioned in this example Jupyter notebook. You can explore them by checking the existing components of each attribute of the ParaDRAM sampler class and by visiting the [ParaMonte library's documentation website](http://cdslab.org/paramonte/)**.