<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#The-classic-pearson-correlation" data-toc-modified-id="The-classic-pearson-correlation-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>The classic pearson correlation</a></span></li><li><span><a href="#Bayesian-inference" data-toc-modified-id="Bayesian-inference-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Bayesian inference</a></span></li><li><span><a href="#Steps-of-Bayesian-data-analysis" data-toc-modified-id="Steps-of-Bayesian-data-analysis-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Steps of Bayesian data analysis</a></span></li><li><span><a href="#Step-2---Define-the-descriptive-statistical-model" data-toc-modified-id="Step-2---Define-the-descriptive-statistical-model-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Step 2 - Define the descriptive statistical model</a></span></li><li><span><a href="#Define-the-descriptive-statistical-model-\begin{align*}
y_i-&amp;\sim-MultivariateNormal(\mu_i,-\Sigma)-
\\-\mu_i-&amp;\sim-Normal(0,1)
\\-\Sigma-&amp;=-\sigma-\cdot-\rho
\\-\sigma-&amp;\sim-Lognormal(0,1)
\\-\rho-&amp;\sim-LKJ(1)
\end{align*}" data-toc-modified-id="Define-the-descriptive-statistical-model-\begin{align*}
y_i-&amp;\sim-MultivariateNormal(\mu_i,-\Sigma)-
\\-\mu_i-&amp;\sim-Normal(0,1)
\\-\Sigma-&amp;=-\sigma-\cdot-\rho
\\-\sigma-&amp;\sim-Lognormal(0,1)
\\-\rho-&amp;\sim-LKJ(1)
\end{align*}-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Define the descriptive statistical model <div class="MathJax_Display" style="text-align: center;"></div><script type="math/tex; mode=display" id="MathJax-Element-25">\begin{align*}
y_i &\sim MultivariateNormal(\mu_i, \Sigma) 
\\ \mu_i &\sim Normal(0,1)
\\ \Sigma &= \sigma \cdot \rho
\\ \sigma &\sim Lognormal(0,1)
\\ \rho &\sim LKJ(1)
\end{align*}</script></a></span></li><li><span><a href="#Step-4--use-Bayes-rule" data-toc-modified-id="Step-4--use-Bayes-rule-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Step 4 -use Bayes rule</a></span></li><li><span><a href="#Stan-model-for-multipke-correlation-estimation" data-toc-modified-id="Stan-model-for-multipke-correlation-estimation-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Stan model for multipke correlation estimation</a></span></li></ul></div>

In [24]:
# Import analysis packages
%matplotlib inline
import pystan as ps
import numpy as np
import pandas as pd
import patsy as pt
import seaborn as sns
import arviz as az
import matplotlib.pyplot as plt
import scipy.stats as ss

# The classic pearson correlation 

# Bayesian inference
<font size = "3"> Following the quick description of the classic one sample t-test above its important to keep in mind that Bayesian analysis inference are all derived from the applciation of Bayes rule $P(\theta \mid y) = \large \frac{P(y \mid \theta) \, P(\theta)}{P(y)}$ and as such while the following description of the Bayesian model is an equivalent to pearspn correlation, it is fundamentally different, because it uses fully probabilistic modelling and the infernce is not based on sampling distributions</font>
    
<font size = "1"> For a fuller description see the Practicing Bayesian statistics markdown file within the Github repository.</font>

# Steps of Bayesian data analysis

<font size = "3"> Kruscke (2015) offers a step by step formulation for how to conduct a Bayesian analysis:

1. Identify the relevant data for question under investigation.

2. Define the descriptive (mathematical) model for the data.

3. Specify the Priors for the model. In the case of scientific research publication is the goal, as such the priors must be accepted by a skeptical audience. Much of this can be achieved using prior predcitve checks to acsetain os the priors are reasonable.

4. Using Bayes rule estimate the posterior for the parameters of the model using the likelihood and priors. Then interprete and the posterior

5. Conduct model checks. i.e. Posterior predcitive checks.</font> 

<font size = "1">This notebook will follow this approach generally.</font> 

In [49]:
url = 'https://raw.githubusercontent.com/ebrlab/Statistical-methods-for-research-workers-bayes-for-psychologists-and-neuroscientists/master/Data/Dawtry%20Sutton%20and%20Sibley%202015.csv'

df = pd.read_csv(url)
df.dropna(axis =0)

Unnamed: 0,PS,PD_15,PD_30,PD_45,PD_60,PD_75,PD_90,PD_105,PD_120,PD_135,...,redist3,redist4,Household_Income,Political_Preference,age,gender,Population_Inequality_Gini_Index,Population_Mean_Income,Social_Circle_Inequality_Gini_Index,Social_Circle_Mean_Income
0,233,27,48,21,0,0,0,0,0,0,...,6,1,,5,40,2,38.782938,29715,28.056738,21150
1,157,39,0,0,0,0,0,0,0,0,...,3,4,20,5,59,2,37.214511,123630,24.323388,65355
2,275,0,0,50,0,0,50,0,0,0,...,5,5,100,5,41,2,20.750000,60000,14.442577,107100
3,111,9,14,17,17,17,8,7,5,2,...,3,4,150,8,59,2,35.379580,59355,26.925900,86640
4,52,68,32,0,0,0,0,0,0,0,...,4,5,500,5,35,1,16.875000,15360,21.401055,56850
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
300,195,5,6,13,5,14,12,10,14,10,...,4,5,275000,4,25,1,29.711712,83250,12.583830,113175
301,101,7,13,19,23,14,11,4,3,3,...,4,4,350000,4,43,1,32.753267,57390,22.719525,139050
302,62,26,22,17,12,9,6,2,2,2,...,6,2,,5,66,1,42.527390,41895,8.557377,91500
303,227,66,20,14,0,0,0,0,0,0,...,3,3,,5,26,2,25.555178,17670,20.927580,24855


# Step 2 - Define the descriptive statistical model

# Define the descriptive statistical model \begin{align*}
y_i &\sim MultivariateNormal(\mu_i, \Sigma) 
\\ \mu_i &\sim Normal(0,1)
\\ \Sigma &= \sigma \cdot \rho
\\ \sigma &\sim Lognormal(0,1)
\\ \rho &\sim LKJ(1)
\end{align*} 



# Step 4 -use Bayes rule

# Stan model for multipke correlation estimation

In [22]:
Correlation_model = """
data{

int<lower = 0> N;
int K;
matrix[N,K] y;

}
transformed data {
matrix[N, K] y_std = y;

for (i in 1:K){
y_std[,i] = (y[,i] - mean(y[,i])) / sd(y[,i]);
}

}

parameters{

vector[K] mu;
vector<lower = 0>[K] sigma;

// Correlation matrix
corr_matrix[K] rho;

}

model{

// Covariance matrix
matrix[K,K] Z;
Z = quad_form_diag(rho,sigma);

//Priors
// Stan std_normal() is a more efficent implementation
mu ~ std_normal();
sigma ~ lognormal(0,1);

// Uniform prior for correlation parameters
rho ~ lkj_corr(1);

//Likelihood
for(i in 1:N){
y_std[i,] ~ multi_normal(mu, Z);
}

}
    
"""

In [23]:
sm = ps.StanModel(model_code=Correlation_model)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_9b16b85a03d777d0f7a4bd0fbd10ba3d NOW.


In [44]:
#df[["Household_Income", "Population_Mean_Income"]].as_matrix
x = np.asmatrix(pt.dmatrix("~ 1 + Household_Income + Population_Mean_Income" , data = df))

data = {'N': len(x),
        "K": x.shape[1],
       'y': x}
data

{'N': 305,
 'K': 113,
 'y': matrix([[1.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00, 0.0000e+00,
          2.9715e+04],
         [1.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00, 0.0000e+00,
          1.2363e+05],
         [1.0000e+00, 1.0000e+00, 0.0000e+00, ..., 0.0000e+00, 0.0000e+00,
          6.0000e+04],
         ...,
         [1.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00, 0.0000e+00,
          4.1895e+04],
         [1.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00, 0.0000e+00,
          1.7670e+04],
         [1.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00, 0.0000e+00,
          4.9695e+04]])}