<a href="https://colab.research.google.com/github/DrSubbiah/1.Bayesian-Inference/blob/master/11_Interpretation_Normal_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <font color="darkblue">Numeric / Metric Data Model

- Model building from Null model to a possible parsimonious model

- Response variable is numeric and Normal model is assumed

- Data set: **bweight**

## <font color="darkgreen">Variables and their meaning</font>

1. id:	      identity number

1. matage:	  maternal age (years)

1. ht:	      hypertension (1=yes,	0=no)

1. gestwks:	  gestational age (weeks)

1. sex:	      sex of the baby

1. bweight:	  birthweight(g)

1. matagegp:  maternal age into four groups (<30, 30-34, 35-39, 40+)

1. gestcat:	  gestwks into two groups (<37, >=37)

## <font color="darkgreen">Observations

- *matage* is numerical whereas *matagegp* categorical

- *gestwks* is numerical and *gestcat* is categorical

- *ht, sex* are dichotomous

## <font color="darkgreen">Practical Significance

- <font color="red">cited only for the impact of the problem and to indicate the data impact and white box modeling. No authenticity or accuracy of the results or anaysis methods are endorsed or recommended</font>

[Practical 1](https://pubmed.ncbi.nlm.nih.gov/32866126/)

[Practical 2](https://pubmed.ncbi.nlm.nih.gov/28767987/)

[Global Nutrition Targets 2025](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjSy9mFjfb0AhUHILcAHcswBesQFnoECAsQAQ&url=https%3A%2F%2Fapps.who.int%2Firis%2Frest%2Fbitstreams%2F665595%2Fretrieve&usg=AOvVaw1ZHSLlF_5nhL30e2oxEFbx)

[Indian 1](https://pubmed.ncbi.nlm.nih.gov/33432318/)

[Indian 2](https://epag.springeropen.com/articles/10.1186/s43054-020-00040-0)

[Indian 3](https://www.sciencedirect.com/science/article/pii/S221339842030230X)

[Indian 4](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0244562)

## <font color="darkgreen"> Possible Questions Driving Analysis

- Behaviour of the variables

- Gender difference

- Cause of Maternal Age

- Gestational week association

- Effect of Hypertension

In [None]:
import numpy as np
import pandas as pd
import statistics as stat
import scipy
import pystan

In [None]:
#For plots
import arviz as az
import matplotlib.pyplot as plt

#<font color="darkblue"> Loading Data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
path = "/content/drive/MyDrive/Data Sets/BirthWt.csv"
brtwt_da = pd.read_csv(path)

In [None]:
brtwt_da.info()

In [None]:
print(brtwt_da.dtypes)

In [None]:
print(np.mean(brtwt_da['bweight']),np.std(brtwt_da['bweight']))

3129.13728549142 652.2732622220477


# <font color="darkblue">Role of Constant only Model (Normal / Linear)

- Estimates the sample mean of the response variable

- Estimated the sample SD of the response variable

- Estimated value is independent of the predictors included in the data

- This is unconditional mean $E[Y]$

# <font color="darkblue"> Model 1. Normal Model without Predictor

## Response Variable: *bweight*

# STAN Code

In [None]:
brtwt_code1 = """
data {
    real a;
    real<lower=0> b;
    real<lower=0> g1;
    real<lower=0> g2;
    int<lower=0> n;
    real y[n];
}

parameters {
    real mu;
     real<lower=0> sig;
}

transformed parameters {
  real<lower=0> tau;
  tau=(1/sig)^2;
}

model {
      y ~ normal(mu, sig);
      mu ~ normal(a, b);
      tau ~ gamma(g1,g2);
}
"""
# posterior
posterior1 = pystan.StanModel(model_code=brtwt_code1)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_7724acc493ed1d41d8cb912abca05633 NOW.


# Input - data and values for prior parameters

In [None]:
brtwt_data = {
             'n': len(brtwt_da),
             'y': brtwt_da['bweight'],
             'a':3000,
             'b':10,
             'g1':3,
             'g2':1,
            }
print(brtwt_data)

#Model Fitting - Sampling

In [None]:
fit_model1= posterior1.sampling(data=brtwt_data,
                  iter=10000,
                  chains=4,
                  seed=1,
                  warmup=3000,
                  thin=1,
                  control={"max_treedepth":15,"adapt_delta" : 0.9999})

# <font color="darkorange"> Condensed Summary Report

In [None]:
summ_mod1=az.summary(fit_model1,round_to=3,hdi_prob=0.95)
summ_mod1

Unnamed: 0,mean,sd,hdi_2.5%,hdi_97.5%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
mu,3016.576,9.324,2998.758,3035.099,0.076,0.054,14962.766,13704.054,1.0
sig,661.426,18.56,624.901,696.917,0.148,0.105,15790.633,14083.477,1.0
tau,0.0,0.0,0.0,0.0,0.0,0.0,15790.633,14083.477,1.0


# <font color="darkorange">More From Posterior Draws

Bayesian methods provide a scope to estimate probabilities associated with the QoI

For example, in the birth weight example,if the interest is to know the probability of birth weight vary between 2500 and 2999 grams (in other words, less than 3000g)

That is

1. $Pr[2500 < (\theta|X) < 2999]$

1. $Pr[(\theta|X) < 3000]$


Samples collected from the MCMC chains can be used to estimate the above (and similar) quantities



In [None]:
# Samples are extracted
sample_extr=fit_model1.extract(pars=["mu","sig"], permuted = True)#, inc_warmup = FALSE,  include = TRUE)
sample_extr=pd.DataFrame(sample_extr)
sample_extr

Unnamed: 0,mu,sig
0,3021.554892,651.759261
1,3017.116178,645.353492
2,3006.295682,646.262231
3,3009.509222,658.226200
4,3034.910322,677.983490
...,...,...
27995,3015.341281,654.168391
27996,3003.863133,635.465051
27997,3035.567182,681.273987
27998,3025.677064,661.957567


In [None]:
#Finding mean from the extracted samples
np.mean(sample_extr['mu'])

3016.576090315453

In [None]:
L=2500

U=2999

rand_gen_mu=sample_extr['mu']

prob_est=np.count_nonzero((rand_gen_mu>L) & (rand_gen_mu < U))

round(prob_est/len(sample_extr),4)

#np.count_nonzero(rand_gen_mu < U)/len(sample_extr)

INFO:numexpr.utils:NumExpr defaulting to 2 threads.


0.0283

# <font color="darkblue"> Model 2. Normal Model with One Predictor

## Mathematical model

$$Y\sim\mathrm{Normal}(\mu,\sigma^2)$$

$$\mu=b_0+b_1X_1$$

$$b_0\sim\mathrm{Normal}(a_i,b_i)$$

$$b_1\sim\mathrm{Normal}(a_{p1},b_{p1})$$

$$\sigma^2\sim\mathrm{Inverse Gamma(g_1,g_2)}$$

## <font color="Green"> More about the symbols

**Data: Observed information**

- $Y:$  Response variable

- $X:$ Predictor variable

**Parameters: Model Estimates**

- $\mu:$ Population mean, QoI

- $\sigma^2:$ Population variance, QoI (partially)

  - Reparameterized $\tau^2 = \frac{1}{\sigma^2}$, precision

**Data: Values supplied by the modeler**

- $a:$ Mean parameters of priors on regression weights

- $b:$ Variance parameters of priors on regression weights

  - Suffix indicates the respective weights (intercept or predictor)

  - In STAN, Normal distribution has standard deviation as the argument (input), not variance. Symbols in the code are meant in that way and constants are supplied accordingly

- $g:$ Parameters of Gamma prior on $\tau^2 = \frac{1}{\sigma^2}$


<font color="red">**The problem of interest is to estimate $E[Y|X]$**

In [None]:
brtwt_code2 = """
data {
    real a_i;
    real<lower=0> b_i;
    real a_p1;
    real<lower=0> b_p1;
    real<lower=0> g1;
    real<lower=0> g2;
    int<lower=0> n;
    real y[n];
    vector[n] x;
}

parameters {
    real b0;
    real b1;
    real<lower=0> sig;
}

transformed parameters {
  vector[n] mu;
  real<lower=0> tau;

  mu=b0+b1*x;
  tau=(1/sig)^2;
}

model {
      y ~ normal(mu, sig);
      b0 ~ normal(a_i, b_i);
      b1 ~ normal(a_p1, b_p1);
      tau ~ gamma(g1,g2);
}
"""
# posterior
posterior2 = pystan.StanModel(model_code=brtwt_code2)

In [None]:
brtwt_data2 = {
             'n': len(brtwt_da),
             'x': brtwt_da['gestwks'],
             'y': brtwt_da['bweight'],
             'a_i':0,
             'b_i':10,
             'a_p1':3000,
             'b_p1':10,
             'g1':3,
             'g2':1,
            }
print(brtwt_data2)

In [None]:
fit_model2= posterior2.sampling(data=brtwt_data2,
                  iter=10000,
                  chains=4,
                  seed=1,
                  warmup=3000,
                  thin=1,
                  control={"max_treedepth":15,"adapt_delta" : 0.9999})

In [None]:
az.plot_trace(fit_model2,var_names=['~mu'], compact=False,legend=True)
plt.show()

In [None]:
# QoI "b0"
az.plot_dist(fit_model2['b0'],quantiles=[.25, .5, .75],kind="hist",figsize=(20, 6))
plt.show()

In [None]:
#QoI "theta=b1"
az.plot_dist(fit_model2['b1'],quantiles=[0.25, 0.5, 0.75],kind="kde",figsize=(20, 6))
plt.show()

# <font color="darkorange"> Condensed Summary Report

In [None]:
summ_mod2=az.summary(fit_model2,var_names=['~mu','~tau'],round_to=3,hdi_prob=0.95)
summ_mod2

Unnamed: 0,mean,sd,hdi_2.5%,hdi_97.5%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
b0,-0.598,10.036,-19.65,19.201,0.07,0.064,20788.92,17589.84,1.001
b1,2977.883,10.013,2958.776,2998.167,0.07,0.049,20476.369,17436.925,1.0
sig,112078.153,3125.144,106059.883,118247.357,21.55,15.262,21089.202,17541.072,1.0


# <font color="darkblue"> Meaning of Estimated Parameters


## <font color="darkred">Model equation is </font>

bweight = -0.598 + 2977.883 * gestwk

## <font color="darkred">Interpretation</font>

<font color="darkgreen">Coefficient of "gestwk":

one week increase in gestwk will increase the mean "bweight" by 2977.883 g

<font color="blue">Plus sign indicates increment</font>

<font color="darkgreen">Constant / Intercept:</font>

Initial week of gestation or when no gestation bweight is -0.598

# <font color="darkblue"> Possible Impracticality in Estimated Parameters

Above estimates seem very impractical to consider

More than one remedies or alternative modeling scope

- Change prior for $b_0$ and $b_1$

- Scale the numeric variable *gestwk* so that meaning of intercept would be more meaningful

# Prior for Intercept is $\mathrm{Normal}(0,1)$ (why)

# Prior for predictor *gestwk* is $\mathrm{Normal}(100,10)$ (why)

In [None]:
brtwt_data2A = {
             'n': len(brtwt_da),
             'x': brtwt_da['gestwks'],
             'y': brtwt_da['bweight'],
             'a_i':0,
             'b_i':1,
             'a_p1':100,
             'b_p1':10,
             'g1':3,
             'g2':1,
            }
print(brtwt_data2A)

In [None]:
fit_model2A= posterior2.sampling(data=brtwt_data2A,
                  iter=10000,
                  chains=4,
                  seed=1,
                  warmup=3000,
                  thin=1,
                  control={"max_treedepth":15,"adapt_delta" : 0.9999})

In [None]:
summ_mod2A=az.summary(fit_model2A,var_names=['~mu','~tau'],round_to=3,hdi_prob=0.95)
summ_mod2A

Unnamed: 0,mean,sd,hdi_2.5%,hdi_97.5%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
b0,-0.044,1.006,-2.046,1.886,0.007,0.007,22395.997,16209.208,1.001
b1,81.389,0.537,80.35,82.461,0.004,0.003,22461.804,17468.93,1.0
sig,528.405,14.696,500.425,557.737,0.101,0.072,21127.258,16336.094,1.0


# <font color="darkblue"> Centering a Numeric Predictor

In the case of a numeric predictor, it is usually preferred to scale or center the predictor that has some advantage in modeling process

Centering a variable is helpful in interpreting the estimated weights ($\beta s$) in a more reasonable way.

Another advantage is in some cases, initial vlaue (at zero) may not be a plausible value

In the Birth Weight example, birth weight in the initial time of gestation could be less or not a meaningful one; so, it may not be appropriate to interpret the constant in the model in the usual way of response value when all predictors are assumed to be at same value (mostly zero) or kept constant

The centering process is explained in the following description for a numeric predictor $X$

- Consider the sample mean $\bar X$ of $X$

- Let $X_{centered}=X-\bar{X}$

- Use the new variable $X_{centered}$ in the model

- Resultant model equation is $$\hat{E[Y|X]}=\hat{\beta_0}+\hat{\beta_1}X_{centered}$$ $$=\hat{\beta_0}+\hat{\beta_1}(X-\bar X)$$

- Hence, interpretation of $\beta_0$ is now based on retaining $X$ as constant at $\bar X$



# <font color="darkblue"> Numeric variable Centered

In [None]:
brtwt_data2B = {
             'n': len(brtwt_da),
             'x': brtwt_da['gestwks']-np.mean(brtwt_da['gestwks']),
             'y': brtwt_da['bweight'],
             'a_i':0,
             'b_i':1,
             'a_p1':30,
             'b_p1':10,
             'g1':3,
             'g2':1,
            }
print(brtwt_data2B)

In [None]:
fit_model2B= posterior2.sampling(data=brtwt_data2B,
                  iter=10000,
                  chains=4,
                  seed=1,
                  warmup=3000,
                  thin=1,
                  control={"max_treedepth":15,"adapt_delta" : 0.9999})

In [None]:
summ_mod2B=az.summary(fit_model2B,var_names=['~mu','~tau'],round_to=3,hdi_prob=0.95)
summ_mod2B

# <font color="darkblue"> Normal Model with one Binary predictor

In this data set, *sex* is a binary variable


In [None]:
brtwt_da=brtwt_da.assign(sex_c=lambda x:x['sex'].apply(lambda y: 1 if y=="male" else 0))
brtwt_da['sex_c'] = brtwt_da['sex_c'].astype('category')

In [None]:
brtwt_code3 = """
data {
    real a_i;
    real<lower=0> b_i;
    real a_p1;
    real<lower=0> b_p1;
    real<lower=0> g1;
    real<lower=0> g2;
    int<lower=0> n;
    real y[n];
    vector[n] x;
}

parameters {
    real b0;
    real b1;
    real<lower=0> sig;
}

transformed parameters {
  vector[n] mu;
  real<lower=0> tau;
  mu=b0+b1*x;
  tau=(1/sig)^2;
}

model {
      y ~ normal(mu, sig);
      b0 ~ normal(a_i, b_i);
      b1 ~ normal(a_p1, b_p1);
      tau ~ gamma(g1,g2);
}
"""
# posterior
posterior3 = pystan.StanModel(model_code=brtwt_code3)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_26ecf12753bb02faa64b0009a4a8e8ff NOW.


# <font color="darkblue"> A word about prior setting

- Intercept (constant, $b_0$) is the mean response value for the base or reference level

- Weight associated with the binary predictor is the difference in the mean of response between two levels

In [None]:
brtwt_data3 = {
             'n': len(brtwt_da),
             'x': brtwt_da['sex_c'],
             'y': brtwt_da['bweight'],
             'a_i':3000,
             'b_i':10,
             'a_p1':100,
             'b_p1':10,
             'g1':3,
             'g2':1,
            }
print(brtwt_data3)

In [None]:
fit_model3= posterior3.sampling(data=brtwt_data3,
                  iter=10000,
                  chains=4,
                  seed=1,
                  warmup=3000,
                  thin=1,
                  control={"max_treedepth":15,"adapt_delta" : 0.9999})

In [None]:
summ_mod3=az.summary(fit_model3,var_names=['~mu','~tau'],round_to=3,hdi_prob=0.95)
summ_mod3

# <font color="darkblue">Inference From the Model

##Categorical (Binary) Predictors

**Model equation is **

bweight = 3009.887 + 107.418	* sexmale

Female is base (reference) level

**More Observations about the estimated values**

Mean bweight of female group: 3009.887

Mean bweight of male group: 3009.887 + 107.418 = 3117.305

<font color="darkgreen">In other words, estimated weight associated with *sex* denoted the difference in *bweight* of male compared to female

Difference:  3117.305 - 3009.887 = 107.418

$b_1$ measures the difference in mean of the response variable (bweight) between the two levels of categorical predictor

<font color="darkred"> The constant (intercept) $b_0$ is the response value of the response (bweight)


In [None]:
#Average bweight grouped by sex from the Sample

op1=brtwt_da.groupby('sex_c').agg({'bweight': ['mean']})

op2=brtwt_da.groupby('sex').agg({'bweight': ['mean']})

print(op1,op2)

In [None]:
#Four Age groups

brtwt_da.agg({'matagegp': ['max','min']})

# op2=brtwt_da.groupby('sex').agg({'bweight': ['mean']})

# print(op1,op2)

Unnamed: 0,gestcat
max,2
min,1


In [None]:
brtwt_da['matagegp'] = brtwt_da['matagegp'].astype('category')
brtwt_da.groupby('matagegp').agg({'bweight': ['mean']})

Unnamed: 0_level_0,bweight
Unnamed: 0_level_1,mean
matagegp,Unnamed: 1_level_2
1,3102.326087
2,3137.74502
3,3132.883721
4,3112.625


In [None]:
brtwt_da=pd.get_dummies(brtwt_da, columns=['matagegp'], prefix = ['matag'])
brtwt_da

In [None]:
brtwt_code4 = """
data {
    real a_i;
    real<lower=0> b_i;
    real a_p1;
    real<lower=0> b_p1;
    real a_p2;
    real<lower=0> b_p2;
    real a_p3;
    real<lower=0> b_p3;
    real<lower=0> g1;
    real<lower=0> g2;
    int<lower=0> n;
    real y[n];
    vector[n] x1;
    vector[n] x2;
    vector[n] x3;
}

parameters {
    real b0;
    real b1;
    real b2;
    real b3;
    real<lower=0> sig;
}

transformed parameters {
  vector[n] mu;
  real<lower=0> tau;
  mu=b0+b1*x1+b2*x2+b3*x3;
  tau=(1/sig)^2;
}

model {
      y ~ normal(mu, sig);
      b0 ~ normal(a_i, b_i);
      b1 ~ normal(a_p1, b_p1);
      b2 ~ normal(a_p2, b_p2);
      b3 ~ normal(a_p3, b_p3);
      tau ~ gamma(g1,g2);
}
"""
# posterior
posterior4 = pystan.StanModel(model_code=brtwt_code4)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_f853e5096ade87630b35883aa61e7823 NOW.


In [None]:
brtwt_data4 = {
             'n': len(brtwt_da),
             'x1': brtwt_da['matag_2'],
             'x2': brtwt_da['matag_3'],
             'x3': brtwt_da['matag_4'],
             'y': brtwt_da['bweight'],
             'a_i':3000,
             'b_i':20,
             'a_p1':100,
             'b_p1':10,
             'a_p2':100,
             'b_p2':10,
             'a_p3':100,
             'b_p3':10,
             'g1':3,
             'g2':1,
            }
print(brtwt_data4)

In [None]:
fit_model4= posterior4.sampling(data=brtwt_data4,
                  iter=10000,
                  chains=4,
                  seed=1,
                  warmup=3000,
                  thin=1,
                  control={"max_treedepth":15,"adapt_delta" : 0.9999})

In [None]:
summ_mod4=az.summary(fit_model4,var_names=['~mu','~tau'],round_to=3,hdi_prob=0.95)
summ_mod4

In [None]:
az.plot_trace(fit_model4,var_names=['~mu'], compact=False,legend=True)
plt.show()

# <font color="darkblue"> Another Model</font>

All predictors are included

<font color="darkgreen">Recall

1. id	      identity number
1. matage	  maternal age (years)
1. ht	      hypertension (1=yes,	0=no)
1. gestwks	  gestational age (weeks)
1. sex	      sex of the baby
1. bweight	  birthweight(g)
1. matagegp  maternal age into four groups (<30, 30-34, 35-39, 40+)
1. gestcat	  gestwks into two groups (<37, >=37)

- Either *matage* (numerical) or *matagegp* (categorical) can be used

- Either *gestwks* (numerical) or *gestcat* (categorical) can be used

- *ht, sex* are dichotomous

- Build a model with all the predictors with the above observations

$$\mathrm{bweight}=\beta_0+\beta_1*matage+\beta_2*ht+\beta_3*gestwks+\beta_4sex$$

- Treat the categorical variables before modeling

# <font color="darkorange">Treatment of Categorical variables

In the model *ht* and *sex* are dichotomous



In [None]:
brtwt_da['ht'] = brtwt_da['ht'].astype('category')
brtwt_da['sex'] = brtwt_da['sex'].astype('category')

In [None]:
brtwt_da.boxplot(column = 'bweight',by = 'sex')
plt.show()

In [None]:
brtwt_da=pd.get_dummies(brtwt_da, columns=['sex'], prefix = ['sex'])
brtwt_da=pd.get_dummies(brtwt_da, columns=['ht'], prefix = ['ht'])

In [None]:
brtwt_da.head()

Unnamed: 0,id,matage,gestwks,bweight,matagegp,gestcat,sex_female,sex_male,ht_no,ht_yes
0,1,33,37.740002,2410,2,2,1,0,1,0
1,2,34,39.150002,2977,2,2,1,0,1,0
2,3,34,35.720001,2100,2,1,1,0,1,0
3,4,30,39.290001,3270,2,2,0,1,1,0
4,5,35,38.380001,2620,3,2,1,0,1,0


In [None]:
brtwt_code5 = """
data {
    real a_i;
    real<lower=0> b_i;
    real a_p1;
    real<lower=0> b_p1;
    real a_p2;
    real<lower=0> b_p2;
    real a_p3;
    real<lower=0> b_p3;
    real a_p4;
    real<lower=0> b_p4;
    real<lower=0> g1;
    real<lower=0> g2;
    int<lower=0> n;
    real y[n];
    vector[n] x1;
    vector[n] x2;
    vector[n] x3;
    vector[n] x4;
}

parameters {
    real b0;
    real b1;
    real b2;
    real b3;
    real b4;
    real<lower=0> sig;
}

transformed parameters {
  vector[n] mu;
  real<lower=0> tau;
  mu=b0+b1*x1+b2*x2+b3*x3+b4*x4;
  tau=(1/sig)^2;
}

model {
      y ~ normal(mu, sig);
      b0 ~ normal(a_i, b_i);
      b1 ~ normal(a_p1, b_p1);
      b2 ~ normal(a_p2, b_p2);
      b3 ~ normal(a_p3, b_p3);
      b4 ~ normal(a_p4, b_p4);
      tau ~ gamma(g1,g2);
}
"""
# posterior
posterior5 = pystan.StanModel(model_code=brtwt_code5)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_220abeb358130518201de4cb8461ccc2 NOW.


In [None]:
brtwt_data5 = {
             'n': len(brtwt_da),
             'x1': brtwt_da['matage'],
             'x2': brtwt_da['ht_yes'],
             'x3': brtwt_da['gestwks'],
             'x4': brtwt_da['sex_male'],
             'y': brtwt_da['bweight'],
             'a_i':3000,
             'b_i':20,
             'a_p1':100,
             'b_p1':10,
             'a_p2':100,
             'b_p2':10,
             'a_p3':100,
             'b_p3':10,
             'a_p4':100,
             'b_p4':10,
             'g1':3,
             'g2':1,
            }
print(brtwt_data5)

In [None]:
fit_model5= posterior5.sampling(data=brtwt_data5,
                  iter=10000,
                  chains=4,
                  seed=1,
                  warmup=3000,
                  thin=1,
                  control={"max_treedepth":15,"adapt_delta" : 0.9999})

In [None]:
summ_mod5=az.summary(fit_model5,var_names=['~mu','~tau'],round_to=3,hdi_prob=0.95)
summ_mod5

Unnamed: 0,mean,sd,hdi_2.5%,hdi_97.5%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
b0,2975.568,20.019,2936.017,3014.253,0.129,0.092,23915.74,18532.908,1.0
b1,-27.257,4.56,-36.518,-18.601,0.039,0.028,13550.637,15351.864,1.0
b2,88.792,9.837,69.5,108.029,0.062,0.044,25159.205,20036.615,1.0
b3,27.222,4.054,19.21,35.078,0.035,0.025,13650.003,15041.404,1.0
b4,100.048,9.784,81.428,119.84,0.062,0.044,25304.573,18835.627,1.0
sig,624.062,17.664,589.071,658.161,0.116,0.082,23466.353,17259.956,1.0


In [None]:
az.plot_trace(fit_model4,var_names=['~mu'], compact=False,legend=True)
plt.show()

#<font color="darkblue"> Final Thoughts - Normal Model

1. The QoI is a numeric then we would prefer to model with Normal distribution

1. A linear model can be explored with Normal priors for weights associated with predictors

1. A Gamma prior can be considered for precision associated with error model

1. **Outcome Analysis**

  1. Intercept / Constant: Mean of response when there is no predictor (Null model) or mean of response variable accounting constant values for the predictors

  1. Weights associated with Predictors

    - If the predictor is numeric, then corresponding $\beta$ estimates the rate of change in mean of response with the change in the predictor.

    - If the predictor is a dichotomous, $\beta$ estimates the difference in the mean response between base and other level

    - If the predictor is a polychotomous, $\beta$ estimates the difference in the mean response between base and other level

1. Bayesian Advantage

  - Posterior probabilities would help to assess the characteristics of $\beta$ related to a predictor

  - The probability  $Pr[-ϵ<\beta<ϵ]$ $(ϵ>0$ and $ϵ→0,$ a small positive number$)$ measures closeness of $\beta$ to zero. If this is negligible, it weighs the need and relevance of the corresponding predictor

  - User friendly Interepretability of weights and its uncertainty in terms of posterior probabiliteis