# Model Sensitivity Analysis
Maximizing the ELBO is a non-convex optimization problem. The parameters estimate are sensitive to the choice of their initial estimates. Hence, we further evaluate the chosen set of hyperparameters for 50 random initialization and then select the best model out of it. 

Stages of the Analysis
 + Python script for variational posterior computation: **model_sensitivity_fit.py**
 + Script to evaluate the model for 50 random initialization: **mem_model_sensitivity**
 + Analysis of the output based on in sample $LLPD$
 

#### Script to evaluate the model
We have saved the command for calling the python script for parameter estimation in the file **mem_model_sensitivity**.

A line in the file **mem_model_sensitivity** calls the python script **model_sensitivity_fit.py** for a given choice of the parameters. 

*module purge ; module load slurm gcc python3 ; omp_num_threads=1 python3 model_sensitivity_fit.py 100.0 50 0.219 0.06503 0.0 50 200 > logfile/50.log 2>&1*

#### Parameter estimation 
We run the script on server using the command:
*sbatch -N [#node] -p [#partition] disBatch.py -t [#task on each node] [script_file]*

Example: *sbatch -N 2 -p ccm disBatch.py -t 25 mem_model_sensitivity*



#### Model output analysis
Let us consider out model output is saved in the folder **MMSens**. We load each of the output file, compute the $LLPD$ on  full data and select the model with the largest LLPD. 


In [1]:
# load module 
import glob
import pickle
import numpy as np 
import pandas as pd

# Get file name 
folname = 'MMSens/'
fname_o = glob.glob(folname+'*model_nb_cvtest.pkl')
fname_x = []
for tem in fname_o:
    if tem.find('sample') < 0.:
        fname_x.append(tem)
fname_o = fname_x    
#fname_o

In [2]:
# Extract model output
out = np.empty((len(fname_o),6))
for i in range(0,len(fname_o)):
    if (i%10) ==0:
        print(i)
    [holdout_mask, llpd, n_test, l,m_seed,sp_mean,\
                 sp_var, h_prop, uid, nsample_o,\
                 Yte_fit, cv_test] = pickle.load(open(fname_o[i], "rb"))
    out[i] = [i, l, sp_mean,sp_var,  np.mean(cv_test), np.mean(Yte_fit)]
    

0
10
20
30
40


In [3]:
pickle.dump(out, open('best_model_selected.pkl','wb'))  # save output 
out = pickle.load(open('best_model_selected.pkl','rb'))
outx = pd.DataFrame(out)
outx.columns = ['index','rank','lambda', 'upsilon', 'llpd' ,'Log-likelihood']
outx.head(10)

Unnamed: 0,index,rank,lambda,upsilon,llpd,Log-likelihood
0,0.0,200.0,0.246,0.10063,-3.261093,-3.257339
1,1.0,200.0,0.246,0.10063,-3.261752,-3.26003
2,2.0,200.0,0.246,0.10063,-3.258982,-3.257405
3,3.0,200.0,0.246,0.10063,-3.263845,-3.261574
4,4.0,200.0,0.246,0.10063,-3.262965,-3.261557
5,5.0,200.0,0.246,0.10063,-3.263984,-3.26325
6,6.0,200.0,0.246,0.10063,-3.2629,-3.260509
7,7.0,200.0,0.246,0.10063,-3.265269,-3.263762
8,8.0,200.0,0.246,0.10063,-3.262861,-3.261591
9,9.0,200.0,0.246,0.10063,-3.264635,-3.262581


In [4]:
# Get the file name and model output from the best model 
best_setting = outx[outx.iloc[:,4] == outx.iloc[:,4].max()]
i = int(best_setting.loc[:,'index'])
fname_o[i]

'MMSens/66_model_nb_cvtest.pkl'

In [5]:
best_setting

Unnamed: 0,index,rank,lambda,upsilon,llpd,Log-likelihood
2,2.0,200.0,0.246,0.10063,-3.258982,-3.257405


<font color=blue>**Our analysis suggest that MEM with seed 66 is most appropriate with highest full data LLPD.** </font>