# Applying Machine Learning to a Hepatitis C Egyptian Cohort Dataset for Predicting the Disease Stage - Experiment 5

### TOC:
## Two-class prediction - on set vs. severe (<font color="red">F2=F3=0, F1=1 and F4=4</font>)
* [AST < 74](#4first-bullet)
* [APRI >= 1](#second-bullet)
* [APRI >= 2](#third-bullet)
* [FIB-4 > 1.45](#fourth-bullet)
* [FIB-4 > 3.25](#fifth-bullet)
* [Stratified AST: above 148](there are none)

In [2]:
import import_ipynb
import warnings
warnings.filterwarnings("ignore")

In [3]:
from CommonUtilsHCV import *

importing Jupyter notebook from CommonUtilsHCV.ipynb


In [4]:
test_connection(conn="Connected")

Connected


#### Dataset downloaded from https://archive.ics.uci.edu/ml/machine-learning-databases/00503/

In [86]:
data = pd.read_excel(r"HCV_Egy_data_for_loading.xlsx")

In [87]:
print("number of observations in data:", " ", data.shape)

number of observations in data:   (1385, 29)


In [88]:
data["BHS"].value_counts(sort=0)

1    336
2    332
3    355
4    362
Name: BHS, dtype: int64

In [89]:
data_tmp1 = data.loc[data["BHS"] != 3]
data_tmp = data_tmp1.loc[data_tmp1["BHS"] != 2]

### Stratified AST - below 74;  ( <font color="red">F1 vs. F4; no F2 and F3</font>) 

In [9]:
data_tmp = data_tmp.loc[data_tmp["AST1"] < 74]

In [10]:
data_tmp["BHS"].value_counts(sort=0)

1    124
4    156
Name: BHS, dtype: int64

In [11]:
df_majority = data_tmp[data_tmp["BHS"]==4]
df_minority = data_tmp[data_tmp["BHS"]==1]

#Downsample majority class
df_majority_downsampled = resample(df_minority,
                                  replace=True,
                                  n_samples=156,
                                  random_state=123)

# concat the minority and majority downsampled dataframe
df_downsampled = pd.concat([df_majority, df_majority_downsampled])

# Display new class counts
df_downsampled.BHS.value_counts()

4    156
1    156
Name: BHS, dtype: int64

In [12]:
data_tmp = df_downsampled.copy()

In [13]:
data_tmp["BHS"].value_counts(sort=0)

1    156
4    156
Name: BHS, dtype: int64

In [None]:
# data["FIB4"] = round((data["Age"]*data["AST1"])/((data["Plat"]/1000)*np.sqrt(data["ALT1"])),2)
# data["APRI"] = round(((data["AST1"]/40)/data["Plat"])*100000,2)
# data["AST_ALT"] = round((data["AST1"]/data["ALT1"]),2)

In [14]:
# Split data for training and testing
data_lists = []
X,y = standard_scaler(dataframe=data_tmp)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=20)

data_lists = [X_train, y_train, X_test, y_test]
label_names = y_train.unique()

In [15]:
models_result = get_all_the_best_values(data_lists=data_lists, experiment_name="Experiment_4")

Running LR model




*********** Logistic Regression ********
Accuracy Score: 0.48936170212765956
Confusion Matrix : 
 [[46  0]
 [48  0]]
Accuracy :  0.48936170212765956
Sensitivity :  1.0
Specificity :  0.0
ROC AUC score :  0.5
*********** ****************** ********************
Running NB model
calling fit in NB
Done with prediction
*********** Naive Bayes ********
Accuracy Score: 0.5425531914893617
Confusion Matrix : 
 [[28 18]
 [25 23]]
Accuracy :  0.5425531914893617
Sensitivity :  0.6086956521739131
Specificity :  0.4791666666666667
ROC AUC score :  0.5439311594202899
*********** ****************** ********************
Running DT model
*********** Decision Tree ********
Accuracy Score: 0.776595744680851
Confusion Matrix : 
 [[37  9]
 [12 36]]
Accuracy :  0.776595744680851
Sensitivity :  0.8043478260869565
Specificity :  0.75
ROC AUC score :  0.7771739130434783
*********** ****************** ********************
Running RF model




*********** Random Forest ********
Accuracy Score: 0.7021276595744681
Confusion Matrix : 
 [[35 11]
 [17 31]]
Accuracy :  0.7021276595744681
Sensitivity :  0.7608695652173914
Specificity :  0.6458333333333334
ROC AUC score :  0.7033514492753624
*********** ****************** ********************
Running XGB model




*********** XGBoost ********
Accuracy Score: 0.6595744680851063
Confusion Matrix : 
 [[32 14]
 [18 30]]
Accuracy :  0.6595744680851063
Sensitivity :  0.6956521739130435
Specificity :  0.625
ROC AUC score :  0.6603260869565217
*********** ****************** ********************
Running kNN model




*********** k Nearest Neighbor ********
Accuracy Score: 0.6595744680851063
Confusion Matrix : 
 [[33 13]
 [19 29]]
Accuracy :  0.6595744680851063
Sensitivity :  0.717391304347826
Specificity :  0.6041666666666666
ROC AUC score :  0.6607789855072465
*********** ****************** ********************
Running SVM model




*********** Support Vector Machine ********
Accuracy Score: 0.6702127659574468
Confusion Matrix : 
 [[27 19]
 [12 36]]
Accuracy :  0.6702127659574468
Sensitivity :  0.5869565217391305
Specificity :  0.75
ROC AUC score :  0.6684782608695651
*********** ****************** ********************
Running NN model
*********** Neural Network ********
Accuracy Score: 0.39361702127659576
Confusion Matrix : 
 [[14 32]
 [25 23]]
Accuracy :  0.39361702127659576
Sensitivity :  0.30434782608695654
Specificity :  0.4791666666666667
ROC AUC score :  0.39175724637681164
*********** ****************** ********************
Got the result
saving the model to pickle file
Saved model object




In [23]:
from sklearn.ensemble import VotingClassifier
#create a dictionary of the models
estimators = [
#     ('lr', models_result[0]['best_param']), 
#               ('nb', models_result[1]['best_param']),
              ('dt', models_result[2]['best_param']),
              ('rf', models_result[3]['best_param']),
              ('xgb', models_result[4]['best_param']),
              ('knn', models_result[5]['best_param']), 
              ('svm', models_result[6]['best_param']) #,
#               ('nn', models_result[7]['best_param'])
             ]
#create our voting classifier, inputting our models
ensemble = VotingClassifier(estimators, voting="hard")

In [24]:
ensemble.fit(data_lists[0], data_lists[1])

VotingClassifier(estimators=[('dt',
                              DecisionTreeClassifier(class_weight=None,
                                                     criterion='entropy',
                                                     max_depth=None,
                                                     max_features=None,
                                                     max_leaf_nodes=None,
                                                     min_impurity_decrease=0.0,
                                                     min_impurity_split=None,
                                                     min_samples_leaf=1,
                                                     min_samples_split=2,
                                                     min_weight_fraction_leaf=0.0,
                                                     presort=False,
                                                     random_state=None,
                                                     splitter='random')),
 

In [25]:
pred = ensemble.predict(data_lists[2])

In [26]:
evaluate_metrics_2d(y_pred=pred, y_test=data_lists[3])

Confusion Matrix : 
 [[33 13]
 [17 31]]
Accuracy :  0.6808510638297872
Sensitivity :  0.717391304347826
Specificity :  0.6458333333333334
ROC AUC score :  0.6816123188405798


### Stratified AST - between 74 and 148;  ( <font color="red">F1 vs. F4; no F2 and F3</font>) 

In [70]:
data_tmp = data_tmp.loc[data_tmp["AST1"] > 74]
data_tmp = data_tmp.loc[data_tmp["AST1"] < 148]

In [71]:
data_tmp["BHS"].value_counts(sort=0)

1    209
4    205
Name: BHS, dtype: int64

In [73]:
# Split data for training and testing
data_lists = []
X,y = standard_scaler(dataframe=data_tmp)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=20)

data_lists = [X_train, y_train, X_test, y_test]
label_names = y_train.unique()

In [74]:
models_result = get_all_the_best_values(data_lists=data_lists, experiment_name="Experiment_5b")

Running LR model
*********** Logistic Regression ********
Accuracy Score: 0.512
Confusion Matrix : 
 [[64  0]
 [61  0]]
Accuracy :  0.512
Sensitivity :  1.0
Specificity :  0.0
ROC AUC score :  0.5
*********** ****************** ********************
Running NB model
calling fit in NB
Done with prediction
*********** Naive Bayes ********
Accuracy Score: 0.504
Confusion Matrix : 
 [[30 34]
 [28 33]]
Accuracy :  0.504
Sensitivity :  0.46875
Specificity :  0.5409836065573771
ROC AUC score :  0.5048668032786885
*********** ****************** ********************
Running DT model
*********** Decision Tree ********
Accuracy Score: 0.576
Confusion Matrix : 
 [[33 31]
 [22 39]]
Accuracy :  0.576
Sensitivity :  0.515625
Specificity :  0.639344262295082
ROC AUC score :  0.577484631147541
*********** ****************** ********************
Running RF model




*********** Random Forest ********
Accuracy Score: 0.56
Confusion Matrix : 
 [[31 33]
 [22 39]]
Accuracy :  0.56
Sensitivity :  0.484375
Specificity :  0.639344262295082
ROC AUC score :  0.561859631147541
*********** ****************** ********************
Running XGB model




*********** XGBoost ********
Accuracy Score: 0.52
Confusion Matrix : 
 [[23 41]
 [19 42]]
Accuracy :  0.52
Sensitivity :  0.359375
Specificity :  0.6885245901639344
ROC AUC score :  0.5239497950819672
*********** ****************** ********************
Running kNN model




*********** k Nearest Neighbor ********
Accuracy Score: 0.576
Confusion Matrix : 
 [[37 27]
 [26 35]]
Accuracy :  0.576
Sensitivity :  0.578125
Specificity :  0.5737704918032787
ROC AUC score :  0.5759477459016393
*********** ****************** ********************
Running SVM model




*********** Support Vector Machine ********
Accuracy Score: 0.512
Confusion Matrix : 
 [[28 36]
 [25 36]]
Accuracy :  0.512
Sensitivity :  0.4375
Specificity :  0.5901639344262295
ROC AUC score :  0.5138319672131147
*********** ****************** ********************
Running NN model
*********** Neural Network ********
Accuracy Score: 0.488
Confusion Matrix : 
 [[48 16]
 [48 13]]
Accuracy :  0.488
Sensitivity :  0.75
Specificity :  0.21311475409836064
ROC AUC score :  0.48155737704918034
*********** ****************** ********************
Got the result
saving the model to pickle file
Saved model object




In [75]:
from sklearn.ensemble import VotingClassifier
#create a dictionary of the models
estimators = [
#     ('lr', models_result[0]['best_param']), 
#               ('nb', models_result[1]['best_param']),
              ('dt', models_result[2]['best_param']),
              ('rf', models_result[3]['best_param']),
              ('xgb', models_result[4]['best_param']),
              ('knn', models_result[5]['best_param']), 
              ('svm', models_result[6]['best_param']) #,
#               ('nn', models_result[7]['best_param'])
             ]
#create our voting classifier, inputting our models
ensemble = VotingClassifier(estimators, voting="hard")

In [76]:
ensemble.fit(data_lists[0], data_lists[1])

VotingClassifier(estimators=[('dt',
                              DecisionTreeClassifier(class_weight=None,
                                                     criterion='entropy',
                                                     max_depth=None,
                                                     max_features=None,
                                                     max_leaf_nodes=None,
                                                     min_impurity_decrease=0.0,
                                                     min_impurity_split=None,
                                                     min_samples_leaf=1,
                                                     min_samples_split=2,
                                                     min_weight_fraction_leaf=0.0,
                                                     presort=False,
                                                     random_state=None,
                                                     splitter='best')),
   

In [77]:
pred = ensemble.predict(data_lists[2])

In [78]:
evaluate_metrics_2d(y_pred=pred, y_test=data_lists[3])

Confusion Matrix : 
 [[31 33]
 [22 39]]
Accuracy :  0.56
Sensitivity :  0.484375
Specificity :  0.639344262295082
ROC AUC score :  0.561859631147541


### Stratified AST - above 148;  ( <font color="red">F1 vs. F4; no F2 and F3</font>) 

In [91]:
data_tmp = data_tmp.loc[data_tmp["AST1"] >= 148]

In [92]:
data_tmp.head()

Unnamed: 0,Age,Gender,BMI,Fever,NV,Headache,Diarrhea,FGBA,Jaundice,EP,...,ALT36,ALT48,ALT_post24w,RNA_base,RNA4,RNA12,RNAEOT,RNAEF,BHG,BHS


#### APRI >= 1;  ( <font color="red">F1 vs. F4; no F2 and F3</font>) 

In [29]:
data_tmp1 = data.loc[data["BHS"] != 3]
data_tmp = data_tmp1.loc[data_tmp1["BHS"] != 2]

In [30]:
data_tmp["BHS"].value_counts(sort=0)

1    336
4    362
Name: BHS, dtype: int64

In [31]:
data_tmp["APRI"] = round(((data_tmp["AST1"]/40)/data_tmp["Plat"])*100000,2)

In [34]:
data_tmp = data_tmp.loc[data_tmp["APRI"] >= 1]

In [35]:
data_tmp["BHS"].value_counts(sort=0)

1    250
4    252
Name: BHS, dtype: int64

In [36]:
# Split data for training and testing
data_lists = []
X,y = standard_scaler(dataframe=data_tmp)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=20)

data_lists = [X_train, y_train, X_test, y_test]
label_names = y_train.unique()

In [37]:
models_result = get_all_the_best_values(data_lists=data_lists, experiment_name="Experiment_5b")

Running LR model
*********** Logistic Regression ********
Accuracy Score: 0.4503311258278146
Confusion Matrix : 
 [[38 41]
 [42 30]]
Accuracy :  0.4503311258278146
Sensitivity :  0.4810126582278481
Specificity :  0.4166666666666667
ROC AUC score :  0.44883966244725737
*********** ****************** ********************
Running NB model
calling fit in NB
Done with prediction
*********** Naive Bayes ********
Accuracy Score: 0.4900662251655629
Confusion Matrix : 
 [[34 45]
 [32 40]]
Accuracy :  0.4900662251655629
Sensitivity :  0.43037974683544306
Specificity :  0.5555555555555556
ROC AUC score :  0.4929676511954993
*********** ****************** ********************
Running DT model
*********** Decision Tree ********
Accuracy Score: 0.4105960264900662
Confusion Matrix : 
 [[37 42]
 [47 25]]
Accuracy :  0.4105960264900662
Sensitivity :  0.46835443037974683
Specificity :  0.3472222222222222
ROC AUC score :  0.4077883263009846
*********** ****************** ********************
Running RF m



*********** k Nearest Neighbor ********
Accuracy Score: 0.4966887417218543
Confusion Matrix : 
 [[41 38]
 [38 34]]
Accuracy :  0.4966887417218543
Sensitivity :  0.5189873417721519
Specificity :  0.4722222222222222
ROC AUC score :  0.49560478199718705
*********** ****************** ********************
Running SVM model




*********** Support Vector Machine ********
Accuracy Score: 0.5298013245033113
Confusion Matrix : 
 [[39 40]
 [31 41]]
Accuracy :  0.5298013245033113
Sensitivity :  0.4936708860759494
Specificity :  0.5694444444444444
ROC AUC score :  0.5315576652601969
*********** ****************** ********************
Running NN model
*********** Neural Network ********
Accuracy Score: 0.5099337748344371
Confusion Matrix : 
 [[41 38]
 [36 36]]
Accuracy :  0.5099337748344371
Sensitivity :  0.5189873417721519
Specificity :  0.5
ROC AUC score :  0.5094936708860759
*********** ****************** ********************
Got the result
saving the model to pickle file
Saved model object


In [40]:
from sklearn.ensemble import VotingClassifier
#create a dictionary of the models
estimators = [
              ('lr', models_result[0]['best_param']), 
#               ('nb', models_result[1]['best_param']),
              ('dt', models_result[2]['best_param']),
              ('rf', models_result[3]['best_param']),
              ('xgb', models_result[4]['best_param']),
              ('knn', models_result[5]['best_param']), 
              ('svm', models_result[6]['best_param']),
              ('nn', models_result[7]['best_param'])
             ]
#create our voting classifier, inputting our models
ensemble = VotingClassifier(estimators, voting="hard")

In [41]:
ensemble.fit(data_lists[0], data_lists[1])

VotingClassifier(estimators=[('lr',
                              LogisticRegression(C=0.09, class_weight=None,
                                                 dual=False, fit_intercept=True,
                                                 intercept_scaling=1,
                                                 l1_ratio=None, max_iter=10,
                                                 multi_class='ovr', n_jobs=None,
                                                 penalty='l1',
                                                 random_state=None,
                                                 solver='liblinear', tol=0.0001,
                                                 verbose=0, warm_start=False)),
                             ('dt',
                              DecisionTreeClassifier(class_weight=None,
                                                     criterion='entropy',
                                                     max_dept...
                                        

In [42]:
pred = ensemble.predict(data_lists[2])

In [43]:
evaluate_metrics_2d(y_pred=pred, y_test=data_lists[3])

Confusion Matrix : 
 [[33 46]
 [30 42]]
Accuracy :  0.4966887417218543
Sensitivity :  0.4177215189873418
Specificity :  0.5833333333333334
ROC AUC score :  0.5005274261603376


#### APRI >= 2  ( <font color="red">F1 vs. F4; no F2 and F3</font>) 

In [48]:
data_tmp1 = data.loc[data["BHS"] != 3]
data_tmp = data_tmp1.loc[data_tmp1["BHS"] != 2]

In [49]:
data_tmp["BHS"].value_counts(sort=0)

1    336
4    362
Name: BHS, dtype: int64

In [50]:
data_tmp["APRI"] = round(((data_tmp["AST1"]/40)/data_tmp["Plat"])*100000,2)

In [51]:
data_tmp = data_tmp.loc[data_tmp["APRI"] >= 2]

In [52]:
data_tmp["BHS"].value_counts(sort=0)

1    53
4    62
Name: BHS, dtype: int64

In [53]:
# Split data for training and testing
data_lists = []
X,y = standard_scaler(dataframe=data_tmp)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=20)

data_lists = [X_train, y_train, X_test, y_test]
label_names = y_train.unique()

In [54]:
models_result = get_all_the_best_values(data_lists=data_lists, experiment_name="Experiment_4")

Running LR model




*********** Logistic Regression ********
Accuracy Score: 0.5142857142857142
Confusion Matrix : 
 [[ 4 14]
 [ 3 14]]
Accuracy :  0.5142857142857142
Sensitivity :  0.2222222222222222
Specificity :  0.8235294117647058
ROC AUC score :  0.522875816993464
*********** ****************** ********************
Running NB model
calling fit in NB
Done with prediction
*********** Naive Bayes ********
Accuracy Score: 0.5428571428571428
Confusion Matrix : 
 [[ 5 13]
 [ 3 14]]
Accuracy :  0.5428571428571428
Sensitivity :  0.2777777777777778
Specificity :  0.8235294117647058
ROC AUC score :  0.5506535947712419
*********** ****************** ********************
Running DT model
*********** Decision Tree ********
Accuracy Score: 0.4857142857142857
Confusion Matrix : 
 [[ 8 10]
 [ 8  9]]
Accuracy :  0.4857142857142857
Sensitivity :  0.4444444444444444
Specificity :  0.5294117647058824
ROC AUC score :  0.48692810457516333
*********** ****************** ********************
Running RF model




*********** Random Forest ********
Accuracy Score: 0.4
Confusion Matrix : 
 [[ 3 15]
 [ 6 11]]
Accuracy :  0.4
Sensitivity :  0.16666666666666666
Specificity :  0.6470588235294118
ROC AUC score :  0.4068627450980392
*********** ****************** ********************
Running XGB model




*********** XGBoost ********
Accuracy Score: 0.42857142857142855
Confusion Matrix : 
 [[ 7 11]
 [ 9  8]]
Accuracy :  0.42857142857142855
Sensitivity :  0.3888888888888889
Specificity :  0.47058823529411764
ROC AUC score :  0.4297385620915033
*********** ****************** ********************
Running kNN model




*********** k Nearest Neighbor ********
Accuracy Score: 0.4857142857142857
Confusion Matrix : 
 [[ 6 12]
 [ 6 11]]
Accuracy :  0.4857142857142857
Sensitivity :  0.3333333333333333
Specificity :  0.6470588235294118
ROC AUC score :  0.4901960784313726
*********** ****************** ********************
Running SVM model




*********** Support Vector Machine ********
Accuracy Score: 0.5714285714285714
Confusion Matrix : 
 [[11  7]
 [ 8  9]]
Accuracy :  0.5714285714285714
Sensitivity :  0.6111111111111112
Specificity :  0.5294117647058824
ROC AUC score :  0.5702614379084967
*********** ****************** ********************
Running NN model
*********** Neural Network ********
Accuracy Score: 0.4
Confusion Matrix : 
 [[ 4 14]
 [ 7 10]]
Accuracy :  0.4
Sensitivity :  0.2222222222222222
Specificity :  0.5882352941176471
ROC AUC score :  0.40522875816993464
*********** ****************** ********************
Got the result
saving the model to pickle file
Saved model object




#### FIB-4 - above 1.45;  ( <font color="red">F1 vs. F4; no F2 and F3</font>) 

In [55]:
data_tmp1 = data.loc[data["BHS"] != 3]
data_tmp = data_tmp1.loc[data_tmp1["BHS"] != 2]

In [56]:
data_tmp["FIB4"] = round((data_tmp["Age"]*data_tmp["AST1"])/((data_tmp["Plat"]/1000)*np.sqrt(data_tmp["ALT1"])),2)

In [57]:
data_tmp = data_tmp.loc[data_tmp["FIB4"] > 1.45]

In [58]:
data_tmp["BHS"].value_counts(sort=0)

1    302
4    302
Name: BHS, dtype: int64

In [59]:
# Split data for training and testing
data_lists = []
X,y = standard_scaler(dataframe=data_tmp)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=20)

data_lists = [X_train, y_train, X_test, y_test]
label_names = y_train.unique()

In [60]:
models_result = get_all_the_best_values(data_lists=data_lists, experiment_name="Experiment_4")

Running LR model
*********** Logistic Regression ********
Accuracy Score: 0.4725274725274725
Confusion Matrix : 
 [[86  0]
 [96  0]]
Accuracy :  0.4725274725274725
Sensitivity :  1.0
Specificity :  0.0
ROC AUC score :  0.5
*********** ****************** ********************
Running NB model
calling fit in NB
Done with prediction
*********** Naive Bayes ********
Accuracy Score: 0.45054945054945056
Confusion Matrix : 
 [[39 47]
 [53 43]]
Accuracy :  0.45054945054945056
Sensitivity :  0.45348837209302323
Specificity :  0.4479166666666667
ROC AUC score :  0.45070251937984496
*********** ****************** ********************
Running DT model
*********** Decision Tree ********
Accuracy Score: 0.5494505494505495
Confusion Matrix : 
 [[46 40]
 [42 54]]
Accuracy :  0.5494505494505495
Sensitivity :  0.5348837209302325
Specificity :  0.5625
ROC AUC score :  0.5486918604651163
*********** ****************** ********************
Running RF model




*********** Random Forest ********
Accuracy Score: 0.46703296703296704
Confusion Matrix : 
 [[36 50]
 [47 49]]
Accuracy :  0.46703296703296704
Sensitivity :  0.4186046511627907
Specificity :  0.5104166666666666
ROC AUC score :  0.46451065891472865
*********** ****************** ********************
Running XGB model




*********** XGBoost ********
Accuracy Score: 0.4945054945054945
Confusion Matrix : 
 [[36 50]
 [42 54]]
Accuracy :  0.4945054945054945
Sensitivity :  0.4186046511627907
Specificity :  0.5625
ROC AUC score :  0.49055232558139533
*********** ****************** ********************
Running kNN model




*********** k Nearest Neighbor ********
Accuracy Score: 0.47802197802197804
Confusion Matrix : 
 [[38 48]
 [47 49]]
Accuracy :  0.47802197802197804
Sensitivity :  0.4418604651162791
Specificity :  0.5104166666666666
ROC AUC score :  0.4761385658914728
*********** ****************** ********************
Running SVM model




*********** Support Vector Machine ********
Accuracy Score: 0.5054945054945055
Confusion Matrix : 
 [[49 37]
 [53 43]]
Accuracy :  0.5054945054945055
Sensitivity :  0.5697674418604651
Specificity :  0.4479166666666667
ROC AUC score :  0.5088420542635659
*********** ****************** ********************
Running NN model
*********** Neural Network ********
Accuracy Score: 0.4725274725274725
Confusion Matrix : 
 [[31 55]
 [41 55]]
Accuracy :  0.4725274725274725
Sensitivity :  0.36046511627906974
Specificity :  0.5729166666666666
ROC AUC score :  0.46669089147286813
*********** ****************** ********************
Got the result
saving the model to pickle file
Saved model object




#### FIB-4 - above 3.25  ( <font color="red">F1 vs. F4; no F2 and F3</font>) 

In [61]:
data_tmp1 = data.loc[data["BHS"] != 3]
data_tmp = data_tmp1.loc[data_tmp1["BHS"] != 2]

In [62]:
data_tmp["FIB4"] = round((data_tmp["Age"]*data_tmp["AST1"])/((data_tmp["Plat"]/1000)*np.sqrt(data_tmp["ALT1"])),2)

In [63]:
data_tmp = data_tmp.loc[data_tmp["FIB4"] > 3.25]

In [64]:
data_tmp["BHS"].value_counts(sort=0)

1    114
4    114
Name: BHS, dtype: int64

In [65]:
# Split data for training and testing
data_lists = []
X,y = standard_scaler(dataframe=data_tmp)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=20)

data_lists = [X_train, y_train, X_test, y_test]
label_names = y_train.unique()

In [66]:
models_result = get_all_the_best_values(data_lists=data_lists, experiment_name="Experiment_4")

Running LR model




*********** Logistic Regression ********
Accuracy Score: 0.4782608695652174
Confusion Matrix : 
 [[15 25]
 [11 18]]
Accuracy :  0.4782608695652174
Sensitivity :  0.375
Specificity :  0.6206896551724138
ROC AUC score :  0.49784482758620685
*********** ****************** ********************
Running NB model
calling fit in NB
Done with prediction
*********** Naive Bayes ********
Accuracy Score: 0.43478260869565216
Confusion Matrix : 
 [[11 29]
 [10 19]]
Accuracy :  0.43478260869565216
Sensitivity :  0.275
Specificity :  0.6551724137931034
ROC AUC score :  0.4650862068965517
*********** ****************** ********************
Running DT model
*********** Decision Tree ********
Accuracy Score: 0.5217391304347826
Confusion Matrix : 
 [[19 21]
 [12 17]]
Accuracy :  0.5217391304347826
Sensitivity :  0.475
Specificity :  0.5862068965517241
ROC AUC score :  0.530603448275862
*********** ****************** ********************
Running RF model




*********** Random Forest ********
Accuracy Score: 0.5362318840579711
Confusion Matrix : 
 [[17 23]
 [ 9 20]]
Accuracy :  0.5362318840579711
Sensitivity :  0.425
Specificity :  0.6896551724137931
ROC AUC score :  0.5573275862068966
*********** ****************** ********************
Running XGB model




*********** XGBoost ********
Accuracy Score: 0.5217391304347826
Confusion Matrix : 
 [[18 22]
 [11 18]]
Accuracy :  0.5217391304347826
Sensitivity :  0.45
Specificity :  0.6206896551724138
ROC AUC score :  0.5353448275862068
*********** ****************** ********************
Running kNN model




*********** k Nearest Neighbor ********
Accuracy Score: 0.391304347826087
Confusion Matrix : 
 [[15 25]
 [17 12]]
Accuracy :  0.391304347826087
Sensitivity :  0.375
Specificity :  0.41379310344827586
ROC AUC score :  0.3943965517241379
*********** ****************** ********************
Running SVM model




*********** Support Vector Machine ********
Accuracy Score: 0.42028985507246375
Confusion Matrix : 
 [[ 0 40]
 [ 0 29]]
Accuracy :  0.42028985507246375
Sensitivity :  0.0
Specificity :  1.0
ROC AUC score :  0.5
*********** ****************** ********************
Running NN model
*********** Neural Network ********
Accuracy Score: 0.6086956521739131
Confusion Matrix : 
 [[35  5]
 [22  7]]
Accuracy :  0.6086956521739131
Sensitivity :  0.875
Specificity :  0.2413793103448276
ROC AUC score :  0.5581896551724138
*********** ****************** ********************
Got the result
saving the model to pickle file
Saved model object


