# Modeling Process With Experimental Feature Database

- In this notebook we use our experimental database to identify key features and tune the hyperparameters of several high preforming models. For model selection, we where looking for models where we could best improve the precision(reduce false positives) while not drastically reducing recall (false negatives).  To evaluate the models we used sklearn metrics and also viewed the confusion matrix to see how the models predictions on the test set preformed.

In [1]:
# Import Packages
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

# Sklearn Packages
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn import metrics
from sklearn.metrics import mean_squared_error, precision_score, confusion_matrix, accuracy_score
from sklearn.dummy import DummyClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from imblearn.over_sampling import SMOTE
from sklearn.tree import DecisionTreeClassifier
from sklearn import set_config
set_config(print_changed_only=False)
from xgboost import XGBClassifier
from sklearn.utils.testing import ignore_warnings
from sklearn.exceptions import ConvergenceWarning
from xgboost import plot_importance

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=ConvergenceWarning)

pd.set_option('display.max_columns', 300)
% matplotlib inline

plt.style.use('seaborn')



In [2]:
#Read in dataframe
exp_df = pd.read_csv('experiment_features.csv',index_col=0)
exp_df.head()

Unnamed: 0,baseline value,accelerations,fetal_movement,uterine_contractions,light_decelerations,prolongued_decelerations,abnormal_short_term_variability,mean_value_of_short_term_variability,percentage_of_time_with_abnormal_long_term_variability,mean_value_of_long_term_variability,histogram_width,histogram_min,histogram_max,histogram_number_of_peaks,histogram_mode,histogram_mean,histogram_median,histogram_variance,fetal_health,uterine_cont_per_min,total_change,sqrt_total_change,hist_zeros_1.0,hist_zeros_2.0,hist_zeros_3.0,hist_zeros_4.0,hist_zeros_5.0,hist_zeros_7.0,hist_zeros_8.0,hist_zeros_10.0,hist_tendancy_0.0,hist_tendancy_1.0,sev_decel_0.001,quant_acc_1,quant_light_dec_1,quant_hist_mean_1,quant_hist_mean_2,quant_hist_mean_3,quant_hist_mean_4,quant_hist_mean_5,quant_hist_mean_6
0,120.0,0.0,0.0,0.0,0.0,0.0,73.0,0.5,43.0,2.4,64.0,62.0,126.0,2.0,120.0,137.0,121.0,73.0,2.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0
1,132.0,0.006,0.0,0.006,0.003,0.0,17.0,2.1,0.0,10.4,130.0,68.0,198.0,6.0,141.0,136.0,140.0,12.0,1.0,0.36,1.98,0.122474,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0
2,133.0,0.003,0.0,0.008,0.003,0.0,16.0,2.1,0.0,13.4,130.0,68.0,198.0,5.0,141.0,135.0,138.0,13.0,1.0,0.48,1.862,0.118322,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0
3,134.0,0.003,0.0,0.008,0.003,0.0,16.0,2.4,0.0,23.0,117.0,53.0,170.0,11.0,137.0,134.0,137.0,13.0,1.0,0.48,1.876,0.118322,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0
4,132.0,0.007,0.0,0.008,0.0,0.0,16.0,2.4,0.0,19.9,117.0,53.0,170.0,9.0,137.0,136.0,138.0,11.0,1.0,0.48,1.98,0.122474,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0


In [3]:
# Evaluation function
def evaluation(y_true, y_pred):
    
# Print Accuracy, Recall, F1 Score, and Precision metrics.
    print('Evaluation Metrics:')
    print('Accuracy: ' + str(metrics.accuracy_score(y_test, y_pred)))
    print('Recall: ' + str(metrics.recall_score(y_test, y_pred)))
    print('F1 Score: ' + str(metrics.f1_score(y_test, y_pred)))
    print('Precision: ' + str(metrics.precision_score(y_test, y_pred)))
    
# Print Confusion Matrix
    print('\nConfusion Matrix:')
    print(' TN,  FP, FN, TP')
    print(confusion_matrix(y_true, y_pred).ravel())
    
# Function Prints best parameters for GridSearchCV
def print_results(results):
    print('Best Parameters: {}\n'.format(results.best_params_))   

In [4]:
#train test split of data
X = exp_df.drop('fetal_health', axis =1)
y = exp_df.fetal_health

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.25, random_state=1)

In [5]:
#standared scaler for predicting features
scaler = StandardScaler()  
scaler.fit(X_train)

StandardScaler(copy=True, with_mean=True, with_std=True)

## Class imbalance

- To handle class imbalance we oversampled the minority class using SMOTE(Synthetic Minority Oversampling Technique), this balanced the minority class by sampling the nearest neighboors and adding points between the neighbors.  We used both a SMOTE sampled database and unbalanced database to compare the effect of the metrics for each experimental baseline model.  The Smote Database preformed better on every baseline model, we chose to use the SMOTE database for our final models.  

In [6]:
#Used smote to oversample minority class
sm = SMOTE(random_state=25)
smX_train, smy_train = sm.fit_sample(X_train, y_train)

## KNN 

- Here we ran a basline model of knn on the experimental database and compared the metrics results to the database treated with smote.  


In [7]:
#baseline KNN with class imbalance
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train,y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=1, p=2,
                     weights='uniform')

In [8]:
y_pred = knn.predict(X_test)

In [9]:
#prediction metrics and confusion matrix of base KNN with class imbalance
evaluation(y_test, y_pred)

Evaluation Metrics:
Accuracy: 0.9172932330827067
Recall: 0.9682151589242054
F1 Score: 0.9473684210526316
Precision: 0.927400468384075

Confusion Matrix:
 TN,  FP, FN, TP
[396  13  31  92]


In [10]:
#Base KNN model with Smote oversampled class
smknn = KNeighborsClassifier(n_neighbors=1)
smknn.fit(smX_train,smy_train)
y_pred = smknn.predict(X_test)

In [11]:
#Prediction metrics and confusion matrix of smote base KNN
evaluation(y_test, y_pred)

Evaluation Metrics:
Accuracy: 0.9172932330827067
Recall: 0.9584352078239609
F1 Score: 0.9468599033816426
Precision: 0.9355608591885441

Confusion Matrix:
 TN,  FP, FN, TP
[392  17  27  96]


## Logistic Regression Basline Model

- Here are two Logistic regression Models comparing the effect of SMOTE on the metric scores.  SMOTE greatly improved the precision of the model, however Recall was greatly reduced.  While one of the aims for our project was to improve precision as best we can, too many false negatives are not ideal for the overal fetal mortality rate.  

In [12]:
#Fit Train set with Logistic Regression model
lr = LogisticRegression()
lr.fit(X_train,y_train)
y_pred = lr.predict(X_test)

In [13]:
#Regression model evaluation metrics and confusion matrix
evaluation(y_test, y_pred)

Evaluation Metrics:
Accuracy: 0.8778195488721805
Recall: 0.9388753056234719
F1 Score: 0.921968787515006
Precision: 0.9056603773584906

Confusion Matrix:
 TN,  FP, FN, TP
[384  25  40  83]


In [14]:
smlr = LogisticRegression(solver='liblinear')
smlr.fit(smX_train,smy_train)
y_predsm = smlr.predict(X_test)

In [15]:
evaluation(y_test, y_predsm)

Evaluation Metrics:
Accuracy: 0.8853383458646616
Recall: 0.8801955990220048
F1 Score: 0.9218950064020486
Precision: 0.967741935483871

Confusion Matrix:
 TN,  FP, FN, TP
[360  49  12 111]


- Below is a table of feature coefficients for the Logistic Regression.  The greatest feature Coefficients are sqrt_total_change, quant_acc_1, quant_hist_mean(1 2 and 3) and tendancy_1.  All of these features are ones that we engineered.  Looking at what these features represent, for this model, the change in the rate of FHR and the average change in that rate seem to be the biggest factors for classifying fetal health.  We wanted to try and improve the Logistic regression by tuning the hyperparameters to improve both the recall and precision.  To attempt this we used a bagging classifer to train the logistic regression on multiple random samples and aggrigate the predictions to see how that effected the evaluation metrics.  

In [16]:
#Create a table of logistic regression coefficients and comparing the coefficients 
#of the SMOTE and inbalanced datasets
coef_table = pd.DataFrame(list(X_train.columns)).copy()
coef_table.insert(len(coef_table.columns),"Coefs",lr.coef_.transpose())

coef_table_2 = pd.DataFrame(list(smX_train.columns)).copy()
coef_table_2.insert(len(coef_table_2.columns),'sm_Coefs',smlr.coef_.transpose())

smote_vs_coef = pd.concat([coef_table,coef_table_2],axis=1)
smote_vs_coef.columns = ['features','Coefs','del','sm_Coefs']
del smote_vs_coef['del']
smote_vs_coef

Unnamed: 0,features,Coefs,sm_Coefs
0,baseline value,0.117174,0.112113
1,accelerations,-0.004092,-0.224252
2,fetal_movement,0.010843,0.313262
3,uterine_contractions,-0.00532,-0.055485
4,light_decelerations,-0.00222,-0.036389
5,prolongued_decelerations,0.000806,0.044586
6,abnormal_short_term_variability,0.06991,0.08435
7,mean_value_of_short_term_variability,-0.715013,-0.626913
8,percentage_of_time_with_abnormal_long_term_var...,0.037261,0.039607
9,mean_value_of_long_term_variability,0.031932,0.045947


In [17]:
#Bagging classifier for legistic regression
bag_log = BaggingClassifier(
    base_estimator=LogisticRegression(
        random_state=1),n_estimators=200,
    max_samples=.85,
    max_features=10,oob_score=True,
    n_jobs=-1,verbose=1)

NameError: name 'BaggingClassifier' is not defined

In [None]:
#Fit bagging classifier
bag_log.fit(smX_train, smy_train)
y_pred = bag_log.predict(X_test)

In [None]:
#Evaluation Metrics
evaluation(y_test,y_pred)

- The Bagging classifer made only minute changes to the evaluation metrics.  We decided to look at some other models and compare the results.

## Decision Tree

- The next model we tried was a decision tree.  Again we compared the imbalanced and SMOTE data sets on baseline models to observe the effect on the metrics.  With the SMOTE data set, decision tree produced very good metrics with all default hyperparamers.  The most important features where abnormal_short_term_variability, and mean_value_of_short_term_variability	

In [None]:
#Train decision tree with train set and predict on the test set
tree = DecisionTreeClassifier()

tree = tree.fit(X_train,y_train)

y_pred = tree.predict(X_test)

In [None]:
#Evaluation metrics
evaluation(y_test, y_pred)

In [None]:
#Decision tree with smote dataset
smtree = DecisionTreeClassifier()
smtree.fit(smX_train,smy_train)
y_pred = smtree.predict(X_test)

In [None]:
#evaluation Metrics
evaluation(y_test, y_pred)

In [None]:
#Table for decision tree feature coefficients
coef_table = pd.DataFrame(list(X_train.columns)).copy()
coef_table.insert(len(coef_table.columns),"Coefs",tree.feature_importances_.transpose())

coef_table_2 = pd.DataFrame(list(smX_train.columns)).copy()
coef_table_2.insert(len(coef_table_2.columns),'sm_Coefs',tree.feature_importances_.transpose())

smote_vs_coef = pd.concat([coef_table,coef_table_2],axis=1)
smote_vs_coef.columns = ['features','importance','del','sm_importance']
del smote_vs_coef['del']
smote_vs_coef

## Random Forest

- As we did above we used the SMOTE and imbalanced data and this time trained a random forest classifer.  


In [None]:
#Random Forest classifier using 50 estimators and a max depth of 3
rfc = RandomForestClassifier(random_state =1, n_estimators= 50, max_depth = 3, n_jobs =-1,verbose=1)
rfc.fit(X_train,y_train)
y_pred = rfc.predict(X_test)

In [None]:
#Evaluation metrics for random forest
evaluation(y_test, y_pred)

In [None]:
#Random Forest classifier using 50 estimators and a max depth of 3 using SMOTE dataset
smrfc = RandomForestClassifier(random_state =1, n_estimators= 50, max_depth = 3, n_jobs =-1,verbose=1)
smrfc.fit(smX_train,smy_train)
y_pred = smrfc.predict(X_test)

In [None]:
print(confusion_matrix(y_test, y_pred))
evaluation(y_test, y_pred)

In [None]:
#Create table of feature coefficients
coef_table = pd.DataFrame(list(X_train.columns)).copy()
coef_table.insert(len(coef_table.columns),"Coefs",rfc.feature_importances_.transpose())

coef_table_2 = pd.DataFrame(list(smX_train.columns)).copy()
coef_table_2.insert(len(coef_table_2.columns),'sm_Coefs',smrfc.feature_importances_.transpose())

smote_vs_coef = pd.concat([coef_table,coef_table_2],axis=1)
smote_vs_coef.columns = ['features','importance_baseline','del','sm_importance_baseline']
del smote_vs_coef['del']
smote_vs_coef

## Grid Search Random Forest
- We ran a gridsearch on the random forest to identify what the best hyperparamters where for the model.  We checked several estimator sizes, max depth and min weight fraction leaf to find the ideal parameters.  

In [None]:
#parameter grid for grid search with lists of estimators, both criterion, a list of max depths sqrt max features 
#and a list of min weight fraction leaf
parameters = {
    'n_estimators': [25,50,100,300,500],
    'criterion' : ['gini','entropy'],
    'max_depth' : [8,9,10,11,12],
    'max_features' : ['sqrt'],
    'min_weight_fraction_leaf' : [0,0.1,0.3,0.5],
    
    
    
}

In [None]:
#Gridsearch with random forest
grid_tree=GridSearchCV(RandomForestClassifier(), parameters, cv=15, scoring='f1', verbose=1, n_jobs=-1)

In [None]:
#Fit random forest grid serch to SMOTE train set
grid_tree.fit(smX_train, smy_train)

In [None]:
#Find the best score, best parameters and best estimator for grid search
print(grid_tree.best_score_)
print(grid_tree.best_params_)
print(grid_tree.best_estimator_)

In [None]:
#evaluation metrics for random forest gridsearch best parameters
y_pred = grid_tree.best_estimator_.predict(X_test)
evaluation(y_test,y_pred)

**Findings:** 
- The best parameters for the random forest was a max depth of 11, min weight fraction leave of 0, and 500 estimators.  The Evaluation metrics where our highest score with the experimental set so far.  Below we compared feature importance for the three random forest models we ran and then graphed the top ten important features to visualize there importance for fetal health classification.

In [None]:
#Best feature table for random forest gridsearch best parameters
coef_table = pd.DataFrame(list(X_train.columns)).copy()
coef_table.insert(len(coef_table.columns),"Coefs",grid_tree.best_estimator_.feature_importances_.transpose())
coef_table

In [None]:
#table comparing random forest best features between imbalanced data set, smote and smote with gridsearch best features
coef_table.columns = ['features','grid_search_importance']
del coef_table['features']
best_features_rfc = pd.concat([smote_vs_coef,coef_table],axis=1)

In [None]:
best_features_rfc

In [None]:
pd.Series(grid_tree.best_estimator_.feature_importances_, index=X.columns).nlargest(10).plot(kind='barh')

**findings**
- In the Above graphe you can see the top ten importanct features for our grid search random foret model.  Abnormal_short_term_variatability, acceleration and mean_value_of_short_term_varitability seem to be key factors in health classification.  We can infer from this that extended rapid extreme changes in fetal heart rate are detrimental to fetal health.

## XGboost

- We wanted to try an XGboost classifer to attempt a better model than the gridsearch random forest.  XGboost uses gradiant descent and boosting principles to regression trees.  We first tried an XGboost model with some parametrs we thought would be ideal for the model.  Next we ran a grid search on the XGboost parameters to produce our best possible model.  

In [None]:
xgb.XGBClassifier()

In [None]:
xg_clf = xgb.XGBClassifier(objective ='binary:logistic', 
                           colsample_bytree = 0.75, 
                           subsample = 0.85,
                           learning_rate = 0.1,
                           max_depth = 11, 
                           alpha = 1, 
                           n_estimators = 1000,
                          verbose=1, n_jobs=-1)

In [None]:
xg_clf.fit(smX_train,smy_train)

In [None]:
y_pred = xg_clf.predict(X_test)

In [None]:
evaluation(y_test,y_pred)

In [None]:
coef_table = pd.DataFrame(list(X_train.columns)).copy()
coef_table.insert(len(coef_table.columns),"Coefs",xg_clf.feature_importances_.transpose())

In [None]:
coef_table

In [None]:
clf_xgb = xgb.XGBClassifier(objective = 'binary:logistic')
param_dist = {'n_estimators': [500,1000,1500],
              'learning_rate': [0.1,0.07,0.05,0.03,0.01],
              'max_depth': [9,10,11,12,13],
              'colsample_bytree': [0.5,0.45,0.4],
              'min_child_weight': [1, 2, 3]
             }

In [None]:
grid_xg = GridSearchCV(estimator=clf_xgb,
                      param_grid= param_dist,
                      scoring='f1',
                      n_jobs=-1,
                      verbose=1,
                      iid=False,
                      cv=10)

In [None]:
grid_xg.fit(smX_train,smy_train)

In [None]:
grid_xg.best_params_

In [None]:
y_pred = grid_xg.best_estimator_.predict(X_test)

In [None]:
evaluation(y_test,y_pred)

In [None]:
from xgboost import plot_importance
plot_importance(grid_xg.best_estimator_)

In [None]:
plot_importance(grid_xg.best_estimator_,max_num_features=10)

**findings**
- The grid search xgboost produced our overall best model.  We where extremely pleased with the highest recall and accuracy score.  The precision, which was our target metric was also very high.  When graphing the featrue importance, the most important features where abnormal_short_term_variatability and the histogram mean, min, width  and mode.  Our engineered features where not as important as some of the unchanged features.  Our best preforming engineered feature was sqrt_total_change, but no engineered feature was in the top ten.  Just like our random forest model short term varitability of the FHR is a key feature in classifying fetal health.