# Example Evaluation Code

This notebook will be very __similar__ to the code I use to evaluate your results - it is provided for __your convenience__ so that you can use it to evaluate your preprocessing results at any time before your __final submission__.

Please note that the results here will __NOT__ be the same as my evaluation results.

Let's start with loading the required packages.

In [45]:
# import required package for data handling
import pandas as pd
import numpy as np

# import required packages for splitting data
from sklearn import model_selection
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split

# import required packages for evaluating models
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import roc_auc_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_recall_fscore_support

# import `logistic regression` model
from sklearn.linear_model import LogisticRegression

Next you should load __your__ data. In this case, I am using a sample dataset (`GroupX.csv`) which contains 6 predictors (`X1 - X6`) and two target variables (`Y1, Y2`).

Please make sure you change the data to your __OWN__ dataset when using this code.

__NOTE__:
1. Your dataset maybe very different from the sample dataset.
2. Please follow this structure when submitting your dataset.

In [46]:
data = pd.read_csv('./Datasets/A8.csv', header=0)
data.head()

Unnamed: 0.1,Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,Y1,Y2
0,0,-0.090778,0.404995,0.654566,-0.526217,0.126002,0.310453,-0.183236,0.138658,-0.15348,1.046536,-0.0617,1.048837,1.003035,-0.347193,-0.806226,0,1
1,1,1.071518,-2.469165,-0.619777,-0.658606,0.187801,-0.110317,-1.280911,0.775051,-0.173644,-0.955533,2.318974,-0.707815,1.003035,-0.347193,-0.806226,1,0
2,2,-0.449619,0.404995,0.40348,-0.810392,0.074778,1.418052,3.078262,-0.984884,0.024778,-0.955533,-0.26812,-0.707815,1.003035,-0.347193,-0.806226,1,0
3,3,0.694691,0.404995,0.382698,-0.797383,-1.195491,1.398322,2.281277,-0.626791,0.710231,-0.955533,-0.298247,-0.707815,1.003035,-0.347193,-0.806226,1,1
4,4,-0.57454,0.404995,-1.266863,0.622161,-0.597202,-0.529648,-0.407551,0.429323,-0.028063,1.046536,-0.117854,0.470583,-0.996974,-0.347193,1.240347,0,1


Checking your data types and make sure it follows the data dictionary would be an important step, you can do that using the `.dtypes` attribute.

__NOTE__: all __continuous__ faetures will be in `float64` data type, and all __categorical__ features will be in `int64` data type (given you already coded (per __suggest task \#6__ in the competition document) them).

In [47]:
data.dtypes

Unnamed: 0      int64
0             float64
1             float64
2             float64
3             float64
4             float64
5             float64
6             float64
7             float64
8             float64
9             float64
10            float64
11            float64
12            float64
13            float64
14            float64
Y1              int64
Y2              int64
dtype: object

Now you need to specify your targets and predictors. __NOTE__ we have two targets here (`Y1, Y2`).

In [48]:
y1 = data.Y1
y2 = data.Y2

Check the shape of the data.

In [49]:
data.shape

(660, 18)

It is very possible that you will use different sets of the predictors for `Y1` and `Y2`. Now let's define them.

First, let's define predictors for `Y1` - which will be the first 5 features in `data`.

In [50]:
cols = list(data.columns)
# first 5 features 
cols[:-3]

['Unnamed: 0',
 '0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '12',
 '13']

Use below code to select the first 5 features as predictors for `Y1`.

In [51]:
datay1 = data.drop(data.columns[12], axis = 1)

In [52]:
datay1[datay1.columns[0:15]]

Unnamed: 0.1,Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,12,13,14
0,0,-0.090778,0.404995,0.654566,-0.526217,0.126002,3.104532e-01,-0.183236,0.138658,-0.153480,1.046536,-0.061700,1.003035,-0.347193,-0.806226
1,1,1.071518,-2.469165,-0.619777,-0.658606,0.187801,-1.103172e-01,-1.280911,0.775051,-0.173644,-0.955533,2.318974,1.003035,-0.347193,-0.806226
2,2,-0.449619,0.404995,0.403480,-0.810392,0.074778,1.418052e+00,3.078262,-0.984884,0.024778,-0.955533,-0.268120,1.003035,-0.347193,-0.806226
3,3,0.694691,0.404995,0.382698,-0.797383,-1.195491,1.398322e+00,2.281277,-0.626791,0.710231,-0.955533,-0.298247,1.003035,-0.347193,-0.806226
4,4,-0.574540,0.404995,-1.266863,0.622161,-0.597202,-5.296476e-01,-0.407551,0.429323,-0.028063,1.046536,-0.117854,-0.996974,-0.347193,1.240347
5,5,-0.401607,0.404995,-1.947540,-0.086080,0.164674,-1.011650e+00,-1.142055,-1.398879,-0.492006,1.046536,0.614532,-0.996974,-0.347193,1.240347
6,6,-0.028993,0.404995,-0.095418,-0.838991,2.082413,-1.156892e+00,0.016536,-0.795056,-1.105698,-0.955533,-0.227843,1.003035,-0.347193,-0.806226
7,7,-0.437518,-2.469165,1.001363,0.000000,3.155918,-1.022820e-15,-0.167987,0.641294,-0.176664,-0.955533,-0.710853,1.003035,-0.347193,-0.806226
8,8,-0.331466,0.404995,-1.099294,0.084103,-1.187231,1.438909e+00,-0.807483,-0.158197,1.527149,-0.955533,0.300962,-0.996974,-0.347193,1.240347
9,9,-0.263399,0.404995,-0.967407,0.435709,-1.492313,-2.788294e-01,1.915093,0.364004,1.620546,1.046536,-0.406296,1.003035,-0.347193,-0.806226


In [53]:

predictors_y1 = datay1[datay1.columns[0:15]]

predictors_y1.head()

Unnamed: 0.1,Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,12,13,14
0,0,-0.090778,0.404995,0.654566,-0.526217,0.126002,0.310453,-0.183236,0.138658,-0.15348,1.046536,-0.0617,1.003035,-0.347193,-0.806226
1,1,1.071518,-2.469165,-0.619777,-0.658606,0.187801,-0.110317,-1.280911,0.775051,-0.173644,-0.955533,2.318974,1.003035,-0.347193,-0.806226
2,2,-0.449619,0.404995,0.40348,-0.810392,0.074778,1.418052,3.078262,-0.984884,0.024778,-0.955533,-0.26812,1.003035,-0.347193,-0.806226
3,3,0.694691,0.404995,0.382698,-0.797383,-1.195491,1.398322,2.281277,-0.626791,0.710231,-0.955533,-0.298247,1.003035,-0.347193,-0.806226
4,4,-0.57454,0.404995,-1.266863,0.622161,-0.597202,-0.529648,-0.407551,0.429323,-0.028063,1.046536,-0.117854,-0.996974,-0.347193,1.240347


Upon investigation of the data, we know we have __six__ features (`X1 - X6`) predicting `Y2`. Use similar code (as below) to select them.

In [54]:
predictors_y2 = data[data.columns[0:16]]
predictors_y2.head()

Unnamed: 0.1,Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,0,-0.090778,0.404995,0.654566,-0.526217,0.126002,0.310453,-0.183236,0.138658,-0.15348,1.046536,-0.0617,1.048837,1.003035,-0.347193,-0.806226
1,1,1.071518,-2.469165,-0.619777,-0.658606,0.187801,-0.110317,-1.280911,0.775051,-0.173644,-0.955533,2.318974,-0.707815,1.003035,-0.347193,-0.806226
2,2,-0.449619,0.404995,0.40348,-0.810392,0.074778,1.418052,3.078262,-0.984884,0.024778,-0.955533,-0.26812,-0.707815,1.003035,-0.347193,-0.806226
3,3,0.694691,0.404995,0.382698,-0.797383,-1.195491,1.398322,2.281277,-0.626791,0.710231,-0.955533,-0.298247,-0.707815,1.003035,-0.347193,-0.806226
4,4,-0.57454,0.404995,-1.266863,0.622161,-0.597202,-0.529648,-0.407551,0.429323,-0.028063,1.046536,-0.117854,0.470583,-0.996974,-0.347193,1.240347


Below is the key part of this notebook - which generates a `logistic regression` model to predict `Y1`/`Y2`.

The code works this way:

1. We generate two lists `f1_score_lst` and `auc_lst` to store f1_score and AUC from each of the `10` runs of the model;
2. Define model:
    1. We define a `LogisticRegression()` model;
    
    2. We split predictors (`predictors_y1`) and target `y1` to training (80%) and testing (20%);
    
    3. We fit the model `clf` to the training data, then use it to predict on the testing data;
    
    4. We also defined a `10-fold cross validation` to make sure our model do not overfit - see [here](https://scikit-learn.org/stable/modules/cross_validation.html) for more info;
    
    5. We append the f1_score and AUC of current model to the lists (`f1_score_lst` and `auc_lst`) we defined earlier.
  
3. Print out average f1_score and AUC for all 10 runs;
4. Print out average average accuracy from cross validation
5. Print out confusion matrix and classification report for the __last__ model.

__NOTE__: Step 3 provides the evaluation results we need; step 4 - 5 can be used to verify the results from step 3.

In [55]:
# lists for f1-score and AUC
f1_score_lst = []
auc_lst = []


#loop to calculate f1 and auc scores and present averages after 10 runs
for count in range (1,10):
    #Model building
    clf = LogisticRegression()
    X1_train, X1_test, y1_train, y1_test = train_test_split(predictors_y1, y1, test_size=0.2, random_state=123)
    clf.fit(X1_train, y1_train)

    y1_pred = clf.predict(X1_test)

    
    #10-fold cross validation
    kfold = model_selection.KFold(n_splits=10, random_state=7)
    scoring = 'accuracy'
    results = model_selection.cross_val_score(clf, X1_train, y1_train, cv=kfold, scoring=scoring)

    

    
    #calculate f1-score and AUC
    
    clf_roc_auc = roc_auc_score(y1_test, y1_pred)
    f1_score_lst.append(precision_recall_fscore_support(y1_test, y1_pred, average='weighted')[2])
    auc_lst.append(clf_roc_auc)


print('F1 {:.4f}; AUC {:.4f} '.format(np.mean(f1_score_lst),np.mean(auc_lst)))

#result=logit_model.fit()
confusion_matrix_y1 = confusion_matrix(y1_test, y1_pred)


#print(result.summary())
print('Accuracy of classifier on test set: {:.2f}'.format(clf.score(X1_test, y1_test)))

print("10-fold cross validation average accuracy of classifier: %.3f" % (results.mean()))

print('Confusion Matrix for Logistic Regression Classfier:')
print(confusion_matrix_y1)

print('Classification Report for Logistic Regression Classfier:')
print(classification_report(y1_test, y1_pred))


F1 0.6319; AUC 0.6346 
Accuracy of classifier on test set: 0.63
10-fold cross validation average accuracy of classifier: 0.589
Confusion Matrix for Logistic Regression Classfier:
[[47 31]
 [18 36]]
Classification Report for Logistic Regression Classfier:
             precision    recall  f1-score   support

          0       0.72      0.60      0.66        78
          1       0.54      0.67      0.60        54

avg / total       0.65      0.63      0.63       132



Below code are used to evaluate model toward `Y2`. It is very similar to the code above - key difference is that `Y2` is imbalanced - so I wrote some code (under `# Begin oversampling`) to deal with that.

In [56]:
# lists for f1-score and AUC
f1_score_lst = []
auc_lst = []


#loop to calculate f1 and auc scores and present averages after 10 runs
for count in range (1,10):
    #Model building
    clf1 = LogisticRegression()

    
    # Splitting data into testing and training
    X2_train, X2_test, y2_train, y2_test = train_test_split(predictors_y2, y2, test_size=0.2, random_state=123)
    
    # Begin oversampling
    oversample = pd.concat([X2_train,y2_train],axis=1)
    max_size = oversample['Y2'].value_counts().max()
    lst = [oversample]
    for class_index, group in oversample.groupby('Y2'):
        lst.append(group.sample(max_size-len(group), replace=True))
    X2_train = pd.concat(lst)
    y2_train=pd.DataFrame.copy(X2_train['Y2'])
    del X2_train['Y2']
    
    # fitting model on oversampled data
    clf1.fit(X2_train, y2_train)
    
    y2_pred = clf1.predict(X2_test)
    
    
    #10-fold cross validation
    kfold = model_selection.KFold(n_splits=10, random_state=123)
    scoring = 'accuracy'
    results = model_selection.cross_val_score(clf1, X2_train, y2_train, cv=kfold, scoring=scoring)
    
    #calculate f1-score and AUC
    
    clf1_roc_auc = roc_auc_score(y2_test, y2_pred)
    
    
    #calculate average f1-score and AUC
    f1_score_lst.append(precision_recall_fscore_support(y2_test, y2_pred, average='weighted')[2])
    auc_lst.append(clf1_roc_auc)
    
    
print('F1 {:.4f}; AUC {:.4f} '.format(np.mean(f1_score_lst),np.mean(auc_lst)))

confusion_matrix_y2 = confusion_matrix(y2_test, y2_pred)


print('Accuracy of classifier on test set: {:.3f}'.format(clf1.score(X2_test, y2_test)))

print("10-fold cross validation average accuracy of clf1: %.3f" % (results.mean()))

print('Confusion Matrix for Classfier:')
print(confusion_matrix_y2)

print('Classification Report for Classfier:')
print(classification_report(y2_test, y2_pred))


F1 0.6379; AUC 0.6177 
Accuracy of classifier on test set: 0.629
10-fold cross validation average accuracy of clf1: 0.574
Confusion Matrix for Classfier:
[[23 13]
 [36 60]]
Classification Report for Classfier:
             precision    recall  f1-score   support

          0       0.39      0.64      0.48        36
          1       0.82      0.62      0.71        96

avg / total       0.70      0.63      0.65       132

