<img src="images/thro.png" align="right"> 
# A2I2 - Artificial Neural Networks (ANN)

## Lecture

---
## Step 3: Modeling and Evaluation (Cost-Benefit-Matrix)

Let us revisit our Cardiotocography Evaluation project:

### <span style="color:blue">Business Problem: </span> Cardiotocography Evaluation

During pregnancy, many doctors perform "fetal cardiotocograms", recording the heartbeat and other measurements of the fetus in order to assess fetal wellbeing. 

### <span style="color:blue">Understanding the business</span>

**Cardiotocography**
* The recorded data is quite hard to interpret, ideally a group if experts is doing the evaluation. However, quite often, this is not possible and a single doctor has to do the interpretation alone.

### <span style="color:blue">Mapping to Data Science Problems and Methods</span>

* Automatically interpret cardiotocography data to assess fetal health (classification)


### <span style="color:blue">Modeling and Evaluation</span>

How do we evaluate **from a business perspective** if our (as of now not existing) model solves the **business problem**?

You know about accuracy, the confusion matrix and many other classification evaluation techniques. But how do we apply these to a business problem?

### The Cost-Benefit-Matrix

"**Cost–benefit analysis (CBA)**, sometimes also called benefit–cost analysis or benefit costs analysis, is a systematic approach to estimating the strengths and weaknesses of alternatives used to determine options which provide the best approach to achieving benefits while preserving savings (for example, in transactions, activities, and functional business requirements). A CBA may be used to compare completed or potential courses of actions, or to estimate (or evaluate) the value against the cost of a decision, project, or policy. It is commonly used in commercial transactions, business or policy decisions (particularly public policy), and project investments." 
(<a href="https://en.wikipedia.org/wiki/Cost%E2%80%93benefit_analysis">wikipedia</a>)


Remember the confusion Matrix: 

$$\begin{bmatrix}
 & Predicted-Negative & Predicted-Positive\\
True-Condition-Negative & TN & FP\\
True-Condition-Positive & FN & TP \\
\end{bmatrix}$$

We can assign now a **cost or benefit** to each sample, that is classified (TN, FP, FN, TP) based on the **business problem**!

In our example, we have three classes: NSP = fetal state class code (1=normal; 2=suspect; 3=pathologic), leading to a confusion matrix:

$$\begin{bmatrix}
 & Predicted-0 & Predicted-1 & Predicted-2\\
True-Condition-0 & NN & NS & NP \\
True-Condition-1 & SN & SS & SP \\
True-Condition-2 & PN & PS & PP \\
\end{bmatrix}$$

We need to assing a cost or benefit to each of these 9 possible outcomes as a financial value. Let us try to put a cost on each case, and afterward minimize this cost. Let us assume that a pathologic fetus can be treated, if diagnosed as pathologic. If a pathologic fetus is not diagnosed and therefore not treated, let us assume this will be diagnosed at the next examination but lead to a more expensive treatment with an additional cost of 20.000. Let us also assume that a suspected or pathologic diagnosis is verified with an additional test costing 1000.

* NN (truly normal,  predicted normal): cost 0
* NS (truly normal, predicted suspect): further (unnecessary) test need to be conducted leading to a cost of 1000
* NP (truly normal, predicted pathologic): further (unnecessary) test need to be conducted, leading to a cost of 1000

* SN (truly suspect, predicted normal): this is hard to quantifiy - let us assume, that 50% of the suspect cases in reality are healthy, so by making this error, we actually save 1000. The other 50% are pathologic and remain untreated costing 20.000. So this error leads to a cost of (20.000 - 1000)/2 = 9.500
* SS (truly suspect, predicted suspect): further (necessary) test will be conducted, cost 0
* SP (truly suspect, predicted pathologic): further (necessary) test will be conducted, cost 0

* PN (truly pathologic, predicted normal): will remain untreated costing 20000
* PS (truly pathologic but predicted suspect): further (necessary) test will be conducted, cost 0
* PP (truly pathologic but predicted pathologic): further (necessary) test will be conducted, cost 0

This leads to the following cost-benefit-matrix:
$$\begin{bmatrix}
 & Predicted-0 & Predicted-1 & Predicted-2\\
True-Condition-0 & 0 & 1000 & 1000 \\
True-Condition-1 & 9500 & 0 & 0 \\
True-Condition-2 & 20000 & 0 & 0 \\
\end{bmatrix}$$

***We can now weigh the confusion matrix by the cost-benefit matrix to evalute a classification model from a business perspective.***

Let us apply to this our (full, no outlier removal) data set.

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.svm import SVC
from sklearn import metrics
from sklearn.model_selection import train_test_split
import sklearn as sklearn

In [2]:
ctg_raw = pd.read_csv('data/CTG.csv', delimiter=';', decimal=",")
ctg_raw

Unnamed: 0,LB,AC,FM,UC,DL,DS,DP,ASTV,MSTV,ALTV,...,Min,Max,Nmax,Nzeros,Mode,Mean,Median,Variance,Tendency,NSP
0,120.0,0.00,0.0,0.00,0.0,0.0,0.0,73.0,0.5,43.0,...,62.0,126.0,2.0,0.0,120.0,137.0,121.0,73.0,1.0,2.0
1,132.0,0.01,0.0,0.01,0.0,0.0,0.0,17.0,2.1,0.0,...,68.0,198.0,6.0,1.0,141.0,136.0,140.0,12.0,0.0,1.0
2,133.0,0.00,0.0,0.01,0.0,0.0,0.0,16.0,2.1,0.0,...,68.0,198.0,5.0,1.0,141.0,135.0,138.0,13.0,0.0,1.0
3,134.0,0.00,0.0,0.01,0.0,0.0,0.0,16.0,2.4,0.0,...,53.0,170.0,11.0,0.0,137.0,134.0,137.0,13.0,1.0,1.0
4,132.0,0.01,0.0,0.01,0.0,0.0,0.0,16.0,2.4,0.0,...,53.0,170.0,9.0,0.0,137.0,136.0,138.0,11.0,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2121,140.0,0.00,0.0,0.01,0.0,0.0,0.0,79.0,0.2,25.0,...,137.0,177.0,4.0,0.0,153.0,150.0,152.0,2.0,0.0,2.0
2122,140.0,0.00,0.0,0.01,0.0,0.0,0.0,78.0,0.4,22.0,...,103.0,169.0,6.0,0.0,152.0,148.0,151.0,3.0,1.0,2.0
2123,140.0,0.00,0.0,0.01,0.0,0.0,0.0,79.0,0.4,20.0,...,103.0,170.0,5.0,0.0,153.0,148.0,152.0,4.0,1.0,2.0
2124,140.0,0.00,0.0,0.01,0.0,0.0,0.0,78.0,0.4,27.0,...,103.0,169.0,6.0,0.0,152.0,147.0,151.0,4.0,1.0,2.0


In [3]:
used_features = ['ALTV', 'ASTV', 'LB', 'MLTV', 'MSTV', 'Max', 'Mean', 
                   'Median', 'Min', 'Mode', 'Nmax', 'Variance', 'Width'] 

In [4]:
X = ctg_raw[used_features]
y = ctg_raw['NSP']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state = 1)

In [5]:
# let's look at the (plain) accuracy first, it is returned by the 'score' method
classifier = DecisionTreeClassifier(random_state=42)
classifier.fit(X_train, y_train)
score_train = classifier.score(X_train, y_train)
score_test = classifier.score(X_test, y_test)
print('DT: accuracy on train = %f and on test = %f' % (score_train, score_test))

DT: accuracy on train = 0.999328 and on test = 0.913793


In [6]:
y_pred_train = classifier.predict(X_train)
cfm_train = metrics.confusion_matrix(y_train, y_pred_train)
print("Confusion Matrix on Training Data: ")
print(cfm_train)

y_pred_test = classifier.predict(X_test)
cfm_test = metrics.confusion_matrix(y_test, y_pred_test)
print("Confusion Matrix on Test Data: ")
print(cfm_test)


Confusion Matrix on Training Data: 
[[1165    0    0]
 [   1  198    0]
 [   0    0  124]]
Confusion Matrix on Test Data: 
[[471  14   5]
 [ 28  65   3]
 [  3   2  47]]


In [7]:
# define the cost-benefit-matrix
cbm = [[0, 1000, 1000], [9500, 0, 0], [20000, 0, 0]]
cbm

[[0, 1000, 1000], [9500, 0, 0], [20000, 0, 0]]

In [8]:
# simply multiply the confusion matrix by the cost/benefits and divide by the number
# of examinations in the test set
expected_cost = (cfm_test*cbm).sum()/cfm_test.sum()
print('Expected cost on the test data per examination: %.2f€' %(expected_cost))

Expected cost on the test data per examination: 540.75€


In [9]:
# let's compare this to a couple of other classifiers
def eval(name, classifier, X_train, y_train, X_test, y_test, cbm): 
    classifier.fit(X_train, y_train)
    score_train = classifier.score(X_train, y_train)
    score_test = classifier.score(X_test, y_test)
    print('Evaluating %s' % (name))
    print('Accuracy on train = %f and on test = %f' % (score_train, score_test))

    y_pred_train = classifier.predict(X_train)
    cfm_train = metrics.confusion_matrix(y_train, y_pred_train)
    print("Confusion Matrix on Training Data: ")
    print(cfm_train)

    y_pred_test = classifier.predict(X_test)
    cfm_test = metrics.confusion_matrix(y_test, y_pred_test)
    print("Confusion Matrix on Test Data: ")
    print(cfm_test)

    expected_cost = (cfm_test*cbm).sum()/cfm_test.sum()
    print('Expected result on the test data per sample: %.2f€' %(expected_cost))
    print()

In [10]:
# logistic regression (this is a classifier, the name is just badly choosen!)
eval('logistic Regression', LogisticRegression(solver="lbfgs", max_iter=10000), 
     X_train, y_train, X_test, y_test, cbm)
# Support Vector Machine
eval('SVM', SVC(kernel="linear"), 
     X_train, y_train, X_test, y_test, cbm)
# Gradient Boosting
eval('Gradient Boosting', GradientBoostingClassifier(), 
     X_train, y_train, X_test, y_test, cbm)

Evaluating logistic Regression
Accuracy on train = 0.890457 and on test = 0.866771
Confusion Matrix on Training Data: 
[[1121   32   12]
 [  88  105    6]
 [  11   14   99]]
Confusion Matrix on Test Data: 
[[470  16   4]
 [ 44  47   5]
 [  5  11  36]]
Expected result on the test data per sample: 843.26€

Evaluating SVM
Accuracy on train = 0.890457 and on test = 0.873041
Confusion Matrix on Training Data: 
[[1118   38    9]
 [  82  109    8]
 [  13   13   98]]
Confusion Matrix on Test Data: 
[[470  18   2]
 [ 36  50  10]
 [  4  11  37]]
Expected result on the test data per sample: 692.79€

Evaluating Gradient Boosting
Accuracy on train = 0.985887 and on test = 0.942006
Confusion Matrix on Training Data: 
[[1162    3    0]
 [  17  182    0]
 [   1    0  123]]
Confusion Matrix on Test Data: 
[[481   6   3]
 [ 23  71   2]
 [  2   1  49]]
Expected result on the test data per sample: 419.28€



In [1]:
# Let's try a few neural networks - using MLPClassifier
# Let us use SGD as the solver with an adaptive learning rate (and moments)
# and a mini-batch size of 64. For SGD, the max_iter specified the number of epochs, 1000 should be plenty.

In [None]:
# --- EOF ---