<img src="images/thro.png" align="right"> 
# A2I2 - Artificial Neural Networks (ANN)

## <span style="color:red">Lecture - Part 3 - Exercise Solution</span>

---
## Step 3: Modeling and Evaluation (Cost-Benefit-Matrix)

## <span style="color:blue">Reminder</span>

### <span style="color:blue">Business Problem: </span> Optimize wine revenue

Wine can have different quality - better wines are more expensive than average or below average wines. 

### <span style="color:blue">Understanding the business</span>

**Wine Making**
* Wine is made from grapes.
* Price of the wine depends on the **expected** quality about 5 to 10 years after production.
* Two options: 
    * keep wine until the quality is known and sell for "correct" price -> lots of storage needed
    * estimate quality and sell immediately  
* Many chemical attributes of the (newly made) wine can be measured easily (e.g. acidity, sugar, pH, ...)

### <span style="color:blue">Mapping to Data Science Problems and Methods</span>

* Predict wine quality based on measured attributes (classification)


### <span style="color:blue">Discussion</span>

Note that your data science problem above is still very soft and not specific enough to easily come up with a cost-benefit matrix. What will our business user do with the predicted quality? Let us assume that the exact price of a quality cannot be estimated as it depends on many other factors as well. 

A much better wording of the business problem could be: The winery has decided that the best usage of the restricted storage space is to store all batches of wines with a quality of at least 7 and sell them after five years. They expect to earn an additional 30.000 EUR as compared to selling the wine immediately. All wines with quality of 6 and below will be sold immediately as the expectations is that the price will only increas by 500 EUR per batch. Storing a batch of wine costs 10.000 EUR.

## <span style="color:blue">Your Job</span>

a) Re-Define the data science problem and the method to use

b) Build multiple models and evalute them

c) Recommend the best model

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.svm import SVC
from sklearn import metrics
from sklearn.model_selection import train_test_split
import sklearn as sklearn

In [2]:
# eval function copied from the code in the lecture
def eval(name, classifier, X_train, y_train, X_test, y_test, cbm): 
    classifier.fit(X_train, y_train)
    score_train = classifier.score(X_train, y_train)
    score_test = classifier.score(X_test, y_test)
    print('Evaluating %s' % (name))
    print('Accuracy on train = %f and on test = %f' % (score_train, score_test))

    y_pred_train = classifier.predict(X_train)
    cfm_train = metrics.confusion_matrix(y_train, y_pred_train)
    print("Confusion Matrix on Training Data: ")
    print(cfm_train)

    y_pred_test = classifier.predict(X_test)
    cfm_test = metrics.confusion_matrix(y_test, y_pred_test)
    print("Confusion Matrix on Test Data: ")
    print(cfm_test)

    expected_cost = (cfm_test*cbm).sum()/cfm_test.sum()
    print('Expected result on the test data per sample: %.2f€' %(expected_cost))
    print()

**a) Data Science Problem and method**

Predict if a wine is quality 7 or above (the positive case) or quality 6 and below (the negative case) using classification.

**b) CBM, modeling and evaluation**

In [3]:
# read and clean the data
wine_raw = pd.read_csv('data/winequality-white.csv', delimiter=';')
wine_raw['target'] = (wine_raw['quality'] >= 7.0).astype(int)
wine_is_outlier_attribute = (np.abs(wine_raw - np.mean(wine_raw)) > np.std(wine_raw)*5)
wine_is_outlier_tupel = wine_is_outlier_attribute.any(axis=1)
wine = wine_raw[-wine_is_outlier_tupel]
wine

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,target
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.00100,3.00,0.45,8.8,6,0
1,6.3,0.30,0.34,1.6,0.049,14.0,132.0,0.99400,3.30,0.49,9.5,6,0
2,8.1,0.28,0.40,6.9,0.050,30.0,97.0,0.99510,3.26,0.44,10.1,6,0
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.99560,3.19,0.40,9.9,6,0
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.99560,3.19,0.40,9.9,6,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
4893,6.2,0.21,0.29,1.6,0.039,24.0,92.0,0.99114,3.27,0.50,11.2,6,0
4894,6.6,0.32,0.36,8.0,0.047,57.0,168.0,0.99490,3.15,0.46,9.6,5,0
4895,6.5,0.24,0.19,1.2,0.041,30.0,111.0,0.99254,2.99,0.46,9.4,6,0
4896,5.5,0.29,0.30,1.1,0.022,20.0,110.0,0.98869,3.34,0.38,12.8,7,1


In [4]:
wine.target.value_counts()

0    3753
1    1059
Name: target, dtype: int64

In [5]:
used_features = list(set(wine.columns) - set(['quality', 'target']))
used_features

['fixed acidity',
 'pH',
 'density',
 'free sulfur dioxide',
 'citric acid',
 'total sulfur dioxide',
 'residual sugar',
 'volatile acidity',
 'chlorides',
 'sulphates',
 'alcohol']

In [6]:
X = wine[used_features]
y = wine['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)

Confusion Matrix / Cost-Benefit-Matrix: 

$$\begin{bmatrix}
 & Predicted-Negative & Predicted-Positive\\
True-Condition-Negative & TN & FP\\
True-Condition-Positive & FN & TP \\
\end{bmatrix}$$

* TN (truly bad quality, predicted bad quality): benefit 500
* FP (truly bad quality, predicted good quality): benefit 500-10.000 = -9.500

* FN (truly good quality, predicted bad quality): benefit 500
* TP (truly good quality, predicted good quality): benefit 30.000 - 10.000 = 20.000

Note that we are not account for "lost opportunities" - if we would account for this, we would count these revenues twice. This is very important! Think about a good quality wine. If one classifiers classifies ist as bad, we get a benefit of 500EUR. If the other classifier classifies it as good, we get a benefit of 20.000EUR  - so the missclassification costs us 19.500 EUR. If we put the -19.500 EUR as value for FN in the CBM, we would "gain" 38.000 EUR by changing this classification - which clearly is wrong. This is one of the most widespread and dangerous mistakes when setting up a CBM!

This leads to the following cost-benefit-matrix:
$$\begin{bmatrix}
 & Predicted-Negative & Predicted-Positive\\
True-Condition-Negative & 500 & -9.500\\
True-Condition-Positive & 500 & +20.000 \\
\end{bmatrix}$$

In [7]:
# define the cost-benefit-matrix
cbm = [[500, -9500], [500, 20000]]
cbm

[[500, -9500], [500, 20000]]

In [8]:
# decision tree
eval('Decision Tree', DecisionTreeClassifier(random_state=42), 
     X_train, y_train, X_test, y_test, cbm)
# logistic regression (this is a classifier, the name is just badly choosen!)
eval('logistic Regression', LogisticRegression(solver="lbfgs", max_iter=10000), 
     X_train, y_train, X_test, y_test, cbm)
# Support Vector Machine
eval('SVM', SVC(kernel="rbf"), 
     X_train, y_train, X_test, y_test, cbm)
# Gradient Boosting
eval('Gradient Boosting', GradientBoostingClassifier(), 
     X_train, y_train, X_test, y_test, cbm)

Evaluating Decision Tree
Accuracy on train = 1.000000 and on test = 0.822715
Confusion Matrix on Training Data: 
[[2592    0]
 [   0  776]]
Confusion Matrix on Test Data: 
[[1002  159]
 [  97  186]]
Expected result on the test data per sample: 1910.66€

Evaluating logistic Regression
Accuracy on train = 0.792755 and on test = 0.819252
Confusion Matrix on Training Data: 
[[2459  133]
 [ 565  211]]
Confusion Matrix on Test Data: 
[[1105   56]
 [ 205   78]]
Expected result on the test data per sample: 1165.51€

Evaluating SVM
Accuracy on train = 0.769596 and on test = 0.804017
Confusion Matrix on Training Data: 
[[2592    0]
 [ 776    0]]
Confusion Matrix on Test Data: 
[[1161    0]
 [ 283    0]]
Expected result on the test data per sample: 500.00€

Evaluating Gradient Boosting
Accuracy on train = 0.861936 and on test = 0.851801
Confusion Matrix on Training Data: 
[[2474  118]
 [ 347  429]]
Confusion Matrix on Test Data: 
[[1095   66]
 [ 148  135]]
Expected result on the test data per sam

Interpretation: The Decision Tree is clearly overfitting (accuracy on training data = 100%, accuracy on test data = 82%). Gradient boosting has the best accuracy on test (85%), however, it is outperfomed by the (overfitting) decision tree when evaluating the cost-benefit (you may get slightly different results as there is some randomness involved)

In [1]:
# Let's try a few neural networks - using MLPClassifier
# Let us use SGD as the solver with an adaptive learning rate (and moments)
# and a mini-batch size of 64. For SGD, the max_iter specified the number of epochs, 1000 should be plenty.

In [10]:
# --- EOF ---