## Estimating the predictive accuracy of a classifier
- **predictive accuracy of a classifier** = the proportion of a set of unseen instances that it correctly classifies
- Estimates by measuring its accuracy for a sample of data not used when it was generated
- The available data is split into two parts: a _training set_ and a _test set_. 
- The training set is used to construct a classifier (decision tree, naïve bayes, etc.). 
- The classifier is then used to predict the classification for the instances in the test set. 
- If the test set contains N instances of which C are correctly classified the predictive accuracy of the classifier for the test set is p = C/N. 
- This can be used as an estimate of its performance on any unseen dataset.
![](images/train_test_split.png)
source: (Bramer, 2016)  
  
    
- A random division into two parts in proportions such as 1:1, 2:1, 70:30 or 60:40 is customary (the largest part is the training set). 
- Obviously, the larger the proportion of the training set, the better the model, but the worse the correctness of the calculated accuracy and vice versa. 

## Other estimators for the predictive accuracy

- The above method is not always sufficient
- Suppose you want to predict if a person has cancer based on some features like tumor size, tumor shape and age. 
- If your test set is a subset of the complete population we know that only a very small subset of the people (fortunately) 
really have cancer. 
- So if you always predict “no cancer” you will probably have an accuracy of over 95%, but are still missing all the persons who have cancer. 
- Your model should at least be better than this “guessing” method.  
  
Solution: breakdown the classifier's performance:
- how frequently instances of class X were correctly classified as class X or misclassified as some other class
- visualized by **confusion matrix**
- e.g. (very accurate) e-mail classification model wich classifies e-mails (from mailing lists) into 4 categories: ADS, ICT, JOB and NEWS    
  
![](images/confusion_matrix.png)

- For binary classifiers (like cancer/no cancer) the confusion matrix looks like this:  
  
![](images/confusion_matrix_tp_tn.png)  
  
In medicine false positives are considered as Type I errors and false negatives as  Type II errors:  
  
![](images/pregnant.png) 
  
Several measures are linked to this observation for binary classifiers: 
  
- _false positive rate_ : the proportion of all negatives that still yield positive test outcomes  
    
  <p align="center">FP-rate = $\frac{FP}{FP + TN}$</p>
  
  
- _false negative rate_ : the proportion of all positives which yield negative test outcomes  
    
  <p align="center">FN-rate = $\frac{FN}{TP + FN}$</p>
   
Obviously, in the cancer case we have to avoid false negatives, because in that case we are missing people really having cancer, so the false negative rate is an important measure. 
  
- _precision_ : the proportion of all samples that were predicted as positive which are really positive
    
  <p align="center">$P = \frac{TP}{TP + FP}$</p>  
   
   
- _recall_ : the proportion of all positives which are correctly predicted
    
  <p align="center">$R = \frac{TP}{TP + FN}$</p>  
  
  
- For a classifier that predicts all samples correct both precision and recall are equal to 1. For an algorithm that always predicts 1 the recall = 1 (because there are no false negatives), but the precision only equals the positive rate in the dataset because you can have a large number of false positives. Therefore it's important to have a single measure that combines both precision and recall, the _F-score_. 
    
  <p align="center">$F = 2\frac{P \cdot R}{P + R}$</p>   
  
   
 - Finally, we can define the _accuracy score_ we have seen above as:  
     
  <p align="center">$\alpha = \frac{TP + TN}{TP + TN + FP + FN}$</p>
 
   
  
### Exercise
  
  
In een artikel in de Gazet van Antwerpen van 27 juli 2017 stond het volgende te lezen.   
  
Bij een NIPT-test wordt bloed afgenomen (van een zwangere vrouw), om te weten te komen of een ongeboren kind het Syndroom van Down heeft. Na twee weken zou je dan 99 procent zeker weten of je kind aan het syndroom lijdt of niet. Maar dat klopt niet, zeggen onderzoekers van de Gentse Universiteit. Het is helemaal niet zeker dat je kind het syndroom heeft, ook al geeft de test dat aan.  
  
“Eigenlijk is dat zeer afhankelijk van de risico’s die er op voorhand zijn. Bij jonge vrouwen die geen afwijkingen hebben op hun echografie, kan de kans op toch een gezond kind tot vijftig procent zijn”, zegt Heidi Mertes, onderzoekster van de Universiteit Gent.  
  
Volgens nieuwe berekeningen van de UGent heeft een gemiddelde vrouw van 40 jaar 93 procent kans dat haar kind ook echt Down heeft na een afwijkende NIPT-test. Bij een vrouw van 35 is dat 79 procent, op 30 jaar 61 procent en op 25 jaar nog maar 51 procent.  
  

Welke van onderstaande uitspraken is dan waar?
  
   
1.	De true positive rate is 99%
1.	Bij een vrouw van 40 is de true negative rate 93%
1.	Bij een vrouw van 35 is de precisie 79%
1.	Bij alle vrouwen is de true negative rate 50%