# Module 3: Evaluation

## Model Evaluation & Selection
### Learning Objectives 
1. Understand why accuracy only gives a partial picture of a classifier's performance.

2. Understand the motivation and definition of important evaluation metrics in machine learning.

3. Learn how to use a variety of evalution metrics to evaluate supervised machine learning models.

4. Learn about choosing the right metric for selecting between models or for doing parameter tuning.

### About Evaluation
1. Different applications have very different goals.

2. Accuracy is widely used, but many others are possible, e.g.
    - user satisfaction (web search)
    - amount of revenue (e-commerce)
    - increase in patient survival rates (medical)

3. It's very important to choose evaluation methods that match the goal of your application.

4. Compute your selected evaluation metric for multiple different models.

5. Then select the model with 'best' value of evaluation metric

### Accuracy with Imbalanced Classes
1. Suppose you have two classes:
    - Relevant (R): the positive class
    - Not_Relevant (N): the negative class
2. Out of 1000 randomly selected items, on average 
    - One item is relevant and has an R label
    - The rest of the items (999 of them) are not relevant and labelled N.
3. Recall that:
    Accuracy = **#correct predictions / # total instances**
4. You build a classifier to predict relevant items, and see that its accuracy on a test set is 99.9%. (You may think: wow, this is amazing. But wait..)

5. For comparison, suppose we had a 'dummy' classifier that didn't look at the features at all, and always just blindly predicted the most frequent class (i.e. the negative N class)

6. Assuming a test set of 1000 instances, what would this dummy classifier's accuracy be?
    - $Accuracy_{DUMMY} = 999/1000 = 99.9%$

### Dummy classifiers completely ignore the input data
1. dumm yclassifiers serve as a sanity check on your classifier's performance

2. They provide a *null metric* (e.g. null accuracy) baseline

3. Dummy classifiers should not be used for real problems

4. Som commonly-used settings for the strategy parameter for DummyClassifier in scikit-learn:
    - most_frequent: predicts the most frequent label in the training set
    - stratified: random preditions based on training set class distribution
    - unifrom: generates predictions uniformly at random
    - constant: always predicts a constant label provided by the user
        - a majormotivation of this method is F1-scoring, when the positive class is in the minority
        
### What if my classifier accuracy is close to the null accuracy baseline?
This could be a sign of:
1. Ineffective, erroneous or missing features

2. Poor choice of kernel or hyperparameter

3. Large class imbalance

### Dummy Regressors
*strategy* parameter options:
1. mean: predicts the mean of the training targets

2. median: predicts the median of the training targets

3. quantile: predicts a user-provided quantile of the training targets

4. constant: predicts a constant user-provided value

### Binary Prediction Outcomes 
<img src="https://img.ceclinux.org/6c/34448493ba9f254a61091449ba6f77c530231f.png">

### Confusion Matrix for Binary Prediction Task
<img src="https://img.ceclinux.org/18/7739eb732a66ee82dd5f6c78d1e33a2ab7828a.png">