# Interpreting models - Classification
https://machinelearningmastery.com/precision-recall-and-f-measure-for-imbalanced-classification/

https://blog.exsilio.com/all/accuracy-precision-recall-f1-score-interpretation-of-performance-measures/

* Step 1: Are you solving an imbalanced or a binary classification problem?
    - Binary: you are predicting Yes/No, On/Off, True/False
    - Multi-Class: you are predicting Red/Amber/Green or any 3+ member set

* Step 2: Define how your training data represents "reality"
    - Binary classification
        - If you have 98% True and 2% False, False is the **minority class**
        - Your model may have difficulty weighting as a result: suggest you create a 50/50 split for training data
    - Multi-Class classification
        - Same holds true - if you have 3 classes but your minority class is 1%, create a 33/33/33 split for training data

Regardless of whether you are doing Binary or Multi-class classification, if your dataset is "imbalanced", you will likely get better results if you create a balanced copy of it before continuing
   
* Step 3: Define the most important metrics for your balance type
Assumption is that you have a training set that is evenly split among rows for your predictors (i.e. 50/50 split).
   - Binary: 
       - Do use: Accuracy (general "feel" for how good the model is), Recall
       - Accuracy and F1
       - Don't use: Precision (ratio of "True positive to total positive predictions") isn't as helpful in binary with a 50/50 dataset
   - Imbalanced: 
       - Precision tells you the "accuracy for the minority class"
       - Precision or Recall, possibly F1

> Type I Error: False positive (rejection of a true null hypothesis)
> Type II Error: False negative (non-rejection of a false null hypothesis)
    
### Accuracy 
* Formula: Correct Predictions / Total Predictions (a.k.a. (TP+TN)/(TP+FP+FN+TN)
* Meaning: Ratio of correct to total predictions
* Use when: Probably your go to for most situations. You would use Recall or Precision if costs of being wrong are very high
    - 

### Precision
* Formula: TP/TP+FP
* Meaning: "What % of the total Positive predictions were right (i.e. True Positive)?"
    - "When the model predicts positive, how often is it correct?"
* ELI5: For an imbalanced data set, this calculates the accuracy for the minority class
* Use when: There is a high cost associated with a False Positive
    - Example - email spam detection: Use Precision bc the cost is high of a user missing an important email b/c it was falsely marked as spam 

### Recall
* Formula: TP/TP+FN
* Meaning: "When the model predicted correctly, what percent were Positive predictions?"
* Use when: When there is a high cost associated with a False Negative
    - Example - detect incoming nuclear missiles: a false negative here kills everyone so use Recall
    - Example - fraud detection: use Recall due to the high cost of predicting "False" when it was really fraud (i.e. "True")
    - Example - sick patient detection: use Recall due to the high cost of predicting "False" when patient was actually sick (i.e. "True")
        
### F1 Score
* Formula: 2* ((Recall * Precision) / (Recall + Precision))
* Meaning: It is a balance of Recall and Precision. 
* ELI5: "aA good F1 score means that you have low false positives and low false negatives, so you’re correctly identifying real threats and you are not disturbed by false alarms."
* Use when: you want a balance of Precision and Recall