# 1. Accuracy

The simplest way to determine the effectiveness of a classification model is prediction accuracy. Accuracy helps us answer the question:

- **What fraction of the predictions were correct (actual label matched predicted label)?**

Prediction accuracy boils down to the number of labels that were correctly predicted divided by the total number of observations:

$$Accuracy = \dfrac{\text{# of Correctly Predicted}}{\text{# of Observations}}$$

In logistic regression, recall that the model's output is a probability between 0 and 1. To decide who gets admitted, we set a threshold and accept all of the students where their computed probability exceeds that threshold. This threshold is called the **discrimination threshold** and scikit-learn sets it to `0.5` by default when predicting labels. If the predicted probability is greater than `0.5`, the label for that observation is `1`. If it is instead less than `0.5`, the label for that observation is `0`.

An accuracy of `1.0` means that the model predicted **100%** of admissions correctly for the given discrimination threshold. An accuracy of `0.2` means that the model predicted **20%** of the admissions correctly.

# 2. Binary classification outcomes

Calculating the accuracy of a model on the dataset used for training is a useful initial step just to make sure the model at least beats randomly assigning a label for each observation. However, prediction accuracy doesn't tell us much more.

The accuracy doesn't tell us how the model performs on data it wasn't trained on. A model that returns a 100% accuracy when evaluated on it's training set doesn't tell us how well the model works on data it's never seen before (and wasn't trained on). Accuracy also doesn't help us discriminate between the different types of outcomes a binary classification model can make. In a later mission, we'll learn how to evaluate a model's effectiveness on new, unseen data. In this mission, we'll focus on the principles of evaluating binary classification models by testing our model's effectiveness on the training data.

To start, let's discuss the 4 different outcomes of a binary classification model:

| Prediction       | Observation |
|------------------|-------------|
|                  | **Admitted (1)**    | **Rejected (0)**    |
| **Admitted (1)** | True Positive (TP)  | False Positive (FP) |
| **Rejected (0)** | False Negative (FN) | True Negative (TN)  |

By segmenting a model's predictions into these different outcome categories, we can start to think about other measures of effectiveness that give us more granularity than simple accuracy.

We can define these outcomes as:

- True Positive - The model correctly predicted that the student would be admitted.

    - Said another way, the model predicted that the label would be **Positive**, and that ended up being **True**.
    - In our case, **Positive** refers to being admitted and maps to the label `1` in the dataset.
    - A true positive is whenever `predicted_label` is `1` and `actual_label` is `1`.

- True Negative - The model correctly predicted that the student would be rejected.

    - Said another way, the model predicted that the label would be **Negative**, and that ended up being **True**.
    - In our case, **Negative** refers to being rejected and maps to the label `0` in the dataset.
    - A true negative is whenever `predicted_label` is `0` and `actual_label` is `0`.

- False Positive - The model incorrectly predicted that the student would be admitted even though the student was actually rejected.

    - Said another way, the model predicted that the label would be **Positive**, but that was **False** (the actual label was False).
    - A false positive is whenever `predicted_label` is `1` but the actual_label is `0`.

- False Negative - The model incorrectly predicted that the student would be rejected even though the student was actually admitted.

    - Said another way, the model predicted that the would be **Negative**, but that was **False** (the actual value was **True**).
    - A false negative is whenever `predicted_label` is `0` but the `actual_label` is `1`.
    
# 3. Sensitivity

Let's now look at a few measures that are much more insightful than simple accuracy. Let's start with **sensitivity**:

- **Sensitivity** (or **True Positive Rate**) - The proportion of applicants that were correctly admitted:

$$TPR=\dfrac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}$$

Of all of the students that should have been admitted (True Positives + False Negatives), what fraction did the model correctly admit (True Positives)? More generally, this measure helps us answer the question:

- **How effective is this model at identifying positive outcomes?**

If the **True Positive Rate** is low, it means that the model isn't effective at catching positive cases. For certain problems, high sensitivity is incredibly important. If we're building a model to predict which patients have cancer, every patient that is missed by the model could mean a loss of life. We want a **highly sensitive** model that is able to "catch" all of the positive cases (in this case, the positive case is a patient with cancer).

# 4. Specificity

Looks like the sensitivity of the model is around **12.7%** and only about 1 in 8 students that should have been admitted were actually admitted. In the context of predicting student admissions, this probably isn't too bad of a thing. Graduate schools can only admit a select number of students into their programs and by definition they end up rejecting many qualified students that would have succeeded.

In the healthcare context, however, low sensitivity could mean a severe loss of life. If a classification model is only catching **12.7%** of positive cases for an illness, then around 7 of 8 people are going undiagnosed (being classified as false negatives). Hopefully you're beginning to acquire a sense for the tradeoffs predictive models make and the importance of understanding the various measures.

Let's now learn about **specificity**:

**Specificity** (or **True Negative Rate**) - The proportion of applicants that were correctly rejected:

$$TNR=\dfrac{\text{True Negatives}}{\text{False Positives} + \text{True Negatives}}$$

This helps us answer the question:

- **How effective is this model at identifying negative outcomes?**

In our case, the specificity tells us the proportion of applicants who should be rejected (`actual_label` equal to `0`, which consists of False Positives + True Negatives) that were correctly rejected (just True Negatives).

A high specificity means that the model is really good at predicting which applicants should be rejected.