# Accuracy

In [None]:
if (!require("Metrics")) install.packages("Metrics")
library("Metrics")

Suppose we had a machine learning model, which was predicting a
categorical result like `yes` or `no` based on some predictors.

Here is an example where we are predicting `Purchased` based on `Salary`
and `Age`. Suppose we started with this **actual** data:

#### Actual Observed Data

| Salary | Age | Purchased |
|--------|-----|-----------|
| 53900  | 45  | yes       |
| 50000  | 32  | no        |
| 55900  | 57  | yes       |
| 55600  | 29  | yes       |

This means we know for sure that there was someone with `53900` salary
and who was `45` and they did purchase whatever we are interested in
here.

Suppose we came up with a model, that made predictions of this data.

And suppose this model made predictions like these:

#### Model Predictions

| Salary | Age | Prediction |
|--------|-----|------------|
| 53900  | 45  | no         |
| 50000  | 32  | no         |
| 55900  | 57  | yes        |
| 55600  | 29  | yes        |

We are interested how accurate our model is on this data.

So for our predictions, how many of them did we get wrong?

## Accuracy of Predictions

Suppose we have some actuals and predicted to compare.

In [None]:
actuals   <- c("yes", "no", "yes", "yes")
predicted   <- c("no", "no", "yes", "yes")
df<- data.frame(actuals, predicted)
print(df)

  actuals predicted
1     yes        no
2      no        no
3     yes       yes
4     yes       yes

### Calculating Accuracy

We can find the accuracy from this table as follows:

We calculate the proportion of agreement. This is called the
**accuracy** of the model. The formula is just this:

$$accuracy = \frac{\text{number of correct predictions}}{\text{number of all predictions}}$$

In [None]:
accuracy(actuals, predicted)

[1] 0.75

### Confusion Matrix

Terminology:

The prediction is called **positive** or **negative**:

-   When the **prediction** is **yes** that is called a **positive**.
-   When the **prediction** is **no** that is called a **negative**.

The prediction is correct or incorrect:

-   **true** means the prediction was correct
-   **false** means the prediction was incorrect

| Prediction Correct? | Prediction           |
|---------------------|----------------------|
| True or False       | Positive or Negative |

So we have *true positive*, *false positive*, *true negative*, and
*false negative*

-   $TP$ prediction was yes, actual was yes
-   $FP$ prediction was yes, actual was no
-   $TN$ prediction was no, actual was no
-   $FN$ prediction was no, actual was yes

We can print out the confusion matrix like this:

In [None]:
table(actuals, predicted)

       predicted
actuals no yes
    no   1   0
    yes  1   2

## Confusion Matrix

|            |            | **Predicted** |            |
|------------|------------|---------------|------------|
|            |            | *Negative*    | *Positive* |
| **Actual** | *Negative* | TN            | FP         |
|            | *Positive* | FN            | TP         |

Here are the results from the above:

-   $TP$ prediction yes, actual yes - ??? times
-   $FP$ prediction yes, actual no - ??? times
-   $TN$ prediction no, actual no - ??? time
-   $FN$ prediction no, actual yes - ??? time

## Model 2

In [None]:
actuals   <- c("yes", "no", "yes", "yes")
predicted   <- c("yes", "yes", "yes", "no")
df<- data.frame(actuals, predicted)
df

  actuals predicted
1     yes       yes
2      no       yes
3     yes       yes
4     yes        no

### Calculating Accuracy

In [None]:
accuracy(actuals, predicted)

[1] 0.5

### Confusion Matrix

In [None]:
table(actuals, predicted)

       predicted
actuals no yes
    no   0   1
    yes  1   2

-   $TP$ prediction yes, actual yes - ??? times
-   $FP$ prediction yes, actual no - ??? times
-   $TN$ prediction no, actual no - ??? time
-   $FN$ prediction no, actual yes - ??? time

Finally we can write the accuracy in terms of these:

$$
accuracy = \frac{\text{number of correct predictions}}{\text{number of all predictions}} = \frac{TP+TN}{TP+TN+FP+FN}
$$

### Similarities with Jury Trials and Hypothesis Testing

|            |            | **Predicted** |            |
|------------|------------|---------------|------------|
|            |            | *Negative*    | *Positive* |
| **Actual** | *Negative* | TN            | FP         |
|            | *Positive* | FN            | TP         |

Analogies:

### Jury and Trials

|            |            | **Jury**    |                 |
|------------|------------|-------------|-----------------|
|            |            | *Innocent*  | *Guilty*        |
| **Actual** | *Innocent* | ok          | innocent jailed |
|            | *Guilty*   | guilty free | ok              |

### Hypothesis Testing (from Stat)

|            |            | **Hyp Test of Null** |          |
|------------|------------|----------------------|----------|
|            |            | *accept*             | *reject* |
| **Actual** | *true*     | ok                   | type 1   |
|            | *not true* | type 2               | ok       |

### Pregnancy Tests

|            |       | **Pregnancy Test** |            |
|------------|-------|--------------------|------------|
|            |       | *no*               | *yes*      |
| **Actual** | *no*  | test-, no          | test+, no  |
|            | *yes* | test-, yes         | test+, yes |