# Accuracy

The simplest way of reporting the effectiveness of an algorithm is by calculating its accuracy. Accuracy is calculated by finding the total number of correctly classified points and dividing by the total number of points.

Let’s say you’re using a machine learning algorithm to try to predict whether or not you will get above a B on a test. The features of your data could be something like:

* The number of hours you studied this week.
* The number of hours you watched Netflix this week.
* The time you went to bed the night before the test.
* Your average in the class before taking the test.

In other words, accuracy can be defined as:

![accuracy.png](attachment:accuracy.png)

Let’s define those terms in the context of our grade example :

* **True Positive**: The algorithm predicted you would get above a B, and you did.
* **True Negative**: The algorithm predicted you would get below a B, and you did.
* **False Positive**: The algorithm predicted you would get above a B, and you didn’t.
* **False Negative**: The algorithm predicted you would get below a B, and you didn’t.

In [2]:
labels = [1, 0, 0, 1, 1, 1, 0, 1, 1, 1]
guesses = [0, 1, 1, 1, 1, 0, 1, 0, 1, 0]

In [2]:
true_positives = 0
true_negatives = 0
false_positives = 0
false_negatives = 0

In [3]:
for i in range(len(guesses)):
    #True Positives
    if labels[i] == 1 and guesses[i] == 1:
        true_positives += 1
    #True Negatives
    if labels[i] == 0 and guesses[i] == 0:
        true_negatives += 1
    #False Positives
    if labels[i] == 0 and guesses[i] == 1:
        false_positives += 1
    #False Negatives
    if labels[i] == 1 and guesses[i] == 0:
        false_negatives += 1

In [4]:
accuracy = (true_positives + true_negatives) / len(guesses)
accuracy

0.3

# Recall

Accuracy can be an extremely misleading statistic depending on your data.

In this situation, the statistic that would be helpful is recall. Recall measures the percentage of relevant items that your classifier found.

![recall.png](attachment:recall.png)

In essence, recall tells you what proportion of actual positive instances in the dataset were correctly identified by the model.

A high recall value is desirable when the cost of false negatives is high. For example, in a medical diagnosis system, you would want high recall to ensure that as many positive cases (e.g., patients with a disease) as possible are correctly identified, even if it means some false positives (healthy patients misclassified as having the disease).

Remember that recall should be used in combination with other metrics, like precision and F1 score, to get a more comprehensive evaluation of the model's performance.

In [5]:
recall = true_positives/(true_positives + false_negatives)
recall

0.42857142857142855

# Precision


Precision is another metric used to evaluate the performance of a classification model, especially in binary classification problems. It measures the ability of the model to correctly identify positive predictions among all the instances that the model predicted as positive.

**Precision = True Positives / (True Positives + False Positives)**

where:

* True Positives (TP) are the number of correctly predicted positive instances.
* False Positives (FP) are the number of negative instances that were incorrectly classified as positive by the model.

Precision focuses on the positive predictions made by the model, and it tells you what proportion of predicted positive instances were actually correct.

A high precision value is desirable when the cost of false positives is high. For instance, in a spam email detection system, high precision is crucial as you want to minimize the number of legitimate emails (false positives) that are classified as spam.

In [6]:
precision = true_positives/(true_positives + false_positives)
precision

0.5

# F1 Score

The F1 score is a single metric that combines both precision and recall into one performance measure for a classification model, particularly in binary classification problems. It provides a balanced evaluation that takes into account both false positives and false negatives.

The F1 score is calculated as the harmonic mean of precision and recall:

**F1 Score = 2 * (Precision * Recall) / (Precision + Recall)**

where:

* Precision is the true positives divided by the sum of true positives and false positives.
* Recall is the true positives divided by the sum of true positives and false negatives.

The F1 score ranges from 0 to 1, with 1 being the best possible score, representing perfect precision and recall, and 0 being the worst score, indicating poor performance.

The F1 score is especially useful when you want to find a balance between precision and recall. In some scenarios, you may want to achieve both high precision and high recall, but in many cases, there is a trade-off between the two. The F1 score penalizes models that have imbalanced precision and recall, making it a more robust metric for overall performance evaluation.

In [7]:
f_1 = 2*(precision * recall)/(precision + recall)
f_1

0.4615384615384615

* Accuracy measures how many classifications your algorithm got correct out of every classification it made.
* Recall measures the percentage of the relevant items your classifier was able to successfully find.
* Precision measures the percentage of items your classifier found that were actually relevant.
* Precision and recall are tied to each other. As one goes up, the other will go down.
* F1 score is a combination of precision and recall.
* F1 score will be low if either precision or recall is low.

The Python library scikit-learn has some functions that will calculate these statistics for you!

In [6]:
!pip install scikit-learn

Collecting scikit-learn
  Obtaining dependency information for scikit-learn from https://files.pythonhosted.org/packages/77/85/bff3a1e818ec6aa3dd466ff4f4b0a727db9fdb41f2e849747ad902ddbe95/scikit_learn-1.3.0-cp311-cp311-win_amd64.whl.metadata
  Downloading scikit_learn-1.3.0-cp311-cp311-win_amd64.whl.metadata (11 kB)
Collecting scipy>=1.5.0 (from scikit-learn)
  Obtaining dependency information for scipy>=1.5.0 from https://files.pythonhosted.org/packages/04/b8/947f40706ee2e316fd1a191688f690c4c2b351c2d043fe9deb9b7940e36e/scipy-1.11.1-cp311-cp311-win_amd64.whl.metadata
  Downloading scipy-1.11.1-cp311-cp311-win_amd64.whl.metadata (59 kB)
     ---------------------------------------- 0.0/59.1 kB ? eta -:--:--
     --------------------------- ------------ 41.0/59.1 kB 2.0 MB/s eta 0:00:01
     ---------------------------------------- 59.1/59.1 kB 1.0 MB/s eta 0:00:00
Collecting joblib>=1.1.1 (from scikit-learn)
  Obtaining dependency information for joblib>=1.1.1 from https://files.pythonh

In [1]:
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score

In [3]:
accuracy_score(labels, guesses)

0.3

In [4]:
recall_score(labels, guesses)

0.42857142857142855

In [5]:
precision_score(labels, guesses)

0.5

In [6]:
f1_score(labels, guesses)

0.4615384615384615