# Implement the classification metrics from scratch

To this exercise you will need to implement the precision, recall, and f1-measure without using scikit-learn or any other library that already implements such metrics.

Your algorithm should take as input the predictions made on the test set (y_pred) and the actual class values of such set (y_test).

You will need to find at least the TP, FP, and FN to compute the three metrics.

You can use this part of code to help your implementation or you can define your own code.

In [2]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import numpy as np

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

def precision(y_true, y_pred):
    """
    Calculate the precision score
    :param y_true: True labels
    :param y_pred: Predicted labels
    :return: Precision score
    """
    #Your code here

    # "true_positive" is where both true labels and predicted labels are 1.
    true_positive = np.sum((y_true == 1) & (y_pred == 1))

    # "false_positive" is where both true labels are 0 and predicted labels are 1.
    false_positive = np.sum((y_true == 0) & (y_pred == 1))

    # Precision is the number of correctly predicted positives (True Positive) 
    # divided by all predicted positives (True Positive + False Positive).
    # It measures the accuracy of the positive predictions.
    precision = true_positive / (true_positive + false_positive) if (true_positive + false_positive) > 0 else 0
    return precision

def recall(y_true, y_pred):
    """
    Calculate the recall score
    :param y_true: True labels
    :param y_pred: Predicted labels
    :return: Recall score
    """
   #Your code here
    # Count where both true labels and predicted labels are 1
    true_positive = np.sum((y_true == 1) & (y_pred == 1))

    # Count where true labels are 1 (positive class) but predicted labels are 0
    false_negative = np.sum((y_true == 1) & (y_pred == 0))
    
    # The recall shows the percentage of positive results predicted by our model.
    # Recall is similar to Precision
    # He is the number of correctly predicted positives (True Positive) divided by all positives (True Positive + False Negative).
    recall = true_positive / (true_positive + false_negative) if (true_positive + false_negative) > 0 else 0
    return recall

def f1_score(y_true, y_pred):
    """
    Calculate the F1 score
    :param y_true: True labels
    :param y_pred: Predicted labels
    :return: F1 score
    """
   #Your code here
    # While useful, neither precision nor recall can fully evaluate a Machine Learning model.
    # Those metrics indicating that our model is efficient, when in fact it's naïve. 
    # In other words, the high score presented will not indicate real performance.

    # Fortunately for us, The F1 Score provides a relatively accurate assessment of our model's performance.

    precision_val = precision(y_true, y_pred)
    recall_val = recall(y_true, y_pred)

    # F1-score is calculated as the harmonic mean of precision and recall. 
    # It provides a balance between precision and recall, 
    # making it a useful metric when there is an imbalance between the classes.
    f1 = 2 * (precision_val * recall_val) / (precision_val + recall_val) if (precision_val + recall_val) > 0 else 0
    return f1

# Fit a model and make predictions
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Calculate precision, recall and F1-score
p = precision(y_test, y_pred)
r = recall(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("Precision: {:.2f}".format(p))
print("Recall: {:.2f}".format(r))
print("F1-Score: {:.2f}".format(f1))


Precision: 0.97
Recall: 0.94
F1-Score: 0.95


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
