About Dataset

Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.
n the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34].

This database is also available through the UW CS ftp server:
ftp ftp.cs.wisc.edu
cd math-prog/cpo-dataset/machine-learn/WDBC/

Also can be found on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

Attribute Information:

1) ID number
2) Diagnosis (M = malignant, B = benign)
3-32)

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" - 1)

The mean, standard error and "worst" or largest (mean of the three
largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.

All feature values are recoded with four significant digits.

Missing attribute values: none

Class distribution: 357 benign, 212 malignant

---

# Exercise 1: Evaluation Metrics Implementation

Below is a step-by-step implementation in Python to calculate Accuracy, Precision, Recall, and F1-Score from scratch using the Breast Cancer Wisconsin (Diagnostic) dataset from scikit-learn.


## 1. Import Required Libraries and Load Dataset

In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix 

In [5]:
rseed = 42

In [2]:
# load the breast cancer dataset
data = load_breast_cancer()
X = data.data # Features
y = data.target # Target (0 = malignant, 1 = benign)

In [3]:
# split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=rseed)


---

## 2. Train a Logistic Regression Model and Make Predictions


In [6]:
# train a logistic regression model
model = LogisticRegression(max_iter=1000, random_state=rseed)
model.fit(X_train, y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [7]:
# make predictions
y_pred = model.predict(X_test)


---

## 3. Write Functions to Calculate Metrics from Scratch

### Confusion Matrix Components:
- **TP**: True Positives (correctly predicted positive cases)
- **TN**: True Negatives (correctly predicted negative cases)
- **FP**: False Positives (incorrectly predicted positive cases)
- **FN**: False Negatives (incorrectly predicted negative cases)

### Metric Functions:

In [8]:
def calculate_confusion_matrix(y_true, y_pred):
    """
    Calculate TP, TN, FP, FN from the confusion matrix.
    """
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
    return tp, tn, fp, fn

def accuracy(tp, tn, fp, fn):
    """
    Calculate Accuracy.
    """
    return (tp + tn) / (tp + tn + fp + fn)

def precision(tp, fp):
    """
    Calculate Precision.
    """
    return tp / (tp + fp) if (tp + fp) > 0 else 0

def recall(tp, fn):
    """
    Calculate Recall.
    """
    return tp / (tp + fn) if (tp + fn) > 0 else 0

def f1_score(precision_value, recall_value):
    """
    Calculate F1-Score.
    """
    return 2 * (precision_value * recall_value) / (precision_value + recall_value) if (precision_value + recall_value) > 0 else 0


---

## 4. Evaluate the Model Using Custom Metric Functions


In [9]:
# get confusion matrix components
tp, tn, fp, fn = calculate_confusion_matrix(y_test, y_pred)

In [10]:
# calculate metrics
acc = accuracy(tp, tn, fp, fn)
prec = precision(tp, fp)
rec = recall(tp, fn)
f1 = f1_score(prec, rec)

In [11]:
# print the metrics
print(f"## Confusion matrix: TP={tp}, TN={tn}, FP={fp}, FN={fn}")
print(f"## Accuracy: {acc:.4f}")
print(f"## Precision: {prec:.4f}")
print(f"## Recall: {rec:.4f}")
print(f"## F1-Score: {f1:.4f}")

## Confusion matrix: TP=70, TN=39, FP=4, FN=1
## Accuracy: 0.9561
## Precision: 0.9459
## Recall: 0.9859
## F1-Score: 0.9655



---

## 5. Interpret the Results



### Interpretation:
- **Accuracy**: The model correctly predicts ~95.61% of all cases, indicating strong overall performance.
- **Precision**: Among all cases predicted as "positive" (e.g., loan default or disease presence), ~94.59% are actually positive. This indicates very few false positives, meaning the model is highly reliable when it predicts a positive case.
- **Recall**: The model identifies ~98.59% of all actual positive cases correctly. Very few positive cases are missed (false negatives), which is crucial in contexts like medical diagnosis or loan default prediction where missing true positives can have severe consequences.
- **F1-Score**: The harmonic mean of precision and recall is ~96.55%, indicating an excellent balance between these two metrics.

### Observations:
- High precision means the model rarely misclassifies negative cases as positive, minimizing unnecessary actions (e.g., unnecessary medical tests or wrongly flagged defaults).
- Extremely high recall ensures that nearly all actual positive cases are detected, making the model highly effective in identifying critical cases.
- The F1-score reflects a robust trade-off between precision and recall, demonstrating that the model performs well in both avoiding false positives and detecting true positives.

---

## Summary:
This implementation demonstrates how to calculate evaluation metrics from scratch using confusion matrix components and interpret them in a binary classification context like breast cancer diagnosis or loan default prediction. The results indicate that the model is both highly accurate and effective at balancing precision and recall, making it well-suited for applications where both false positives and false negatives carry significant consequences.
