## Final Project Report(CS634)

Name - Ashot Kirakosyan<br>
NJIT ID - ak2095<br>
Email - ak2995@njit.edu<br>
Date: 11/24/2024<br>
Professor - Yasser Abduallah

### Model Evaluation Report for Breast Cancer Recurrence Prediction

## Abstract

In this report, we evaluate and compare the performance of three different machine learning models—Random Forest, Decision Tree, and GRU (Gated Recurrent Unit)—on the task of predicting breast cancer recurrence. The dataset used for this analysis is the Breast Cancer Recurrence dataset from the UCI Machine Learning Repository. This dataset contains several features of breast cancer patients and their recurrence status, which are classified as either "no recurrence" or "recurrence".



### Data source

The dataset used in this analysis was downloaded from the UCI Machine Learning Repository, specifically from the Breast Cancer Recurrence dataset. The dataset was retrieved using the fetch_ucirepo function, which facilitates the direct access to UCI datasets for use in machine learning experiments.<br>

Here’s a brief description of the source: <br>

1. Dataset Name: Breast Cancer Recurrence
2. Source: UCI Machine Learning Repository https://archive.ics.uci.edu/dataset/14/breast+cancer
3. ID: 14 (Identifier for the dataset in the UCI repository)
4. The data consists of features related to breast cancer patients and their recurrence status, which is classified as either "no recurrence" or "recurrence". This dataset has been widely used for testing machine learning algorithms in medical data classification tasks.

### Data Preprocessing:

1. The features of the dataset were one-hot encoded, and categorical labels were converted to numeric format for the models. <br>
2. The target labels were further transformed into one-hot encoded format for use in the GRU model. <br>
3. Additionally, the features were reshaped to fit the input format expected by the GRU model, which requires a 3D input tensor of the shape (samples, time steps, features).

### Download repository

Link to the repository https://github.com/Ash-K-97/Kirakosyan_Ashot_Final_Project<br>
Download the zip file and extract all files into one folder<BR>
Read a readme file and follow the instructions

### Cross-validation Setup:

For each model, 10-fold stratified cross-validation was performed, ensuring that each fold contains a proportional distribution of the target classes. This helps to avoid overfitting and provides a reliable evaluation of model performance.

### Evaluation Metrics:

1. Confusion Matrix (TP, TN, FP, FN): To measure the number of true positives, true negatives, false positives, and false negatives.
2. Accuracy (ACC): The proportion of correctly classified instances.
3. True Positive Rate (TPR), Specificity (SPC), Positive Predictive Value (PPV), Negative Predictive Value (NPV), and other advanced metrics that assess the model's ability to predict both positive and negative cases accurately.
4. F1 Score: A balanced measure of precision and recall.
5. ROC AUC: The area under the receiver operating characteristic curve, which is a measure of the model's ability to distinguish between the classes.
6. Brier Score: A measure of the accuracy of probabilistic predictions.

### Model 1: Random Forest Classifier

The Random Forest Classifier is an ensemble method that uses multiple decision trees to make predictions. It is known for its robustness and ability to handle high-dimensional data with various feature types.

### Model 2: Decision Tree Classifier

The Decision Tree Classifier is a simpler model compared to Random Forest, but it can still provide insightful results, especially when the data has a clear hierarchical structure.

### Model 3: GRU (Gated Recurrent Unit)

The GRU model is a type of recurrent neural network designed for sequential data. Even though this task is not inherently sequential, using GRU allows us to test the performance of a deep learning model in a more complex setting.

### Software Requirements

1. Python: Version 3.6 or higher (Recommended: Python 3.8+)<br>
   Ensure that Python is installed on your system. You can download it from python.org.<br>
2. TensorFlow: Version 2.x (For the GRU model)
   TensorFlow is required to build and train the GRU model. You can install TensorFlow using:
 
   pip install tensorflow command in your Cmd (forWindowss) or the Terminal application for macOSs/Linux)
3. Scikit-learn: Version 0.24 or higher (For the Random Forest, Decision Tree, and metrics)
    Scikit-learn is necessary for machine learning models and evaluation metrics. Install it using:

    pip install scikit-learn
4. Pandas: Version 1.x or higher (For data manipulation and processing)
   Pandas is used to handle and process the dataset. Install it using: 

    pip install pandas
5. NumPy: Version 1.19 or higher (For numerical operations)
    NumPy is essential for efficient numerical computations. Install it using:

    pip install numpy

6. UCI ML Repo: To fetch datasets from the UCI Machine Learning Repository
   You need the ucimlrepo library to fetch datasets from the UCI repository:

    pip install ucimlrepo
7. IPython: (For displaying DataFrames in Jupyter notebooks)
    You need the IPython library for displaying data in Jupyter:

   pip install ipython
8. Jupyter Notebook (Optional, but recommended for interactive work)
    Install Jupyter using:

    pip install notebook

### Hardware Requirements

1. CPU: Any modern processor should suffice. However, for deep learning (GRU model), it's recommended to use a machine with a GPU.
2. RAM: At least 8 GB of RAM is recommended, particularly for handling large datasets and training deep learning models.
3. GPU: If running the GRU model on a large dataset, having an NVIDIA GPU is highly recommended for faster training. You can use TensorFlow with GPU support for acceleration.

### How to Run the Program

1. After downloading the repository and extracting the files move it to the directory of your choice<br>
2. Run the Program: In the CLI, navigate to the directory containing the script and execute:<br>
    .Example: cd C:\Users\YourName\Documents\Kirakosyan_Ashot.Final_Project<br>
    .where: YourName is the name of the user.<br>
3. After code execution, you should be in the directory of the file <br>
4. You can check which Python files are in this directory by using the following command: dir <br>
5. Once you see the Python file you want to run, you can execute it by typing: python Kirakosyan_Ashot_Final_code.py<br>


### Below is the running code

In [3]:
import pandas as pd
import numpy as np
from ucimlrepo import fetch_ucirepo
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_auc_score, brier_score_loss
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense, Dropout, Input
from tensorflow.keras.utils import to_categorical
from IPython.display import display  
import os
import warnings
import sys
import io
import os

# Only modify stdout if not running in a Jupyter notebook environment
if not "ipykernel" in sys.modules:
    sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
    sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding='utf-8')

# Set TensorFlow logging level to avoid unnecessary logs
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # Suppress TensorFlow log messages

# Suppress warnings that may arise
import warnings
warnings.filterwarnings('ignore')

# Fetch the dataset
def fetch_data():
    breast_cancer = fetch_ucirepo(id=14)
    X = pd.DataFrame(breast_cancer.data.features)
    y = breast_cancer.data.targets
    
    # One-hot encode categorical features in X
    encoder = OneHotEncoder(sparse_output=False, drop='first')
    X_encoded = pd.DataFrame(encoder.fit_transform(X), columns=encoder.get_feature_names_out(X.columns))
    
    # Encode the target variable y
    label_encoder = LabelEncoder()
    y_encoded = label_encoder.fit_transform(y).ravel()  # Convert to 0 and 1 for classification
    y_categorical = to_categorical(y_encoded)  # One-hot encoding for GRU model
    
    # Reshape X for GRU (samples, time steps, features)
    X_reshaped = X_encoded.values.reshape((X_encoded.shape[0], 1, X_encoded.shape[1]))
    
    return X_encoded, y_encoded, y_categorical, X_reshaped

X_encoded, y_encoded, y_categorical, X_reshaped = fetch_data()

# Function to calculate various metrics
def calculate_metrics(y_true, y_pred):
    cm = confusion_matrix(y_true, y_pred)
    TP, TN, FP, FN = cm[1, 1], cm[0, 0], cm[0, 1], cm[1, 0]
    
    # Calculate performance metrics
    epsilon = 1e-10  # Small epsilon to avoid division by zero
    TPR = TP / (TP + FN) if (TP + FN) != 0 else 0
    SPC = TN / (TN + FP) if (TN + FP) != 0 else 0
    PPV = TP / (TP + FP) if (TP + FP) != 0 else 0
    NPV = TN / (TN + FN) if (TN + FN) != 0 else 0
    FPR = FP / (FP + TN) if (FP + TN) != 0 else 0
    FDR = FP / (FP + TP) if (FP + TP) != 0 else 0
    FNR = FN / (FN + TP) if (FN + TP) != 0 else 0
    ACC = (TP + TN) / (TP + TN + FP + FN) if (TP + TN + FP + FN) != 0 else 0
    F1 = 2 * (PPV * TPR) / (PPV + TPR + epsilon)
    # Skill Scores
    TSS = TPR + SPC - 1
    HSS = (TP + TN - (FP + FN)) / (TP + TN + FP + FN) if (TP + TN + FP + FN) != 0 else 0
    BACC = (TPR + SPC) / 2 if (TPR + SPC) != 0 else 0
    BSS = (TPR + SPC) / 2 if (TPR + SPC) != 0 else 0
    BS = brier_score_loss(y_true, y_pred)
    return {
        'TP': TP, 'TN': TN, 'FP': FP, 'FN': FN, 'TPR': TPR, 'SPC': SPC,
        'PPV': PPV, 'NPV': NPV, 'FPR': FPR, 'FDR': FDR, 'FNR': FNR, 'ACC': ACC,
        'F1': F1, 'BS': BS, 'TSS': TSS, 'HSS': HSS, 'BACC' : BACC, 'BSS' : BSS,
    }

# Cross-validation function
def cross_validate_model(model, X, y, reshaped=False):
    skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
    fold_metrics = []
    roc_auc_scores = []
    brier_scores = []

    for fold, (train_idx, test_idx) in enumerate(skf.split(X, y), 1):
        X_train, X_test = X[train_idx], X[test_idx]
        y_train, y_test = y[train_idx], y[test_idx]
        
        # Reshape X for GRU if necessary
        if reshaped:
            X_train, X_test = X_train.reshape((X_train.shape[0], 1, X_train.shape[1])), X_test.reshape((X_test.shape[0], 1, X_test.shape[1]))
        
        # Fit model
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)
        
        # Calculate metrics
        metrics = calculate_metrics(y_test, y_pred)
        fold_metrics.append(metrics)
        
        # ROC AUC
        y_pred_prob = model.predict_proba(X_test)[:, 1]
        roc_auc = roc_auc_score(y_test, y_pred_prob)
        roc_auc_scores.append(roc_auc)
        
        # Brier Score
        brier_score = brier_score_loss(y_test, y_pred_prob)
        brier_scores.append(brier_score)
    
    avg_metrics = {key: np.mean([fold[key] for fold in fold_metrics]) for key in fold_metrics[0].keys()}
    avg_roc_auc = np.mean(roc_auc_scores)
    avg_brier_score = np.mean(brier_scores)
    
    return fold_metrics, avg_metrics, avg_roc_auc, avg_brier_score

# Training and evaluation functions for different models

def train_random_forest():
    rf = RandomForestClassifier(random_state=42)
    fold_metrics_rf, avg_metrics_rf, avg_roc_auc_rf, avg_brier_rf = cross_validate_model(rf, X_encoded.values, y_encoded)
    
    # Display results
    print("Random Forest Fold-wise Metrics:")
    display(pd.DataFrame(fold_metrics_rf))
    print("Average Random Forest Metrics:")
    display(pd.DataFrame([avg_metrics_rf]))
    print(f"Average ROC AUC: {avg_roc_auc_rf:.2f}")
    print(f"Average Brier Score: {avg_brier_rf:.2f}")

def train_decision_tree():
    dt = DecisionTreeClassifier(random_state=42)
    fold_metrics_dt, avg_metrics_dt, avg_roc_auc_dt, avg_brier_dt = cross_validate_model(dt, X_encoded.values, y_encoded)
    
    # Display results
    print("Decision Tree Fold-wise Metrics:")
    display(pd.DataFrame(fold_metrics_dt))
    print("Average Decision Tree Metrics:")
    display(pd.DataFrame([avg_metrics_dt]))
    print(f"Average ROC AUC: {avg_roc_auc_dt:.2f}")
    print(f"Average Brier Score: {avg_brier_dt:.2f}")

def train_gru():
    skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
    fold_metrics_gru = []
    accuracy_scores_gru = []
    roc_auc_scores_gru = []
    brier_scores_gru = []

    for fold, (train_idx, test_idx) in enumerate(skf.split(X_reshaped, y_encoded), 1):
        X_train, X_test = X_reshaped[train_idx], X_reshaped[test_idx]
        y_train, y_test = y_encoded[train_idx], y_encoded[test_idx]
        
        # Convert target to one-hot encoding for GRU
        y_train_categorical = to_categorical(y_train)
        y_test_categorical = to_categorical(y_test)
        
        # Define GRU model
        model_gru = Sequential([
            Input(shape=(X_train.shape[1], X_train.shape[2])),
            GRU(50, return_sequences=True),
            Dropout(0.2),
            GRU(50),
            Dropout(0.2),
            Dense(y_train_categorical.shape[1], activation='softmax')
        ])
        
        model_gru.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
        
        # Train GRU model
        model_gru.fit(X_train, y_train_categorical, epochs=10, batch_size=32, validation_data=(X_test, y_test_categorical), verbose=0)
        
        # Evaluate model
        y_pred_prob_gru = model_gru.predict(X_test)
        y_pred_gru = np.argmax(y_pred_prob_gru, axis=1)
        y_test_labels_gru = np.argmax(y_test_categorical, axis=1)
        
        # Calculate metrics
        metrics = calculate_metrics(y_test_labels_gru, y_pred_gru)
        fold_metrics_gru.append(metrics)
        
        # Accuracy, ROC AUC, and Brier Score
        accuracy_scores_gru.append(accuracy_score(y_test_labels_gru, y_pred_gru))
        roc_auc_scores_gru.append(roc_auc_score(y_test_labels_gru, y_pred_prob_gru[:, 1]))
        brier_scores_gru.append(brier_score_loss(y_test_labels_gru, y_pred_prob_gru[:, 1]))

    # Calculate average metrics
    avg_metrics_gru = {key: np.mean([fold[key] for fold in fold_metrics_gru]) for key in fold_metrics_gru[0].keys()}
    avg_accuracy_gru = np.mean(accuracy_scores_gru)
    avg_roc_auc_gru = np.mean(roc_auc_scores_gru)
    avg_brier_score_gru = np.mean(brier_scores_gru)
    
    # Display results
    print("GRU Fold-wise Metrics:")
    display(pd.DataFrame(fold_metrics_gru))
    print("Average GRU Metrics:")
    display(pd.DataFrame([avg_metrics_gru]))
    print(f"Average Accuracy (GRU): {avg_accuracy_gru:.2f}")
    print(f"Average ROC AUC (GRU): {avg_roc_auc_gru:.2f}")
    print(f"Average Brier Score (GRU): {avg_brier_score_gru:.2f}")

# Call functions to train and evaluate all models
train_random_forest()
train_decision_tree()
train_gru()

Random Forest Fold-wise Metrics:


Unnamed: 0,TP,TN,FP,FN,TPR,SPC,PPV,NPV,FPR,FDR,FNR,ACC,F1,BS,TSS,HSS,BACC,BSS
0,3,17,4,5,0.375,0.809524,0.428571,0.772727,0.190476,0.571429,0.625,0.689655,0.4,0.310345,0.184524,0.37931,0.592262,0.592262
1,2,17,3,7,0.222222,0.85,0.4,0.708333,0.15,0.6,0.777778,0.655172,0.285714,0.344828,0.072222,0.310345,0.536111,0.536111
2,2,17,3,7,0.222222,0.85,0.4,0.708333,0.15,0.6,0.777778,0.655172,0.285714,0.344828,0.072222,0.310345,0.536111,0.536111
3,2,17,3,7,0.222222,0.85,0.4,0.708333,0.15,0.6,0.777778,0.655172,0.285714,0.344828,0.072222,0.310345,0.536111,0.536111
4,0,17,3,9,0.0,0.85,0.0,0.653846,0.15,1.0,1.0,0.586207,0.0,0.413793,-0.15,0.172414,0.425,0.425
5,3,18,2,6,0.333333,0.9,0.6,0.75,0.1,0.4,0.666667,0.724138,0.428571,0.275862,0.233333,0.448276,0.616667,0.616667
6,5,19,1,3,0.625,0.95,0.833333,0.863636,0.05,0.166667,0.375,0.857143,0.714286,0.142857,0.575,0.714286,0.7875,0.7875
7,3,20,0,5,0.375,1.0,1.0,0.8,0.0,0.0,0.625,0.821429,0.545455,0.178571,0.375,0.642857,0.6875,0.6875
8,3,18,2,5,0.375,0.9,0.6,0.782609,0.1,0.4,0.625,0.75,0.461538,0.25,0.275,0.5,0.6375,0.6375
9,5,16,4,3,0.625,0.8,0.555556,0.842105,0.2,0.444444,0.375,0.75,0.588235,0.25,0.425,0.5,0.7125,0.7125


Average Random Forest Metrics:


Unnamed: 0,TP,TN,FP,FN,TPR,SPC,PPV,NPV,FPR,FDR,FNR,ACC,F1,BS,TSS,HSS,BACC,BSS
0,2.8,17.6,2.5,5.7,0.3375,0.875952,0.521746,0.758992,0.124048,0.478254,0.6625,0.714409,0.399523,0.285591,0.213452,0.428818,0.606726,0.606726


Average ROC AUC: 0.68
Average Brier Score: 0.20
Decision Tree Fold-wise Metrics:


Unnamed: 0,TP,TN,FP,FN,TPR,SPC,PPV,NPV,FPR,FDR,FNR,ACC,F1,BS,TSS,HSS,BACC,BSS
0,3,17,4,5,0.375,0.809524,0.428571,0.772727,0.190476,0.571429,0.625,0.689655,0.4,0.310345,0.184524,0.37931,0.592262,0.592262
1,4,14,6,5,0.444444,0.7,0.4,0.736842,0.3,0.6,0.555556,0.62069,0.421053,0.37931,0.144444,0.241379,0.572222,0.572222
2,6,15,5,3,0.666667,0.75,0.545455,0.833333,0.25,0.454545,0.333333,0.724138,0.6,0.275862,0.416667,0.448276,0.708333,0.708333
3,4,14,6,5,0.444444,0.7,0.4,0.736842,0.3,0.6,0.555556,0.62069,0.421053,0.37931,0.144444,0.241379,0.572222,0.572222
4,0,15,5,9,0.0,0.75,0.0,0.625,0.25,1.0,1.0,0.517241,0.0,0.482759,-0.25,0.034483,0.375,0.375
5,4,14,6,5,0.444444,0.7,0.4,0.736842,0.3,0.6,0.555556,0.62069,0.421053,0.37931,0.144444,0.241379,0.572222,0.572222
6,5,16,4,3,0.625,0.8,0.555556,0.842105,0.2,0.444444,0.375,0.75,0.588235,0.25,0.425,0.5,0.7125,0.7125
7,4,17,3,4,0.5,0.85,0.571429,0.809524,0.15,0.428571,0.5,0.75,0.533333,0.25,0.35,0.5,0.675,0.675
8,6,15,5,2,0.75,0.75,0.545455,0.882353,0.25,0.454545,0.25,0.75,0.631579,0.25,0.5,0.5,0.75,0.75
9,5,16,4,3,0.625,0.8,0.555556,0.842105,0.2,0.444444,0.375,0.75,0.588235,0.25,0.425,0.5,0.7125,0.7125


Average Decision Tree Metrics:


Unnamed: 0,TP,TN,FP,FN,TPR,SPC,PPV,NPV,FPR,FDR,FNR,ACC,F1,BS,TSS,HSS,BACC,BSS
0,4.1,15.3,4.8,4.4,0.4875,0.760952,0.440202,0.781767,0.239048,0.559798,0.5125,0.67931,0.460454,0.32069,0.248452,0.358621,0.624226,0.624226


Average ROC AUC: 0.63
Average Brier Score: 0.32
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 474ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 488ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 464ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 526ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 523ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 456ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 489ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 452ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 473ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 464ms/step
GRU Fold-wise Metrics:


Unnamed: 0,TP,TN,FP,FN,TPR,SPC,PPV,NPV,FPR,FDR,FNR,ACC,F1,BS,TSS,HSS,BACC,BSS
0,1,20,1,7,0.125,0.952381,0.5,0.740741,0.047619,0.5,0.875,0.724138,0.2,0.275862,0.077381,0.448276,0.53869,0.53869
1,0,20,0,9,0.0,1.0,0.0,0.689655,0.0,0.0,1.0,0.689655,0.0,0.310345,0.0,0.37931,0.5,0.5
2,2,19,1,7,0.222222,0.95,0.666667,0.730769,0.05,0.333333,0.777778,0.724138,0.333333,0.275862,0.172222,0.448276,0.586111,0.586111
3,3,18,2,6,0.333333,0.9,0.6,0.75,0.1,0.4,0.666667,0.724138,0.428571,0.275862,0.233333,0.448276,0.616667,0.616667
4,0,18,2,9,0.0,0.9,0.0,0.666667,0.1,1.0,1.0,0.62069,0.0,0.37931,-0.1,0.241379,0.45,0.45
5,3,20,0,6,0.333333,1.0,1.0,0.769231,0.0,0.0,0.666667,0.793103,0.5,0.206897,0.333333,0.586207,0.666667,0.666667
6,4,19,1,4,0.5,0.95,0.8,0.826087,0.05,0.2,0.5,0.821429,0.615385,0.178571,0.45,0.642857,0.725,0.725
7,0,20,0,8,0.0,1.0,0.0,0.714286,0.0,0.0,1.0,0.714286,0.0,0.285714,0.0,0.428571,0.5,0.5
8,2,20,0,6,0.25,1.0,1.0,0.769231,0.0,0.0,0.75,0.785714,0.4,0.214286,0.25,0.571429,0.625,0.625
9,5,16,4,3,0.625,0.8,0.555556,0.842105,0.2,0.444444,0.375,0.75,0.588235,0.25,0.425,0.5,0.7125,0.7125


Average GRU Metrics:


Unnamed: 0,TP,TN,FP,FN,TPR,SPC,PPV,NPV,FPR,FDR,FNR,ACC,F1,BS,TSS,HSS,BACC,BSS
0,2.0,19.0,1.1,6.5,0.238889,0.945238,0.512222,0.749877,0.054762,0.287778,0.761111,0.734729,0.306552,0.265271,0.184127,0.469458,0.592063,0.592063


Average Accuracy (GRU): 0.73
Average ROC AUC (GRU): 0.67
Average Brier Score (GRU): 0.19


### Random forest results

1. Accuracy (ACC): 71.44%
2. TPR (Recall): 33.75%
3. SPC (Specificity): 87.60%
4. F1 Score: 0.40
5. ROC AUC: 0.68
6. Brier Score: 0.20<br>

The Random Forest model shows a reasonable performance with 71.44% accuracy. However, it struggles with recall (33.75%), which means it misses a significant proportion of actual recurrence events (positive cases). The model does well in identifying negative cases (specificity of 87.6%) and has a moderate F1 score of 0.40, suggesting a low balance between precision and recall.
The ROC AUC score of 0.68 indicates moderate ability to discriminate between the two classes, while the Brier score of 0.20 shows that its predicted probabilities are somewhat reliable but not perfect.

### Decision tree model results

1. Accuracy (ACC): 67.93%
2. TPR (Recall): 48.75%
3. SPC (Specificity): 76.10%
4. F1 Score: 0.46
5. ROC AUC: 0.63
6. Brier Score: 0.32 <br>

The Decision Tree model has 67.93% accuracy, which is lower than the Random Forest. However, it has a higher recall (48.75%), meaning it is better at identifying positive cases (recurrence events) compared to Random Forest, though still not perfect. Its F1 score of 0.46 is slightly better than that of Random Forest, showing an improved balance between precision and recall.

The specificity is also lower (76.10%) compared to Random Forest, indicating that the model is not as good at correctly identifying negative cases (no recurrence events). The ROC AUC score of 0.63 suggests that its ability to discriminate between the two classes is weaker than that of Random Forest, and the Brier score of 0.32 indicates that its predicted probabilities are less accurate.

### GRU results

1. Accuracy (ACC): 73.47%
2. TPR (Recall): 23.89%
3. SPC (Specificity): 94.52%
4. F1 Score: 0.31
5. ROC AUC: 0.67
6. Brier Score: 0.19 <br>

The GRU model shows the highest accuracy (73.47%), indicating its overall correct predictions are slightly better than both Random Forest and Decision Tree. However, the recall is still quite low (23.89%), meaning it misses a significant portion of the recurrence events (positive cases). The F1 score of 0.31 reflects this imbalance, with the model favoring precision over recall. Its specificity is the highest (94.52%), meaning it is particularly good at identifying negative cases (no recurrence events).

The ROC AUC score of 0.67 is similar to Random Forest, indicating that the GRU model also has a moderate ability to distinguish between the two classes. The Brier score of 0.19 is the lowest of all three models, suggesting that its predicted probabilities are the most accurate among the three.



### Comparison and Conclusion


##### Best Performing Algorithm: Random Forest <br>

While each model has its strengths and weaknesses, the Random Forest algorithm appears to be the best-performing model overall, particularly in terms of a balance between sensitivity (recall) and specificity. Although it struggles with recall (33.75%), it has:<br>

1. A higher accuracy (71.44%) compared to Decision Tree (67.93%) and GRU (73.47%).
2. The best balance between Precision (PPV = 52.17%) and recall (TPR = 33.75%).
3. A moderate F1 score of 0.40, which is better than GRU's 0.31.
4. High specificity (87.6%), which is important in imbalanced datasets.
5. While Random Forest is not perfect, it provides a better overall balance for both classes than the Decision Tree and GRU models, which exhibit higher recall but suffer from imbalanced results and weaker performance in other areas (like specificity and F1 score).

##### Decision Tree vs. GRU <br>
Decision Tree performs better in recall (48.75%) compared to GRU (23.89%), but at the cost of lower specificity (76.10% vs. GRU's 94.52%). This indicates that the Decision Tree is more biased towards identifying positive cases, but also generates more false positives.
GRU has the highest accuracy (73.47%) but suffers from very low recall and a poor F1 score (0.31). Despite its high specificity, it misses many of the actual recurrence events, making it less effective in identifying positive instances.
