# Evaluate a Model's Classifications

### Authors: Calvin Howard.

#### Last updated: July 6, 2023

# 00 - Import CSV with All Data
**The CSV is expected to be in this format**
- ID and absolute paths to niftis are critical
```
+-----+----------------------------+--------------+--------------+--------------+
| ID  | Nifti_File_Path            | Covariate_1  | Covariate_2  | Covariate_3  |
+-----+----------------------------+--------------+--------------+--------------+
| 1   | /path/to/file1.nii.gz      | 0.5          | 1.2          | 3.4          |
| 2   | /path/to/file2.nii.gz      | 0.7          | 1.4          | 3.1          |
| 3   | /path/to/file3.nii.gz      | 0.6          | 1.5          | 3.5          |
| 4   | /path/to/file4.nii.gz      | 0.9          | 1.1          | 3.2          |
| ... | ...                        | ...          | ...          | ...          |
+-----+----------------------------+--------------+--------------+--------------+
```

In [None]:
# Specify the path to your CSV file containing NIFTI paths
input_csv_path = '/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/studies/collaborations/barotono_disease_classification/metadata/dataset_2/UB_correlations_w_LBD.csv'
sheet = None

In [None]:
# Specify where you want to save your results to
out_dir = '/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/studies/collaborations/barotono_disease_classification/figure'

In [None]:
from calvin_utils.permutation_analysis_utils.statsmodels_palm import CalvinStatsmodelsPalm
# Instantiate the PalmPrepararation class
cal_palm = CalvinStatsmodelsPalm(input_csv_path=input_csv_path, output_dir=out_dir, sheet=sheet)
# Call the process_nifti_paths method
data_df = cal_palm.read_and_display_data()

In [None]:
data_df = data_df.fillna(value=1)
data_df

# 01 - Preprocess Your Data

**Handle NANs**
- Set drop_nans=True is you would like to remove NaNs from data
- Provide a column name or a list of column names to remove NaNs from

In [None]:
data_df.columns

In [None]:
drop_list = ['Age', 'Z_Scored_Percent_Cognitive_Improvement']

In [None]:
data_df = cal_palm.drop_nans_from_columns(columns_to_drop_from=drop_list)
display(data_df)

**Drop Row Based on Value of Column**

Define the column, condition, and value for dropping rows
- column = 'your_column_name'
- condition = 'above'  # Options: 'equal', 'above', 'below'

In [None]:
data_df.columns

Set the parameters for dropping rows

In [None]:
column = 'DX'  # The column you'd like to evaluate
condition = 'equal'  # The condition to check ('equal', 'above', 'below')
value = 'PD'  # The value to compare against

In [None]:
data_df, other_df = cal_palm.drop_rows_based_on_value(column, condition, value)
display(data_df)

**Standardize Data**
- Enter Columns you Don't want to standardize into a list

In [None]:
# Remove anything you don't want to standardize
cols_not_to_standardize = ['Age']

In [None]:
data_df = cal_palm.standardize_columns(cols_not_to_standardize)
data_df

Descriptive Stats

In [None]:
data_df.describe()

Rename Columns

In [None]:
# Rename columns
data_df.rename(columns={
    'cube_prediction': 'Cube Prediction',
    'infinity_prediction': 'Infinity Prediction',
    'clock_prediction': 'Clock Prediction',
    'cube_actual': 'Cube Actual',
    'infinity_actual': 'Infinity Actual',
    'clock_actual': 'Clock Actual',
    'Cognitive_Status_Code': 'Cognitively Intact',
    'Cognitive_Status': 'Cognitive Status',
    'Question_16': 'RoCA Score'
}, inplace=True)

# 02 - Evaluate Already Administered Classifications

How to Use the Mapping Dictionary
- The mapping dictionary is a crucial component of the BinaryDataMetricsPlotter class. It defines the relationships between columns in your dataframe that you want to compare. Here's how to use it:

- Define the Mapping: Create a dictionary where each key-value pair represents a mapping of the columns of ground-truth classification to the column of the experimental classificaiton. 

>mapping_dict = {
>>    'gold_standard_1': 'classification_1',
>>
>>    'gold_standard_2': 'classification_2',
>>
>>    add more as needed
>>
>}

**The mapping dict must have gold standards as keys**

In [None]:
data_df.columns

In [None]:
data_df['Diagnosis'] = np.where(data_df['Question_16'] >= 7 , 1, 0)

In [None]:
mapping_dict = {'Diagnosis':'Cognitive_Status_Code'}





**Visualize Classification Metrics with Barplots**

- Sensitivity (Recall): Measures the proportion of actual positives correctly identified. Identical to recall, it assesses how well the model identifies true positives among the positive cases.

- Specificity: Indicates the proportion of actual negatives correctly identified as such. It evaluates the model's ability to recognize true negatives among negative cases.

- Precision: Refers to the proportion of positive identifications that are actually correct. It focuses on the accuracy of the positive predictions made by the model.

- Recall: Measures the proportion of actual positives that are correctly identified. This is identical to sensitivity, emphasizing the model's accuracy in detecting positive cases.

- Accuracy: The ratio of correctly predicted observations to the total observations. It provides an overall measure of the model's performance.

- F1 Score: The harmonic mean of precision and recall. This metric is particularly useful when the balance between precision and recall is important.

- Positive Predictive Value (PPV): Similar to precision, PPV is the proportion of positive test results that are true positives. It indicates the likelihood that a positive test accurately reflects the underlying condition.

- Negative Predictive Value (NPV): The proportion of negative test results that are true negatives. NPV measures the likelihood that a negative test result accurately indicates the absence of the condition.

Edit the List with the Specific Metrics You Would Like to Display
- Set to None to display all metrics

In [None]:
specified_metrics = ['Accuracy', 'Sensitivity',  'NPV']

Notes on Confusion Matrix Normalization
- Normalization by True Labels (Rows):
    - Each entry in a row is divided by the sum of that row.
    - Useful to understand how well each actual class is classified.
    - Gives insights into the recall (sensitivity) for each class.
​
- Normalization by Predicted Labels (Columns):
    - Each entry in a column is divided by the sum of that column.
    - Useful to understand the precision of each class.
    - Gives insights into how reliable each prediction is.
 
- Normalization by the Total Sum of the Matrix:
    - Each entry is divided by the total number of samples.
    - Provides a proportion of the total data that falls into each category.

Options:
'all', 'pred', 'true', None

In [None]:
normalization='pred'

In [None]:
from sklearn.metrics import accuracy_score, auc, roc_curve, accuracy_score, confusion_matrix, precision_recall_fscore_support
from calvin_utils.statistical_utils.distribution_statistics import bootstrap_distribution_statistics
from math import pi

import os 
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import warnings
from sklearn.utils import resample

class BinaryDataMetricsPlotter:
    def __init__(self, dataframe, mapping_dict, specified_metrics=None, out_dir=None, cm_normalization=None):
        """
        Initialize with a dataframe containing binary data and a dictionary mapping columns.
        """
        self.dataframe = dataframe
        self.mapping_dict = mapping_dict
        self.specified_metrics = specified_metrics
        self.save_dir = out_dir
        self.metrics = self.calculate_metrics()
        self.confusion_matrices = self.get_confusion_matrices(normalize=cm_normalization)
        
    def bootstrap_metrics(self, n_iterations=1000):
        def calculate_metrics(true, pred):
            tn, fp, fn, tp = confusion_matrix(true, pred).ravel()
            
            sensitivity = tp / (tp + fn) if (tp + fn) != 0 else 0
            specificity = tn / (tn + fp) if (tn + fp) != 0 else 0
            ppv = tp / (tp + fp) if (tp + fp) != 0 else 0  # Positive Predictive Value
            npv = tn / (tn + fn) if (tn + fn) != 0 else 0  # Negative Predictive Value
            acc = accuracy_score(true, pred)
            precision, recall, f1, _ = precision_recall_fscore_support(true, pred, average='binary')
            
            return {
                'Sensitivity': sensitivity,
                'Specificity': specificity,
                'Precision': precision,
                'PPV': ppv,
                'NPV': npv,
                'Accuracy': acc,
                'F1 Score': f1,
                'TP': tp,
                'TN': tn,
                'FP': fp,
                'FN': fn
            }
        
        # Suppress warnings
        warnings.filterwarnings("ignore")

        bootstrap_results = {}

        for truth, prediction in self.mapping_dict.items():
            # Observed metrics
            obs_metrics = calculate_metrics(self.dataframe[truth], self.dataframe[prediction])
            
            # Bootstrapping
            bootstrap_metrics = []
            for _ in range(n_iterations):
                boot_df = resample(self.dataframe)
                metrics = calculate_metrics(boot_df[truth], boot_df[prediction])
                bootstrap_metrics.append(metrics)
            
            # Convert list of dictionaries to a dictionary of lists
            bootstrap_metrics = {key: [d[key] for d in bootstrap_metrics] for key in bootstrap_metrics[0]}
            
            # Confidence intervals
            conf_intervals = {key: np.percentile(values, [2.5, 97.5]) for key, values in bootstrap_metrics.items()}
            
            # Store results for this pair of columns
            bootstrap_results[(truth, prediction)] = {
                'observed': obs_metrics,
                'conf_intervals': conf_intervals
            }
        
        return bootstrap_results

    def calculate_metrics(self):
        metrics = {}
        for truth, prediction in self.mapping_dict.items():
            tn, fp, fn, tp = confusion_matrix(self.dataframe[truth], self.dataframe[prediction]).ravel()
            
            sensitivity = tp / (tp + fn) if (tp + fn) != 0 else 0
            specificity = tn / (tn + fp) if (tn + fp) != 0 else 0
            ppv = tp / (tp + fp) if (tp + fp) != 0 else 0  # Positive Predictive Value
            npv = tn / (tn + fn) if (tn + fn) != 0 else 0  # Negative Predictive Value
            acc = accuracy_score(self.dataframe[truth], self.dataframe[prediction])
            precision, recall, f1, _ = precision_recall_fscore_support(self.dataframe[truth], self.dataframe[prediction], average='binary')

            metrics[(truth, prediction)] = {
                'Sensitivity': sensitivity,
                'Specificity': specificity,
                'Precision': precision,
                'PPV': ppv,
                'NPV': npv,
                'Accuracy': acc,
                'F1 Score': f1,
                'TP': tp,
                'TN': tn,
                'FP': fp,
                'FN': fn
            }
        return metrics
    
    def get_confusion_matrices(self, normalize=False):
        confusion_matrices = {}
        for ground_truth, predicted in self.mapping_dict.items():
            cm = confusion_matrix(self.dataframe[ground_truth], self.dataframe[predicted], normalize=normalize)
            confusion_matrices[(ground_truth, predicted)] = cm
        return confusion_matrices

    def plot_confusion_matrices(self):
        confusion_matrices = self.confusion_matrices
        num_matrices = len(confusion_matrices)
        fig, axes = plt.subplots(1, num_matrices, figsize=(6 * num_matrices, 6))
        
        if num_matrices == 1:
            axes = [axes]
            
        for ax, ((ground_truth, predicted), cm) in zip(axes, confusion_matrices.items()):
            sns.heatmap(cm, annot=True, fmt='.2f', cmap='Blues', ax=ax,
                        xticklabels=['Negative', 'Positive'], yticklabels=['Negative', 'Positive'],
                        annot_kws={"size": 16})  # Set annotation font size
            ax.set_ylabel(f'Predicted: {predicted}', fontsize=16)
            ax.set_xlabel(f'Actual: {ground_truth}', fontsize=16)
            ax.set_title(f'Confusion Matrix for {ground_truth} vs {predicted}', fontsize=16)
            ax.tick_params(axis='both', which='major', labelsize=16)
            
        if self.save_dir is not None:
            subdir = "confusion_matrix"
            os.makedirs(os.path.join(self.save_dir, subdir), exist_ok=True)
            file_name_svg = "conf_matrix.svg"
            path_svg = os.path.join(self.save_dir, subdir, file_name_svg)
            plt.savefig(path_svg, format='svg')
            
        plt.tight_layout()
        plt.show()
        
    def plot_radar_charts(self):
        if self.specified_metrics is None:
            self.specified_metrics = ['Accuracy', 'Sensitivity', 'Specificity', 'PPV', 'NPV']
            
        tab10 = sns.color_palette("tab10", 10)
        color_map = sns.color_palette([tab10[i] for i in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

        # Get bootstrap metrics
        bootstrap_results = self.bootstrap_metrics()
        
        for idx, ((old_col, new_col), metric_values) in enumerate(self.metrics.items()):
            plt.figure(figsize=(6, 6))
            ax = plt.subplot(111, polar=True)

            categories = self.specified_metrics
            N = len(categories)

            angles = [n / float(N) * 2 * pi for n in range(N)]
            angles += angles[:1]

            ax.set_theta_offset(pi / 2)
            ax.set_theta_direction(-1)

            plt.xticks(angles[:-1], categories)

            ax.set_rlabel_position(0)
            plt.yticks([0.2, 0.4, 0.6, 0.8], ["0.2","0.4","0.6","0.8"], color="black", size=12)
            plt.ylim(0,1)

            values = [metric_values[metric] for metric in self.specified_metrics]
            values += values[:1]
            ax.plot(angles, values, linewidth=1, linestyle='solid', label=f'{old_col} to {new_col}', color=color_map[idx])
            ax.fill(angles, values, alpha=0.25, color=color_map[idx])

            # Add error bars to the radar chart
            conf_intervals = bootstrap_results[(old_col, new_col)]['conf_intervals']
            for i, metric in enumerate(self.specified_metrics):
                metric_value = metric_values[metric]
                lower_bound = conf_intervals[metric][0]
                upper_bound = conf_intervals[metric][1]
                error = [[metric_value - lower_bound], [upper_bound - metric_value]]
                ax.errorbar(angles[i], metric_value, yerr=error, fmt='o', color=color_map[idx], capsize=5)
            
            plt.legend(loc='upper right', bbox_to_anchor=(0.1, 0.1))
            plt.title(f'Metrics for "{old_col} to {new_col}"', size=15, color='black', y=1.1)

            if self.save_dir is not None:
                radar_plots_subdir = "radar_plots"
                os.makedirs(os.path.join(self.save_dir, radar_plots_subdir), exist_ok=True)
                file_name_svg = f"{old_col}_to_{new_col}_radar.svg"
                path_svg = os.path.join(self.save_dir, radar_plots_subdir, file_name_svg)
                plt.savefig(path_svg, format='svg')
            plt.show()

            plt.close()


    def plot_metrics(self):
        if self.specified_metrics is None:
            self.specified_metrics = ['Sensitivity', 'Specificity', 'Precision', 'PPV', 'NPV', 'Accuracy', 'F1 Score']

        plot_data = []
        for (old_col, new_col), metric_values in self.metrics.items():
            for metric_name, metric_value in metric_values.items():
                if metric_name in self.specified_metrics:
                    plot_data.append({
                        'Mapping': f'{old_col} to {new_col}',
                        'Metric': metric_name,
                        'Value': metric_value
                    })

        plot_df = pd.DataFrame(plot_data)

        # Create the plot
        plt.figure(figsize=(10, 6))
        sns.barplot(x="Value", y="Mapping", hue="Metric", data=plot_df)

        plt.xlabel('Metric Value')
        plt.ylabel('Column Mapping')
        plt.title('Performance Metrics for Each Column Mapping')

        plt.legend()
        plt.tight_layout()
        if self.save_dir is not None:
            subdir = "bar_plots"
            os.makedirs(os.path.join(self.save_dir, subdir), exist_ok=True)
            file_name_svg = f"{old_col}_to_{new_col}_bar.svg"
            path_svg = os.path.join(self.save_dir, subdir, file_name_svg)
            plt.savefig(path_svg, format='svg')

        plt.show()
    
    def convert_metrics_to_dataframe(self):
        """
        Converts the provided metrics dictionary into a pandas DataFrame.

        Args:
        metrics_dict (dict): A dictionary where each key is a tuple containing two strings
                            (categories) and each value is another dictionary containing
                            various metrics.

        Returns:
        pandas.DataFrame: A DataFrame with the metrics organized in columns and categories in rows.
        """
        import pandas as pd

        # Convert the dictionary to a DataFrame
        df = pd.DataFrame(self.metrics).T

        # Setting the names for the multi-index and resetting it to make it part of the DataFrame
        df.columns.name = 'Metric'
        df.index.set_names(['Category', 'Subcategory'], inplace=True)
        df.reset_index(inplace=True)
        
        if self.save_dir is not None:
            subdir = "metrics_df"
            os.makedirs(os.path.join(self.save_dir, subdir), exist_ok=True)
            df.to_csv(os.path.join(self.save_dir, subdir, 'metrics.csv'))

        return df
    
    def lineplot_metrics(self):
        # Convert metrics to DataFrame
        metrics_df = self.convert_metrics_to_dataframe()
        
        # Set up the color palette
        palette = sns.color_palette("tab10", 5)
        
        # Initialize the plot
        plt.figure(figsize=(6*len(metrics_df.index), 6))
        
        # Plot each metric
        sns.lineplot(x='Category', y='Accuracy', data=metrics_df, marker='o', label='Accuracy', color=palette[0])
        sns.lineplot(x='Category', y='Sensitivity', data=metrics_df, marker='o', label='Sensitivity', color=palette[1])
        sns.lineplot(x='Category', y='Specificity', data=metrics_df, marker='o', label='Specificity', color=palette[2])
        sns.lineplot(x='Category', y='PPV', data=metrics_df, marker='o', label='PPV', color=palette[3])
        sns.lineplot(x='Category', y='NPV', data=metrics_df, marker='o', label='NPV', color=palette[4])
        
        # Customize the plot
        plt.ylim(0, 1.05)
        plt.xlabel('Class', fontsize=20)
        plt.ylabel('Classification Metric Score', fontsize=20)
        plt.title('Classification Metrics Across Classes', fontsize=20)
        
        plt.xticks(fontsize=16)
        plt.yticks([0, 0.2, 0.4, 0.6, 0.8, 1.0], fontsize=16)
        plt.legend(fontsize=16)
        
        plt.grid(False)
        sns.despine()
        
        if self.save_dir is not None:
            subdir = "metrics_lineplot"
            os.makedirs(os.path.join(self.save_dir, subdir), exist_ok=True)
            file_name_svg = "lineplot.svg"
            path_svg = os.path.join(self.save_dir, subdir, file_name_svg)
            plt.savefig(path_svg, format='svg')
        # Show the plot
        plt.tight_layout()
        plt.show()
            
    def plot_macro_averages(self):
        # Convert metrics to DataFrame
        metrics_df = self.convert_metrics_to_dataframe()
        
        # Calculate macro-averages and standard deviations
        metric_names = ['Accuracy', 'Sensitivity', 'Specificity', 'PPV', 'NPV']
        averages = metrics_df[metric_names].mean()
        std_devs = metrics_df[metric_names].std()
        print("Macro Averages: ", averages)
        print("Macro Standard Deviations: ", std_devs)

        # Create a DataFrame for plotting
        macro_df = pd.DataFrame({
            'Metric': metric_names,
            'Average': averages,
            'StdDev': std_devs
        })

        # Initialize the plot
        plt.figure(figsize=(18, 6))

        # Create bar plot with error bars
        sns.barplot(x='Metric', y='Average', yerr=macro_df['StdDev'], data=macro_df, palette='tab10', capsize=0.5)

        # Customize the plot
        plt.ylim(0, 1.05)
        plt.xlabel('Metric', fontsize=20)
        plt.ylabel('Macro-Average Score', fontsize=20)
        plt.title('Macro-Average Classification Metrics with Standard Deviations', fontsize=20)
        
        plt.xticks(fontsize=16)
        plt.yticks(fontsize=16)
        
        plt.grid(False)
        sns.despine()
        
        if self.save_dir is not None:
            subdir = "macro_averages"
            os.makedirs(os.path.join(self.save_dir, subdir), exist_ok=True)
            file_name_svg = "macro_averages.svg"
            path_svg = os.path.join(self.save_dir, subdir, file_name_svg)
            plt.savefig(path_svg, format='svg')
        # Show the plot
        plt.tight_layout()
        plt.show()
        
    def run_plotting(self):
        self.plot_metrics()
        self.plot_confusion_matrices()
        self.plot_radar_charts()
        self.lineplot_metrics()
        if len(self.mapping_dict.values()) > 1:
            self.plot_macro_averages()
        

In [None]:
# from calvin_utils.statistical_utils.classification_statistics import BinaryDataMetricsPlotter
plotter = BinaryDataMetricsPlotter(dataframe=data_df, mapping_dict=mapping_dict, specified_metrics=specified_metrics,
                                   out_dir=out_dir, cm_normalization=normalization)
plotter.run_plotting()

In [None]:
plotter.convert_metrics_to_dataframe()

In [None]:
data_df.columns

In [None]:
data_df.loc[:, ['Cognitively Intact', 'Cognitively Impaired', 'RoCA Score', 'Cognitive Status', 'Classification']]

# 03 - Evaluate a Binary Classifier

In [None]:
import os 
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, roc_curve, auc

class BinaryClassificationEvaluation:
    """
    Initializes the BinaryClassificationEvaluation with model results and the true outcomes.

    Args:
        fitted_model: The result object from a fitted statsmodels MNLogit model or similar.
        observation_df: A pandas DataFrame with the true class outcomes.
        normalization: Normalization method for the confusion matrix (None, 'true', 'pred', 'all').
        predictions_df: DataFrame of predicted probabilities or values.
        out_dir: Directory where plots and results will be saved.
        threshold: The threshold to classify probabilities into binary outcomes.
        positive_label: The label to be considered as the positive class. This is critical as the ROC curve calculations and thresholding will be based on this label.
    """
    def __init__(self, fitted_model=None, observation_df=None, normalization=None, predictions_df=None, out_dir=None, threshold=None, positive_label=1):
        self.results = fitted_model
        self.observation_df = observation_df
        self.normalization = normalization
        self.predictions_df = predictions_df
        self.out_dir = out_dir
        self.threshold = threshold
        self.positive_label = positive_label
        
    def find_optimal_threshold(self):
        """
        Calculates the optimal threshold using Youden's J statistic from an ROC curve.

        Returns:
            float: The optimal threshold based on Youden's J statistic.
        """
        if self.positive_label is None:
            self.positive_label = 1

        
        fpr, tpr, thresholds = roc_curve(y_true=self.observation_df, y_score=self.predictions_df, pos_label=self.positive_label)
        youden_j = tpr - fpr

        max_index = youden_j.argmax()
        optimal_threshold = thresholds[max_index]
        print("Optimal threshold by Youden's J: ", optimal_threshold)
        
        if optimal_threshold not in self.predictions_df.unique():
            print('----- \n WARNING: OPTIMAL THRESHOLD NOT IN OBSERVED SCORES. IF USING DISCRETE THRESHOLDS, THIS IS A PROBLEM AND SUGGESTS ARG:positive_label IS INCORRECT. \n')
        
        if self.threshold is None:
            self.threshold = optimal_threshold
        return optimal_threshold, youden_j


    def threshold_predictions(self, probabilities):
        """
        Converts probabilities into binary predictions based on the threshold.

        Args:
            probabilities: Array of predicted probabilities.

        Returns:
            Array of binary predictions.
        """
        print(f"Positive cases are being set to those under: {self.threshold}. \n Ensure this is logically sound.")
        return (probabilities < self.threshold).astype(int)

    def get_predictions(self):
        """
        Generates predictions from the fitted model or predictions DataFrame.
        """
        if self.predictions_df is not None:
            self.raw_predictions = self.predictions_df.to_numpy()
        else:
            self.raw_predictions = self.results.predict()
        self.predictions = self.threshold_predictions(self.raw_predictions)

    def get_observations(self):
        """
        Converts the observation DataFrame into a flattened array.
        """
        self.observations = self.observation_df.to_numpy().flatten()

    def calculate_confusion_matrix(self):
        """
        Calculates the confusion matrix from predictions and observations.
        """
        self.conf_matrix = confusion_matrix(self.observations, self.predictions, normalize=self.normalization)

    def extract_confusion_components(self):
        """
        Extracts True Positive, True Negative, False Positive, and False Negative counts from the confusion matrix.
        """
        self.TP = self.conf_matrix[1, 1]
        self.TN = self.conf_matrix[0, 0]
        self.FP = self.conf_matrix[0, 1]
        self.FN = self.conf_matrix[1, 0]

    def calculate_metrics(self):
        """
        Calculates accuracy, sensitivity, specificity, PPV, and NPV based on the confusion matrix components.
        """
        self.accuracy = (self.TP + self.TN) / (self.TP + self.TN + self.FP + self.FN)
        self.sensitivity = self.TP / (self.TP + self.FN)
        self.specificity = self.TN / (self.TN + self.FP)
        self.PPV = self.TP / (self.TP + self.FP)
        self.NPV = self.TN / (self.TN + self.FN)

    def display_metrics(self):
        print("Accuracy:", self.accuracy)
        print("Sensitivity:", self.sensitivity)
        print("Specificity:", self.specificity)
        print("PPV:", self.PPV)
        print("NPV:", self.NPV)

    def plot_confusion_matrix(self):
        """
        Plots a heatmap of the confusion matrix.
        """
        plt.figure(figsize=(6, 6))
        sns.heatmap(self.conf_matrix, annot=True, fmt=".2f", cmap="Blues")
        plt.xlabel("Predicted label")
        plt.ylabel("True label")
        plt.title("Confusion Matrix")
        if self.out_dir:
            os.makedirs(self.out_dir, exist_ok=True)
            plt.savefig(os.path.join(self.out_dir, "confusion_matrix.png"))
        plt.show()
        
    def bootstrap_confidence_intervals(self, n_bootstraps=1000, alpha=0.05):
        """
        Calculates confidence intervals for evaluation metrics using bootstrapping.

        Args:
            n_bootstraps: Number of bootstrap samples.
            alpha: Significance level for confidence intervals.

        Returns:
            dict: Confidence intervals for AUC, accuracy, sensitivity, specificity, PPV, and NPV.
        """
        np.random.seed(42)  # For reproducibility
        bootstrapped_metrics = {
            'auc': [],
            'accuracy': [],
            'sensitivity': [],
            'specificity': [],
            'ppv': [],
            'npv': []
        }

        data = pd.DataFrame({'observations': self.observations, 'predictions': self.predictions, 'raw_predictions': self.raw_predictions.flatten()})
        
        for _ in range(n_bootstraps):
            bootstrap_sample = data.sample(n=len(data), replace=True)
            y_true = bootstrap_sample['observations'].to_numpy()
            y_pred = bootstrap_sample['predictions'].to_numpy()
            y_scores = bootstrap_sample['raw_predictions'].to_numpy()
            
            # AUC
            fpr, tpr, _ = roc_curve(y_true, y_scores, pos_label=self.positive_label)
            roc_auc = auc(fpr, tpr)
            bootstrapped_metrics['auc'].append(roc_auc)
            
            # Confusion matrix and derived metrics
            conf_matrix = confusion_matrix(y_true, y_pred)
            TP = conf_matrix[1, 1]
            TN = conf_matrix[0, 0]
            FP = conf_matrix[0, 1]
            FN = conf_matrix[1, 0]

            accuracy = (TP + TN) / (TP + TN + FP + FN)
            sensitivity = TP / (TP + FN) if (TP + FN) > 0 else 0
            specificity = TN / (TN + FP) if (TN + FP) > 0 else 0
            ppv = TP / (TP + FP) if (TP + FP) > 0 else 0
            npv = TN / (TN + FN) if (TN + FN) > 0 else 0

            bootstrapped_metrics['accuracy'].append(accuracy)
            bootstrapped_metrics['sensitivity'].append(sensitivity)
            bootstrapped_metrics['specificity'].append(specificity)
            bootstrapped_metrics['ppv'].append(ppv)
            bootstrapped_metrics['npv'].append(npv)
            
        self.bootstrapped_metrics = bootstrapped_metrics
        confidence_intervals = {}
        for metric in bootstrapped_metrics:
            sorted_metrics = np.sort(bootstrapped_metrics[metric])
            lower_bound = np.percentile(sorted_metrics, alpha / 2 * 100)
            upper_bound = np.percentile(sorted_metrics, (1 - alpha / 2) * 100)
            confidence_intervals[metric] = (lower_bound, upper_bound)
        return confidence_intervals
    
    def calculate_p_values(self, random_classifier_metrics):
        """
        Calculates p-values for bootstrapped metrics compared to the random classifier's metrics.

        Args:
            bootstrapped_metrics: Dictionary containing lists of bootstrapped metric values.
            random_classifier_metrics: Dictionary containing the random classifier's metric values.

        Returns:
            dict: P-values for each metric.
        """
        p_values = {}
        for metric in self.bootstrapped_metrics:
            metric_values = self.bootstrapped_metrics[metric]
            random_value = random_classifier_metrics[metric]
            p_value = np.mean(np.array(metric_values) < random_value)
            p_values[metric] = p_value
        return p_values
    def run(self):
        """
        Orchestrates the calculation and display of all evaluation metrics.
        """
        self.find_optimal_threshold()
        self.get_predictions()
        self.get_observations()
        self.calculate_confusion_matrix()
        self.extract_confusion_components()
        self.calculate_metrics()
        self.display_metrics()
        self.plot_confusion_matrix()


In [None]:
data_df.columns

In [None]:
# def __init__(self, fitted_model=None, observation_df=None, normalization=None, predictions_df=None, out_dir=None, threshold=0.5):
bce = BinaryClassificationEvaluation(fitted_model=None, normalization='pred', 
                                     predictions_df=data_df['Classification'], observation_df=data_df['Cognitive_Status_Code'], 
                                     threshold=None, positive_label=0)
bce.run()

Derive the Optimal Threshold and Derive Classification Metrics

In [None]:
# Get confidence intervals
confidence_intervals = bce.bootstrap_confidence_intervals()
print("Confidence Intervals:", confidence_intervals)

Get P-Values
- The dictionary of random classifier baselines are provided in section 05
    - {'Sensitivity': 0, 'Specificity': 0.5, 'Precision': 0.0, 'PPV': 0.0, 'NPV': 1.0, 'Accuracy': 0.5}

In [None]:
random_baseline = {'Sensitivity': 0, 'Specificity': 0.5, 'Precision': 0.0, 'PPV': 0.0, 'NPV': 1.0, 'Accuracy': 0.5}

In [None]:
random_baseline = {'sensitivity': 0, 'specificity': 0.5,  'ppv': 0.0, 'npv': 0.63, 'accuracy': 0.5, 'auc': 0.5}
bce.calculate_p_values(random_baseline)

# 04 - Evaluate a Multiclass Classifier
- If you have ground truths and some sort of continuous variable which can be used for classification, this will help you visualize that.
```
Args:
    fitted_model: The result object from a fitted statsmodels MNLogit model.
    outcome_matrix: A pandas DataFrame with the true class outcomes in one-hot encoded format.
    normalization: Normalization method for the confusion matrix (None, 'true', 'pred', 'all').
    predictions_df (pd.DataFrame, Optional): If no fitted_model provided, pass a dataframe with predictions. 
        - can contain probabilities or dummy-coded predictions.
    thresholds (dict, Optional): a dictionary mapping the index of the threshold to the probability threshold to make that classification. 
    assign_labels (bool, Optional): Scipy's confusion matrix orders by minimum to maximum occurence of the predictions. It will output the confusion matrix by this. 
            If set to False, we will organize our confusion matrix as per scipy's order. 
```
- The ROC considers clasisfications acoss ALL POSSIBLE PROBABILITIES, demonstrating what is ultiamtely accomplishable at the best possible threshold

- First curve is ROC for classifcation of each class with respect to all other classes
- Second Curve (Macro Average) is basically a meta-analytic ROC with equal weight per class.
- Third Curve (Micro Average) is basically a meta-analytic ROC with weight proportional to class sample

If you do not have predictions and observations organized as:
```
Observations (columns are a given classificaiton)
    [[0, 1, 0, 0],
      1, 0 ,0 ,0]]
predictions (columns are a given classification, value is the p(class))      
    It will expect predictions dataframeto take the form where prediction for a given classificition is an array of probability:
    [[0.2, 0.7, 0.1, 0.0],
      0.9, 0.05, 0.05, 0.0]]
```
then shape them here:

In [None]:
import pandas as pd
def create_observations_df(df, outcome_column):
    # One-hot encode the binary outcomes
    observations_df = pd.get_dummies(df[outcome_column])
    return observations_df

In [None]:
data_df.columns

**Get Observations DF Prepared**

Enter the column which has ground-truth in it. Can be numeric or string. 

In [None]:
classification_column = 'DX'

In [None]:
observations_df = create_observations_df(data_df, 'DX')
observations_df

Option A - Generate a Predictions DF

In [None]:
predictions_df = data_df.loc[:, observations_df.columns] # set columns to equal observations of observations_df. 
predictions_df

Option B - Fit a Multinomial Logit

In [None]:
# Go use calvin_utils_project/notebooks/statistical_notebooks/regression_notebooks/logistic_regression_notebook.ipynb to do this. 

**Run the Evaluation**

In [None]:
from calvin_utils.statistical_utils.classification_statistics import ComprehensiveMulticlassROC
evaluator = ComprehensiveMulticlassROC(fitted_model=None, observation_df=observations_df, predictions_df=predictions_df, normalization='true',
                                     thresholds=None, out_dir=out_dir)
evaluator.run()

# 04 - Evaluate a Random Classifier
- Set the mapping_dict so keys are columns and values are the positive hits

In [None]:
data_df.columns

In [None]:
mapping_dict = {'Cognitive_Status':'Correct'}

In [None]:
import os
import warnings
import numpy as np
from math import pi
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.utils import resample
from calvin_utils.statistical_utils.classification_statistics import BinaryDataMetricsPlotter

class EvaluateRandomClassifier(BinaryDataMetricsPlotter):
    def __init__(self, dataframe, mapping_dict, out_dir=None, cm_normalization=None, n_classes=2):
        '''
        mapping_dict (dict): a dicitonary where keys represent the column with observed classes. Keys are the observation corresponding to a hit. 
        '''
        self.dataframe = dataframe
        self.mapping_dict = mapping_dict
        self.specified_metrics = None
        self.save_dir = out_dir
        self.n_classes=n_classes
        self.metrics = self.calculate_metrics()
        self.confusion_matrices = self.get_confusion_matrices(normalize=cm_normalization)

    def calculate_metrics(self):
        metrics = {}
        for class_col, positive_class in self.mapping_dict.items():
            if len(self.dataframe[class_col].unique()) != 2:
                raise ValueError("Only 2 classes acceptable in this class. Do not pass columns with more than 2 classes.")
            tp = 1/self.n_classes * (self.dataframe[class_col]==positive_class).sum() / (self.dataframe[class_col]).count()
            tn = (1 - 1/self.n_classes) * (self.dataframe[class_col]!=positive_class).sum() / (self.dataframe[class_col]).count()
            fp = 1/self.n_classes * (self.dataframe[class_col]!=positive_class).sum() / (self.dataframe[class_col]).count()
            fn = (1 - 1/self.n_classes) * (self.dataframe[class_col]==positive_class).sum() / (self.dataframe[class_col]).count()
            
            sensitivity = tp / (tp + fn) if (tp + fn) != 0 else 0
            specificity = tn / (tn + fp) if (tn + fp) != 0 else 0
            ppv = tp / (tp + fp) if (tp + fp) != 0 else 0  # Positive Predictive Value
            npv = tn / (tn + fn) if (tn + fn) != 0 else 0  # Negative Predictive Value
            acc = (tp + tn) / (tp + tn + fp + fn)
            precision = tp / (tp + fp) if (tp + fp) != 0 else 0
            recall = sensitivity  # Recall is the same as sensitivity
            f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) != 0 else 0

            metrics[class_col] = {
                'Sensitivity': sensitivity,
                'Specificity': specificity,
                'Precision': precision,
                'PPV': ppv,
                'NPV': npv,
                'Accuracy': acc,
                'F1 Score': f1,
                'TP': tp,
                'TN': tn,
                'FP': fp,
                'FN': fn
            }
            
        return metrics
    
    def get_confusion_matrices(self, normalize=None):
        confusion_matrices = {}
        for class_col in self.mapping_dict.keys():
            tp = self.metrics[class_col]['TP']
            tn = self.metrics[class_col]['TN']
            fp = self.metrics[class_col]['FP']
            fn = self.metrics[class_col]['FN']
            cm = np.array([[tn, fp], [fn, tp]])
            if normalize == 'true':
                cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
            elif normalize == 'pred':
                cm = cm.astype('float') / cm.sum(axis=0)[np.newaxis, :]
            elif normalize == 'all':
                cm = cm.astype('float') / cm.sum()
            else: 
                pass
            confusion_matrices[class_col] = cm
        return confusion_matrices
    
    def plot_confusion_matrices(self):
        confusion_matrices = self.confusion_matrices
        num_matrices = len(confusion_matrices)
        fig, axes = plt.subplots(1, num_matrices, figsize=(6 * num_matrices, 6))
        
        if num_matrices == 1:
            axes = [axes]
            
        for ax, (class_col, cm) in zip(axes, confusion_matrices.items()):
            sns.heatmap(cm, annot=True, fmt='.2f', cmap='Blues', ax=ax,
                        xticklabels=['Negative', 'Positive'], yticklabels=['Negative', 'Positive'],
                        annot_kws={"size": 16})  # Set annotation font size
            ax.set_xlabel(f'Predicted: {self.mapping_dict[class_col]}', fontsize=16)
            ax.set_ylabel(f'Actual: {class_col}', fontsize=16)
            ax.set_title(f'Confusion Matrix for {class_col} vs {self.mapping_dict[class_col]}', fontsize=16)
            ax.tick_params(axis='both', which='major', labelsize=16)
            
        if self.save_dir is not None:
            subdir = "confusion_matrix"
            os.makedirs(os.path.join(self.save_dir, subdir), exist_ok=True)
            file_name_svg = "conf_matrix.svg"
            path_svg = os.path.join(self.save_dir, subdir, file_name_svg)
            plt.savefig(path_svg, format='svg')
            
        plt.tight_layout()
        plt.show()
        
    def visualize_probability_influence(self):
        """
        Visualize how the predicted probabilities influence the random classifier.
        """
        for class_col, metrics in self.metrics.items():
            # Extract the required metrics
            tp = int(metrics['TP'])
            tn = int(metrics['TN'])
            fp = int(metrics['FP'])
            fn = int(metrics['FN'])

            # Simulate probabilities based on the metric values
            total = tp + tn + fp + fn
            pred_prob = np.random.rand(total)
            true_prob = np.concatenate((
                np.repeat(1, tp + fn),  # True positives and false negatives
                np.repeat(0, tn + fp)   # True negatives and false positives
            ))

            # Simulate true labels based on probabilities
            true_labels = np.concatenate((
                np.repeat(1, tp), np.repeat(0, fn),
                np.repeat(0, tn), np.repeat(1, fp)
            ))

            # Determine predicted labels based on a threshold of 0.5 for visualization purposes
            pred_labels = (pred_prob > 0.5).astype(int)

            plt.figure(figsize=(12, 8))

            # Determine color based on correct classification
            correct_classification = true_labels == pred_labels
            colors = np.array(sns.color_palette("tab10"))[correct_classification.astype(int)]

            # Scatter plot
            scatter = plt.scatter(pred_prob, true_prob, c=colors, alpha=0.6, edgecolor='w', linewidth=0.5)

            # Colorbar settings
            cbar = plt.colorbar(scatter, boundaries=[0, 0.5, 1], ticks=[0.25, 0.75])
            cbar.set_ticklabels(['Incorrect', 'Correct'])
            cbar.set_label('Classification Result', fontsize=12)

            # Plot settings
            plt.xlabel("Predicted Probability of Positive Class", fontsize=14)
            plt.ylabel("True Probability of Positive Class", fontsize=14)
            plt.title(f"Probability Influence on Random Classifier for {class_col}", fontsize=16)

            # Grid and layout settings
            plt.grid(True, linestyle='--', alpha=0.7)
            plt.tight_layout()

            # Show the plot
            if self.save_dir is not None:
                subdir = "probability_influence"
                os.makedirs(os.path.join(self.save_dir, subdir), exist_ok=True)
                file_name_svg = f"{class_col}_probability_influence.svg"
                path_svg = os.path.join(self.save_dir, subdir, file_name_svg)
                plt.savefig(path_svg, format='svg')

            plt.show()

    def plot_radar_charts(self):
        if self.specified_metrics is None:
            self.specified_metrics = ['Accuracy', 'Sensitivity', 'Specificity', 'PPV', 'NPV']
        
        color_map = sns.color_palette("tab10", len(self.mapping_dict))

        for idx, (class_col, metric_values) in enumerate(self.metrics.items()):
            plt.figure(figsize=(6, 6))
            ax = plt.subplot(111, polar=True)

            categories = self.specified_metrics
            N = len(categories)

            angles = [n / float(N) * 2 * pi for n in range(N)]
            angles += angles[:1]

            ax.set_theta_offset(pi / 2)
            ax.set_theta_direction(-1)

            plt.xticks(angles[:-1], categories)

            ax.set_rlabel_position(0)
            plt.yticks([0.2, 0.4, 0.6, 0.8], ["0.2", "0.4", "0.6", "0.8"], color="black", size=12)
            plt.ylim(0, 1)

            values = [metric_values[metric] for metric in self.specified_metrics]
            values += values[:1]
            ax.plot(angles, values, linewidth=1, linestyle='solid', label=f'{class_col}', color=color_map[idx % len(color_map)])
            ax.fill(angles, values, alpha=0.25, color=color_map[idx % len(color_map)])

            plt.legend(loc='upper right', bbox_to_anchor=(0.1, 0.1))
            plt.title(f'Metrics for "{class_col}"', size=15, color='black', y=1.1)

            if self.save_dir is not None:
                radar_plots_subdir = "radar_plots"
                os.makedirs(os.path.join(self.save_dir, radar_plots_subdir), exist_ok=True)
                file_name_svg = f"{class_col}_radar.svg"
                path_svg = os.path.join(self.save_dir, radar_plots_subdir, file_name_svg)
                plt.savefig(path_svg, format='svg')
            plt.show()

            plt.close()

    def run(self):
        self.plot_confusion_matrices()
        self.plot_radar_charts()
        self.visualize_probability_influence()


In [None]:
random_classifier = EvaluateRandomClassifier(dataframe=data_df, mapping_dict=mapping_dict, out_dir=None, cm_normalization=None,
                                                    n_classes=4)
random_classifier.run()
print(random_classifier.metrics)


# 05 - Scatterplot a Value to a Classification

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from scipy.special import expit

# Define logistic function
def logistic_function(x, beta_0, beta_1):
    return expit(beta_0 + beta_1 * x)

# Function to create scatter plot and fit sigmoid using logistic regression
def plot_with_logistic_regression(df, dv_col, iv_col):
    # Extract data
    x = df[iv_col].values.reshape(-1, 1)
    y = df[dv_col].values

    # Normalize the dependent variable to be between 0 and 1
    y_min, y_max = y.min(), y.max()
    y_normalized = (y - y_min) / (y_max - y_min)

    # Fit logistic regression model
    model = LogisticRegression()
    model.fit(x, y_normalized)

    # Create fitted sigmoid curve data
    x_fit = np.linspace(x.min(), x.max(), 100).reshape(-1, 1)
    y_fit = logistic_function(x_fit, model.intercept_[0], model.coef_[0][0])
    
    # Create a JointGrid for the scatter plot and KDEs
    g = sns.JointGrid(x=df[iv_col], y=y_normalized, space=0, height=8, ratio=5)
    g.plot_joint(sns.scatterplot, color="tab:blue", alpha=0.6)
    g.plot_joint(plt.plot, x_fit, y_fit, color='blue', label='Logistic Fit')
    g.plot_marginals(sns.kdeplot, fill=True, color='tab:blue')

    # Labels and title
    g.set_axis_labels(iv_col, f'Normalized {dv_col}', fontsize=14)
    plt.suptitle(f'Scatter Plot with Logistic Regression Fit and KDE: {iv_col} vs {dv_col}', fontsize=16)
    plt.legend()
    plt.tight_layout()
    plt.subplots_adjust(top=0.95)

    # # Show the plot
    # plt.show()

    # Plot
    # plt.figure(figsize=(6, 6))
    # sns.scatterplot(x=x.flatten(), y=y_normalized, palette="tab10", label=None)
    # plt.plot(x_fit, y_fit, color='blue', label='Logistic Fit')



In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from scipy.special import expit
import os
# Define logistic function
def logistic_function(x, beta_0, beta_1):
    return expit(beta_0 + beta_1 * x)

# Function to create scatter plot and fit sigmoid using logistic regression with KDE plots
def plot_with_logistic_regression_and_kde(df, dv_col, iv_col, xlim=None, ylim=None, out_dir=None):
    # Extract data
    x = df[iv_col].values.reshape(-1, 1)
    y = df[dv_col].values

    # Normalize the dependent variable to be between 0 and 1
    y_min, y_max = y.min(), y.max()
    y_normalized = (y - y_min) / (y_max - y_min)

    # Fit logistic regression model
    model = LogisticRegression()
    model.fit(x, y_normalized)

    # Create fitted sigmoid curve data
    x_fit = np.linspace(x.min(), x.max(), 100).reshape(-1, 1)
    y_fit = logistic_function(x_fit, model.intercept_[0], model.coef_[0][0])

    # Create a JointGrid for the scatter plot and KDEs
    g = sns.JointGrid(x=df[iv_col], y=y_normalized, space=0, height=8, ratio=5)
    g.plot_joint(sns.scatterplot, alpha=0.6)
    g.plot_marginals(sns.kdeplot, fill=True, color='tab:blue')
    
    # Plot the logistic fit line separately
    g.ax_joint.plot(x_fit, y_fit, color='blue', label='Logistic Fit')

    # Labels and title
    g.set_axis_labels(iv_col, f'{dv_col}', fontsize=16)
    plt.suptitle(f'Scatter Plot with Logistic Regression Fit and KDE:\n{iv_col} vs {dv_col}', fontsize=20)

    # plt.subplots_adjust(top=0.95)
    
    # Labels and title
    plt.tight_layout()
    plt.yticks([0,1])
    if xlim:
        g.ax_joint.set_xlim(xlim)
    if ylim:
        g.ax_joint.set_ylim(ylim)

    # Show the plot
    if out_dir is not None:
        os.makedirs(out_dir, exist_ok=True)
        plt.savefig(os.path.join(out_dir, 'sigmoid_scatter.svg'))
    plt.show()


In [None]:
out_dir = '/Users/cu135/Library/CloudStorage/OneDrive-Personal/OneDrive_Documents/Research/2023/roca/figures/cognitive_classifier/clock'

In [None]:
data_df.columns

In [None]:
plot_with_logistic_regression_and_kde(data_df, dv_col='Cognitively Intact', iv_col='Clock Prediction', ylim=(-0.05, 1.05), xlim=(-0.05,1.05), out_dir=out_dir)

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Function to create jitter plot with KDE plots
def plot_jitter_with_kde(df, category_col, value_col):
    # Create a FacetGrid
    g = sns.FacetGrid(df, col=category_col, col_wrap=4, sharex=False, sharey=False, height=4)
    
    # Map the stripplot (jitter plot) to the FacetGrid
    g.map(sns.stripplot, value_col, jitter=True, alpha=0.6, color="tab:blue")
    
    # Map the KDE plot to the FacetGrid
    g.map(sns.kdeplot, value_col, fill=True, color="tab:blue", alpha=0.6)
    
    # Adjust the layout
    g.set_axis_labels(value_col, '')
    g.set_titles(col_template='{col_name}')
    g.fig.subplots_adjust(top=0.9)
    g.fig.suptitle(f'Jitter Plot with KDE: {value_col} across {category_col}', fontsize=16)

    # Show the plot
    plt.show()

plot_jitter_with_kde(data_df, 'Cognitively Intact', 'Clock Prediction')


Temp

In [None]:
data_df.columns

In [None]:
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score
from sklearn.utils import resample
import warnings
warnings.filterwarnings("ignore")
def bootstrap_metrics(data_df, true_col, pred_col, n_iterations=1000):
    def calculate_metrics(true, pred):
        accuracy = accuracy_score(true, pred)
        sensitivity = recall_score(true, pred, zero_division=0)  # Sensitivity is the same as recall
        specificity = recall_score(true, pred, pos_label=0, zero_division=0)
        precision = precision_score(true, pred, zero_division=0)
        f1 = f1_score(true, pred, zero_division=0)
        return accuracy, sensitivity, specificity, precision, f1
    # Suppress specific warnings
        
    # Observed metrics
    obs_metrics = calculate_metrics(data_df[true_col], data_df[pred_col])
    
    # Bootstrapping
    bootstrap_metrics = []
    for _ in range(n_iterations):
        boot_df = resample(data_df)
        metrics = calculate_metrics(boot_df[true_col], boot_df[pred_col])
        bootstrap_metrics.append(metrics)
    
    bootstrap_metrics = np.array(bootstrap_metrics)
    
    # Confidence intervals
    conf_intervals = np.percentile(bootstrap_metrics, [2.5, 97.5], axis=0)
    
    # Print observed metrics and confidence intervals
    print(f"Observed Metrics:\nAccuracy: {obs_metrics[0]}, Sensitivity: {obs_metrics[1]}, Specificity: {obs_metrics[2]}, Precision: {obs_metrics[3]}, F1 Score: {obs_metrics[4]}")
    print(f"\nConfidence Intervals (95%):")
    print(f"Accuracy: {conf_intervals[:, 0]}")
    print(f"Sensitivity: {conf_intervals[:, 1]}")
    print(f"Specificity: {conf_intervals[:, 2]}")
    print(f"Precision: {conf_intervals[:, 3]}")
    print(f"F1 Score: {conf_intervals[:, 4]}")

# Example usage
bootstrap_metrics(data_df, 'cube_actual', 'cube_prediction')
