
# Credit Card Churn Prediction
---  

## Project Description  
In this project, I will develop a machine learning model – specifically, **a custom neural network architecture built with PyTorch** – to predict the probability of a customer canceling their credit card service (*churn*). The model will follow a **supervised learning** approach, using a labeled dataset where:  
- **Customers who left the service** (*churn*) are labeled as **1**.  
- **Active customers** (*non-churn*) are labeled as **0**.  

---  

### CRISP-DM Methodology  
The project will follow the CRISP-DM (*Cross-Industry Standard Process for Data Mining*) framework:  

| **Stage** | **Objective** | **Key Actions** |  
|-----------|---------------|------------------|  
| **1. Business Understanding** | Define the impact of churn prediction on customer retention. | - Identify costs of false negatives.<br>- Align metrics with business KPIs. |  
| **2. Data Understanding** | Analyze data structure, quality, and variable relationships. | - Exploratory Data Analysis (EDA).<br>- Outlier and correlation detection. |  
| **3. Data Preparation** | Prepare data for model training. | - Split training and test data.<br>- Remove redundant variables. |  
| **4. Modeling** | Train and compare classical models and neural networks. | - Random Forest/Logistic Regression (baseline).<br>- PyTorch neural network (focus on generalization). |  
| **5. Evaluation** | Validate performance with business-oriented metrics. | - AUC-ROC, confusion matrix.<br>- Simulate financial impact. |  
| **6. Deployment** | Deploy the model for production use. | - Build a final churn prediction model with customer behavior indicators. |  

*This notebook covers the Modeling, Evaluation, and Deployment stages.*  

---  


## Installs:


#### Install in the cluster > Libraries > Install new > PyPI  insert: >>>:

- scikit-learn==1.7.0

- torch==2.7.1

- torchmetrics==1.7.3

- tqdm == 4.67.1

- ray[tune]==2.47.1

- seaborn == 0.13.2

- threadpoolctl==3.6.0

- optuna==4.4.0






## Imports:

In [0]:
# Data Loading and Modeling:
# Pyspark.Api Pandas
from pyspark import pandas as ps
# Pandas
import pandas as pd
# Numpy
import numpy as np

# Models of Machine Learning:
# Scikit-Learn Preprocessing / Metrics
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder, StandardScaler, RobustScaler, MinMaxScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import StratifiedKFold, cross_val_score
# Scikit-Learn Models
from sklearn.tree import DecisionTreeClassifier 
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
# PyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
# Torch Metrics
from torchmetrics.classification import BinaryAccuracy, BinaryAUROC, BinaryRecall, BinaryF1Score, BinaryConfusionMatrix, BinaryPrecision, BinaryNegativePredictiveValue, BinaryROC, BinaryPrecisionRecallCurve, BinaryAveragePrecision
from torchmetrics import MetricCollection

# Hypertunning Pytorch:
# Ray Tunner/Optuna
from ray import tune
from ray.tune import Checkpoint, Tuner, TuneConfig, RunConfig
from ray.tune.search.optuna import OptunaSearch
import ray.cloudpickle as pickle
# Tqdm
from tqdm import tqdm

# Graphics:
# Matplotlib
import matplotlib.pyplot as plt
# Seaborn
import seaborn as sns

# Python:
# Time
import time
# Random
import random
# Partial
from functools import partial
# OS
import os
# Tempfile
import tempfile
# Path
from pathlib import Path
# Warnings
import warnings

## Functions

### Loading Data

In [0]:
class DataSparkPS:
    """
    A helper class for managing data loading using PySpark Pandas (Koalas),
    with automatic conversion to Pandas DataFrame.

    This class facilitates loading CSV or Parquet files into memory using
    `pyspark.pandas` (Koalas) and converts them into standard `pandas.DataFrame`
    for further analysis or modeling.

    Attributes:
        file_location (str): Path to the data file.
        dataframe (Optional[pd.DataFrame]): Loaded dataset in Pandas format.
    """
    # Init
    def __init__(
        self,
        file_location: str,
        dataframe: ps.DataFrame = None,
    ):
        try:

            if dataframe is not None:
                self.dataframe = dataframe

            self.file_location = file_location

        except Exception as e:
            print(f'[ERROR] Failed to initialize dataframe and path: {e}')

    # Load Data
    def load_data(
        self,
        file_type: str = 'parquet',
    ) -> pd.DataFrame:
        """
        Loads data from a file and stores it as a Pandas DataFrame.

        This method uses PySpark Pandas (`pyspark.pandas`, formerly Koalas) to read
        the specified file (CSV or Parquet) and then converts it into a `pandas.DataFrame`.

        Args:
            file_type (str): The file format to load. Options are:
                - 'parquet': Loads a Parquet file.
                - 'csv': Loads a CSV file.

        Returns:
            Optional[pd.DataFrame]: The loaded data in Pandas format, or None if an error occurs.

        Raises:
            ValueError: If the provided file_type is unsupported.
            Exception: For any unexpected issues during file loading.
        """
        try:

            if file_type == 'parquet':
                dataframe = ps.read_parquet(self.file_location)
            
            elif file_type == 'csv':
                dataframe = ps.read_csv(self.file_location)
            
            else:
                raise ValueError(f"Unsupported file type '{file_type}'. Use 'csv' or parquet.")

            print(f'✅ File loaded successfully from: {self.file_location}')
            
            # Adjusting the dataframe to the pandas as pd type
            self.dataframe = dataframe.to_pandas()
            return self.dataframe 
        
        except Exception as e:
            print(f"[ERROR]  Error loading file '{self.file_location}': {str(e)}") 

### Graphics

In [0]:
class GraphicsData:

    # Init
    def __init__(
        self, 
        data: pd.DataFrame,
        ):

        try:
            # Entry checks
            if data.empty:
                raise ValueError('The provided DataFrame is empty.')

            self.data = data

        except Exception  as e:
            print(f'[Error] Failed to load Dataframe : {str(e)}')
    

    ###_initializer_subplot_grid Function ###
    def _initializer_subplot_grid(
        self, 
        num_columns, 
        figsize_per_row
    ):
        """
        Initializes and returns a standardized matplotlib subplot grid layout.

        This utility method calculates the required number of rows based on 
        the number of variables in the dataset and the desired number of 
        columns per row. It then creates a grid of subplots accordingly and 
        applies a consistent styling.

        Args:
            num_columns (int): Number of subplots per row.
            figsize_per_row (int): Vertical size (height) per row in the final figure.

        Returns:
            tuple:
                - fig (matplotlib.figure.Figure): The full matplotlib figure object.
                - ax (np.ndarray of matplotlib.axes._subplots.AxesSubplot): Flattened array of subplot axes.
        """
        num_vars = len(self.data.columns)
        num_rows = (num_vars + num_columns - 1) // num_columns

        plt.rc('font', size = 12)
        fig, ax = plt.subplots(num_rows, num_columns, figsize = (30, num_rows * figsize_per_row))
        ax = ax.flatten()
        sns.set(style = 'whitegrid')

        return fig, ax

    ###_finalize_subplot_layout Function ###
    def _finalize_subplot_layout(
        self,
        fig,
        ax,
        i: int,
        title: str = None,
        fontsize: int = 30,
    ):
        """
        Finalizes and displays a matplotlib figure by adjusting layout and removing unused subplots.

        This method is used after plotting multiple subplots to:
        - Remove any unused axes in the grid.
        - Set a central title for the entire figure.
        - Automatically adjust spacing and layout for better readability.
        - Display the resulting plot.

        Args:
            fig (matplotlib.figure.Figure): The matplotlib figure object containing the subplots.
            ax (np.ndarray of matplotlib.axes.Axes): Array of axes (flattened) for all subplots.
            i (int): Index of the last used subplot (all subplots after this will be removed).
            title (str, optional): Title to be displayed at the top of the entire figure.
            fontsize (int, optional): Font size of the overall title. Default is 30.
        """
        for j in range(i + 1, len(ax)):
                fig.delaxes(ax[j])
        
        plt.suptitle(title, fontsize = fontsize, fontweight = 'bold')
        plt.tight_layout(rect = [0, 0, 1, 0.97])
        plt.show()
    
    ###_format_single_ax Function ###
    def _format_single_ax(
        self, 
        ax,
        title: str = None,
        fontsize: int = 20,
        linewidth: float = 0.9
    ):

        """
        Applies standard formatting to a single subplot axis.

        This method configures a single axis by:
        - Setting the title with specified font size and bold style.
        - Hiding the x and y axis labels.
        - Adding dashed grid lines for both axes with configurable line width.

        Args:
            ax (matplotlib.axes.Axes): The axis to be formatted.
            title (str, optional): Title text for the axis. Defaults to None.
            fontsize (int, optional): Font size for the title. Defaults to 20.
            linewidth (float, optional): Width of the dashed grid lines. Defaults to 0.9.
        """
        ax.set_title(title, fontsize = fontsize, fontweight = 'bold')
        ax.set_xlabel(None)
        ax.set_ylabel(None)
        ax.grid(axis = 'y', which = 'major', linestyle = '--', linewidth = linewidth)
        ax.grid(axis = 'x', which = 'major', linestyle = '--', linewidth = linewidth)

    ### Numerical histograms Function ###
    def numerical_histograms(
        self, 
        num_columns: int = 3,
        figsize_per_row: int = 6,
        color: str = '#a2bffe',
        hue: str = None,
        palette: list = ['#b0ff9d', '#db5856'],
        title: str = 'Histograms of Numerical Variables',
    ):
        """
        Plots histograms with KDE (Kernel Density Estimation) for all numerical columns in the dataset.

        Optionally groups the histograms by a categorical target variable using different colors (hue).
        Useful for visualizing the distribution of numerical features and how they differ between groups.

        Args:
            num_columns (int): Number of plots per row in the subplot grid.
            figsize_per_row (int): Height of each row in inches (controls vertical spacing).
            color (str): Default color for histograms when `hue` is not specified.
            hue (str, optional): Name of the column used for grouping (e.g., 'churn_target'). Must be categorical.
            palette (list): List of colors for hue levels. Only used if `hue` is provided.
            title (str): Title of the entire figure layout.

        Raises:
            Exception: If plotting fails due to missing columns, incorrect types, or rendering errors.
        """
        try:
            # Entry checks
            numeric_cols = self.data.select_dtypes(include = 'number').columns.tolist()
            if hue and hue in numeric_cols:
                numeric_cols.remove(hue)

            # Define AX and Fig
            fig, ax = self._initializer_subplot_grid(num_columns, figsize_per_row)

            for i, column in enumerate(numeric_cols):
                sns.histplot(
                    data = self.data,
                    x = column,
                    kde = True,
                    hue = hue,
                    palette = palette if hue else None,
                    edgecolor = 'black',
                    alpha = 0.4 if hue else 0.7,
                    color = None if hue else color,
                    ax = ax[i],
                )
                # Config Ax's
                self._format_single_ax(ax[i], title = f'Histogram of variable: {column}')
                
            # Show Graphics
            self._finalize_subplot_layout(fig, ax, i, title = title)
        except Exception as e:
            print(f'[Error] Failed to generate numeric histograms: {str(e)}')

    ### Numerical Boxplots Function ###
    def numerical_boxplots(
        self, 
        hue: str = None, 
        num_columns: int = 3,
        figsize_per_row: int = 6,
        palette: list = ['#b0ff9d', '#db5856'],
        color: str = '#a2bffe',
        showfliers: bool = False,
        title: str = 'Boxplots of Numerical Variables',
        legend: list = []
    ):
        """
        Plots boxplots for each numerical variable in the dataset.

        Optionally groups the boxplots by a categorical hue variable (e.g., churn target), 
        allowing for comparison of distributions between groups. Helps identify outliers, 
        skewness, and variability in each feature.

        Args:
            hue (str, optional): Column name to group the boxplots (e.g., 'churn_target').
                                If None, individual boxplots are created without grouping.
            num_columns (int): Number of plots per row in the subplot grid.
            figsize_per_row (int): Height (in inches) of each row of plots.
            palette (list): Color palette to use when `hue` is provided.
            color (str): Single color to use when `hue` is not specified.
            showfliers (bool): Whether to display outlier points in the boxplots (default: False).
            title (str): Overall title for the subplot grid.
            legend (list): Custom legend labels to replace default tick labels when `hue` is present.

        Raises:
            ValueError: If the hue column is not found in the DataFrame.
            Exception: If plotting fails due to unexpected issues.
        """
        try:
            # Entry checks
            if hue and hue not in self.data.columns:
                raise ValueError(f"Column '{hue}' not in the DataFrame.")

            numeric_cols = self.data.select_dtypes(include = 'number').columns.tolist()
            if hue and hue in numeric_cols:
                numeric_cols.remove(hue)

            # Define AX and Fig
            fig, ax = self._initializer_subplot_grid(num_columns, figsize_per_row)

            for i, column in enumerate(numeric_cols):
                    sns.boxplot(
                        data = self.data,
                        x = hue if hue else column,
                        y = column if hue else None,
                        hue = hue if hue else None,
                        palette = palette if hue else None,
                        color = None if hue else color,
                        showfliers = showfliers,
                        legend = False,
                        ax = ax[i]
                    )

                    # Config Ax's
                    if len(legend) > 0:
                        ax[i].set_xticks([l for l in range(0, len(legend))])
                        ax[i].set_xticklabels(legend, fontsize = 16, fontweight = 'bold')

                    self._format_single_ax(ax[i], f'Box plot of variable: {column}')
                    ax[i].set_yticklabels([])
                    sns.despine(ax = ax[i], top = True, right = True, left = True, bottom = True)
            
            # Show Graphics
            self._finalize_subplot_layout(fig, ax, i, title = title)
        except Exception as e: 
            print(f'[ERROR] Failed to generate numerical boxplots: {str(e)}')
    
    ### Models Performance Barplots Function ###
    def models_performance_barplots(
        self,
        models_col: str = None,
        palette = None,
        title: str = 'Models Performance Comparison',
        num_columns: int = 1,
        figsize_per_row: int = 9
    ):
        """
        Generates bar plots to compare the performance of multiple models across different metrics.

        Args:
            models_col (str, optional): Column name containing model identifiers.
            palette (list or seaborn color palette, optional): Color palette for the bar plots.
                Defaults to a 'viridis' palette if None.
            title (str, optional): Main title for the figure. Defaults to 'Models Performance Comparison'.
            num_columns (int, optional): Number of subplot columns. Defaults to 1.
            figsize_per_row (int, optional): Height of each subplot row in inches. Defaults to 9.

        Raises:
            Exception: If there is an error generating the bar plots.

        Notes:
            - The method expects `self.data` to contain one column for model names
            and one or more columns with numeric performance metrics.
            - Each subplot will represent a different metric.
        """
        try:

            # Define palette
            if palette is None:
                palette = sns.color_palette('viridis', len(self.data[models_col].unique()))
            
            # Define AX and Fig
            fig, ax = self._initializer_subplot_grid(num_columns, figsize_per_row)
            # Ax Flatten
            ax = ax.flatten()

            # Iterate over metrics (excluding 'Model')
            for i, column in enumerate(self.data.drop(columns = models_col).columns):

                barplot = sns.barplot(
                    data = self.data,
                    x = models_col,
                    y = column,
                    hue = models_col,
                    dodge = False,
                    edgecolor = 'white',
                    saturation = 1,
                    palette = palette,
                    ax = ax[i]
                )

                # Formatting axis
                self._format_single_ax(ax[i], title = column, fontsize = 25)
                ax[i].tick_params(axis = 'x', labelsize = 20)
                ax[i].set_yticklabels([])
                sns.set(style = 'whitegrid')

                # Add values on bars
                for v in barplot.patches:
                    barplot.annotate(
                        f'{v.get_height():.4f}',
                        (v.get_x() + v.get_width() / 2., v.get_height() / 1.06),
                        ha = 'center',
                        va = 'top',
                        xytext = (0, 0),
                        textcoords = 'offset points',
                        fontsize = 20,
                        fontweight = 'bold',
                        color = 'white'
                    )

            # Finalize plot
            self._finalize_subplot_layout(fig, ax, i, title = title)     
        
        except Exception as e:
                print(f'[ERROR] Failed to generate model performance barplots: {str(e)}.')
                
    ### Plot KDE Predictions Function ###
    def plot_kde_predictions(
        self,
        palette: list = ['#12e193', '#feb308'],
        predictions: str = None,
        labels: str = None,
        title: str = 'Prediction Probabilities'
    ):
        """
        Plots the probability distributions of predictions using Kernel Density Estimation (KDE).

        Args:
            palette (list, optional): List of colors for each class. Defaults to ['#12e193', '#feb308'].
            predictions (str, optional): Column name containing the predicted probabilities.
            labels (str, optional): Column name containing the true labels.
            title (str, optional): Title for the plot. Defaults to 'Prediction Probabilities'.

        Raises:
            Exception: If there is an error generating the KDE plot.

        Notes:
            - The method expects `self.data` to contain the prediction probabilities and true labels.
            - KDE plots are useful for visualizing class separation in probabilistic predictions.
        """
        try:

            # Creating figures and setting font size
            plt.rc('font', size = 10)
            fig, ax = plt.subplots(figsize = (12, 4))

            sns.kdeplot(
                data = self.data,
                x = predictions,
                hue = labels,
                fill = True,
                alpha = 0.4,
                bw_adjust = 1,
                palette = palette,
                linewidth = 1,
                ax = ax
            )

            # Axis and title adjustments
            ax.set_title(title, fontsize = 14)
            ax.set_xlabel(None)
            ax.set_ylabel(None)
            ax.set_yticklabels([])

            ## Grade and style
            ax.grid(axis = 'y', linestyle = '--', linewidth = 0.3)
            ax.grid(axis = 'x', linestyle = '--', linewidth = 0.3)
            sns.set(style = 'whitegrid')
            sns.despine(ax = ax, top = True, right = True, left = True, bottom = False)

            # Show Graphics
            plt.tight_layout()
            plt.show()
        
        except Exception as e:
            print(f'[ERROR] Failed to generate prediction probability graph {str(e)}.')

    ### Plot ROC and Precision Curves Function ###
    def plot_roc_pr_curves(
        preds, 
        labels
    ):
        """
        Plots the Receiver Operating Characteristic (ROC) curve and the Precision-Recall (PR) curve
        for a binary classification model, along with their respective AUC metrics.

        Args:
            preds (Tensor or array-like): Predicted probabilities for the positive class.
            labels (Tensor or array-like): True binary labels (0 or 1).

        Raises:
            Exception: If there is an error generating or plotting the curves.

        Notes:
            - Computes and plots:
                * ROC curve with AUROC (Area Under ROC Curve).
                * Precision-Recall curve with AUPRC (Area Under Precision-Recall Curve).
            - Uses TorchMetrics for metric computation.
            - Useful for evaluating model performance, especially in imbalanced datasets.
        """
        try:
            # Metrics

            # ROC
            fpr, tpr, _ = BinaryROC()(preds, labels)
            auroc_value = BinaryAUROC()(preds, labels).item()

            # Precision-Recall
            precision_vals, recall_vals, _ = BinaryPrecisionRecallCurve()(preds, labels)
            auprc_value = BinaryAveragePrecision()(preds, labels).item()

            # Ax and Fig
            fig, axes = plt.subplots(1, 2, figsize = (12, 5))

            # ROC Curve
            axes[0].plot(fpr, tpr, color = '#1f77b4', lw = 2, label = f'ROC (AUROC = {auroc_value:.3f})')
            axes[0].plot([0, 1], [0, 1], color = 'gray', linestyle = '--', label = 'Random (AUROC = 0.5)')
            axes[0].set_xlabel('False Positive Rate', fontsize = 12)
            axes[0].set_ylabel('True Positive Rate', fontsize = 12)
            axes[0].set_title('ROC Curve', fontsize = 14, fontweight = 'bold')
            axes[0].legend(loc = 'lower right', fontsize = 10)

            # PR Curve
            axes[1].plot(recall_vals, precision_vals, color = 'darkorange', lw = 2, label = f'PR Curve (AUPRC = {auprc_value:.3f})')
            axes[1].set_xlabel('Recall', fontsize = 12)
            axes[1].set_ylabel('Precision', fontsize = 12)
            axes[1].set_title('Precision-Recall Curve', fontsize = 14, fontweight = 'bold')
            axes[1].legend(loc='upper right', fontsize = 10)

            # Grid
            for ax in axes:
                ax.grid(True, linestyle = '--', linewidth = 0.5)

            # Show Graphics
            plt.tight_layout()
            plt.show()

        except Exception as e:
            print(f'[ERROR] Failed to generate ROC and PR curves: {str(e)}.')

### Cross Validation Classic ML

In [0]:
def cross_validation_ml(
    models,
    x_train,
    y_train,
    n_splits: int = 5,
    random_state: int = 33,
    shuffle: bool = True,
    scoring: str = 'roc_auc',  
):
    """
    Performs stratified K-fold cross-validation on a list of machine learning models.

    This function receives multiple models (e.g., pipelines), trains and evaluates them
    using Stratified K-Fold cross-validation, and prints the mean and standard deviation 
    of the chosen scoring metric for each model.

    Args:
        models (list): A list of tuples with (model_name, model_pipeline), where each 
            model_pipeline follows the scikit-learn API (i.e., implements .fit and .predict).
        x_train (array-like): Feature set used for training and validation.
        y_train (array-like): Target labels corresponding to x_train.
        n_splits (int, optional): Number of folds for cross-validation. Defaults to 5.
        random_state (int, optional): Random seed for reproducibility. Defaults to 33.
        shuffle (bool, optional): Whether to shuffle the data before splitting. Defaults to True.
        scoring (str, optional): Scoring metric to use (e.g., 'roc_auc', 'accuracy', 'f1'). 
            Defaults to 'roc_auc'.

    Prints:
        For each model:
            - Name of the model
            - Mean cross-validation score
            - Standard deviation of the score across folds

    Returns:
        None
    """
    try:
        
        # Cross Validation
        cv = StratifiedKFold(n_splits = n_splits, random_state = random_state, shuffle = shuffle)

        # Metrics Cross validation
        results, names = [], []

        for name, pipeline in models:

            cv_results = cross_val_score(
                pipeline,
                x_train, 
                y_train,
                cv = cv,
                scoring = scoring

            )
            results.append(cv_results)
            names.append(name)

            print('#' * 30)
            print(f'🎯 {name}: ')
            print(f'🔵 AUC ROC Mean: {cv_results.mean():.6f} ')
            print(f'🟣 STD Metrics: {cv_results.std():.6f}\n')

    except Exception as e:
        print(f'[ERROR] Failed to run ml cross validation {str(e)}')

### Pytorch

In [0]:
class PyTorch:

    ### Init ###
    def __init__(
        self, 
        name: str = 'PyTorch_object'
    ):
        self.name = name

    ### Dataset Pytorch Class ###
    class Dataset(torch.utils.data.Dataset):
        """
        Custom PyTorch Dataset class for handling tabular data with separate indices
        for categorical features, numerical features, and labels.

        This class enables flexible extraction of different feature types from
        a preprocessed dataset (e.g., NumPy array) and returns them as PyTorch tensors,
        which is useful for deep learning models that treat categorical and numerical
        features differently (e.g., when using embeddings).

        Attributes:
            data (array-like): The full dataset (e.g., NumPy array) containing all features and labels.
            cat_idx (list): A list with two elements [start, end] indicating the range of categorical columns.
            num_idx (list): A list with two elements [start, end] indicating the range of numerical columns.
            label_idx (list): A list with two elements [start, end] indicating the range of label columns.

        Methods:
            __len__(): Returns the total number of samples in the dataset.
            __getitem__(idx): Returns a tuple (categorical_data, numerical_data, labels) for a given index.
        """
        # Initializing Attributes
        def __init__(
            self, 
            dataset,
            cat_idx: list = [],
            num_idx: list = [],
            label_idx: list = [],
        ):
            try:

                # Loading data
                self.data = dataset
                self.cat_idx = cat_idx
                self.num_idx = num_idx
                self.label_idx = label_idx

            except Exception as e:
                print(f'[ERROR] Failed to load dataset: {str(e)}.')
        
        # Len function Torch
        def __len__(
            self,
        ):
            try:

                return len(self.data)  
                
            except Exception as e:
                print(f'[ERROR] Failed to len data: {str(e)}.')

        # Get item 
        def __getitem__(
            self,
            idx: int,
        ):
            try:

                # Split of categorical, numerical and label variables
                categorical_data = self.data[idx][self.cat_idx[0] : self.cat_idx[1]]
                numerical_data = self.data[idx][self.num_idx[0] : self.num_idx[1]]
                labels = self.data[idx][self.label_idx[0] :  self.label_idx[1]]

                # Transform to tensors
                categorical_data = torch.from_numpy(categorical_data.astype(np.int64))
                numerical_data = torch.from_numpy(numerical_data.astype(np.float32))
                labels = torch.from_numpy(labels.astype(np.float32))

                return categorical_data, numerical_data, labels
            
            except Exception as e:
                print(f'[ERROR] Failed to find the indices and transform the data into tensors: {str(e)}')

    ### Neural NetWork Class ###
    class Net(nn.Module):
        """
        Neural network for binary classification using both categorical and numerical features.

        This architecture uses embedding layers for categorical features and dense layers 
        for numerical features. Both feature types are combined and passed through a 
        deep neural network with batch normalization, dropout, and LeakyReLU activations.
        """
        # Initializing network parameters and attributes
        def __init__(
            self, 
            l1: int = 256,
            l2: int = 128,
            l3: int = 64,
            dropout_rate: float = 0.5,
            classes_per_cat: list = [2, 4, 7, 6, 4],  # Number of classes per categorical variable
            num_numerical_features: int = 13,
            prior_minoritary_class: float = 0,
            negative_slope: float = 0.01
        ):
            """
            Initializes the network architecture and parameters.

            Args:
                l1 (int): Number of units in the first dense layer for numerical features.
                l2 (int): Number of units in the first combined layer (numerical + categorical).
                l3 (int): Number of units in the second combined layer.
                dropout_rate (float): Dropout rate applied after the first combined layer.
                classes_per_cat (list): List with the number of classes per categorical feature.
                num_numerical_features (int): Number of input numerical features.
                prior_minoritary_class (float): Proportion of the positive (minority) class; used to adjust output layer bias.
                negative_slope (float): Negative slope used in the LeakyReLU activation function.
            """
            try:

                # Running __init__ of the nn.Module class
                super().__init__()
                
                # Embedding dims
                embedding_dims = [int(min(50, np.sqrt(n))) for n in  classes_per_cat]

                # Creating embeddings dynamically
                self.embeddings = nn.ModuleList(
                    [nn.Embedding(num_embeddings, emb_dim) for num_embeddings, emb_dim in zip(classes_per_cat, embedding_dims)]
                )

                # Total dimensions of the embedding layers
                self.total_embedding_dim = sum(embedding_dims)

                # Layer for numeric variables
                self.numerical_layer = nn.Linear(num_numerical_features, l1)
                self.bn_num = nn.BatchNorm1d(l1)
                
                # Combined the layers
                # Layer 1
                self.combined_layer_1 = nn.Linear(self.total_embedding_dim + l1, l2)
                self.bn1 = nn.BatchNorm1d(l2)

                # Layer 2
                self.combined_layer_2 = nn.Linear(l2, l3)
                self.bn2 = nn.BatchNorm1d(l3)

                # Dropout Layer
                self.dropout_layer = nn.Dropout(dropout_rate)

                # Output Layer
                self.output_layer = nn.Linear(l3, 1)
                
                # Activation LeakyReLU
                self.leaky_relu = nn.LeakyReLU(negative_slope)

                # Bias adjustment according to the probability of the negative class
                self.prior_minoritary_class = prior_minoritary_class

                # Initialization weights
                self._init_weights()
            
            except Exception as e:
                print(f'[ERRO] Failed to load network attributes: {str(e)}')
        
        # Initialization Weights
        def _init_weights(
            self,
        ):
            """
            Initializes the weights of the network layers.

            - Dense layers are initialized using Kaiming Normal initialization.
            - Embedding layers are initialized using Xavier Uniform initialization.
            - The output layer bias is set based on the prior probability of the minority class (if provided).
            """
            try:
                # Linear Layers
                for layer in [self.numerical_layer, self.combined_layer_1, self.combined_layer_2]:
                    
                    # Weights Linear Layers
                    if isinstance(layer, nn.Linear):
                        nn.init.kaiming_normal_(layer.weight, mode = 'fan_out', nonlinearity = 'leaky_relu')
                        # Bias Linear Layers
                        if layer.bias is not None:
                            nn.init.zeros_(layer.bias)
                
                # Embbedings
                for embedding in self.embeddings:
                    nn.init.xavier_uniform_(embedding.weight)
                
                # Special adjustment for output layer
                nn.init.xavier_uniform_(self.output_layer.weight)

                # Initializing the output layer bios for imbalanced data 
                # Replacing with the actual proportion of the positive class
                if self.prior_minoritary_class > 0:
                    eps = 1e-6
                    prior_adjusted = max(min(self.prior_minoritary_class, 1 - eps), eps)
                    nn.init.constant_(
                        self.output_layer.bias, 
                        torch.log(torch.tensor(prior_adjusted / (1 - prior_adjusted), dtype = torch.float))
                    )
                else:
                    nn.init.zeros_(self.output_layer.bias)
            
            except Exception as e:
                print(f'[ERROR] Failed to perform weight initialization: {str(e)}.')

        # Forward
        def forward(
            self,
            cat_data, 
            num_data
        ):
            """
            Forward pass of the neural network.

            Args:
                cat_data (torch.Tensor): Tensor of categorical feature indices.
                    Shape: (batch_size, num_categorical_features)
                num_data (torch.Tensor): Tensor of numerical feature values.
                    Shape: (batch_size, num_numerical_features)

            Returns:
                torch.Tensor: Logits output by the network.
                    Shape: (batch_size, 1)
            """
            try:

                # Processing categorical variables with embeddings
                embedded_features = [embedding(cat_data[:, i]) for i, embedding in enumerate(self.embeddings)]
                combined_embeddings = torch.cat(embedded_features, dim = 1)

                # Processing of numerical variables
                numerical_out = self.numerical_layer(num_data)
                numerical_out = self.bn_num(numerical_out)
                numerical_out = self.leaky_relu(numerical_out)

                # Combining embeddings with numerical data
                combined = torch.cat([numerical_out, combined_embeddings], dim = 1)

                # Passage through the neural network
                # combined_layer_1
                x = self.combined_layer_1(combined)
                x = self.bn1(x)
                x = self.leaky_relu(x)
                x = self.dropout_layer(x)

                # combined_layer_2
                x = self.combined_layer_2(x)
                x = self.bn2(x)
                x = self.leaky_relu(x)

                # Logits
                logits = self.output_layer(x)

                return logits
            
            except Exception as e:
                print(f'[ERROR] Failed to execute neural network forward: {str(e)}')
    
    
    ### Focal Loss Class ###
    class FocalLoss(nn.Module):
        """
        Focal Loss for binary classification tasks.

        This loss function addresses class imbalance by down-weighting easy examples and focusing 
        training on hard negatives using a modulating factor.

        Attributes:
            gamma (float): Focusing parameter that reduces the relative loss for well-classified examples.
            reduction (str): Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'.
            alpha (torch.Tensor): Class weights for positive and negative samples.
        """
        # Initializing attributes
        def __init__(
            self,
            alpha: list = [1.0, 1.0],
            gamma: float = 2,
            reduction: str = 'mean'
        ):
            """
            Initializes the FocalLoss module.

            Args:
                alpha (list): Class weighting factors [alpha_negative, alpha_positive].
                gamma (float): Focusing parameter to reduce the impact of easy examples.
                reduction (str): Specifies the reduction to apply to the output:
                    - 'none': no reduction
                    - 'mean': average of the loss
                    - 'sum': sum of the loss
            """
            try:
                # Running __init__ of the FocalLoss class
                super().__init__()
                self.gamma = gamma
                self.reduction = reduction
                self.register_buffer('alpha', torch.tensor(alpha, dtype = torch.float))

            except Exception as e:
                print(f'[ERROR] Failed load FocalLoss atributes: {str(e)}')

        # Foward Focal Loss
        def forward(
            self,
            inputs,
            targets
        ):
            """
            Computes the focal loss between the predictions and targets.

            Args:
                inputs (torch.Tensor): Raw model outputs (logits) of shape (batch_size,).
                targets (torch.Tensor): Ground truth binary labels of shape (batch_size,).

            Returns:
                torch.Tensor: The computed loss. The shape depends on the reduction method:
                    - 'none': returns loss per sample
                    - 'mean': returns scalar average loss
                    - 'sum': returns scalar summed loss
            """
            try:
                # Predicted probabilities  
                probs = torch.sigmoid(inputs)
                probs = torch.clamp(probs, min = 1e-7, max = 1-1e-7)

                # Targets
                targets = targets.float()

                # BCE Loss
                bce_loss = F.binary_cross_entropy(probs, targets, reduction = 'none')
                
                # Focusing factor (1 - p_t)^γ
                focal_factor = (1 - probs).pow(self.gamma) * targets + probs.pow(self.gamma) * (1 - targets)
                
                # Adjust the alpha weight for each class
                alpha_factor = self.alpha[1] * targets + self.alpha[0] * (1 - targets)
                
                # Loss
                loss = alpha_factor * focal_factor * bce_loss

                # Reduction
                if self.reduction == 'mean':
                    return loss.mean()

                elif self.reduction == 'sum':
                    return loss.sum()
                
                elif self.reduction == 'none':
                    return loss
            
            except Exception as e:
                print(f'[ERROR] Failed to execute forward in Focal Loss function: {str(e)}')
    
    ### Early Sopping Class ###
    class EarlyStopping:
        
        """
        Implements early stopping to terminate training when a monitored metric stops improving.

        Attributes:
            patience (int): Number of epochs with no improvement after which training will be stopped.
            min_delta (float): Minimum change in the monitored score to qualify as an improvement.
            mode (str): One of ['min', 'max']. In 'min' mode, training stops when the score increases;
                in 'max' mode, training stops when the score decreases.
            save_path (str or Path): Path to save the best model.
            tempfile_save (bool): Whether to save using a temporary file.
            verbose (bool): If True, prints messages during training.
            best_score (float): Best score observed so far.
            counter (int): Number of epochs since the last improvement.
            early_stop (bool): Whether early stopping was triggered.
            best_epoch (int): Epoch number at which the best score was achieved.
        """
        # Initializing attributes
        def __init__(
            self,
            patience: int = 15,
            min_delta: float = 1e-4,
            mode: str = 'max',
            save_path = None,
            tempfile_save = False,
            verbose: bool = True,
        ):
            """
            Initializes the EarlyStopping object.

            Args:
                patience (int): Number of epochs to wait for improvement before stopping.
                min_delta (float): Minimum score improvement to reset the patience counter.
                mode (str): 'max' for maximizing the metric, 'min' for minimizing.
                save_path (str or Path): File path to save the best model.
                tempfile_save (bool): Use temporary file handling for saving the model.
                verbose (bool): If True, logs progress and stopping messages.
            """
            try:
                self.patience = patience
                self.min_delta = min_delta
                self.mode = mode
                self.save_path = save_path
                self.verbose = verbose
                self.tempfile_save = tempfile_save
                self.best_score = None
                self.counter = 0
                self.early_stop = False
                self.best_epoch = None
            except Exception as e:
                print(f'[ERROR] Failed to initialize attributes of Early Stopping class: {str(e)}.')
        
        # Call
        def __call__(
            self,
            epoch: int,
            score: float, 
            model = None,
        ):  
            """
            Evaluates whether the model has improved and saves the best model if applicable.

            Args:
                epoch (int): Current epoch number.
                score (float): The value of the monitored metric at the current epoch.
                model (torch.nn.Module, optional): The model to save if improvement is detected.
            """
            try:
                # If the model as been improved
                improved = False

                # First score
                if self.best_score is None:
                    improved = True

                # Max Mode
                elif self.mode == 'max' and score > self.best_score + self.min_delta:
                    improved = True
                
                # Min Mode
                elif self.mode == 'min' and score < self.best_score - self.min_delta:
                    improved = True
                
                # Improved Score
                if improved:
                    
                    self.best_score = score
                    self.counter = 0
                    self.early_stop = False
                    self.best_epoch = epoch if epoch is not None else 0

                    # Saving the model and the path
                    if model and self.save_path:
                        
                        # Tempfile
                        if self.tempfile_save:
                            model_to_save = model.module if isinstance(model, nn.DataParallel) else model 
                            torch.save(model_to_save.state_dict(), self.save_path.name)

                        # No Tempfile
                        else:
                            model_to_save = model.module if isinstance(model, nn.DataParallel) else model 
                            torch.save(model_to_save.state_dict(), self.save_path)
                        
                        if self.verbose:
                            print(f'✅ Model Improvement (Epoch: {self.best_epoch}, Score: {self.best_score:.5f})')
                
                # No Improvement
                else:
                    # Counter
                    self.counter += 1
                    if self.verbose:
                        print(f'⏳ EarlyStopping: {self.counter}/{self.patience} no improvement (Current Score: {score:.5f})')
                    
                    # Early Stopping
                    if self.counter >= self.patience:
                        self.early_stop = True
                        if self.verbose:
                            print(f'🛑 Stopping training by early stopping (no improvement after: {self.patience} epochs.)')
                            print(f"✅ Best Model saved in: '{self.save_path}' (Epoch: {self.best_epoch}, Score: {self.best_score:.5f}).")
            
            except Exception as e:
                print(f'[ERROR] Failed to execute Early stopping: {str(e)}')

        # Reset Atributes
        def reset(
            self
        ):
            """
            Resets the internal state of the EarlyStopping instance.
            """
            try:

                self.best_score = None
                self.counter = 0
                self.early_stop = False
                self.best_epoch = 0
            
            except Exception as e:
                print(f'[ERROR] Failed to reset Early Stooping class attributes: {str(e)}.')
    
    ### PyTorch Flow Class ###
    class PyTorchFlow():

        """
        Wrapper class for managing the training pipeline of a PyTorch model,
        including data loading, architecture parameters, training configuration,
        early stopping, and evaluation with k-fold cross-validation.

        Attributes:
            trainset (Dataset): Training dataset.
            testset (Dataset): Test dataset.
            l1 (int): Number of units in the first hidden layer.
            l2 (int): Number of units in the second hidden layer.
            l3 (int): Number of units in the third hidden layer.
            dropout_rate (float): Dropout rate for regularization.
            num_workers (int): Number of subprocesses for data loading.
            batch_size (int): Number of samples per batch.
            lr (float): Learning rate for the optimizer.
            weight_decay (float): Weight decay (L2 penalty) for regularization.
            max_epochs (int): Maximum number of training epochs.
            early_stopping_p (int): Patience for early stopping.
            early_stopping_mode (str): Mode for early stopping, either 'min' or 'max'.
            save_path_model (str): File path to save the best model.
            seed (int): Random seed for reproducibility.
            k_fold (int): Number of folds for cross-validation.
            target_score (str): Metric used to determine model performance (e.g., 'roc').
        """
        # Initialize Atributes
        def __init__(
            self,
            trainset = None, 
            testset = None,
            l1: int = 256,
            l2: int = 128,
            l3: int = 64,
            dropout_rate: float = 0.5,
            num_workers: int = 2,
            batch_size: int = 128,
            lr: float = 1e-3,
            weight_decay: float = 1e-5,
            max_epochs: int = 200,
            early_stopping_p: int = 15,
            early_stopping_mode: str = 'max',
            save_path_model: str = '/best_model.pt',
            seed: int = 33,
            k_fold: int = 5,
            target_score: str = 'roc',
            
        ):
            try:

                self.trainset = trainset
                self.testset = testset
                self.l1 = l1
                self.l2 = l2
                self.l3 = l3
                self.dropout_rate = dropout_rate
                self.num_workers = num_workers
                self.batch_size = batch_size
                self.lr = lr
                self.weight_decay = weight_decay
                self.max_epochs = max_epochs
                self.early_stopping_p = early_stopping_p
                self.early_stopping_mode = early_stopping_mode
                self.save_path_model = save_path_model
                self.seed = seed
                self.k_fold = k_fold
                self.target_score = target_score

            except Exception as e:
                print(f'[ERROR] Failed to load PytorchFlow class attributes {str(e)} .')
        
        ### Device Function ###
        def _device(
            self,
            net,
            device: str = None,
        ):
            """
            Moves the given PyTorch model to the appropriate device (CPU or GPU).
            If multiple GPUs are available, wraps the model using `nn.DataParallel`.

            Args:
                net (nn.Module): The PyTorch model to be moved.
                device (str, optional): Device identifier (e.g., 'cpu', 'cuda:0'). 
                    If None, it will be automatically selected based on availability.

            Returns:
                Tuple[nn.Module, str]: A tuple containing the model moved to the device 
                    and the device identifier string.

            Raises:
                Exception: If the model fails to move to the specified device.
            """
            try:

                # Automatically detect device if not provided
                if device is None:
                    device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
                
                # If more than one GPU is available, apply DataParallel
                if torch.cuda.device_count() > 1 and device.startswith('cuda'):
                    net = nn.DataParallel(net)
                
                # Move the model to the device
                net.to(device)

                return net, device
            except Exception as e:
                print(f'[ERROR] Failed to move network to device: {str(e)}.')

        ### Create Kfolds Function ###
        def _kfolds(
            self
        ):  
            """
            Splits the training dataset into k folds for cross-validation.

            This method creates `k_fold` random subsets (folds) of the training dataset, 
            ensuring reproducibility using a fixed random seed. The final fold may be slightly 
            larger if the dataset size is not perfectly divisible by `k_fold`.

            Returns:
                List[Subset]: A list of PyTorch `Subset` objects representing each fold.

            Raises:
                Exception: If the dataset fails to split into folds.
            """
            try:
                # Spliting folds for cross validation
                fold_size = len(self.trainset) // self.k_fold
                fold_sizes = [fold_size] * (self.k_fold - 1) + [len(self.trainset) - fold_size * (self.k_fold - 1)]
                folds = torch.utils.data.random_split(self.trainset, fold_sizes, generator = torch.Generator().manual_seed(self.seed))
                
                return folds

            except Exception as e:
                print(f'[ERROR] Failed to create k folds of trainset: {str(e)}.')
        
        ### OverSampling Function ###
        def _oversampling(
            self, 
            trainset,
        ):  
            """
            Applies oversampling to balance class distribution in the training dataset.

            This method calculates class weights based on the frequency of each class and creates 
            a `WeightedRandomSampler` to allow oversampling of the minority class during training. 
            It also returns the proportion of the positive class in the dataset.

            Args:
                trainset (Dataset): The training dataset, where each item returns a tuple 
                    (features, ..., label). The label must be the last element and compatible 
                    with `torch.int64`.

            Returns:
                Tuple[WeightedRandomSampler, float]: 
                    - A PyTorch `WeightedRandomSampler` to be used in a DataLoader.
                    - The proportion of the positive class (class 1) in the dataset.

            Raises:
                Exception: If there is an error while computing class weights or creating the sampler.
            """
            try:
                
                # Compute class weights for imbalance handling
                eps = 1e-6
                all_class = torch.cat([labels for _, _, labels in trainset]).to(torch.int64)
                class_counts = torch.bincount(all_class)
                total_samples = len(all_class)
                num_classes = len(class_counts)
                class_weights = total_samples / (class_counts.float() + eps)

                # Create Weighted Sampler
                sample_weights = class_weights[all_class]
                sampler = torch.utils.data.WeightedRandomSampler(
                    weights = sample_weights, 
                    num_samples = len(sample_weights), 
                    replacement = True
                )
                # Calculate minority class proportion
                prop_class_positive = class_counts[1].item() / total_samples

                return sampler, prop_class_positive
            
            except Exception as e:
                print(f'[ERROR] Failed to create sampler and calculate class distributions: {str(e)}.')
        
        ### Plot Metrics Function ###
        def _plot_metrics(
            self,
            avg_loss_t: list = None,
            avg_loss_v: list = None,
            avg_accuracy_t: list = None,
            avg_accuracy_v: list = None,
            avg_precision_t: list = None, 
            avg_precision_v: list = None,
            avg_npv_t: list = None, 
            avg_npv_v: list = None,
            avg_recall_t: list = None, 
            avg_recall_v: list = None,
            avg_auc_t: list = None, 
            avg_auc_v: list = None
        ):
            """
            Plots training and validation metrics over epochs for model convergence visualization.

            Args:
                avg_loss_t (list, optional): List of average training loss values per epoch.
                avg_loss_v (list, optional): List of average validation loss values per epoch.
                avg_accuracy_t (list, optional): List of average training accuracy values per epoch.
                avg_accuracy_v (list, optional): List of average validation accuracy values per epoch.
                avg_precision_t (list, optional): List of average training precision values per epoch.
                avg_precision_v (list, optional): List of average validation precision values per epoch.
                avg_npv_t (list, optional): List of average training negative predictive value per epoch.
                avg_npv_v (list, optional): List of average validation negative predictive value per epoch.
                avg_recall_t (list, optional): List of average training recall values per epoch.
                avg_recall_v (list, optional): List of average validation recall values per epoch.
                avg_auc_t (list, optional): List of average training AUC-ROC values per epoch.
                avg_auc_v (list, optional): List of average validation AUC-ROC values per epoch.

            Raises:
                Exception: If an error occurs during plotting.

            """
            try:
                # List of metrics
                metrics, train_metrics, val_metrics = [], [], []

                # Dynamically grouping metrics
                if avg_loss_t and avg_loss_v:
                    metrics.append('Loss')
                    train_metrics.append(avg_loss_t)
                    val_metrics.append(avg_loss_v)

                if avg_accuracy_t and avg_accuracy_v:
                    metrics.append('Accuracy')
                    train_metrics.append(avg_accuracy_t)
                    val_metrics.append(avg_accuracy_v)

                if avg_precision_t and avg_precision_v :
                    metrics.append('Precision')
                    train_metrics.append(avg_precision_t)
                    val_metrics.append(avg_precision_v)

                if avg_npv_t and avg_npv_v:
                    metrics.append('NPV')
                    train_metrics.append(avg_npv_t)
                    val_metrics.append(avg_npv_v)

                if avg_recall_t and avg_recall_v:
                    metrics.append('Recall')
                    train_metrics.append(avg_recall_t)
                    val_metrics.append(avg_recall_v)

                if avg_auc_t and avg_auc_v:
                    metrics.append('AUC-ROC')
                    train_metrics.append(avg_auc_t)
                    val_metrics.append(avg_auc_v)

                # Total number of metrics
                num_metrics = len(train_metrics)

                num_cols = 3
                num_rows = (num_metrics + num_cols - 1) // num_cols

                plt.rc('font', size = 10)
                fig, axes = plt.subplots(num_rows, num_cols, figsize = (6 * num_cols, 4 * num_rows))
                axes = axes.flatten() if num_metrics > 1 else [axes]

                for i in range(num_metrics):
                    ax = axes[i]
                    ax.plot(train_metrics[i], label = 'Training')
                    ax.plot(val_metrics[i], label = 'Validation')
                    ax.set_title(f'Model Convergence - {metrics[i]}')
                    ax.set_xlabel('Epochs')
                    ax.set_ylabel(metrics[i])
                    ax.legend()
                    ax.grid(True, alpha = 0.6, linestyle = 'dotted')

                # Remove extra subplots, if any
                for j in range(num_metrics, len(axes)):
                    fig.delaxes(axes[j])

                plt.tight_layout()
                plt.show()
            
            except Exception as e:
                print(f'[ERROR] Failed to generate graphs of metrics during epochs: {str(e)}.')

        ### Confusion Matrix Function ###
        def _confusion_matrix(
            self,
            preds,
            labels,
            title: str = 'Confusion Matrix',
        ):
            """
            Plots the confusion matrix for binary classification predictions.

            Args:
                preds (Tensor or array-like): Predicted labels or probabilities from the model.
                labels (Tensor or array-like): True labels corresponding to the predictions.
                title (str, optional): Title for the confusion matrix plot. Default is 'Confusion Matrix'.

            Raises:
                Exception: If an error occurs during the plotting process.

            """
            try:

                # Confusion Matrix
                plt.rc('font', size = 10)
                fig, ax= plt.subplots(figsize = (8, 4))
                ax.grid(False)
                
                metric = BinaryConfusionMatrix()
                metric(preds, labels)
                fig_, ax_ = metric.plot(cmap = 'viridis', ax = ax)
                ax_.set_title(title)
                plt.show()
    
            except Exception as e:
                print(f'[ERROR] failed to plot confusion matrix: {str(e)}.')

        ### Flow Cross Validation Function ###
        def CrossValidation(
            self,
        ):
            """
            Perform k-fold cross-validation training and evaluation of the PyTorch model.

            This method performs the following steps for each fold:
            - Splits the training data into training and validation sets.
            - Applies oversampling to handle class imbalance.
            - Creates data loaders for training and validation.
            - Initializes the neural network, loss function, optimizer, and learning rate scheduler.
            - Trains the model for a maximum number of epochs or until early stopping triggers.
            - Evaluates the model on validation data after training.
            - Collects and prints various metrics (Loss, Accuracy, Precision, NPV, Recall, AUC-ROC).
            - Plots training and validation metric curves and the confusion matrix.
            - Aggregates metrics across folds and prints mean and standard deviation.

            Attributes used:
                self.trainset: Dataset used for training and cross-validation splits.
                self.k_fold (int): Number of folds for cross-validation.
                self.batch_size (int): Batch size for data loaders.
                self.num_workers (int): Number of workers for data loading.
                self.l1, self.l2, self.l3 (int): Neural network layer sizes.
                self.dropout_rate (float): Dropout rate in the network.
                self.lr (float): Learning rate for optimizer.
                self.weight_decay (float): Weight decay for optimizer.
                self.max_epochs (int): Maximum number of epochs for training.
                self.early_stopping_p (int): Patience parameter for early stopping.
                self.early_stopping_mode (str): Mode ('max' or 'min') for early stopping.
                self.save_path_model (str): File path to save the best model.
                self.target_score (str): Metric used for early stopping ('roc', 'accuracy', etc.).

            Raises:
                Exception: Prints error message if any exception occurs during the cross-validation process.

            """
            try:

                folds = self._kfolds()
                # Defining list to store fold metrics
                accuracy_kfolds, recall_kfolds, auc_kfolds, loss_kfolds, precision_kfolds, npv_kfolds = [], [], [], [], [], []

                # Initialize Metrics for binary classification

                # Training
                accuracy_train = BinaryAccuracy()
                recall_train = BinaryRecall()
                auc_train = BinaryAUROC(thresholds = None)
                precision_train = BinaryPrecision()
                npv_train = BinaryNegativePredictiveValue()

                # Validation
                accuracy_val = BinaryAccuracy()
                recall_val = BinaryRecall()
                auc_val = BinaryAUROC(thresholds = None)
                precision_val = BinaryPrecision()
                npv_val = BinaryNegativePredictiveValue()

                # Score target
                if self.target_score == 'acuracy':
                    target_score = BinaryAccuracy()
                
                elif self.target_score == 'recall':
                    target_score = BinaryRecall()
                
                elif self.target_score == 'roc':
                    target_score = BinaryAUROC(thresholds = None)
                
                elif self.target_score == 'precision':
                    target_score = BinaryPrecision()

                elif self.target_score == 'npv':
                    target_score = BinaryNegativePredictiveValue()

                for i in tqdm(range(self.k_fold), desc = '\nCross - Validation Progress', leave = False):

                    # Separating training and validation data
                    # Fold for validation
                    val_set = folds[i] 

                    # All training folds except validation fold
                    train_sets = [folds[j] for j in range(self.k_fold) if j != i]
                    train_set = torch.utils.data.ConcatDataset(train_sets)

                    sampler, prop_class_positive =  self._oversampling(trainset = train_set)
                    
                    # Train Loader
                    trainloader = torch.utils.data.DataLoader(
                        train_set, 
                        batch_size = self.batch_size, 
                        sampler = sampler, 
                        num_workers = self.num_workers,
                        drop_last = True,
                    )

                    # Val Loader
                    valloader = torch.utils.data.DataLoader(
                        val_set, 
                        batch_size = self.batch_size, 
                        shuffle = False, 
                        num_workers = self.num_workers, 
                        drop_last = False,
                    )
                    

            
                    # Loading Net
                    net = PyTorch.Net(
                        l1 = self.l1,
                        l2 = self.l2, 
                        l3 = self.l3, 
                        dropout_rate = self.dropout_rate,
                        prior_minoritary_class = prop_class_positive,  
                    )
                    # Moving the network to the device
                    net, device = self._device(net)

                    # Criterion
                    criterion = PyTorch().FocalLoss().to(device)

                    # Optimizer
                    optimizer = optim.AdamW(net.parameters(), lr = self.lr, weight_decay = self.weight_decay) 

                    # Scheduler
                    # Warmup (linear from 1e-5 to 0.001)
                    warmup_scheduler = torch.optim.lr_scheduler.LinearLR(
                        optimizer,
                        start_factor = 0.01,
                        total_iters = 10,
                    )
                    # Cosine Annealing after warmup
                    cosine_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
                        optimizer,
                        T_max = self.max_epochs - 10,
                        eta_min = 1e-6,
                    )
                    # Composition: 10 warmup epochs + (max_epochs cosine - 10 warmup)
                    scheduler = torch.optim.lr_scheduler.SequentialLR(
                        optimizer, 
                        schedulers = [warmup_scheduler, cosine_scheduler],
                        milestones = [10],
                    )

                    # Adjusting error caused by scheduler
                    warnings.filterwarnings('ignore', category = UserWarning)

                    # Info Cross-Validation
                    print(f'\n\nTraining fold sample set:')
                    print(f'################# [ K-Fold {i+1} ] #################')
                    
                    # Early Stopping
                    early_stopping = PyTorch().EarlyStopping(
                        patience = self.early_stopping_p, 
                        mode = self.early_stopping_mode, 
                        save_path = self.save_path_model
                    )
                    
                    # Metrics for epochs
                    avg_loss_t, avg_accuracy_t, avg_recall_t, avg_auc_t, avg_precision_t, avg_npv_t = [], [], [], [], [], []
                    avg_loss_v, avg_accuracy_v, avg_recall_v, avg_auc_v, avg_precision_v, avg_npv_v = [], [], [], [], [], []
                    # Epochs
                    for epoch in range(0, self.max_epochs):

                    # Checking the learning rate according to the epochs:
                    #print(f'\nEpoch {epoch + 1} -- Kfold: {i + 1}\n--------------------------------------------------------------')
                    #print(f"Epoch {epoch + 1} -- Current Learning Rate: {optimizer.param_groups[0]['lr']:.6f}")

                        # Metrics Training
                        train_loss = 0.0 
                        train_steps = 0 
                        accuracy_train.reset()
                        recall_train.reset()
                        auc_train.reset()
                        precision_train.reset()
                        npv_train.reset()

                        # Metrics Validation
                        val_loss = 0.0
                        val_steps = 0
                        accuracy_val.reset()
                        recall_val.reset()
                        auc_val.reset()
                        precision_val.reset()
                        npv_val.reset()

                        # Target Score
                        target_score.reset()

                        # Save preds and labels
                        preds_val, labels_val = [], []
                        
                        # Training
                        net.train()

                        for cat_input, num_input, labels in (trainloader):
                
                            # Inputs + Labels to(device)    
                            cat_input, num_input, labels = cat_input.to(device), num_input.to(device), labels.to(device)
                            
                            # Zero the parameter gradients
                            optimizer.zero_grad()

                            # Foward Pass
                            outputs = net(cat_input, num_input)
                            loss = criterion(outputs, labels)

                            # Backward + optimize
                            loss.backward()
                            # Optimizer
                            optimizer.step()

                            # Accumulating Loss
                            train_loss += loss.item()
                            train_steps += 1

                            # Updating metrics
                            accuracy_train.update(torch.sigmoid(outputs), labels.int())
                            recall_train.update(torch.sigmoid(outputs), labels.int())
                            auc_train.update(torch.sigmoid(outputs), labels.int())
                            precision_train.update(torch.sigmoid(outputs), labels.int())
                            npv_train.update(torch.sigmoid(outputs), labels.int())

                        # Evaluation
                        net.eval()

                        # Disabling gradient calculations
                        with torch.no_grad():
                            # Get the inputs; data is a list of [inputs, labels]
                            for cat_input, num_input, labels in (valloader):
                                    
                                    # Inputs + Labels to(device)
                                    cat_input, num_input, labels = cat_input.to(device), num_input.to(device), labels.to(device)
                                    
                                    # Eval net
                                    outputs = net(cat_input, num_input)
                                    loss = criterion(outputs, labels)
                                    
                                    # Accumulating Loss
                                    val_loss += loss.item()
                                    val_steps += 1

                                    # Updating metrics
                                    accuracy_val.update(torch.sigmoid(outputs), labels.int())
                                    recall_val.update(torch.sigmoid(outputs), labels.int())
                                    auc_val.update(torch.sigmoid(outputs), labels.int())
                                    precision_val.update(torch.sigmoid(outputs), labels.int())
                                    npv_val.update(torch.sigmoid(outputs), labels.int())
                                    
                                    # Target Score
                                    target_score.update(torch.sigmoid(outputs), labels.int())

                        # Saving metrics by epoch
                        # Train
                        avg_accuracy_t.append(accuracy_train.compute().item())
                        avg_recall_t.append(recall_train.compute().item())
                        avg_auc_t.append(auc_train.compute().item())
                        avg_precision_t.append(precision_train.compute().item())
                        avg_npv_t.append(npv_train.compute().item())
                        avg_loss_t.append(train_loss / train_steps)
                        # Validation
                        avg_accuracy_v.append(accuracy_val.compute().item())
                        avg_recall_v.append(recall_val.compute().item())
                        avg_auc_v.append(auc_val.compute().item())
                        avg_precision_v.append(precision_val.compute().item())
                        avg_npv_v.append(npv_val.compute().item())
                        avg_loss_v.append(val_loss / val_steps)
                                    
                        # Scheduler step
                        scheduler.step()

                        # Early_stopping step
                        early_stopping(score = target_score.compute().item(), model = net, epoch = epoch)
                        # Stopping 
                        if early_stopping.early_stop:
                            print(f'>>>>>>> Finished Training K-Fold {i+1}.')
                        
                            # Final Validation Score Model
                            # Load the Best Model
                            net.load_state_dict(torch.load(self.save_path_model))
                            

                            # Metrics Validation
                            val_loss = 0.0
                            val_steps = 0
                            accuracy_val.reset()
                            recall_val.reset()
                            auc_val.reset()
                            precision_val.reset()
                            npv_val.reset()
                            
                            # Evaluation
                            net.eval()
                            # Disabling gradient calculations
                            with torch.no_grad():
                                # Get the inputs; data is a list of [inputs, labels]
                                for cat_input, num_input, labels in (valloader):
                                        
                                        # Inputs + Labels to(device)
                                        cat_input, num_input, labels = cat_input.to(device), num_input.to(device), labels.to(device)
                                        
                                        # Eval net
                                        outputs = net(cat_input, num_input)
                                        loss = criterion(outputs, labels)
                                        
                                        # Accumulating Loss
                                        val_loss += loss.item()
                                        val_steps += 1

                                        # Updating metrics
                                        accuracy_val.update(torch.sigmoid(outputs), labels.int())
                                        recall_val.update(torch.sigmoid(outputs), labels.int())
                                        auc_val.update(torch.sigmoid(outputs), labels.int())
                                        precision_val.update(torch.sigmoid(outputs), labels.int())
                                        npv_val.update(torch.sigmoid(outputs), labels.int())
                                        
                                        # Accumulating Predictions
                                        preds_val.append(torch.sigmoid(outputs).detach().cpu())
                                        labels_val.append(labels.detach().cpu())
                        
                            # Concatenates all batches
                            preds_val = torch.cat(preds_val).float()
                            labels_val = torch.cat(labels_val).long()

                            break
                    
                    # Metrics out
                    print('\nTrain Metrics:') 
                    print(f'Loss: {train_loss / train_steps:.3f}')   
                    print(f'Accuracy: {accuracy_train.compute().item() * 100:> 0.1f}%') 
                    print(f'Precision: {precision_train.compute().item() * 100:> 0.1f}%')
                    print(f'NPV: {npv_train.compute().item() * 100:> 0.1f}%')
                    print(f'Recall: {recall_train.compute().item() *100:> 0.1f}%') 
                    print(f'AUC-ROC: {auc_train.compute().item() *100:> 0.1f}%') 

                    print('\nValidation Metrics:')
                    print(f'Loss: {val_loss / val_steps:.3f}')
                    print(f'Accuracy: {accuracy_val.compute().item() * 100:> 0.1f}%')
                    print(f'Precision: {precision_val.compute().item() * 100:> 0.1f}%')
                    print(f'NPV: {npv_val.compute().item() * 100:> 0.1f}%')
                    print(f'Recall: {recall_val.compute().item() * 100:> 0.1f}%')
                    print(f'AUC-ROC: {auc_val.compute().item() * 100:> 0.1f}%')
                    
                    # Graphics for Training and Validation
            
                    self._plot_metrics(
                        avg_loss_t = avg_loss_t, 
                        avg_loss_v = avg_loss_v,
                        avg_accuracy_t = avg_accuracy_t, 
                        avg_accuracy_v = avg_accuracy_v,
                        avg_precision_t = avg_precision_t, 
                        avg_precision_v = avg_precision_v,
                        avg_npv_t = avg_npv_t, 
                        avg_npv_v = avg_npv_v,
                        avg_recall_t = avg_recall_t, 
                        avg_recall_v = avg_recall_v,
                        avg_auc_t = avg_auc_t, 
                        avg_auc_v = avg_auc_v
                    )

                    self._confusion_matrix(
                        preds = preds_val,
                        labels = labels_val,
                        title = f'Confusion Matrix is K-Fold: {i + 1}'
                    )

                    # Time sleep
                    time.sleep(5)

                    # Calculating metrics per fold
                    accuracy_kfolds.append(accuracy_val.compute().item())
                    recall_kfolds.append(recall_val.compute().item())
                    auc_kfolds.append(auc_val.compute().item())
                    precision_kfolds.append(precision_val.compute().item())
                    npv_kfolds.append(npv_val.compute().item())
                    loss_kfolds.append(val_loss / val_steps)

                print('\n\n✅### Cross validation Metrics ### :')

                print(f'\n🔴 Loss: {np.mean(loss_kfolds):.3f}') 
                print(f'☑️ Standard Deviation - Loss: {np.std(loss_kfolds):.6f}')

                print(f'\n🟠 Accuracy: {(np.mean(accuracy_kfolds) * 100):.2f}%')
                print(f'☑️ Standard Deviation - Accuracy: {np.std(accuracy_kfolds):.6f}')

                print(f'\n🔵 Precision: {(np.mean(precision_kfolds) * 100):.2f}%')
                print(f'☑️ Standard Deviation - Precision: {np.std(precision_kfolds):.6f}')

                print(f'\n🔵 NPV: {(np.mean(npv_kfolds) * 100):.2f}%')
                print(f'☑️ Standard Deviation - NPV: {np.std(npv_kfolds):.6f}')

                print(f'\n⚠️ Recall: {(np.mean(recall_kfolds) * 100):.2f}%')
                print(f'☑️ Standard Deviation - Recall: {np.std(recall_kfolds):.6f}')

                print(f'\n🎯 AUC-ROC: {(np.mean(auc_kfolds) * 100):.2f}%')
                print(f'☑️ Standard Deviation - AUC-ROC: {np.std(auc_kfolds):.6f}')

            except Exception as e:
                print(f'[ERROR] Cross validation flow execution failed: {str(e)}.')

        ### Flow Hyper Tunning ###
        ### Cross Tunning Function ###
        def _cross_tunning(
            self,
            config,
        ):
            """
            Perform hyperparameter tuning with k-fold cross-validation for a PyTorch model.

            This method integrates k-fold cross-validation with oversampling, early stopping, 
            and learning rate scheduling to evaluate model performance across different 
            hyperparameter configurations. It is designed to work with Ray Tune for distributed 
            hyperparameter optimization.

            The process includes:
                - Splitting the dataset into k folds.
                - Using each fold once for validation and the remaining folds for training.
                - Applying oversampling to balance class distribution in training data.
                - Initializing the neural network architecture from the given configuration.
                - Setting up the optimizer, loss function, and composite learning rate scheduler.
                - Training with early stopping to avoid overfitting.
                - Evaluating performance metrics on the validation set.
                - Aggregating results and reporting them to Ray Tune along with a model checkpoint.

            Args:
                config (dict):
                    A dictionary containing hyperparameters for the training process. Keys include:
                        - 'batch_size' (int): Batch size for training and validation.
                        - 'l1', 'l2', 'l3' (int): Number of units in each layer of the network.
                        - 'lr' (float): Learning rate for the optimizer.
                        - 'weight_decay' (float): Weight decay (L2 regularization) for the optimizer.

            Attributes Used:
                self.trainset: The training dataset for k-fold splitting.
                self.k_fold (int): Number of folds for cross-validation.
                self.num_workers (int): Number of workers for data loading.
                self.dropout_rate (float): Dropout probability for the network.
                self.target_score (str): Metric used for early stopping ('accuracy', 'recall', 'roc', 'precision', or 'npv').
                self.max_epochs (int): Maximum number of training epochs.
                self.early_stopping_p (int): Patience value for early stopping.
                self.early_stopping_mode (str): Mode for early stopping ('max' or 'min').

            Metrics Computed per Fold:
                - Accuracy
                - Recall
                - AUC-ROC
                - Precision
                - Negative Predictive Value (NPV)
                - Loss

            Ray Tune Reports:
                - Mean loss across folds
                - Mean accuracy, precision, NPV, recall, and AUC-ROC
                - Standard deviation of AUC-ROC
                - Checkpoint containing model and optimizer states

            Raises:
                Exception: Logs an error message if any step in the tuning process fails.
            """
            try:

                folds = self._kfolds()
                # Defining list to store fold metrics
                accuracy_kfolds, recall_kfolds, auc_kfolds, loss_kfolds, precision_kfolds, npv_kfolds = [], [], [], [], [], []

                # Initialize Metrics for binary classification

                # Validation
                accuracy_val = BinaryAccuracy()
                recall_val = BinaryRecall()
                auc_val = BinaryAUROC(thresholds = None)
                precision_val = BinaryPrecision()
                npv_val = BinaryNegativePredictiveValue()

                # Score target
                if self.target_score == 'acuracy':
                    target_score = BinaryAccuracy()
                
                elif self.target_score == 'recall':
                    target_score = BinaryRecall()
                
                elif self.target_score == 'roc':
                    target_score = BinaryAUROC(thresholds = None)
                
                elif self.target_score == 'precision':
                    target_score = BinaryPrecision()

                elif self.target_score == 'npv':
                    target_score = BinaryNegativePredictiveValue()

                for i in range(self.k_fold):

                    # Separating training and validation data
                    # Fold for validation
                    val_set = folds[i] 

                    # All training folds except validation fold
                    train_sets = [folds[j] for j in range(self.k_fold) if j != i]
                    train_set = torch.utils.data.ConcatDataset(train_sets)

                    sampler, prop_class_positive =  self._oversampling(trainset = train_set)
                    
                    # Train Loader
                    trainloader = torch.utils.data.DataLoader(
                        train_set, 
                        batch_size = int(config['batch_size']), 
                        sampler = sampler, 
                        num_workers = self.num_workers,
                        drop_last = True,
                    )

                    # Val Loader
                    valloader = torch.utils.data.DataLoader(
                        val_set, 
                        batch_size = int(config['batch_size']), 
                        shuffle = False, 
                        num_workers = self.num_workers, 
                        drop_last = False,
                    )
                    

            
                    # Loading Net
                    net = PyTorch.Net(
                        l1 = config['l1'],
                        l2 = config['l2'], 
                        l3 = config['l3'], 
                        dropout_rate = self.dropout_rate,
                        prior_minoritary_class = prop_class_positive,  
                    )
                    # Moving the network to the device
                    net, device = self._device(net)

                    # Criterion
                    criterion = PyTorch().FocalLoss().to(device)

                    # Optimizer
                    optimizer = optim.AdamW(net.parameters(), lr = config['lr'], weight_decay = config['weight_decay']) 

                    # Scheduler
                    # Warmup (linear from 1e-5 to 0.001)
                    warmup_scheduler = torch.optim.lr_scheduler.LinearLR(
                        optimizer,
                        start_factor = 0.01,
                        total_iters = 10,
                    )
                    # Cosine Annealing after warmup
                    cosine_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
                        optimizer,
                        T_max = self.max_epochs - 10,
                        eta_min = 1e-6,
                    )
                    # Composition: 10 warmup epochs + (max_epochs cosine - 10 warmup)
                    scheduler = torch.optim.lr_scheduler.SequentialLR(
                        optimizer, 
                        schedulers = [warmup_scheduler, cosine_scheduler],
                        milestones = [10],
                    )

                    # Adjusting error caused by scheduler
                    warnings.filterwarnings('ignore', category = UserWarning)

                    # Saving the model temporarily with early stopping
                    with tempfile.NamedTemporaryFile(delete = False) as temp_model_file:  
                        
                        # Early Stopping
                        early_stopping = PyTorch().EarlyStopping(
                            patience = self.early_stopping_p, 
                            mode = self.early_stopping_mode, 
                            save_path = temp_model_file,
                            tempfile_save = True,
                            verbose = False
                        )
                        
                        # Epochs
                        for epoch in range(0, self.max_epochs):

                            # Target Score
                            target_score.reset()
                            
                            # Training
                            net.train()

                            for cat_input, num_input, labels in (trainloader):
                    
                                # Inputs + Labels to(device)    
                                cat_input, num_input, labels = cat_input.to(device), num_input.to(device), labels.to(device)
                                
                                # Zero the parameter gradients
                                optimizer.zero_grad()

                                # Foward Pass
                                outputs = net(cat_input, num_input)
                                loss = criterion(outputs, labels)

                                # Backward + optimize
                                loss.backward()
                                # Optimizer
                                optimizer.step()

                            # Evaluation
                            net.eval()

                            # Disabling gradient calculations
                            with torch.no_grad():
                                # Get the inputs; data is a list of [inputs, labels]
                                for cat_input, num_input, labels in (valloader):
                                        
                                        # Inputs + Labels to(device)
                                        cat_input, num_input, labels = cat_input.to(device), num_input.to(device), labels.to(device)
                                        
                                        # Eval net
                                        outputs = net(cat_input, num_input)
                                        loss = criterion(outputs, labels)
                                        
                                        # Target Score
                                        target_score.update(torch.sigmoid(outputs), labels.int())

                            # Scheduler step
                            scheduler.step()

                            # Early_stopping step
                            early_stopping(score = target_score.compute().item(), model = net, epoch = epoch)
                            # Stopping 
                            if early_stopping.early_stop:
                    
                                # Final Validation Score Model
                                # Load the Best Model
                                net.load_state_dict(torch.load(temp_model_file.name))
                                os.remove(temp_model_file.name)

                                # Metrics Validation
                                val_loss = 0.0
                                val_steps = 0
                                accuracy_val.reset()
                                recall_val.reset()
                                auc_val.reset()
                                precision_val.reset()
                                npv_val.reset()
                                
                                # Evaluation
                                net.eval()
                                # Disabling gradient calculations
                                with torch.no_grad():
                                    # Get the inputs; data is a list of [inputs, labels]
                                    for cat_input, num_input, labels in (valloader):
                                            
                                            # Inputs + Labels to(device)
                                            cat_input, num_input, labels = cat_input.to(device), num_input.to(device), labels.to(device)
                                            
                                            # Eval net
                                            outputs = net(cat_input, num_input)
                                            loss = criterion(outputs, labels)
                                            
                                            # Accumulating Loss
                                            val_loss += loss.item()
                                            val_steps += 1

                                            # Updating metrics
                                            accuracy_val.update(torch.sigmoid(outputs), labels.int())
                                            recall_val.update(torch.sigmoid(outputs), labels.int())
                                            auc_val.update(torch.sigmoid(outputs), labels.int())
                                            precision_val.update(torch.sigmoid(outputs), labels.int())
                                            npv_val.update(torch.sigmoid(outputs), labels.int())

                                break
                            
                    # Calculating metrics per fold
                    accuracy_kfolds.append(accuracy_val.compute().item())
                    recall_kfolds.append(recall_val.compute().item())
                    auc_kfolds.append(auc_val.compute().item())
                    precision_kfolds.append(precision_val.compute().item())
                    npv_kfolds.append(npv_val.compute().item())
                    loss_kfolds.append(val_loss / val_steps)
                
                # Checkpoint
                checkpoint_data = {
                    'net_state_dict': net.state_dict(),
                    'optimizer_state_dict': optimizer.state_dict(),
                }
                with tempfile.TemporaryDirectory() as checkpoint_dir:
                    data_path = Path(checkpoint_dir) / 'data.pkl'
                    with open(data_path, 'wb') as fp:
                        pickle.dump(checkpoint_data, fp)
                        checkpoint = Checkpoint.from_directory(checkpoint_dir)
                        
                        # Report for Ray
                        tune.report(
                            {
                                'loss_mean': np.mean(loss_kfolds),
                                'accuracy_mean': np.mean(accuracy_kfolds),
                                'precision_mean': np.mean(precision_kfolds),
                                'npv_mean': np.mean(npv_kfolds),
                                'recall_mean': np.mean(recall_kfolds),
                                'auc_roc_mean': np.mean(auc_kfolds),
                                'auc_roc_std': np.std(auc_kfolds),
                            },
                            checkpoint = checkpoint
                        )

                print('✅Finished Training')

            except Exception as e:
                print(f'[ERROR] Failed to execute cross adjustment flow: {str(e)}.')

        ### HyperTunning Function ###
        def HyperTunning(
            self,
            n_samples: int = 10,
        ):
            """
            Run hyperparameter search using Ray Tune and Optuna.

            This method configures and executes a hyperparameter tuning experiment for 
            the neural network model using Ray Tune's `Tuner` class. It searches across 
            a predefined parameter space, evaluating configurations with k-fold cross-validation 
            implemented in the `_cross_tunning` method.

            The tuning process uses `OptunaSearch` as the search algorithm and optimizes 
            for the highest mean AUC-ROC score across folds.

            Args:
                n_samples (int, optional):
                    Number of hyperparameter configurations to evaluate. 
                    Defaults to 10.

            Parameter Space:
                - l1 (int): Number of units in the first hidden layer. Choices: powers of 2 from 1 to 256.
                - l2 (int): Number of units in the second hidden layer. Choices: powers of 2 from 1 to 256.
                - l3 (int): Number of units in the third hidden layer. Choices: powers of 2 from 1 to 256.
                - lr (float): Learning rate for the optimizer. Log-uniform range: [1e-3, 1e-2].
                - batch_size (int): Batch size for training and validation. Choices: [128, 256, 512].
                - weight_decay (float): Weight decay (L2 regularization). Choices: [5e-4, 1e-5].

            Tuning Configuration:
                - Metric: 'auc_roc_mean'
                - Mode: 'max'
                - Search Algorithm: OptunaSearch
                - Number of Samples: n_samples
                - Experiment Name: 'Cross Hyper Tunning'

            Output:
                Prints the configuration and metrics of the best trial, including:
                    - Loss mean
                    - Accuracy mean
                    - Precision mean
                    - Negative Predictive Value (NPV) mean
                    - Recall mean
                    - AUC-ROC mean
                    - Standard deviation of AUC-ROC

            Raises:
                Exception: If the tuning process fails, an error message is printed.
            """
            try:

                tunner = Tuner(
                    self._cross_tunning,
                    param_space = {
                        'l1': tune.choice([2 ** i for i in range(9)]),
                        'l2': tune.choice([2 ** i for i in range(9)]),
                        'l3': tune.choice([2 ** i for i in range(9)]),
                        'lr': tune.loguniform(1e-3, 1e-2),
                        'batch_size': tune.choice([128, 256, 512]),
                        'weight_decay': tune.choice([5e-4, 1e-5]),
                    },

                    tune_config = TuneConfig(
                        metric = 'auc_roc_mean',
                        mode = 'max', 
                        num_samples = n_samples,
                        search_alg = OptunaSearch(),
                    ),
                    run_config = RunConfig(name = 'Cross Hyper Tunning')
                )

                results = tunner.fit()

                # Printing results
                best_result = results.get_best_result(metric= 'auc_roc_mean', mode = 'max')
                print(f'\n✅ Best trial config: \n{best_result.config}')
                print(f'\n✅ Best Trial Final Validation Metrics:')
                print(f"🔴 Loss: {best_result.metrics['loss_mean']}")
                print(f"🟠 Accuracy: {best_result.metrics['accuracy_mean']}")
                print(f"🔵 Precision: {best_result.metrics['precision_mean']}")
                print(f"🔵 NPV: {best_result.metrics['npv_mean']}")
                print(f"⚠️ Recall: {best_result.metrics['recall_mean']}")
                print(f"🎯 AUC-ROC: {best_result.metrics['auc_roc_mean']}")
                print(f"☑️ Standard Deviation AUC-ROC: {best_result.metrics['auc_roc_std']}")
            
            except Exception as e:
                print(f'[ERROR] Failed to run hypertuning tests in config space: {str(e)}.')

        ### Final Training Function ###
        def FinalTraining(
            self,
        ):
            """
            Executes the final training routine for the neural network, including 
            data preparation, training, validation, early stopping, and metric 
            reporting.

            This method:
            - Splits the dataset into training and validation sets.
            - Performs oversampling to handle class imbalance.
            - Initializes the model, loss function, optimizer, and learning rate schedulers.
            - Trains the model with early stopping based on a selected target score.
            - Evaluates the model on a validation set.
            - Plots metrics and the final confusion matrix.

            The training loop collects accuracy, precision, recall, NPV, AUC, and loss
            for both training and validation sets across epochs. Learning rate scheduling 
            uses a warm-up phase followed by cosine annealing.

            Args:
                self: 
                    An instance of the training class containing:
                        - trainset (Dataset): The dataset used for training and validation split.
                        - target_score (str): The metric used for early stopping ('accuracy', 'recall', 'roc', 'precision', or 'npv').
                        - batch_size (int): Batch size for training and validation data loaders.
                        - num_workers (int): Number of worker threads for data loading.
                        - seed (int): Random seed for reproducibility in splitting.
                        - l1, l2, l3 (int): Sizes of the hidden layers in the network.
                        - dropout_rate (float): Dropout probability for regularization.
                        - lr (float): Initial learning rate for the optimizer.
                        - weight_decay (float): Weight decay (L2 regularization) for the optimizer.
                        - max_epochs (int): Maximum number of training epochs.
                        - early_stopping_p (int): Patience for early stopping.
                        - early_stopping_mode (str): Mode for early stopping ('min' or 'max').
                        - save_path_model (str): File path to save the best model weights.

            Raises:
                Exception: If any part of the training process fails, an error message 
                will be printed containing the exception details.

            Prints:
                - Class imbalance ratio in the training set.
                - Final metrics for training and validation sets (Loss, Accuracy, Precision, 
                NPV, Recall, AUC-ROC).

            Side Effects:
                - Saves the best-performing model to disk.
                - Generates and displays plots for metrics and the final confusion matrix.
            """
            try:

                # Initialize Metrics for binary classification

                # Training
                accuracy_train = BinaryAccuracy()
                recall_train = BinaryRecall()
                auc_train = BinaryAUROC(thresholds = None)
                precision_train = BinaryPrecision()
                npv_train = BinaryNegativePredictiveValue()

                # Validation
                accuracy_val = BinaryAccuracy()
                recall_val = BinaryRecall()
                auc_val = BinaryAUROC(thresholds = None)
                precision_val = BinaryPrecision()
                npv_val = BinaryNegativePredictiveValue()

                # Score target
                if self.target_score == 'acuracy':
                    target_score = BinaryAccuracy()
                
                elif self.target_score == 'recall':
                    target_score = BinaryRecall()
                
                elif self.target_score == 'roc':
                    target_score = BinaryAUROC(thresholds = None)
                
                elif self.target_score == 'precision':
                    target_score = BinaryPrecision()

                elif self.target_score == 'npv':
                    target_score = BinaryNegativePredictiveValue()

                

                # Separating training and validation data
                    # Spliting trainset and validationset

                train_size = int(len(self.trainset) * 0.8)
                train_set, val_set = torch.utils.data.random_split(
                    self.trainset, [train_size, len(self.trainset) - train_size],
                    generator = torch.Generator().manual_seed(self.seed)
                )

                sampler, prop_class_positive =  self._oversampling(trainset = train_set)
                
                # Train Loader
                trainloader = torch.utils.data.DataLoader(
                    train_set, 
                    batch_size = self.batch_size, 
                    sampler = sampler, 
                    num_workers = self.num_workers,
                    drop_last = True,
                )

                # Val Loader
                valloader = torch.utils.data.DataLoader(
                    val_set, 
                    batch_size = self.batch_size, 
                    shuffle = False, 
                    num_workers = self.num_workers, 
                    drop_last = False,
                )
                

        
                # Loading Net
                net = PyTorch.Net(
                    l1 = self.l1,
                    l2 = self.l2, 
                    l3 = self.l3, 
                    dropout_rate = self.dropout_rate,
                    prior_minoritary_class = prop_class_positive,  
                )
                # Moving the network to the device
                net, device = self._device(net)

                # Criterion
                criterion = PyTorch.FocalLoss().to(device)

                # Optimizer
                optimizer = optim.AdamW(net.parameters(), lr = self.lr, weight_decay = self.weight_decay) 

                # Scheduler
                # Warmup (linear from 1e-5 to 0.001)
                warmup_scheduler = torch.optim.lr_scheduler.LinearLR(
                    optimizer,
                    start_factor = 0.01,
                    total_iters = 10,
                )
                # Cosine Annealing after warmup
                cosine_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
                    optimizer,
                    T_max = self.max_epochs - 10,
                    eta_min = 1e-6,
                )
                # Composition: 10 warmup epochs + (max_epochs cosine - 10 warmup)
                scheduler = torch.optim.lr_scheduler.SequentialLR(
                    optimizer, 
                    schedulers = [warmup_scheduler, cosine_scheduler],
                    milestones = [10],
                )

                # Adjusting error caused by scheduler
                warnings.filterwarnings('ignore', category = UserWarning)
                
                # Early Stopping
                early_stopping = PyTorch().EarlyStopping(
                    patience = self.early_stopping_p, 
                    mode = self.early_stopping_mode, 
                    save_path = self.save_path_model
                )
                
                # Metrics for epochs
                avg_loss_t, avg_accuracy_t, avg_recall_t, avg_auc_t, avg_precision_t, avg_npv_t = [], [], [], [], [], []
                avg_loss_v, avg_accuracy_v, avg_recall_v, avg_auc_v, avg_precision_v, avg_npv_v = [], [], [], [], [], []
                
                # Epochs
                for epoch in range(0, self.max_epochs):

                    # Metrics Training
                    train_loss = 0.0 
                    train_steps = 0 
                    accuracy_train.reset()
                    recall_train.reset()
                    auc_train.reset()
                    precision_train.reset()
                    npv_train.reset()

                    # Metrics Validation
                    val_loss = 0.0
                    val_steps = 0
                    accuracy_val.reset()
                    recall_val.reset()
                    auc_val.reset()
                    precision_val.reset()
                    npv_val.reset()

                    # Target Score
                    target_score.reset()

                    # Save preds and labels
                    preds_val, labels_val = [], []
                    
                    # Training
                    net.train()

                    for cat_input, num_input, labels in (trainloader):
            
                        # Inputs + Labels to(device)    
                        cat_input, num_input, labels = cat_input.to(device), num_input.to(device), labels.to(device)
                        
                        # Zero the parameter gradients
                        optimizer.zero_grad()

                        # Foward Pass
                        outputs = net(cat_input, num_input)
                        loss = criterion(outputs, labels)

                        # Backward + optimize
                        loss.backward()
                        # Optimizer
                        optimizer.step()

                        # Accumulating Loss
                        train_loss += loss.item()
                        train_steps += 1

                        # Updating metrics
                        accuracy_train.update(torch.sigmoid(outputs), labels.int())
                        recall_train.update(torch.sigmoid(outputs), labels.int())
                        auc_train.update(torch.sigmoid(outputs), labels.int())
                        precision_train.update(torch.sigmoid(outputs), labels.int())
                        npv_train.update(torch.sigmoid(outputs), labels.int())

                    # Evaluation
                    net.eval()

                    # Disabling gradient calculations
                    with torch.no_grad():
                        # Get the inputs; data is a list of [inputs, labels]
                        for cat_input, num_input, labels in (valloader):
                                
                                # Inputs + Labels to(device)
                                cat_input, num_input, labels = cat_input.to(device), num_input.to(device), labels.to(device)
                                
                                # Eval net
                                outputs = net(cat_input, num_input)
                                loss = criterion(outputs, labels)
                                
                                # Accumulating Loss
                                val_loss += loss.item()
                                val_steps += 1

                                # Updating metrics
                                accuracy_val.update(torch.sigmoid(outputs), labels.int())
                                recall_val.update(torch.sigmoid(outputs), labels.int())
                                auc_val.update(torch.sigmoid(outputs), labels.int())
                                precision_val.update(torch.sigmoid(outputs), labels.int())
                                npv_val.update(torch.sigmoid(outputs), labels.int())
                                
                                # Target Score
                                target_score.update(torch.sigmoid(outputs), labels.int())

                    # Saving metrics by epoch
                    # Train
                    avg_accuracy_t.append(accuracy_train.compute().item())
                    avg_recall_t.append(recall_train.compute().item())
                    avg_auc_t.append(auc_train.compute().item())
                    avg_precision_t.append(precision_train.compute().item())
                    avg_npv_t.append(npv_train.compute().item())
                    avg_loss_t.append(train_loss / train_steps)
                    # Validation
                    avg_accuracy_v.append(accuracy_val.compute().item())
                    avg_recall_v.append(recall_val.compute().item())
                    avg_auc_v.append(auc_val.compute().item())
                    avg_precision_v.append(precision_val.compute().item())
                    avg_npv_v.append(npv_val.compute().item())
                    avg_loss_v.append(val_loss / val_steps)
                                
                    # Scheduler step
                    scheduler.step()

                    # Early_stopping step
                    early_stopping(score = target_score.compute().item(), model = net, epoch = epoch)
                    # Stopping 
                    if early_stopping.early_stop:
                        print(f'>>>>>>> Finished Training.')
                    
                        # Final Validation Score Model
                        # Load the Best Model
                        net.load_state_dict(torch.load(self.save_path_model))
                        

                        # Metrics Validation
                        val_loss = 0.0
                        val_steps = 0
                        accuracy_val.reset()
                        recall_val.reset()
                        auc_val.reset()
                        precision_val.reset()
                        npv_val.reset()
                        
                        # Evaluation
                        net.eval()
                        # Disabling gradient calculations
                        with torch.no_grad():
                            # Get the inputs; data is a list of [inputs, labels]
                            for cat_input, num_input, labels in (valloader):
                                    
                                    # Inputs + Labels to(device)
                                    cat_input, num_input, labels = cat_input.to(device), num_input.to(device), labels.to(device)
                                    
                                    # Eval net
                                    outputs = net(cat_input, num_input)
                                    loss = criterion(outputs, labels)
                                    
                                    # Accumulating Loss
                                    val_loss += loss.item()
                                    val_steps += 1

                                    # Updating metrics
                                    accuracy_val.update(torch.sigmoid(outputs), labels.int())
                                    recall_val.update(torch.sigmoid(outputs), labels.int())
                                    auc_val.update(torch.sigmoid(outputs), labels.int())
                                    precision_val.update(torch.sigmoid(outputs), labels.int())
                                    npv_val.update(torch.sigmoid(outputs), labels.int())
                                    
                                    # Accumulating Predictions
                                    preds_val.append(torch.sigmoid(outputs).detach().cpu())
                                    labels_val.append(labels.detach().cpu())
                    
                        # Concatenates all batches
                        preds_val = torch.cat(preds_val).float()
                        labels_val = torch.cat(labels_val).long()

                        break

                # Distribution of the minority class
                print(f'\n⚖️ The Distribution of the minority class (prop_class_positive): {prop_class_positive}')

                # Metrics out
                print('\n✅ Train Metrics:') 
                print(f'Loss: {train_loss / train_steps:.3f}')   
                print(f'Accuracy: {accuracy_train.compute().item() * 100:> 0.1f}%') 
                print(f'Precision: {precision_train.compute().item() * 100:> 0.1f}%')
                print(f'NPV: {npv_train.compute().item() * 100:> 0.1f}%')
                print(f'Recall: {recall_train.compute().item() *100:> 0.1f}%') 
                print(f'AUC-ROC: {auc_train.compute().item() *100:> 0.1f}%') 

                print('\n☑️ Validation Metrics:')
                print(f'Loss: {val_loss / val_steps:.3f}')
                print(f'Accuracy: {accuracy_val.compute().item() * 100:> 0.1f}%')
                print(f'Precision: {precision_val.compute().item() * 100:> 0.1f}%')
                print(f'NPV: {npv_val.compute().item() * 100:> 0.1f}%')
                print(f'Recall: {recall_val.compute().item() * 100:> 0.1f}%')
                print(f'AUC-ROC: {auc_val.compute().item() * 100:> 0.1f}%')
                
                # Graphics for Training and Validation
                self._plot_metrics(
                    avg_loss_t = avg_loss_t, 
                    avg_loss_v = avg_loss_v,
                    avg_accuracy_t = avg_accuracy_t, 
                    avg_accuracy_v = avg_accuracy_v,
                    avg_precision_t = avg_precision_t, 
                    avg_precision_v = avg_precision_v,
                    avg_npv_t = avg_npv_t, 
                    avg_npv_v = avg_npv_v,
                    avg_recall_t = avg_recall_t, 
                    avg_recall_v = avg_recall_v,
                    avg_auc_t = avg_auc_t, 
                    avg_auc_v = avg_auc_v
                )

                self._confusion_matrix(
                    preds = preds_val,
                    labels = labels_val,
                    title = 'Confusion Matrix Final Validation'
                )

            except Exception as e:
                print(f'[ERROR] Failed to execute final training flow: {str(e)}.')
        
        ### Final Test Function ###
        def FinalTest(
            self,
            net,
        ):
            """
            Executes the final evaluation of a trained neural network model on the test dataset,
            computing multiple performance metrics and displaying a confusion matrix.

            This method:
                - Loads the trained model onto the appropriate device (CPU or GPU).
                - Iterates over the test dataset without gradient computation.
                - Calculates the loss and multiple binary classification metrics.
                - Stores predictions and ground truth labels for later analysis.
                - Displays the final evaluation results and a confusion matrix.

            Args:
                net (torch.nn.Module):
                    The trained PyTorch neural network model to be evaluated.

            Returns:
                tuple:
                    - preds_test (torch.Tensor): Tensor containing all predicted probabilities for the test dataset.
                    - labels_test (torch.Tensor): Tensor containing all ground truth labels for the test dataset.

            Raises:
                Exception:
                    If any error occurs during the evaluation process.

            Notes:
                - This method uses a `FocalLoss` criterion for evaluation.
                - Metrics computed:
                    * Binary Accuracy
                    * Binary Precision
                    * Binary Negative Predictive Value (NPV)
                    * Binary Recall
                    * Binary Area Under the ROC Curve (AUC-ROC)
                - Predictions are probability scores obtained from `torch.sigmoid(outputs)`.
                - The confusion matrix is generated via the `_confusion_matrix` method.
            """
            try:
                
                # Net Loading
                net, device = self._device(net)

                # Test Loader
                testloader = torch.utils.data.DataLoader(
                    self.testset, batch_size = 4, shuffle = False, num_workers = 2, 
                    drop_last = False,
                )
                
                # Criterion
                criterion = PyTorch.FocalLoss().to(device)

                # Evaluation
                net.eval()

                # Metrics Testing
                test_loss = 0.0 
                test_steps = 0 
                
                accuracy_test = BinaryAccuracy()
                precision_test = BinaryPrecision()
                npv_test = BinaryNegativePredictiveValue()
                recall_test = BinaryRecall()
                auc_test = BinaryAUROC(thresholds = None)
                
                # Save preds and labels
                preds_test, labels_test = [], []

                # Disabling gradient calculations
                with torch.no_grad():
                    # Get the inputs; data is a list of [inputs, labels]
                    for cat_input, num_input, labels in (testloader):
                            
                            # Inputs + Labels to(device)
                            cat_input, num_input, labels = cat_input.to(device), num_input.to(device), labels.to(device)
                            
                            # Eval net
                            outputs = net(cat_input, num_input)
                            loss = criterion(outputs, labels)
                            
                            # Updating Metrics
                            accuracy_test.update(torch.sigmoid(outputs), labels.int())
                            precision_test.update(torch.sigmoid(outputs), labels.int())
                            npv_test.update(torch.sigmoid(outputs), labels.int())
                            recall_test.update(torch.sigmoid(outputs), labels.int())
                            auc_test.update(torch.sigmoid(outputs), labels.int())

                            # Accumulating Loss
                            test_loss += loss.item()
                            test_steps += 1
                            # Accumulating Predictions
                            preds_test.append(torch.sigmoid(outputs).detach().cpu())
                            labels_test.append(labels.detach().cpu())

                # Concatenates all batches
                preds_test = torch.cat(preds_test).float()
                labels_test = torch.cat(labels_test).long()
                
                # Metrics Out
                print('\n✅ Test Metrics:')
                print(f'Loss: {test_loss / test_steps:.3f}')
                print(f'Accuracy: {accuracy_test.compute().item() * 100:> 0.1f}%')
                print(f'Precision: {precision_test.compute().item() * 100:> 0.1f}%')
                print(f'NPV: {npv_test.compute().item() * 100:> 0.1f}%')
                print(f'Recall: {recall_test.compute().item() * 100:> 0.1f}%')
                print(f'AUC-ROC: {auc_test.compute().item() * 100:> 0.1f}%')

                self._confusion_matrix(
                    preds = preds_test,
                    labels = labels_test,
                    title = 'Confusion Matrix Final Test'
                )
                # Return Preds
                return preds_test, labels_test

            except Exception as e:
                print(f'[ERROR] Failed to execute test flow: {str(e)}.')

## Configs:

In [0]:
# Pandas:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)


## 4. Modeling
---  
In this stage, **I will test classical machine learning models** to evaluate their performance on the training data. The approach will be **intentionally simple** (without complex hyperparameter tuning or advanced preprocessing techniques), as algorithms like **Random Forest, Logistic Regression, and SVM** typically perform better with straightforward data transformations.  

---  

After this initial analysis, **I will prioritize the project’s main model**: a **neural network developed in PyTorch**. This architecture was chosen due to its:  

- **Ability to identify complex patterns** in non-linear data.  
- **Flexibility to adapt to class imbalances** (e.g., the observed 84%-16% class distribution).  
- **Generalization capability** (Highly efficient with unseen data).  

However, neural networks require **specific preprocessing**, particularly to address:  
1. **High-cardinality categorical variables** (e.g., unique identifiers).  
2. **Asymmetric distributions** (identified during the EDA phase).  
3. **Data noise** (such as outliers in numerical variables).  

To address these, I will apply:  
- **Embedding layers** for categorical variables.  
- **Cross-validation** to verify and adjust data across different partitions.  
- **Regularization techniques** (e.g., *dropout*) to prevent *overfitting*.  

---  

### Modeling Split into Two Phases 
#### **Phase 1: Classical Machine Learning Models**  
| **Objective** | **Tools** | **Metric** |  
|---------------|------------|-------------|  
| Establish a performance baseline for future comparison. | Scikit-learn (Decision Trees, SVM, Logistic Regression). | AUC-ROC. |  

#### **Phase 2: PyTorch Neural Network**  
| **Objective** | **Tools** | **Metric** |  
|---------------|------------|-------------|  
| Achieve better generalization on unseen data. | PyTorch, Torchmetrics, Ray Tune. | AUC-ROC, Recall, Acurracy. |  

---  

### Evaluation Metric Choice: AUC-ROC 
#### Why AUC-ROC?  
| **Criterion** | **Explanation** | **Business Impact** |  
|---------------|------------------|----------------------|  
| **Class imbalance** | Balances *recall* (capturing churning customers) and *specificity* (avoiding unnecessary actions on loyal customers). | Reduces operational costs by prioritizing high-risk customers. |  
| **Asymmetric cost sensitivity** | False negatives (missing churn) are more critical than false positives. | Improves retention campaign efficacy (e.g., personalized offers). |  
| **Universal interpretability** | Scores above **0.85** indicate strong predictive power for binary classification. | Simplifies communication with non-technical stakeholders. |  

---  


#### Loading data train and data test

In [0]:
# File location and file type -- train
file_location  = 'dbfs:/FileStore/DS_Credit-Card_Churn_Analysis/Datasets/Gold/train'
train = DataSparkPS(file_location = file_location).load_data(file_type = 'parquet')

In [0]:
# File location and file type -- train
file_location  = 'dbfs:/FileStore/DS_Credit-Card_Churn_Analysis/Datasets/Gold/test'
test = DataSparkPS(file_location = file_location).load_data(file_type = 'parquet')

In [0]:
train.head()

In [0]:
test.head()

#### Separating features and labels

In [0]:
# Train
X_train = train.drop(columns = 'churn_target')
y_train = train['churn_target'].copy()
# Test 
X_test = test.drop(columns = 'churn_target')
y_test = test['churn_target'].copy()

# Checking the dimensions of the training and test data
print(f'The Train features dataset shape: {X_train.shape}')
print(f'The Train labels dataset shape: {y_train.shape}')
print(f'\nThe Test features dataset shape: {X_test.shape}')
print(f'The Test labels dataset shape: {y_test.shape}')

#### Checking numeric columns

In [0]:
# Creating lists of numeric variables
continuos_numerical = X_train.select_dtypes('float64').columns.tolist()
discrete_numerical = X_train.select_dtypes('int32').columns.tolist()

numerical_features = continuos_numerical
numerical_features.extend(discrete_numerical)

# Printing the quantity and numeric columns
print(f'There are {len(numerical_features)} numerical features.')
print('\nThere are:')
numerical_features

#### Checking categorical columns

In [0]:
# Creating lists of categorical variables
categorical_features = X_train.select_dtypes('object').columns.tolist()

# Printing the quantity and categorical columns
print(f'There are {len(categorical_features)} categorical features.')
print('\nThere are:')
categorical_features

In [0]:
for feature in categorical_features:
  print(feature)
  print('-' * 40)
  print(f'There are {X_train[feature].nunique()} unique values there are: ')
  print(X_train[feature].value_counts(normalize = True))
  print()

#### Checking labels

In [0]:
  print('churn_target')
  print('-' * 40)
  print(f'There are {y_train.nunique()} unique values there are for churn_target: ')
  print(y_train.value_counts(normalize = True))


### Training Classics Models 


#### Preprocessing

In [0]:
# Standard Scaler Features
std_scaler_features = [
    'credit_limit', 'total_amt_chng_q4_q1', 'total_ct_chng_q4_q1','avg_utilization_ratio','customer_age','dependent_count','months_on_book', 'total_relationship_count', 'months_inactive_12_mon', 'contacts_count_12_mon', 'total_revolving_bal', 'total_trans_amt', 'total_trans_ct'
]

# One Hot Encoder Features
one_hot_features = [
    'gender', 'marital_status',
]

# Ordinal Features
ordinal_features = ['education_level', 'income_category', 'card_category']
ordinal_features_order = {
    'education_level': ['Unknown', 'Uneducated', 'High School', 'College', 'Graduate', 'Post-Graduate', 'Doctorate'],
    'income_category': ['Unknown', 'Less than $40K', '$40K - $60K', '$60K - $80K', '$80K - $120K', '$120K +'],
    'card_category': ['Blue', 'Silver', 'Gold', 'Platinum']
}

# Pipeline Ordinal Features
ordinal_pipeline = Pipeline(
    steps = [
        ('ordinal_encoder', OrdinalEncoder(categories = [
            ordinal_features_order['education_level'], 
            ordinal_features_order['income_category'],
            ordinal_features_order['card_category']
        ])),
        ('standard_scaler', StandardScaler()),
    ]
)


classic_ml_preprocessor = ColumnTransformer(
    transformers = [
        ('one_hot_features', OneHotEncoder(), one_hot_features),
        ('ordinal_features', ordinal_pipeline, ordinal_features),
        ('std_scaler_features', StandardScaler(), std_scaler_features),
        
    ], 
    remainder = 'passthrough'
)


#### Checking data train preprocessed

In [0]:
X_train_preprocessed = classic_ml_preprocessor.fit_transform(X_train)
pd.DataFrame(X_train_preprocessed, columns = classic_ml_preprocessor.get_feature_names_out(X_train.columns)).head()

Shape of train preprocessed

In [0]:
X_train_preprocessed.shape

#### Applying Cross Validation

In [0]:
models = [

    ('Logistic Regression', 
     Pipeline([('model', LogisticRegression(
         random_state = 33, 
         class_weight = 'balanced',
    ))])),

    ('Decision Tree Classifier', 
     Pipeline([('model', DecisionTreeClassifier(
         random_state = 33, 
         class_weight = 'balanced',
         max_depth = 5,
         criterion = 'gini',
         min_impurity_decrease = 0.001,
    ))])),


    ('Random Forest Classifier', 
     Pipeline([('model', RandomForestClassifier(
        random_state = 33,
        class_weight = 'balanced',
        n_estimators = 100,
        max_depth = 10,
        min_samples_split = 2,
        
    ))])),
     
    ('KNeighbors Classifier', 
     Pipeline([('model', KNeighborsClassifier(
        n_neighbors = 5,
        weights = 'distance',
        metric = 'minkowski'
    ))])),

    ('Suport Vector Machine Classifier', 
     Pipeline([('model', SVC(
        random_state = 33,
        class_weight = 'balanced',
        C = 1.0,
        kernel = 'rbf',
        gamma = 'scale',
        probability = True, 
    ))])),

    ('Gradient Boosting Classifier', 
     Pipeline([('model', GradientBoostingClassifier(
        random_state = 33,
        n_estimators = 200,
        max_depth = 3,
        learning_rate = 0.1,
        subsample = 0.7
    ))])),

]

In [0]:
cross_validation_ml(
    models = models,
    x_train = X_train_preprocessed,
    y_train = y_train,
)


### Training PyTorch Model

#### Data Preprocessing for PyTorch 
Neural networks benefit from **transformations and scaling that preserve the natural distribution of the data**, ensuring stability during training. For this, I adopted the following strategy:  

#### 1. Numerical Variables 
| **Technique** | **Applied Variables** | **Justification** |  
|-------------|--------------------------|-------------------|  
| **MinMaxScaler** | Most variables (e.g., age, number of transactions). | Preserves the original scale within [0, 1] intervals. |  
| **Robust Scaler** | `credit_limit`, `total_trans_amt`. | Minimizes the impact of **outliers** and wide amplitude (e.g., values between $500 and $50,000), maintaining statistical robustness. |  

**Prior Checks**:  
- Distribution analysis (histograms and boxplots).  
- Initial model performance tests with Robustscaler.  

---  

#### 2. Categorical Variables  
For categorical variables, I used **embedding layers** in PyTorch because:  
- **Learning specific patterns**: Dense representations capture hierarchical relationships (e.g., "Blue" → "Gold" → "Platinum").  
- **Handling high cardinality**: Reduces dimensionality of variables.  
- **Direct network integration**: Avoids one-hot encoding, which increases sparsity and computational cost.  

---  

In [0]:
# Data collect
data_ax = train[numerical_features]
# Initialize Graphics
numerical_graphics = GraphicsData(data_ax)
# Numerical Histograms
numerical_graphics.numerical_histograms()
# Numerical Boxplots
numerical_graphics.numerical_boxplots(showfliers = True)

In [0]:
# Nominal features
nominal_features = [
    'gender', 'marital_status'
]


# Ordinal features
ordinal_features = ['education_level', 'income_category', 'card_category']
ordinal_features_order = {
    'education_level': ['Unknown', 'Uneducated', 'High School', 'College', 'Graduate', 'Post-Graduate', 'Doctorate'],
    'income_category': ['Unknown', 'Less than $40K', '$40K - $60K', '$60K - $80K', '$80K - $120K', '$120K +'],
    'card_category': ['Blue', 'Silver', 'Gold', 'Platinum']
}

# Pipeline ordinal features
ordinal_pipeline = Pipeline(
    steps = [
        ('ordinal_encoder', OrdinalEncoder(categories = [
            ordinal_features_order['education_level'], 
            ordinal_features_order['income_category'],
            ordinal_features_order['card_category']
        ])),
        
    ]
) 

# Robust features
robust_scaler_features = [
    'credit_limit', 'total_trans_amt', 
]

# Standard Scaler features
std_scaler_features = []

# MixMax features
minmax_scaler_features = [
    'months_on_book', 'customer_age', 'dependent_count',
    'total_relationship_count', 'months_inactive_12_mon',
    'contacts_count_12_mon', 'total_revolving_bal', 'avg_utilization_ratio', 
    'total_amt_chng_q4_q1', 'total_ct_chng_q4_q1', 'total_trans_ct', 
]

# Final Preprocessor
pytorch_preprocessor = ColumnTransformer(
    transformers = [
        ('nominal_features', OrdinalEncoder(), nominal_features),
        ('ordinal_features', ordinal_pipeline, ordinal_features),
        ('minmax_scaler_features', MinMaxScaler(), minmax_scaler_features),
        ('robust_scaler_features', RobustScaler(), robust_scaler_features),        
    ], 
    remainder = 'passthrough'
) 

In [0]:
preprocessed_train = pytorch_preprocessor.fit_transform(train)

In [0]:
pd.DataFrame(preprocessed_train, columns = pytorch_preprocessor.get_feature_names_out(train.columns)).head(10)

In [0]:
preprocessed_train.shape

#### Dataset Pytorch

In [0]:
train_set = PyTorch.Dataset(
    dataset = preprocessed_train,
    cat_idx = [0, 5],
    num_idx = [5, 18],
    label_idx = [18, 19],
)

#### Checking the transformations of data into tensors and their dimensions

In [0]:
categorical, numerical, label = train_set[0]
print(f'The Categorical tensor {categorical}')
print(f'\nThe Numerical tensor {numerical}')
print(f'\nThe Labels {label}')

In [0]:
for cat, num, label in train_set:
    print(f'Shape of categorical train: {cat.shape} {cat.dtype}')
    print(f'Shape of numerical train: {num.shape} {num.dtype}')
    print(f'Shape of label train: {label.shape} {label.dtype}')
    break

#### The Neural Network

In [0]:
exemple_model = net = PyTorch.Net()
exemple_model

#### Focal Loss

In [0]:
criterion = PyTorch.FocalLoss()
criterion

#### Early Stopping

In [0]:
early_stopping = PyTorch.EarlyStopping()
early_stopping

#### Cross Validation Function

#### Training

In [0]:
PyTorch.PyTorchFlow(
    trainset = train_set, 
    l1 = 256,
    l2 = 128,
    l3 = 64,
    dropout_rate = 0.5,
    num_workers = 2,
    batch_size = 128,
    lr = 1e-3,
    weight_decay = 1e-5,
    max_epochs = 200,
    early_stopping_p = 15,
    early_stopping_mode = 'max',
    save_path_model = '/best_model.pt',
    k_fold = 5,
    target_score = 'roc',
    seed = 33
).CrossValidation()

#### PyTorch Training Architecture
The training process was adapted to the project’s specific characteristics (limited and imbalanced data) using the following strategies:  

#### 1. Cross-Validation for Limited Data  
| **Strategy** | **Details** | **Benefit** |  
|----------------|--------------|----------------|  
| **5-Fold Cross-Validation** | - Training data split into **5 folds**.<br>- Each fold: **4 parts for training** + **1 for validation**. | Maximizes the use of **8120 available training records**, reducing overfitting and evaluation bias. |  

---  

#### 2. Handling Class Imbalance  
| **Technique** | **Implementation** | **Impact** |  
|-------------|--------------------|--------------|  
| **WeightedRandomSampler** | - Adjusts batch sampling to prioritize the minority class (*churn=1*). | Balances class contributions during training. |  
| **Bias-Weight Initialization** | - Output layer bias adjusted according to class distribution (16% *churn* vs. 84% *non-churn*). | Reduces initial bias toward the majority class. |  
| **Focal Loss** (γ=2.0) | - Penalizes errors in minority class examples (*hard samples*). | Focuses on complex *churn* patterns, improving recall without sacrificing precision. |  

---  

#### 3. Regularization and Training Stability 
| **Component** | **Configuration** | **Objective** |  
|----------------|-------------------|---------------|  
| **BatchNorm** | - Applied to intermediate dense layers.<br>- Batch size ≥ 64. | Reduces scale dependency and accelerates convergence for asymmetric variables (e.g., `total_trans_amt`). |  
| **Dropout** (p = 0.5) | - Hidden layers after BatchNorm. | Forces the network to learn redundant patterns, preventing overfitting. |  
| **AdamW** | - *Weight decay* = 5e-4. | Decouples L2 regularization from gradient adaptation, improving generalization. |  

---  

#### 4. Learning Rate Management  
| **Scheduler** | **Behavior** | **Advantage** |  
|---------------|--------------------|---------------|  
| **LinearLR** | - Increases learning rate from 1% → 100% over **100 iterations**. | Smoothens training initiation, avoiding abrupt oscillations. |  
| **ReduceLROnPlateau** | - Halves the learning rate after **5 epochs without validation AUC-ROC improvement**. | Dynamically adapts the learning rate to performance plateaus. |  


## 5 - Evaluation  
In this stage, I analyze the comparative performance of **classical machine learning models** and the **PyTorch neural model**, focusing on technical and business criteria. I will evaluate and compare the top 3 models based on their performance during **cross-validation**.  

---

### Model Performance  
| **Model**             | **AUC-ROC** | **Standard Deviation (AUC-ROC)** | **Highlight** |  
|-----------------------|-------------|----------------------------------|---------------|  
| **Random Forest**      | 98.24%      | ±0.001276                        | Good generalization and satisfactory AUC-ROC. |  
| **Gradient Boosting**  | 99.13%      | ±0.002317                        | Best AUC-ROC, but higher variance compared to PyTorch. |  
| **Neural Network (PyTorch)** | 98.96% | **±0.001296**                    | Superior generalization and lower sensitivity to data partitions. |  

---

### Why Prioritize the PyTorch Model?  
1. **Robust Generalization**:  
   - Lower standard deviation indicates **consistency across diverse data scenarios**.  
   - Techniques like *dropout (p=0.5)* and *L2 regularization* reduce dependency on specific variables.  

2. **Adaptability to New Data**:  
   - The neural architecture is more effective for **unseen data** (e.g., new clients with unusual patterns), thanks to its ability to learn complex non-linear relationships.  

3. **Business Costs**:  
   - **False negatives** (failing to identify *churn*) cost **5-7x more** than false positives (estimated cost: R$ 2,000 per lost client vs. R$ 400 in unnecessary offers).  
   - PyTorch allows adjustments to *thresholds* and loss functions (e.g., Focal Loss) to prioritize **recall** (capturing a larger share of *churners*).  

---

### Next Optimization Steps  
1. **Hyperparameter Tuning**:  
   - Adjust hyperparameters using cross-validation and test these adjustments with unseen test data to verify the model’s generalization.  
   - Low Bias
   - Low Variance 
2. **Recall Prioritization**:  
   - Adjust classification thresholds to maximize *churner* detection.  
3. **Production Validation**:  
   - Monitor business metrics:  
     - *Churn* reduction (>15%).  
     - Retention campaign ROI (>200%).  

---

### Conclusion  
The choice of PyTorch is justified not only by technical performance but also by **operational flexibility**:  

- **Scalability**: Adapts to new variables (e.g., real-time transaction data) without full retraining.  
- **Robustness**: The slight difference in AUC-ROC (98.96% vs. 99.13%) is offset by the neural model’s robustness in dynamic scenarios, aligning with the strategic goal of reducing customer acquisition costs.  

---  



In [0]:
data_scores = {
    'Model': [
        'Logistic Regression', 'Decision Tree', 'Random Forest', 
        'KNeighbors', 'Suport Vector Machine', 'Gradient Boosting', 'Pytorch'
    ], 
    'AUC-ROC Validation': [0.9231, 0.94562, 0.9824, 0.8980, 0.9643, 0.9913, 0.9902], 
    'Standard Deviation': [0.012404, 0.005276, 0.001276, 0.012466,0.003249, 0.002317, 0.002035], 
}
df_scores = pd.DataFrame(data_scores)
df_scores.sort_values('AUC-ROC Validation')

In [0]:
GraphicsData(df_scores.sort_values('AUC-ROC Validation')).models_performance_barplots(models_col = 'Model')

### Hypertunning

In [0]:
PyTorch.PyTorchFlow(
    trainset = train_set, 
    dropout_rate = 0.5,
    num_workers = 2,
    max_epochs = 200,
    early_stopping_p = 15,
    early_stopping_mode = 'max',
    k_fold = 5,
    target_score = 'roc',
    seed = 33
).HyperTunning(n_samples = 30)

#### Final Training

In [0]:
PyTorch.PyTorchFlow(
    trainset = train_set, 
    l1 = 128,
    l2 = 128,
    l3 = 64,
    dropout_rate = 0.5,
    num_workers = 2,
    batch_size = 256,
    lr = 0.009832484484875212,
    weight_decay = 1e-5,
    max_epochs = 200,
    early_stopping_p = 15,
    early_stopping_mode = 'max',
    save_path_model = '/main_final_model.pt',
    target_score = 'roc',
    seed = 33
).FinalTraining()

#### Final Testing

In [0]:
# Loading Net
net = PyTorch.Net(l1 = 128, l2 = 128, l3 = 64, prior_minoritary_class = 0.16071428571428573)
net.load_state_dict(torch.load('/main_final_model.pt'))

In [0]:
# Salving the model
torch.save(net.state_dict(), '/main_final_model.pth')
print('Saved Pytorch Model State to main_final_model.pth')

In [0]:
# Preprocessing Test data
preprocessed_test = pytorch_preprocessor.transform(test)

In [0]:
# Salving the pipeline of preprocessing
with open('/pytorch_preprocessor.pkl', 'wb') as fp:
    pickle.dump(pytorch_preprocessor, fp)

In [0]:
# Datset Pytoch
test_set = PyTorch.Dataset(
    dataset = preprocessed_test,
    cat_idx = [0, 5],
    num_idx = [5, 18],
    label_idx = [18, 19],
)

In [0]:
categorical, numerical, label = test_set[0]
print(f'The Categorical tensor {categorical}')
print(f'\nThe Numerical tensor {numerical}')
print(f'\nThe Labels {label}')

In [0]:
preds, labels = PyTorch.PyTorchFlow(
    testset = test_set, 
).FinalTest(net = net)

In [0]:
metric_collection = MetricCollection([
    BinaryAUROC(), 
    BinaryAccuracy(),
    BinaryRecall(), 
    BinaryF1Score(),
    BinaryPrecision(),
])
metric_collection(preds, labels)
plt.rc('font', size = 12)
fig, ax= plt.subplots(figsize = (10, 4))
fig_ax_ = metric_collection.plot(together=True, ax = ax)

# Work Here

In [0]:
list_preds = preds.flatten().tolist()
list_labels = labels.flatten().tolist()
data_preds = {
    'Predictions': list_preds,
    'Labels': list_labels,
}
df_preds = pd.DataFrame(data_preds)
df_preds.head()

In [0]:
# Data collect
list_preds = preds.flatten().tolist()
list_labels = labels.flatten().tolist()
data_preds = {
    'Predictions': list_preds,
    'Labels': list_labels,
}
df_preds = pd.DataFrame(data_preds)
df_preds.sample(10)


In [0]:
# Organizing data and renaming labels
data_ax = df_preds
data_ax['Labels'] = data_ax['Labels'].map({0: 'No Churn', 1: 'Churn'})

In [0]:
GraphicsData(data_ax).plot_kde_predictions(
    predictions = 'Predictions',
    labels = 'Labels',
    title = 'Prediction Probabilities - Churn and Non-Churn'
)

In [0]:
GraphicsData.plot_roc_pr_curves(preds = preds, labels = labels)

### Models Aux

#### Precison Model

In [0]:
PyTorch.PyTorchFlow(
    trainset = train_set, 
    l1 = 128,
    l2 = 128,
    l3 = 64,
    dropout_rate = 0.5,
    num_workers = 2,
    batch_size = 256,
    lr = 0.009832484484875212,
    weight_decay = 1e-5,
    max_epochs = 200,
    early_stopping_p = 15,
    early_stopping_mode = 'max',
    save_path_model = '/main_final_model.pt',
    target_score = 'precision',
    seed = 33
).FinalTraining()

#### NPV Model

In [0]:
PyTorch.PyTorchFlow(
    trainset = train_set, 
    l1 = 128,
    l2 = 128,
    l3 = 64,
    dropout_rate = 0.5,
    num_workers = 2,
    batch_size = 256,
    lr = 0.009832484484875212,
    weight_decay = 1e-5,
    max_epochs = 200,
    early_stopping_p = 15,
    early_stopping_mode = 'max',
    save_path_model = '/main_final_model.pt',
    target_score = 'npv',
    seed = 33
).FinalTraining()

## 6 - Deployment  

- In this step, I will be developing a classifier based on the statistics and analyses obtained through EDA and with this information the classifier will be returning the probability of the customer becoming a churn and their characteristics to help the user make the appropriate decision about a decision when it comes to retaining this customer.

In [0]:
train.head()

#### Avg Utilization Mean

In [0]:
train.groupby('churn_target')['avg_utilization_ratio'].mean()

#### Total Revolving Bal Mean

In [0]:
train.groupby('churn_target')['total_revolving_bal'].mean().round()

#### Total Relationship Mean

In [0]:
train.groupby('churn_target')['total_relationship_count'].mean().round()

#### Contacts Count 12 Months Mean

In [0]:
train.groupby('churn_target')['contacts_count_12_mon'].mean().round()


#### Total Transfer Amount Mean

In [0]:
train.groupby('churn_target')['total_trans_amt'].mean().round()


#### Total Transaction Count Mean

In [0]:
train.groupby('churn_target')['total_trans_ct'].mean().round()


In [0]:
test.head(11)

In [0]:
client_to_classify = test.drop(columns = 'churn_target')[10:11]
client_to_classify

## -- Final Classifier --


In [0]:
def final_classifier(client_to_classify, device = 'cpu'):

    # Secure copy of data (avoids altering the original)
    client_data = client_to_classify.copy()

    # Add target column just for pipeline compatibility (will not be used)
    if 'churn_target' not in client_data.columns:
        client_data['churn_target'] = None  # dummy

    # Loading preprocessing pipeline
    with open('/pytorch_preprocessor.pkl', 'rb') as fp:
        preprocessor_loaded = pickle.load(fp)

    # Pre- porcessing
    data_preprocessed = preprocessor_loaded.transform(client_data)

    # Separation of categorical and numerical data
    cat_input = torch.from_numpy(data_preprocessed[:, 0:5].astype(np.int64)).to(device)
    num_input = torch.from_numpy(data_preprocessed[:, 5:18].astype(np.float32)).to(device)

    # Loading the trained network
    net = PyTorch.Net(l1 = 128, l2 = 128, l3 = 64, prior_minoritary_class = 0.16071428571428573).to(device)
    net.load_state_dict(torch.load('/main_final_model.pt', map_location = device))
    net.eval()

    # Prediction
    with torch.no_grad():
        pred = net(cat_input, num_input)
        prob = torch.sigmoid(pred).item()

    # Decision threshold
    threshold = 0.5
    binary_pred = int(prob >= threshold)

    # Classification Output
    if binary_pred == 1:
        print('\nThis customer has been classified as a:\n⛔ Potential Churner.')
    else:
        print('\nThis customer has been classified as a:\n✅ No-Churner.')

    print(f'📊 With a rate of: [{prob * 100:.2f}%] Chance Of Churning.')

    print('\n🎯We have some indicators for this customer that we can work on to prevent churn:')

    # Interpretable indicators and analyses
    avg_utilization_ratio = client_data['avg_utilization_ratio'].item()
    print(f'\nAverage credit card usage in the last 12 months: {avg_utilization_ratio * 100:.2f}%')
    if avg_utilization_ratio <= 0.162929:
        print('🔴 This customer has very low credit card usage. Recommend increasing above 16.30%.')
    else:
        print('🟢 Good credit card usage (GREATER THAN 16.30%).')

    total_revolving_bal = client_data['total_revolving_bal'].item()
    print(f'\nRevolving balance: {total_revolving_bal}')
    if total_revolving_bal <= 684.0:
        print('🔴 Low revolving balance. Recommend increasing above 684.')
    else:
        print('🟢 Good revolving balance (GREATER THAN 684).')

    total_relationship_count = client_data['total_relationship_count'].item()
    print(f'\nTotal products/services: {total_relationship_count}')
    if total_relationship_count <= 3:
        print('🔴 Less than 4 products/services. Recommend cross-sell to increase engagement.')
    else:
        print('🟢 Customer has more than 3 services. Positive indicator.')

    total_trans_amt = client_data['total_trans_amt'].item()
    print(f'\nTotal transaction amount (12 months): {total_trans_amt}')
    if total_trans_amt <= 3116.0:
        print('🔴 Low transaction volume. Recommend incentive campaigns.')
    else:
        print('🟢 Healthy transaction amount (GREATER THAN 3116).')

    total_trans_ct = client_data['total_trans_ct'].item()
    print(f'\nNumber of transactions (12 months): {total_trans_ct}')
    if total_trans_ct <= 45:
        print('🔴 Low activity. Recommend engaging offers to increase usage.')
    else:
        print('🟢 Active customer (GREATER THAN 45 transactions).')

In [0]:
final_classifier(client_to_classify)