# Phishing URL Tree-Based Model Experiments

This notebook explores various tree-based models using the Kaggle phishing URL dataset.

For the tree-based models, we will be experimenting with:

1. Random Forest
2. XGBoost
3. LightGBM
4. CatBoost

## Setup and Imports

In [1]:
# Standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Scikit-learn
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.metrics import (accuracy_score, precision_score, recall_score,
                             f1_score, roc_auc_score, confusion_matrix,
                             classification_report)
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import TruncatedSVD
from scipy.sparse import hstack

# Tree-based models
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier

# Import ModelSaver
import sys
import os
sys.path.append(os.path.abspath('.'))
from save_model import ModelSaver

# Display settings
pd.set_option('display.max_columns', None)
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

In [2]:
# Configuration
SAVE_MODELS = True
SEED = 42
np.random.seed(SEED)

# Check for Google Drive (if running in Colab)
use_drive = False
try:
    from google.colab import drive
    drive.mount('/content/drive')
    use_drive = True
    drive_root = '/content/drive/MyDrive/fraud-grp-proj/'
except ImportError:
    pass

In [3]:
# Load train and test datasets
train_df = pd.read_csv('dataset/train.csv')
test_df = pd.read_csv('dataset/test.csv')

train_w_features_df = pd.read_csv('dataset/df_train_feature_engineered.csv')
test_w_features_df = pd.read_csv('dataset/df_test_feature_engineered.csv')

print(f"Train shape: {train_df.shape}")
print(f"Test shape: {test_df.shape}")

print(f"Train with features shape: {train_w_features_df.shape}")
print(f"Test with features shape: {test_w_features_df.shape}")

# Prepare text data for TF-IDF
X_text = train_df['url'].values
X_text_test = test_df['url'].values

Train shape: (9143, 2)
Test shape: (2286, 2)
Train with features shape: (9143, 78)
Test with features shape: (2286, 78)


Due to the robust nature of tree-based models, we will just be using the full feature set including originals and transformed features, unlike our approach for linear and neural network models.

In [4]:
# Prepare X and y for numeric features
non_text_cols = train_w_features_df.select_dtypes(exclude=[object]).columns.tolist()
if 'target' in non_text_cols:
    non_text_cols.remove('target')

X_numeric = train_w_features_df[non_text_cols].values
y = train_w_features_df['target'].values

X_numeric_test = test_w_features_df[non_text_cols].values
if 'target' in test_w_features_df.columns:
    y_test = test_w_features_df['target'].values
else:
    y_test = np.zeros(len(test_w_features_df))

print(f"Numeric features used: {len(non_text_cols)}")

# Prepare TF-IDF + SVD features
tfidf_vectorizer = TfidfVectorizer(max_features=5000, analyzer='char', ngram_range=(3, 5))
X_text_tfidf = tfidf_vectorizer.fit_transform(X_text)
X_text_tfidf_test = tfidf_vectorizer.transform(X_text_test)

svd = TruncatedSVD(n_components=100, random_state=SEED)
X_text_svd = svd.fit_transform(X_text_tfidf)
X_text_svd_test = svd.transform(X_text_tfidf_test)

print(f"TF-IDF + SVD features shape: {X_text_svd.shape}")

# Prepare combined features (TF-IDF + SVD + Numeric)
X_combined = np.hstack([X_text_svd, X_numeric])
X_combined_test = np.hstack([X_text_svd_test, X_numeric_test])

print(f"Combined features shape: {X_combined.shape}")

Numeric features used: 72
TF-IDF + SVD features shape: (9143, 100)
Combined features shape: (9143, 172)
TF-IDF + SVD features shape: (9143, 100)
Combined features shape: (9143, 172)


## Training Models

Now lets move on to training the models. We use the `ModelSaver` utility to help us standardize the storing of metrics and models for evaluation later on.

Since we found that combined features worked best for linear models, we will focus on combined features (TF-IDF + Numeric) for tree-based models as well. That said, to help with performance, we will perform SVD on the TF-IDF features to reduce dimensionality before combining with numeric features.

Ultimately, we will be experimenting with:
1. Numeric features only
2. Combined features (TF-IDF + SVD + Numeric)

In [5]:
def run_tree_experiment(model_class, model_name, model_params, experiment_name, X_train, y_train, X_test, feature_names, vectorizer_name, vectorizer_params, save_model=True, **kwargs):
    print(f"\n=== Running Experiment: {experiment_name} ({model_name}) ===")
    print(f"Saving Model: {save_model}")

    saver = None
    if save_model:
        if use_drive:
            base_path = drive_root + "experiments"
        else:
            base_path = "experiments"
        saver = ModelSaver(base_path=base_path)
        saver.start_experiment(
            experiment_name=experiment_name,
            model_type=model_name,
            vectorizer=vectorizer_name,
            vectorizer_params=vectorizer_params,
            model_params=model_params,
            n_folds=5,
            save_format="pickle"
        )

    skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=SEED)
    fold_test_preds = []
    
    for fold, (train_idx, val_idx) in enumerate(skf.split(X_train, y_train), start=1):
        print(f"\n--- Fold {fold}/5 ---")
        
        X_train_fold, X_val_fold = X_train[train_idx], X_train[val_idx]
        y_train_fold, y_val_fold = y_train[train_idx], y_train[val_idx]

        # Initialize and train model
        model = model_class(**model_params)
        model.fit(X_train_fold, y_train_fold)

        # Validation predictions
        val_probs = model.predict_proba(X_val_fold)[:, 1]
        val_preds = (val_probs > 0.5).astype(int)

        # Calculate metrics
        tn, fp, fn, tp = confusion_matrix(y_val_fold, val_preds).ravel()
        
        metrics = {
            'fold': fold,
            'accuracy': accuracy_score(y_val_fold, val_preds),
            'precision': precision_score(y_val_fold, val_preds, zero_division=0),
            'recall': recall_score(y_val_fold, val_preds, zero_division=0),
            'f1': f1_score(y_val_fold, val_preds, zero_division=0),
            'roc_auc': roc_auc_score(y_val_fold, val_probs),
            'TP': int(tp),
            'FP': int(fp),
            'TN': int(tn),
            'FN': int(fn),
            'train_size': len(train_idx),
            'val_size': len(val_idx)
        }
        
        print(f"Fold {fold} Val AUC: {metrics['roc_auc']:.4f}")

        # Test predictions
        test_probs = model.predict_proba(X_test)[:, 1]
        fold_test_preds.append(test_probs)

        if save_model and saver:
            saver.add_fold(
                fold_model=model,
                fold_metric=metrics,
                test_predictions=test_probs,
                feature_names=feature_names
            )

    if save_model and saver:
        saver.finalize_experiment(**kwargs)
        print(f"Experiment saved to {saver._exp_dir}")

    return model

### 1. Random Forest

#### 1.1. Numeric Features

In [6]:
# Default params
rf_params = {
    'random_state': SEED,
    'verbose': 0
}

run_tree_experiment(
    RandomForestClassifier, 
    "RandomForest", 
    rf_params, 
    "exp_2_random_forest_numeric", 
    X_train=X_numeric, 
    y_train=y, 
    X_test=X_numeric_test, 
    feature_names=non_text_cols, 
    vectorizer_name="NumericFeatures", 
    vectorizer_params={}, 
    save_model=SAVE_MODELS
)


=== Running Experiment: exp_2_random_forest_numeric (RandomForest) ===
Saving Model: True
Experiment 'exp_2_random_forest_numeric' initialized at: experiments/exp_2_random_forest_numeric
Mode: Incremental saving (5 folds)

--- Fold 1/5 ---
Fold 1 Val AUC: 0.9739
  Fold 1/5 saved | ROC AUC: 0.9739

--- Fold 2/5 ---
Fold 1 Val AUC: 0.9739
  Fold 1/5 saved | ROC AUC: 0.9739

--- Fold 2/5 ---
Fold 2 Val AUC: 0.9716
  Fold 2/5 saved | ROC AUC: 0.9716

--- Fold 3/5 ---
Fold 2 Val AUC: 0.9716
  Fold 2/5 saved | ROC AUC: 0.9716

--- Fold 3/5 ---
Fold 3 Val AUC: 0.9705
  Fold 3/5 saved | ROC AUC: 0.9705

--- Fold 4/5 ---
Fold 3 Val AUC: 0.9705
  Fold 3/5 saved | ROC AUC: 0.9705

--- Fold 4/5 ---
Fold 4 Val AUC: 0.9700
  Fold 4/5 saved | ROC AUC: 0.9700

--- Fold 5/5 ---
Fold 4 Val AUC: 0.9700
  Fold 4/5 saved | ROC AUC: 0.9700

--- Fold 5/5 ---
Fold 5 Val AUC: 0.9707
  Fold 5/5 saved | ROC AUC: 0.9707

Finalizing experiment...
  Predictions saved to experiments/exp_2_random_forest_numeric/exp_

0,1,2
,n_estimators,100
,criterion,'gini'
,max_depth,
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,'sqrt'
,max_leaf_nodes,
,min_impurity_decrease,0.0
,bootstrap,True


#### 1.2. TF-IDF + SVD Features

In [17]:
# Feature names to track
# Build feature names in the exact order used in X_combined: SVD components first, then numeric columns
n_svd = X_text_svd.shape[1]
# sanity check
assert X_combined.shape[1] == n_svd + len(non_text_cols), "Feature count mismatch between SVD + numeric and X_combined"
combined_feature_names = [f"svd_{i}" for i in range(n_svd)] + non_text_cols

run_tree_experiment(
    RandomForestClassifier, 
    "RandomForest", 
    rf_params, 
    "exp_2_random_forest_all", 
    X_train=X_combined, 
    y_train=y, 
    X_test=X_combined_test, 
    feature_names=combined_feature_names, 
    vectorizer_name="CombinedFeatures",
    vectorizer_params={'tfidf': {'max_features': 5000, 'analyzer': 'char', 'ngram_range': (3, 5)}, 'svd': {'n_components': 100}}, 
    save_model=SAVE_MODELS
)


=== Running Experiment: exp_2_random_forest_all (RandomForest) ===
Saving Model: True
Experiment 'exp_2_random_forest_all' initialized at: experiments/exp_2_random_forest_all
Mode: Incremental saving (5 folds)

--- Fold 1/5 ---
Fold 1 Val AUC: 0.9819
  Fold 1/5 saved | ROC AUC: 0.9819

--- Fold 2/5 ---
Fold 2 Val AUC: 0.9809
  Fold 2/5 saved | ROC AUC: 0.9809

--- Fold 3/5 ---
Fold 3 Val AUC: 0.9790
  Fold 3/5 saved | ROC AUC: 0.9790

--- Fold 4/5 ---
Fold 4 Val AUC: 0.9841
  Fold 4/5 saved | ROC AUC: 0.9841

--- Fold 5/5 ---
Fold 5 Val AUC: 0.9796
  Fold 5/5 saved | ROC AUC: 0.9796

Finalizing experiment...
  Predictions saved to experiments/exp_2_random_forest_all/exp_2_random_forest_all_prediction.csv

✓ Experiment 'exp_2_random_forest_all' finalized!
  Location: experiments/exp_2_random_forest_all
  Folds completed: 5
  Best fold: 4 (ROC AUC: 0.9841)
  Average ROC AUC: 0.9811 ± 0.0018
Experiment saved to experiments/exp_2_random_forest_all


0,1,2
,n_estimators,100
,criterion,'gini'
,max_depth,
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,'sqrt'
,max_leaf_nodes,
,min_impurity_decrease,0.0
,bootstrap,True


Since using combined features worked best for our baseline random forest, we will try using it for the rest of the tree-based models as well.

### 2. XGBoost

In [18]:
# Default XGBoost params
xgb_params = {
    'random_state': SEED,
    'verbosity': 0
}

run_tree_experiment(
    XGBClassifier, 
    "XGBoost", 
    xgb_params, 
    "exp_2_xgboost_all", 
    X_train=X_combined, 
    y_train=y, 
    X_test=X_combined_test, 
    feature_names=[f"svd_{i}" for i in range(X_combined.shape[1])], 
    vectorizer_name="TfidfSVD", 
    vectorizer_params={'tfidf': {'max_features': 5000, 'analyzer': 'char', 'ngram_range': (3, 5)}, 'svd': {'n_components': 100}}, 
    save_model=SAVE_MODELS
)


=== Running Experiment: exp_2_xgboost_all (XGBoost) ===
Saving Model: True
Experiment 'exp_2_xgboost_all' initialized at: experiments/exp_2_xgboost_all
Mode: Incremental saving (5 folds)

--- Fold 1/5 ---
Fold 1 Val AUC: 0.9862
  Fold 1/5 saved | ROC AUC: 0.9862

--- Fold 2/5 ---
Fold 2 Val AUC: 0.9824
  Fold 2/5 saved | ROC AUC: 0.9824

--- Fold 3/5 ---
Fold 3 Val AUC: 0.9820
  Fold 3/5 saved | ROC AUC: 0.9820

--- Fold 4/5 ---
Fold 4 Val AUC: 0.9875
  Fold 4/5 saved | ROC AUC: 0.9875

--- Fold 5/5 ---
Fold 5 Val AUC: 0.9808
  Fold 5/5 saved | ROC AUC: 0.9808

Finalizing experiment...
  Predictions saved to experiments/exp_2_xgboost_all/exp_2_xgboost_all_prediction.csv

✓ Experiment 'exp_2_xgboost_all' finalized!
  Location: experiments/exp_2_xgboost_all
  Folds completed: 5
  Best fold: 4 (ROC AUC: 0.9875)
  Average ROC AUC: 0.9838 ± 0.0026
Experiment saved to experiments/exp_2_xgboost_all


0,1,2
,objective,'binary:logistic'
,base_score,
,booster,
,callbacks,
,colsample_bylevel,
,colsample_bynode,
,colsample_bytree,
,device,
,early_stopping_rounds,
,enable_categorical,False


### 3. LightGBM

In [19]:
# Default LightGBM params
lgbm_params = {
    'random_state': SEED,
    'verbose': -1
}

run_tree_experiment(
    LGBMClassifier, 
    "LightGBM", 
    lgbm_params, 
    "exp_2_lgbm_all", 
    X_train=X_combined, 
    y_train=y, 
    X_test=X_combined_test, 
    feature_names=[f"svd_{i}" for i in range(X_combined.shape[1])], 
    vectorizer_name="TfidfSVD", 
    vectorizer_params={'tfidf': {'max_features': 5000, 'analyzer': 'char', 'ngram_range': (3, 5)}, 'svd': {'n_components': 100}}, 
    save_model=SAVE_MODELS
)


=== Running Experiment: exp_2_lgbm_all (LightGBM) ===
Saving Model: True
Experiment 'exp_2_lgbm_all' initialized at: experiments/exp_2_lgbm_all
Mode: Incremental saving (5 folds)

--- Fold 1/5 ---
Fold 1 Val AUC: 0.9854
  Fold 1/5 saved | ROC AUC: 0.9854

--- Fold 2/5 ---
Fold 2 Val AUC: 0.9843
  Fold 2/5 saved | ROC AUC: 0.9843

--- Fold 3/5 ---
Fold 3 Val AUC: 0.9830
  Fold 3/5 saved | ROC AUC: 0.9830

--- Fold 4/5 ---
Fold 4 Val AUC: 0.9864
  Fold 4/5 saved | ROC AUC: 0.9864

--- Fold 5/5 ---
Fold 5 Val AUC: 0.9825
  Fold 5/5 saved | ROC AUC: 0.9825

Finalizing experiment...
  Predictions saved to experiments/exp_2_lgbm_all/exp_2_lgbm_all_prediction.csv

✓ Experiment 'exp_2_lgbm_all' finalized!
  Location: experiments/exp_2_lgbm_all
  Folds completed: 5
  Best fold: 4 (ROC AUC: 0.9864)
  Average ROC AUC: 0.9843 ± 0.0015
Experiment saved to experiments/exp_2_lgbm_all


0,1,2
,boosting_type,'gbdt'
,num_leaves,31
,max_depth,-1
,learning_rate,0.1
,n_estimators,100
,subsample_for_bin,200000
,objective,
,class_weight,
,min_split_gain,0.0
,min_child_weight,0.001


### 4. CatBoost

In [20]:
# Default CatBoost params
catboost_params = {
    'random_state': SEED,
    'verbose': 0
}

# Feature names to track
# Build feature names in the exact order used in X_combined: SVD components first, then numeric columns
n_svd = X_text_svd.shape[1]
# sanity check
assert X_combined.shape[1] == n_svd + len(non_text_cols), "Feature count mismatch between SVD + numeric and X_combined"
combined_feature_names = [f"svd_{i}" for i in range(n_svd)] + non_text_cols

run_tree_experiment(
    CatBoostClassifier, 
    "CatBoost", 
    catboost_params, 
    "exp_2_catboost_all", 
    X_train=X_combined, 
    y_train=y, 
    X_test=X_combined_test, 
    feature_names=[f"svd_{i}" for i in range(X_combined.shape[1])], 
    vectorizer_name="TfidfSVD", 
    vectorizer_params={'tfidf': {'max_features': 5000, 'analyzer': 'char', 'ngram_range': (3, 5)}, 'svd': {'n_components': 100}}, 
    save_model=SAVE_MODELS
)


=== Running Experiment: exp_2_catboost_all (CatBoost) ===
Saving Model: True
Experiment 'exp_2_catboost_all' initialized at: experiments/exp_2_catboost_all
Mode: Incremental saving (5 folds)

--- Fold 1/5 ---
Fold 1 Val AUC: 0.9867
  Fold 1/5 saved | ROC AUC: 0.9867

--- Fold 2/5 ---
Fold 2 Val AUC: 0.9840
  Fold 2/5 saved | ROC AUC: 0.9840

--- Fold 3/5 ---
Fold 3 Val AUC: 0.9846
  Fold 3/5 saved | ROC AUC: 0.9846

--- Fold 4/5 ---
Fold 4 Val AUC: 0.9890
  Fold 4/5 saved | ROC AUC: 0.9890

--- Fold 5/5 ---
Fold 5 Val AUC: 0.9830
  Fold 5/5 saved | ROC AUC: 0.9830

Finalizing experiment...
  Predictions saved to experiments/exp_2_catboost_all/exp_2_catboost_all_prediction.csv

✓ Experiment 'exp_2_catboost_all' finalized!
  Location: experiments/exp_2_catboost_all
  Folds completed: 5
  Best fold: 4 (ROC AUC: 0.9890)
  Average ROC AUC: 0.9855 ± 0.0021
Experiment saved to experiments/exp_2_catboost_all


<catboost.core.CatBoostClassifier at 0x1461eade0>

## Optuna Hyperparameter Tuning

Now we can perform hyperparameter tuning using Optuna for the best tree-based model, CatBoost (on numeric features).

In [11]:
import optuna
from optuna.samplers import TPESampler

print(f"Optuna version: {optuna.__version__}")

Optuna version: 4.6.0


In [14]:
def objective(trial):
    params = {
        'iterations': trial.suggest_int('iterations', 100, 1000),
        'learning_rate': trial.suggest_float('learning_rate', 1e-3, 0.1, log=True),
        'depth': trial.suggest_int('depth', 4, 10),
        'l2_leaf_reg': trial.suggest_float('l2_leaf_reg', 1e-8, 100.0, log=True),
        'bootstrap_type': trial.suggest_categorical('bootstrap_type', ['Bayesian']),
        'random_strength': trial.suggest_float('random_strength', 1e-8, 10.0, log=True),
        'bagging_temperature': trial.suggest_float('bagging_temperature', 0.0, 10.0),
        'od_type': trial.suggest_categorical('od_type', ['IncToDec', 'Iter']),
        'od_wait': trial.suggest_int('od_wait', 10, 50),
        'verbose': 0,
        'random_seed': SEED,
        'task_type': 'CPU'
    }
    
    skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=SEED)
    cv_scores = []
    
    for train_idx, val_idx in skf.split(X_combined, y):
        X_train, X_val = X_combined[train_idx], X_combined[val_idx]
        y_train, y_val = y[train_idx], y[val_idx]
        
        model = CatBoostClassifier(**params)
        model.fit(X_train, y_train)
        
        val_probs = model.predict_proba(X_val)[:, 1]
        roc_auc = roc_auc_score(y_val, val_probs)
        cv_scores.append(roc_auc)
        
    return np.mean(cv_scores)

In [15]:
study = optuna.create_study(direction='maximize', sampler=TPESampler(seed=SEED))
study.optimize(objective, n_trials=20)

print('Number of finished trials:', len(study.trials))
print('Best trial:', study.best_trial.params)

[I 2025-11-21 11:35:33,633] A new study created in memory with name: no-name-bfb73575-1261-475f-8373-142e1efe0ac5
[I 2025-11-21 11:36:18,345] Trial 0 finished with value: 0.9825276949249316 and parameters: {'iterations': 437, 'learning_rate': 0.07969454818643933, 'depth': 9, 'l2_leaf_reg': 0.009695826644515229, 'bootstrap_type': 'Bayesian', 'random_strength': 2.5361081166471375e-07, 'bagging_temperature': 1.5599452033620265, 'od_type': 'Iter', 'od_wait': 34}. Best is trial 0 with value: 0.9825276949249316.
[I 2025-11-21 11:38:25,273] Trial 1 finished with value: 0.9682878154418413 and parameters: {'iterations': 737, 'learning_rate': 0.0010994335574766201, 'depth': 10, 'l2_leaf_reg': 2.1106995036049607, 'bootstrap_type': 'Bayesian', 'random_strength': 8.148018307012941e-07, 'bagging_temperature': 1.8182496720710062, 'od_type': 'Iter', 'od_wait': 31}. Best is trial 0 with value: 0.9825276949249316.
[I 2025-11-21 11:38:54,041] Trial 2 finished with value: 0.9660646632373581 and parameters

Number of finished trials: 20
Best trial: {'iterations': 364, 'learning_rate': 0.04072484575897832, 'depth': 7, 'l2_leaf_reg': 1.1625745466565562e-08, 'bootstrap_type': 'Bayesian', 'random_strength': 0.6037688129097698, 'bagging_temperature': 0.2997065444120519, 'od_type': 'Iter', 'od_wait': 35}


In [21]:
best_params = study.best_params
best_params['random_seed'] = SEED
best_params['verbose'] = 0
best_params['task_type'] = 'CPU'

optuna_info = {
    "n_trials": 20,
    "best_params": study.best_params,
    "best_value": study.best_value,
    "study_path": "optuna_study.pkl"
}

print("Running final experiment with best parameters...")
run_tree_experiment(
    CatBoostClassifier, 
    "CatBoost_Optuna", 
    best_params, 
    "exp_2_catboost_optuna", 
    X_train=X_combined, 
    y_train=y, 
    X_test=X_combined_test, 
    feature_names=combined_feature_names, 
    vectorizer_name="CombinedFeatures", 
    vectorizer_params={}, 
    save_model=SAVE_MODELS,
    optuna_study=study,
    optuna_params=optuna_info
)

Running final experiment with best parameters...

=== Running Experiment: exp_2_catboost_optuna (CatBoost_Optuna) ===
Saving Model: True
Experiment 'exp_2_catboost_optuna' initialized at: experiments/exp_2_catboost_optuna
Mode: Incremental saving (5 folds)

--- Fold 1/5 ---
Fold 1 Val AUC: 0.9847
  Fold 1/5 saved | ROC AUC: 0.9847

--- Fold 2/5 ---
Fold 2 Val AUC: 0.9833
  Fold 2/5 saved | ROC AUC: 0.9833

--- Fold 3/5 ---
Fold 3 Val AUC: 0.9806
  Fold 3/5 saved | ROC AUC: 0.9806

--- Fold 4/5 ---
Fold 4 Val AUC: 0.9875
  Fold 4/5 saved | ROC AUC: 0.9875

--- Fold 5/5 ---
Fold 5 Val AUC: 0.9809
  Fold 5/5 saved | ROC AUC: 0.9809

Finalizing experiment...
  Optuna plots saved to experiments/exp_2_catboost_optuna/optuna_plots
  Predictions saved to experiments/exp_2_catboost_optuna/exp_2_catboost_optuna_prediction.csv

✓ Experiment 'exp_2_catboost_optuna' finalized!
  Location: experiments/exp_2_catboost_optuna
  Folds completed: 5
  Best fold: 4 (ROC AUC: 0.9875)
  Average ROC AUC: 0.98

<catboost.core.CatBoostClassifier at 0x1461eacc0>