### Pipeline

In [15]:
"""
┌─────────────────────────────────────────────────────────────────┐
│                        數據載入與初始化                           │
│  ・讀取訓練數據集                                                │
│  ・將列名轉換為字符串類型                                        │
│  ・分離特徵(X)和目標變量(y)                                      │
└───────────────────────────────┬─────────────────────────────────┘
                                ↓
┌─────────────────────────────────────────────────────────────────┐
│                           數據分割                               │
│  ・使用train_test_split分割訓練集和測試集                        │
│  ・採用分層抽樣(stratify=y)                                      │
│  ・測試集比例為20%                                               │
└───────────────────────────────┬─────────────────────────────────┘
                                ↓
┌─────────────────────────────────────────────────────────────────┐
│                       特徵預處理流程                             │
├─────────────────────────────────────────────────────────────────┤
│  ┌───────────────────────────────────────────────────────┐      │
│  │               特徵轉換(Yeo-Johnson)                    │      │
│  │  ・對偏斜特徵進行轉換                                  │      │
│  │  ・排除二元特徵                                        │      │
│  │  ・計算並儲存lambda參數                                │      │
│  └─────────────────────────┬─────────────────────────────┘      │
│                            ↓                                    │
│  ┌───────────────────────────────────────────────────────┐      │
│  │               特徵標準化(Z-score)                      │      │
│  │  ・對數值特徵進行標準化                                │      │
│  │  ・生成標準化特徵({feature}_std)                       │      │
│  └─────────────────────────┬─────────────────────────────┘      │
│                            ↓                                    │
│  ┌───────────────────────────────────────────────────────┐      │
│  │               特徵歸一化(Min-Max)                      │      │
│  │  ・用於神經網絡模型                                    │      │
│  │  ・特徵值縮放到0-1範圍                                 │      │
│  │  ・生成歸一化特徵({feature}_norm)                      │      │
│  └─────────────────────────┬─────────────────────────────┘      │
│                            ↓                                    │
│  ┌───────────────────────────────────────────────────────┐      │
│  │               降維處理(可選)                           │      │
│  │  ・PCA降維(效果不彰)                                   │      │
│  │  ・嘗試LDA但不適合                                     │      │
│  └───────────────────────────────────────────────────────┘      │
└───────────────────────────────┬─────────────────────────────────┘
                                ↓
┌─────────────────────────────────────────────────────────────────┐
│                       類別不平衡處理                             │
├─────────────────────────────────────────────────────────────────┤
│  ┌───────────────────────────────────────────────────────┐      │
│  │               SMOTE過採樣(效果不彰)                    │      │
│  │  ・生成少數類別的合成樣本                              │      │
│  └─────────────────────────┬─────────────────────────────┘      │
│                            ↓                                    │
│  ┌───────────────────────────────────────────────────────┐      │
│  │               類別權重計算                             │      │
│  │  ・反比於類別頻率的權重                                │      │
│  │  ・為不同模型準備權重格式                              │      │
│  └───────────────────────────────────────────────────────┘      │
└───────────────────────────────┬─────────────────────────────────┘
                                ↓
┌─────────────────────────────────────────────────────────────────┐
│                         模型訓練                                 │
├─────────────────────────────────────────────────────────────────┤
│  ┌───────────────────────────────────────────────────────┐      │
│  │               XGBoost模型                              │      │
│  │  ・貝葉斯參數優化                                      │      │
│  │  ・早停機制                                            │      │
│  │  ・特徵重要性分析                                      │      │
│  └─────────────────────────┐                             │      │
│                            │                             │      │
│  ┌───────────────────────────────────────────────────────┐      │
│  │               LightGBM模型                             │      │
│  │  ・類別權重平衡                                        │      │
│  │  ・並行訓練                                            │      │
│  │  ・特徵重要性分析                                      │      │
│  └─────────────────────────┘     ┌─────────────────────┐ │      │
│                                   │                     │ │      │
│  ┌───────────────────────────────┐│    堆疊集成模型    │ │      │
│  │               神經網絡模型     ││  ・5折交叉驗證     │ │      │
│  │  ・多層網絡架構               ││  ・生成元特徵      │ │      │
│  │  ・正則化技術                 ││  ・邏輯回歸元模型  │ │      │
│  │  ・早停和學習率調度           │└─────────────────────┘ │      │
│  └───────────────────────────────┘                        │      │
└───────────────────────────────┬─────────────────────────────────┘
                                ↓
┌─────────────────────────────────────────────────────────────────┐
│                         模型評估                                 │
├─────────────────────────────────────────────────────────────────┤
│  ・計算準確率、平衡準確率和F1分數                                │
│  ・生成分類報告                                                  │
│  ・繪製混淆矩陣熱圖                                              │
│  ・分析特徵重要性                                                │
│  ・基於F1宏平均選擇最佳模型                                      │
└───────────────────────────────┬─────────────────────────────────┘
                                ↓
┌─────────────────────────────────────────────────────────────────┐
│                       模型保存與應用                             │
├─────────────────────────────────────────────────────────────────┤
│  ・使用joblib保存完整模型                                        │
│  ・包含預處理流程和訓練好的模型                                  │
│  ・對新數據應用相同的預處理步驟                                  │
│  ・生成預測結果和概率                                            │
│  ・評估新數據上的模型表現                                        │
└─────────────────────────────────────────────────────────────────┘
"""

'\n┌─────────────────────────────────────────────────────────────────┐\n│                        數據載入與初始化                           │\n│  ・讀取訓練數據集                                                │\n│  ・將列名轉換為字符串類型                                        │\n│  ・分離特徵(X)和目標變量(y)                                      │\n└───────────────────────────────┬─────────────────────────────────┘\n                                ↓\n┌─────────────────────────────────────────────────────────────────┐\n│                           數據分割                               │\n│  ・使用train_test_split分割訓練集和測試集                        │\n│  ・採用分層抽樣(stratify=y)                                      │\n│  ・測試集比例為20%                                               │\n└───────────────────────────────┬─────────────────────────────────┘\n                                ↓\n┌─────────────────────────────────────────────────────────────────┐\n│                       特徵預處理流程                             │\n├─────────────────────────

### 流程解釋



### 1. 數據載入與初始化
這是流程的起點，主要任務是準備數據以便後續處理。將所有列名轉換為字符串類型是為了確保一致性，避免在後續處理中因數據類型不一致而產生錯誤。

### 2. 數據分割
使用分層抽樣確保訓練集和測試集中的類別分佈一致，這對於不平衡數據集尤為重要。固定隨機種子（random_state=42）確保結果可重現。

### 3. 特徵預處理流程
這是一個多步驟的流程，每個步驟都針對特定的數據特性：

- **Yeo-Johnson轉換**：處理偏斜數據，使其更接近正態分佈，有助於提高許多模型的性能。特別排除二元特徵，因為轉換對它們沒有意義。

- **Z-score標準化**：將特徵調整為零均值和單位方差，使不同尺度的特徵可比較，對於基於距離的算法和神經網絡尤為重要。

- **Min-Max歸一化**：專為神經網絡設計，將特徵縮放到0-1範圍，有助於加速神經網絡的收斂。

- **降維處理**：嘗試了PCA和LDA，但效果不彰。這表明原始特徵可能已經包含了重要的分類信息，降維會損失這些信息。

### 4. 類別不平衡處理
處理不平衡數據集的兩種方法：

- **SMOTE過採樣**：嘗試生成少數類別的合成樣本，但效果不彰，可能是因為合成樣本無法捕捉真實數據的複雜性。

- **類別權重計算**：為不同類別分配不同權重，使模型更關注少數類別。這種方法在實踐中效果更好，可能是因為它保留了原始數據的真實分佈。

### 5. 模型訓練
訓練了三種基礎模型，並將它們組合成堆疊集成模型：

- **XGBoost模型**：使用貝葉斯優化尋找最佳參數，這比傳統的網格搜索更高效。早停機制防止過擬合，特徵重要性分析提供了模型解釋性。

- **LightGBM模型**：另一種梯度提升樹實現，通常比XGBoost更快。使用類別權重處理不平衡問題，並行訓練加速計算。

- **神經網絡模型**：使用多層架構捕捉複雜模式，應用多種正則化技術防止過擬合，使用早停和學習率調度優化訓練過程。

- **堆疊集成模型**：將上述基礎模型的預測結果作為新特徵（元特徵），使用邏輯回歸作為元模型進行最終預測。選擇邏輯回歸是為了保持簡單避免過擬合，5折交叉驗證確保元特徵的可靠性。

### 6. 模型評估
使用多種指標全面評估模型性能：

- 準確率、平衡準確率和F1分數提供了不同角度的性能度量
- 分類報告詳細展示了每個類別的精確率、召回率和F1分數
- 混淆矩陣熱圖直觀顯示了預測錯誤的分佈
- 特徵重要性分析揭示了哪些特徵對預測最有貢獻
- 基於F1宏平均選擇最佳模型，這對於不平衡數據集是一個合理的選擇

### 7. 模型保存與應用
將完整模型（包括預處理流程）保存為單一文件，確保在應用階段使用完全相同的處理步驟。這種方法避免了因預處理不一致導致的預測錯誤，是機器學習部署的最佳實踐。

### 關鍵發現與優化

1. **預處理的重要性**：完整的預處理流程對模型性能至關重要, 且能提升模型效果（雖然可能沒有大幅提升）

2. **降維與過採樣的局限性**：PCA降維和SMOTE過採樣在此數據集上效果不彰，說明並非所有常用技術都適用於每個問題。

3. **類別權重的有效性**：相比於過採樣，類別權重策略更有效地處理了不平衡問題。

4. **堆疊集成的優勢**：通過組合不同模型的優勢，堆疊集成提高了整體預測性能。

5. **簡單元模型的選擇**：使用邏輯回歸作為元模型是為了避免過擬合，在元特徵空間中，簡單模型通常已足夠。




In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from sklearn.decomposition import PCA
import warnings
warnings.filterwarnings('ignore')

path1 = '/Users/mouyasushi/Desktop/vici holdings/Test/data/train_data.npy'
train_data = np.load(path1)

path2 = '/Users/mouyasushi/Desktop/vici holdings/Test/data/train_labels.npy'
train_labels = np.load(path2)

# 轉換為 DataFrame
train_df = pd.DataFrame(train_data)
train_df['label'] = train_labels  # 添加標籤列


In [8]:
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, MinMaxScaler

from sklearn.model_selection import train_test_split
from sklearn.metrics import (confusion_matrix, classification_report, accuracy_score, 
                             f1_score, balanced_accuracy_score, roc_auc_score)
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
from tqdm import tqdm
import lightgbm as lgb
import xgboost as xgb

# Import libraries for neural networks
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, Callback
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder

# Import tools for handling imbalanced data
from imblearn.over_sampling import SMOTE
from imblearn.pipeline import Pipeline as ImbPipeline
from sklearn.utils.class_weight import compute_class_weight


class FeatureNormalizer(BaseEstimator, TransformerMixin):
    def __init__(self, features_to_normalize=None, range=(0, 1)):
        self.features_to_normalize = features_to_normalize
        self.range = range
        self.normalizers_ = {}  # Store individual normalizers for each feature
        self.normalized_feature_names_ = []  # Store names of normalized features
    
    def fit(self, X, y=None):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X)
        
        # If no features are specified, use all numeric features
        if self.features_to_normalize is None:
            self.features_to_normalize = X.select_dtypes(include=np.number).columns.tolist()
        
        # Add progress bar
        for feature in tqdm(self.features_to_normalize, desc="Fitting normalization"):
            if feature in X.columns and np.issubdtype(X[feature].dtype, np.number):
                # Create an individual normalizer for each feature
                normalizer = MinMaxScaler(feature_range=self.range)
                normalizer.fit(X[[feature]])
                self.normalizers_[feature] = normalizer
                self.normalized_feature_names_.append(f"{feature}_norm")
                print(f"Normalization for {feature} fitted, range: [{self.range[0]}, {self.range[1]}]")
        
        return self
    
    def transform(self, X):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X.copy())
        
        # Add progress bar
        for feature, normalizer in tqdm(self.normalizers_.items(), desc="Applying normalization"):
            if feature in X.columns:
                # Apply saved normalizer for normalization
                X[f"{feature}_norm"] = normalizer.transform(X[[feature]])
        
        return X
    
    def get_feature_names_out(self):
        # Return normalized feature names
        return self.normalized_feature_names_


class StackingClassifier:
    
    def __init__(self, base_models, meta_model, n_folds=5, use_proba=True, random_state=42):
        """
        Initialize Stacking Classifier
        
        Parameters:
        base_models: Dictionary of base models {name: model}
        meta_model: Meta model
        n_folds: Number of cross-validation folds
        use_proba: Whether to use probability predictions as meta features
        random_state: Random seed
        """
        self.base_models = base_models
        self.meta_model = meta_model
        self.n_folds = n_folds
        self.use_proba = use_proba
        self.random_state = random_state
        self.base_model_preds = {}  # Store predictions from base models
        
    def fit(self, X, y):
        """
        Train the stacking model
        
        Parameters:
        X: Training features
        y: Training labels
        """
        from sklearn.model_selection import StratifiedKFold
        
        # Create cross-validation object
        kf = StratifiedKFold(n_splits=self.n_folds, shuffle=True, random_state=self.random_state)
        
        # Store meta features
        meta_features = {}
        
        # Perform cross-validation predictions for each base model
        for name, model in self.base_models.items():
            print(f"\nTraining base model: {name}")
            
            # Initialize an array to store predictions
            if self.use_proba:
                n_classes = len(np.unique(y))
                train_meta_preds = np.zeros((X.shape[0], n_classes))
            else:
                train_meta_preds = np.zeros(X.shape[0])
            
            # Perform K-fold cross-validation
            for i, (train_idx, val_idx) in enumerate(kf.split(X, y)):
                print(f"  Fold {i+1}/{self.n_folds}")
                
                if isinstance(X, pd.DataFrame):
                    X_train_fold, X_val_fold = X.iloc[train_idx], X.iloc[val_idx]
                else:
                    X_train_fold, X_val_fold = X[train_idx], X[val_idx]

                if isinstance(y, pd.Series):
                    y_train_fold, y_val_fold = y.iloc[train_idx], y.iloc[val_idx]
                else:
                    y_train_fold, y_val_fold = y[train_idx], y[val_idx]
                
                # Train the model
                model.fit(X_train_fold, y_train_fold)
                
                # Generate predictions
    
                if self.use_proba:
                    fold_preds = model.predict_proba(X_val_fold)
    
                    for i, idx in enumerate(val_idx):
                        train_meta_preds[idx] = fold_preds[i]
                else:
                    fold_preds = model.predict(X_val_fold)
    
                    for i, idx in enumerate(val_idx):
                        train_meta_preds[idx] = fold_preds[i]
            
            meta_features[name] = train_meta_preds
            
            # Retrain the model on the entire dataset
            print(f"  Training {name} on the entire data")
            model.fit(X, y)
        
        # Prepare training data for meta model
        meta_X = self._prepare_meta_features(meta_features)
        
        # Train the meta model
        print("\nTraining meta model...")
        self.meta_model.fit(meta_X, y)
        
        # Save base models' predictions
        self.base_model_preds = meta_features
        
        return self
    
    def predict(self, X):
        """
        Use the trained stacking model to predict
        
        Parameters:
        X: Test features
        
        Returns:
        Predicted labels
        """
        meta_features = {}
        
        # Generate predictions using each base model
        for name, model in self.base_models.items():
            if self.use_proba:
                preds = model.predict_proba(X)
            else:
                preds = model.predict(X)
            meta_features[name] = preds
        
        # Prepare meta features for meta model
        meta_X = self._prepare_meta_features(meta_features)
        
        # Final prediction using meta model
        return self.meta_model.predict(meta_X)
    
    def predict_proba(self, X):
        """
        Use the trained stacking model to predict probabilities
        
        Parameters:
        X: Test features
        
        Returns:
        Predicted probabilities
        """
        meta_features = {}
        
        # Generate predictions using each base model
        for name, model in self.base_models.items():
            if self.use_proba:
                preds = model.predict_proba(X)
            else:
                preds = model.predict(X)
            meta_features[name] = preds
        
        # Prepare meta features for meta model
        meta_X = self._prepare_meta_features(meta_features)
        
        # Final probability prediction using meta model
        return self.meta_model.predict_proba(meta_X)
    
    def _prepare_meta_features(self, meta_features):
        """
        Prepare meta features
        
        Parameters:
        meta_features: Dictionary of predictions from base models
        
        Returns:
        Combined meta features array
        """
        # Combine predictions from all base models into one array
        all_features = []
        
        for name, preds in meta_features.items():
            # Ensure predictions are 2D arrays
            if preds.ndim == 1:
                preds = preds.reshape(-1, 1)
            all_features.append(preds)
        
        # Concatenate all features horizontally
        return np.hstack(all_features)


# Custom transformer: Feature transformation (Yeo-Johnson)
class FeatureTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, features_to_transform=None, binary_features=None, method='yeo-johnson'):
        self.features_to_transform = features_to_transform
        self.binary_features = binary_features if binary_features is not None else []
        self.method = method
        self.lambdas_ = {}  # Store lambda values for each feature
        self.transformed_feature_names_ = []  # Store transformed feature names
    
    def fit(self, X, y=None):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X)
        
        # If no features specified, use all numeric features
        if self.features_to_transform is None:
            self.features_to_transform = X.select_dtypes(include=np.number).columns.tolist()
        
        # Exclude binary features
        features_to_process = list(set(self.features_to_transform) - set(self.binary_features))
        
        # Add progress bar
        for feature in tqdm(features_to_process, desc="Fitting transformations"):
            if feature not in X.columns:
                continue
                
            if not np.issubdtype(X[feature].dtype, np.number):
                continue
            
            # For Yeo-Johnson transformation, save lambda values
            if self.method == 'yeo-johnson':
                try:
                    # Fit transformer and save lambda value
                    _, lmbda = stats.yeojohnson(X[feature])
                    self.lambdas_[feature] = lmbda
                    self.transformed_feature_names_.append(f"{feature}_yeojohnson")
                    print(f"Fitted yeo-johnson transformation for {feature} with lambda={lmbda:.4f}")
                except Exception as e:
                    print(f"Error fitting {self.method} transformation to {feature}: {str(e)}")
        
        return self
    
    def transform(self, X):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X.copy())
        
        # Add progress bar
        for feature, lmbda in tqdm(self.lambdas_.items(), desc="Applying transformations"):
            if feature in X.columns:
                try:
                    # Apply transformation using saved lambda value
                    X[f"{feature}_yeojohnson"] = stats.yeojohnson(X[feature], lmbda=lmbda)
                except Exception as e:
                    print(f"Error applying {self.method} transformation to {feature}: {str(e)}")
        
        return X
    
    def get_feature_names_out(self):
        # Return transformed feature names
        return self.transformed_feature_names_


# Custom transformer: Feature standardization
class FeatureStandardizer(BaseEstimator, TransformerMixin):
    def __init__(self, features_to_standardize=None):
        self.features_to_standardize = features_to_standardize
        self.scalers_ = {}  # Store a separate scaler for each feature
        self.standardized_feature_names_ = []  # Store standardized feature names
    
    def fit(self, X, y=None):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X)
        
        # If no features specified, use all numeric features
        if self.features_to_standardize is None:
            self.features_to_standardize = X.select_dtypes(include=np.number).columns.tolist()
        
        # Add progress bar
        for feature in tqdm(self.features_to_standardize, desc="Fitting standardization"):
            if feature in X.columns and np.issubdtype(X[feature].dtype, np.number):
                # Create a separate scaler for each feature
                scaler = StandardScaler()
                scaler.fit(X[[feature]])
                self.scalers_[feature] = scaler
                self.standardized_feature_names_.append(f"{feature}_std")
                print(f"Fitted standardization for {feature} with mean={scaler.mean_[0]:.4f}, scale={scaler.scale_[0]:.4f}")
        
        return self
    
    def transform(self, X):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X.copy())
        
        # Add progress bar
        for feature, scaler in tqdm(self.scalers_.items(), desc="Applying standardization"):
            if feature in X.columns:
                # Apply standardization using saved scaler
                X[f"{feature}_std"] = scaler.transform(X[[feature]])
        
        return X
    
    def get_feature_names_out(self):
        # Return standardized feature names
        return self.standardized_feature_names_


# Feature selector: Select specified feature columns
class FeatureSelector(BaseEstimator, TransformerMixin):
    def __init__(self, feature_names=None):
        self.feature_names = feature_names
    
    def fit(self, X, y=None):
        return self
    
    def transform(self, X):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X)
        
        # Select specified features
        if self.feature_names:
            # Check which features exist in the data
            available_features = [f for f in self.feature_names if f in X.columns]
            if len(available_features) < len(self.feature_names):
                missing = set(self.feature_names) - set(available_features)
                print(f"Warning: Some features are missing: {missing}")
            
            return X[available_features]
        else:
            return X


def evaluate_model(y_true, y_pred, dataset_name=""):
    """
    Evaluate model performance with confusion matrix and metrics
    
    Parameters:
    y_true: True labels
    y_pred: Predicted labels
    dataset_name: Name of the dataset for display purposes
    """
    # Calculate accuracy
    accuracy = accuracy_score(y_true, y_pred)
    
    # Print results
    print(f"\nModel Evaluation on {dataset_name}:")
    print(f"Accuracy: {accuracy:.4f}")
    
    # Get unique classes
    classes = np.unique(np.concatenate([y_true, y_pred]))
    n_classes = len(classes)
    
    # Compute confusion matrix
    cm = confusion_matrix(y_true, y_pred, labels=classes)
    
    # Generate classification report
    try:
        report = classification_report(y_true, y_pred, labels=classes, output_dict=False)
        print("\nClassification Report:")
        print(report)
    except Exception as e:
        print(f"Error generating classification report: {str(e)}")
    
    # Attempt to plot confusion matrix
    try:
        plt.figure(figsize=(10, 8))
        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                    xticklabels=classes, 
                    yticklabels=classes)
        plt.xlabel('Predicted')
        plt.ylabel('True')
        plt.title(f'Confusion Matrix - {dataset_name}')
        plt.tight_layout()
        plt.savefig(f'confusion_matrix_{dataset_name.replace(" ", "_")}.png')
        print(f"Confusion matrix saved to confusion_matrix_{dataset_name.replace(' ', '_')}.png")
        plt.close()  # Close the plot to avoid display issues
    except Exception as e:
        print(f"Error plotting confusion matrix: {str(e)}")
        # Print confusion matrix as fallback
        print("\nConfusion Matrix:")
        print(cm)
    
    return accuracy, cm, report


class PCATransformer(BaseEstimator, TransformerMixin):
    def __init__(self, n_components=None, variance_threshold=0.95):
        self.n_components = n_components
        self.variance_threshold = variance_threshold
        self.pca_ = None
        self.feature_names_ = []
    
    def fit(self, X, y=None):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X)
        
        # If n_components is None, use variance threshold
        if self.n_components is None:
            # First, fit a full PCA
            temp_pca = PCA()
            temp_pca.fit(X)
            
            # Determine how many components are needed to reach the variance threshold
            cumulative_variance = np.cumsum(temp_pca.explained_variance_ratio_)
            self.n_components = np.argmax(cumulative_variance >= self.variance_threshold) + 1
            print(f"Selected {self.n_components} components to explain {self.variance_threshold*100:.1f}% of variance")
        
        # Create and fit PCA
        self.pca_ = PCA(n_components=self.n_components)
        self.pca_.fit(X)
        
        # Create feature names
        self.feature_names_ = [f"PC{i+1}" for i in range(self.n_components)]
        
        # Print explained variance ratios
        explained_variance = self.pca_.explained_variance_ratio_
        print(f"Top 5 components explain: {explained_variance[:5].sum()*100:.2f}% of variance")
        print(f"All {self.n_components} components explain: {explained_variance.sum()*100:.2f}% of variance")
        
        return self
    
    def transform(self, X):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X)
        
        # Apply PCA transformation
        X_pca = self.pca_.transform(X)
        
        # Convert to DataFrame and add column names
        X_pca_df = pd.DataFrame(X_pca, columns=self.feature_names_, index=X.index)
        
        return X_pca_df
    
    def get_feature_names_out(self):
        return self.feature_names_


class TrainingVisualizationCallback(Callback):
    def __init__(self, validation_data=None):
        super(TrainingVisualizationCallback, self).__init__()
        self.validation_data = validation_data
        self.train_loss = []
        self.train_acc = []
        self.val_loss = []
        self.val_acc = []
        
    def on_epoch_end(self, epoch, logs=None):
        logs = logs or {}
        self.train_loss.append(logs.get('loss'))
        self.train_acc.append(logs.get('accuracy'))
        
        if self.validation_data:
            val_logs = self.model.evaluate(
                self.validation_data[0], self.validation_data[1], 
                verbose=0
            )
            self.val_loss.append(val_logs[0])
            self.val_acc.append(val_logs[1])
    
    def plot_metrics(self):
        epochs = range(1, len(self.train_loss) + 1)
        
        # Create a 2x1 subplot layout
        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10))
        
        # Plot loss curves
        ax1.plot(epochs, self.train_loss, 'r-', label='Training')
        if self.validation_data:
            ax1.plot(epochs, self.val_loss, 'b-', label='Validation')
        ax1.set_title('Model Loss')
        ax1.set_xlabel('Epoch')
        ax1.set_ylabel('Loss')
        ax1.legend()
        ax1.grid(True)
        
        # Plot accuracy curves
        ax2.plot(epochs, self.train_acc, 'r-', label='Training')
        if self.validation_data:
            ax2.plot(epochs, self.val_acc, 'b-', label='Validation')
        ax2.set_title('Model Accuracy')
        ax2.set_xlabel('Epoch')
        ax2.set_ylabel('Accuracy')
        ax2.legend()
        ax2.grid(True)
        
        plt.tight_layout()
        plt.savefig('training_metrics.png')
        plt.close()
        print("Training metrics plot saved to training_metrics.png")
        
        # Calculate error rate and plot
        train_error = [1 - acc for acc in self.train_acc]
        if self.validation_data:
            val_error = [1 - acc for acc in self.val_acc]
            
            plt.figure(figsize=(12, 5))
            plt.plot(epochs, train_error, 'r-', label='Training')
            plt.plot(epochs, val_error, 'b-', label='Validation')
            plt.title('Model Error Rate')
            plt.xlabel('Epoch')
            plt.ylabel('Error Rate')
            plt.legend()
            plt.grid(True)
            plt.tight_layout()
            plt.savefig('error_rate.png')
            plt.close()
            print("Error rate plot saved to error_rate.png")

class NeuralNetworkClassifier(BaseEstimator, TransformerMixin):
    def __init__(self, input_dim=200, epochs=15, batch_size=64, learning_rate=0.001):
        self.input_dim = input_dim
        self.epochs = epochs
        self.batch_size = batch_size
        self.learning_rate = learning_rate
        self.model = None
        self.label_encoder = LabelEncoder()
        self.history = None
        self.visualization_callback = None
        
    def build_model(self, input_shape, n_classes):
        model = Sequential([
            # First layer
            Dense(128, activation='relu', input_shape=(input_shape,),
                  kernel_regularizer=tf.keras.regularizers.l2(0.001)),
            BatchNormalization(),
            Dropout(0.3),
            
            # Second layer
            Dense(64, activation='relu',
                  kernel_regularizer=tf.keras.regularizers.l2(0.001)),
            BatchNormalization(),
            Dropout(0.3),
            
            # Third layer
            Dense(32, activation='relu',
                  kernel_regularizer=tf.keras.regularizers.l2(0.001)),
            BatchNormalization(),
            Dropout(0.3),
            
            # Output layer
            Dense(n_classes, activation='softmax')
        ])
        
        model.compile(
            optimizer=Adam(learning_rate=self.learning_rate),
            loss='categorical_crossentropy',
            metrics=['accuracy']
        )
        
        return model
    
    def fit(self, X, y, X_val=None, y_val=None):
        # Ensure X is a numpy array
        if isinstance(X, pd.DataFrame):
            X = X.values
        
        # Encode labels
        y_encoded = self.label_encoder.fit_transform(y)
        n_classes = len(np.unique(y_encoded))
        
        # Convert to one-hot encoding
        y_onehot = to_categorical(y_encoded)
        
        # Prepare validation data
        validation_data = None
        if X_val is not None and y_val is not None:
            if isinstance(X_val, pd.DataFrame):
                X_val = X_val.values
            y_val_encoded = self.label_encoder.transform(y_val)
            y_val_onehot = to_categorical(y_val_encoded)
            validation_data = (X_val, y_val_onehot)
        
        # Compute class weights
        y_integers = np.argmax(y_onehot, axis=1)
        class_weights = compute_class_weight('balanced', classes=np.unique(y_integers), y=y_integers)
        class_weight_dict = {i: weight for i, weight in enumerate(class_weights)}
        
        # Build model
        self.model = self.build_model(X.shape[1], n_classes)
        
        # Create visualization callback
        self.visualization_callback = TrainingVisualizationCallback(validation_data)
        
        # Add callbacks
        callbacks = [
            # Early stopping
            tf.keras.callbacks.EarlyStopping(
                monitor='loss',
                patience=5,
                restore_best_weights=True
            ),
            # Learning rate scheduler
            tf.keras.callbacks.ReduceLROnPlateau(
                monitor='loss',
                factor=0.5,
                patience=3,
                min_lr=1e-6
            ),
            # Visualization callback
            self.visualization_callback
        ]
        
        # Train model
        self.history = self.model.fit(
            X, y_onehot,
            epochs=self.epochs,
            batch_size=self.batch_size,
            verbose=1,
            callbacks=callbacks,
            class_weight=class_weight_dict,
            validation_data=validation_data
        )
        
        # Plot training metrics
        self.visualization_callback.plot_metrics()
        
        # Plot training history
        self.plot_training_history()
        
        return self
    
    def plot_training_history(self):
        """Plot training history curves"""
        history = self.history
        
        # Create a 2x1 subplot layout
        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10))
        
        # Plot loss curve
        ax1.plot(history.history['loss'], 'r-', label='Training Loss')
        if 'val_loss' in history.history:
            ax1.plot(history.history['val_loss'], 'b-', label='Validation Loss')
        ax1.set_title('Model Loss')
        ax1.set_xlabel('Epoch')
        ax1.set_ylabel('Loss')
        ax1.legend()
        ax1.grid(True)
        
        # Plot accuracy curve
        ax2.plot(history.history['accuracy'], 'r-', label='Training Accuracy')
        if 'val_accuracy' in history.history:
            ax2.plot(history.history['val_accuracy'], 'b-', label='Validation Accuracy')
        ax2.set_title('Model Accuracy')
        ax2.set_xlabel('Epoch')
        ax2.set_ylabel('Accuracy')
        ax2.legend()
        ax2.grid(True)
        
        plt.tight_layout()
        plt.savefig('nn_training_history.png')
        plt.close()
        print("Neural network training history saved to nn_training_history.png")
    
    def predict(self, X):
        # Ensure X is a numpy array
        if isinstance(X, pd.DataFrame):
            X = X.values
        
        # Predict probabilities
        y_pred_probs = self.model.predict(X, verbose=0)
        
        # Adjust the threshold for class 1 to improve recall
        adjusted_probs = y_pred_probs.copy()
        adjusted_probs[:, 1] = adjusted_probs[:, 1] * 1.15  # Increase probability for class 1
        
        # Convert to class labels
        y_pred = np.argmax(adjusted_probs, axis=1)
        
        # Convert back to original labels
        return self.label_encoder.inverse_transform(y_pred)
    
    def predict_proba(self, X):
        # Ensure X is a numpy array
        if isinstance(X, pd.DataFrame):
            X = X.values
        
        # Predict probabilities
        return self.model.predict(X, verbose=0)        


def train_advanced_models(train_df, target_column, skewed_features=None, binary_features=None, 
                          high_kurtosis_features=None, use_pca=False, use_smote=True,
                          n_pca_components=None, pca_variance=0.95, use_bayes_opt=False,
                          use_stacking=False):
    """
    Train advanced models and evaluate performance
    
    Parameters:
    train_df: Training data DataFrame
    target_column: Target variable column name
    skewed_features: List of skewed features
    binary_features: List of binary features
    high_kurtosis_features: List of high kurtosis features
    use_pca: Whether to use PCA
    use_smote: Whether to use SMOTE oversampling
    n_pca_components: Number of PCA components
    pca_variance: PCA variance threshold
    use_bayes_opt: Whether to use Bayesian Optimization for XGBoost parameters
    use_stacking: Whether to use Stacking ensemble method
    
    Returns:
    Best model, results dictionary, preprocessor
    """
    train_df = train_df.copy()
    train_df.columns = train_df.columns.astype(str)
    
    # Convert features to string if needed
    if skewed_features is not None:
        skewed_features = [str(f) for f in skewed_features]
    
    if binary_features is not None:
        binary_features = [str(f) for f in binary_features]
    
    if high_kurtosis_features is not None:
        high_kurtosis_features = [str(f) for f in high_kurtosis_features]
    
    
    print("=" * 50)
    print("ADVANCED MODEL TRAINING")
    print("=" * 50)
    print(f"Training data shape: {train_df.shape}")
    print(f"Target column: {target_column}")
    print(f"Using PCA: {use_pca}")
    print(f"Using SMOTE: {use_smote}")
    print(f"Using Bayes Optimization: {use_bayes_opt}")
    print(f"Using Stacking: {use_stacking}")
    
    
    X = train_df.drop(columns=[target_column])
    y = train_df[target_column]
    
    
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    
    print(f"Training set shape: {X_train.shape}, Test set shape: {X_test.shape}")
    print(f"Class distribution in training set: {np.bincount(y_train)}")
    print(f"Class distribution in test set: {np.bincount(y_test)}")
    
    
    if skewed_features is not None:
        skewed_in_data = [f for f in skewed_features if f in X.columns]
        features_to_transform = list(set(skewed_in_data) - set(binary_features or []))
    else:
        features_to_transform = X.select_dtypes(include=np.number).columns.tolist()
    
    # Features to standardize
    if high_kurtosis_features is not None:
        kurt_in_data = [f for f in high_kurtosis_features if f in X.columns]
        features_to_standardize = kurt_in_data.copy()
    else:
        features_to_standardize = []
    
    # Create preprocessing steps
    preprocessing_steps = [
        ('transformer', FeatureTransformer(features_to_transform=features_to_transform, 
                                             binary_features=binary_features)),
        ('standardizer', FeatureStandardizer(features_to_standardize=None))  # Standardizes all features
    ]
    
    if use_pca:
        preprocessing_steps.append(
            ('pca', PCATransformer(n_components=n_pca_components, variance_threshold=pca_variance))
        )
    
    # Create preprocessor pipeline
    preprocessing = Pipeline(preprocessing_steps)
    
    # Create preprocessing pipeline for neural network (first standardize then normalize)
    nn_preprocessing_steps = [
        ('transformer', FeatureTransformer(features_to_transform=features_to_transform, 
                                             binary_features=binary_features)),
        ('standardizer', FeatureStandardizer(features_to_standardize=None)),  # Standardize first
        ('normalizer', FeatureNormalizer())  # Then normalize
    ]
    
    if use_pca:
        nn_preprocessing_steps.append(
            ('pca', PCATransformer(n_components=n_pca_components, variance_threshold=pca_variance))
        )
    
    nn_preprocessing = Pipeline(nn_preprocessing_steps)
    
    # Fit preprocessor
    print("\nFitting regular preprocessor pipeline...")
    X_train_processed = preprocessing.fit_transform(X_train)
    X_test_processed = preprocessing.transform(X_test)
    
    print("\nFitting neural network preprocessor pipeline...")
    X_train_nn_processed = nn_preprocessing.fit_transform(X_train)
    X_test_nn_processed = nn_preprocessing.transform(X_test)
    
    print(f"Shape of training data after regular processing: {X_train_processed.shape}")
    print(f"Shape of test data after regular processing: {X_test_processed.shape}")
    print(f"Shape of training data after neural network processing: {X_train_nn_processed.shape}")
    print(f"Shape of test data after neural network processing: {X_test_nn_processed.shape}")
    
    # Apply SMOTE oversampling if enabled
    if use_smote:
        print("\nApplying SMOTE oversampling...")
        smote = SMOTE(random_state=42)
        X_train_resampled, y_train_resampled = smote.fit_resample(X_train_processed, y_train)
        X_train_nn_resampled, y_train_nn_resampled = smote.fit_resample(X_train_nn_processed, y_train)
        print(f"Shape of training data after resampling: {X_train_resampled.shape}")
        print(f"Class distribution after resampling: {np.bincount(y_train_resampled)}")
    else:
        X_train_resampled, y_train_resampled = X_train_processed, y_train
        X_train_nn_resampled, y_train_nn_resampled = X_train_nn_processed, y_train
    
    # Compute class weights
    class_weights = compute_class_weight('balanced', classes=np.unique(y_train), y=y_train)
    weight_dict = {i: weight for i, weight in enumerate(class_weights)}
    print(f"Class weights: {weight_dict}")
    
    # Prepare class weights for neural network
    nn_class_weight = {i: weight for i, weight in enumerate(class_weights)}
    
    # If using Bayesian Optimization, optimize XGBoost parameters
    if use_bayes_opt:
        print("\nUsing Bayesian Optimization to tune XGBoost parameters...")
        # Split training set into training and validation sets
        X_train_opt, X_val_opt, y_train_opt, y_val_opt = train_test_split(
            X_train_resampled, y_train_resampled, test_size=0.2, random_state=42, stratify=y_train_resampled
        )
        
        # Perform Bayesian Optimization
        best_params, _ = optimize_xgboost_params(
            X_train_opt, y_train_opt, X_val_opt, y_val_opt, n_iter=30
        )
        
        # Create XGBoost model using best parameters
        xgb_model = xgb.XGBClassifier(
            **best_params,
            objective='multi:softprob',
            num_class=len(np.unique(y)),
            random_state=42,
            n_jobs=-1,
            eval_metric=['mlogloss', 'merror']
        )
    else:
        # Use default parameters
        xgb_model = xgb.XGBClassifier(
            n_estimators=500,
            max_depth=6,
            learning_rate=0.01,
            gamma=0.1,
            subsample=0.8,
            colsample_bytree=0.8,
            colsample_bylevel=0.8,
            objective='multi:softprob',
            num_class=len(np.unique(y)),
            random_state=42,
            n_jobs=-1,
            reg_alpha=0.2,
            reg_lambda=1.5,
            # Note: scale_pos_weight should be a scalar value. For multiclass problems, you may omit this parameter or use class weights.
            tree_method='hist',
            eval_metric=['mlogloss', 'merror']
        )
    
    # Define models
    models = {
        "XGBoost": xgb_model,
        
        "LightGBM": lgb.LGBMClassifier(
            n_estimators=200, max_depth=8, learning_rate=0.05, num_leaves=31,
            subsample=0.8, colsample_bytree=0.8, reg_alpha=0.1, reg_lambda=0.1,
            objective='multiclass', num_class=len(np.unique(y)), 
            class_weight='balanced',
            random_state=42, n_jobs=-1
        ),
        
        "NeuralNetwork": NeuralNetworkClassifier(
            input_dim=X_train_nn_processed.shape[1],
            learning_rate=0.001,
            batch_size=128,
            epochs=30
        )
    }
    
    # Compare results
    results = {}
    
    # If using Stacking, create Stacking model
    if use_stacking:
        print("\nPreparing Stacking model...")
        
        # Prepare base models
        base_models = {
            "XGBoost": models["XGBoost"],
            "LightGBM": models["LightGBM"]
        }
        
        # Create meta model (using Logistic Regression)
        from sklearn.linear_model import LogisticRegression
        meta_model = LogisticRegression(
            multi_class='multinomial',
            solver='lbfgs',
            max_iter=1000,
            C=1.0,
            random_state=42
        )
        
        # Create Stacking Classifier
        stacking_model = StackingClassifier(
            base_models=base_models,
            meta_model=meta_model,
            n_folds=5,
            use_proba=True,
            random_state=42
        )
        


class FeatureNormalizer(BaseEstimator, TransformerMixin):
    def __init__(self, features_to_normalize=None, range=(0, 1)):
        self.features_to_normalize = features_to_normalize
        self.range = range
        self.normalizers_ = {}  # Store individual normalizers for each feature
        self.normalized_feature_names_ = []  # Store names of normalized features
    
    def fit(self, X, y=None):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X)
        
        # If no features are specified, use all numeric features
        if self.features_to_normalize is None:
            self.features_to_normalize = X.select_dtypes(include=np.number).columns.tolist()
        
        # Add progress bar
        for feature in tqdm(self.features_to_normalize, desc="Fitting normalization"):
            if feature in X.columns and np.issubdtype(X[feature].dtype, np.number):
                # Create an individual normalizer for each feature
                normalizer = MinMaxScaler(feature_range=self.range)
                normalizer.fit(X[[feature]])
                self.normalizers_[feature] = normalizer
                self.normalized_feature_names_.append(f"{feature}_norm")
                print(f"Normalization for {feature} fitted, range: [{self.range[0]}, {self.range[1]}]")
        
        return self
    
    def transform(self, X):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X.copy())
        
        # Add progress bar
        for feature, normalizer in tqdm(self.normalizers_.items(), desc="Applying normalization"):
            if feature in X.columns:
                # Apply saved normalizer for normalization
                X[f"{feature}_norm"] = normalizer.transform(X[[feature]])
        
        return X
    
    def get_feature_names_out(self):
        # Return normalized feature names
        return self.normalized_feature_names_


class StackingClassifier:
    
    def __init__(self, base_models, meta_model, n_folds=5, use_proba=True, random_state=42):
        """
        Initialize Stacking Classifier
        
        Parameters:
        base_models: Dictionary of base models {name: model}
        meta_model: Meta model
        n_folds: Number of cross-validation folds
        use_proba: Whether to use probability predictions as meta features
        random_state: Random seed
        """
        self.base_models = base_models
        self.meta_model = meta_model
        self.n_folds = n_folds
        self.use_proba = use_proba
        self.random_state = random_state
        self.base_model_preds = {}  # Store predictions from base models
        
    def fit(self, X, y):
        """
        Train the stacking model
        
        Parameters:
        X: Training features
        y: Training labels
        """
        from sklearn.model_selection import StratifiedKFold
        
        # Create cross-validation object
        kf = StratifiedKFold(n_splits=self.n_folds, shuffle=True, random_state=self.random_state)
        
        # Store meta features
        meta_features = {}
        
        # Perform cross-validation predictions for each base model
        for name, model in self.base_models.items():
            print(f"\nTraining base model: {name}")
            
            # Initialize an array to store predictions
            if self.use_proba:
                n_classes = len(np.unique(y))
                train_meta_preds = np.zeros((X.shape[0], n_classes))
            else:
                train_meta_preds = np.zeros(X.shape[0])
            
            # Perform K-fold cross-validation
            for i, (train_idx, val_idx) in enumerate(kf.split(X, y)):
                print(f"  Fold {i+1}/{self.n_folds}")
                
                if isinstance(X, pd.DataFrame):
                    X_train_fold, X_val_fold = X.iloc[train_idx], X.iloc[val_idx]
                else:
                    X_train_fold, X_val_fold = X[train_idx], X[val_idx]

                if isinstance(y, pd.Series):
                    y_train_fold, y_val_fold = y.iloc[train_idx], y.iloc[val_idx]
                else:
                    y_train_fold, y_val_fold = y[train_idx], y[val_idx]
                
                # Train the model
                model.fit(X_train_fold, y_train_fold)
                
                # Generate predictions
    
                if self.use_proba:
                    fold_preds = model.predict_proba(X_val_fold)
    
                    for i, idx in enumerate(val_idx):
                        train_meta_preds[idx] = fold_preds[i]
                else:
                    fold_preds = model.predict(X_val_fold)
    
                    for i, idx in enumerate(val_idx):
                        train_meta_preds[idx] = fold_preds[i]
            
            meta_features[name] = train_meta_preds
            
            # Retrain the model on the entire dataset
            print(f"  Training {name} on the entire data")
            model.fit(X, y)
        
        # Prepare training data for meta model
        meta_X = self._prepare_meta_features(meta_features)
        
        # Train the meta model
        print("\nTraining meta model...")
        self.meta_model.fit(meta_X, y)
        
        # Save base models' predictions
        self.base_model_preds = meta_features
        
        return self
    
    def predict(self, X):
        """
        Use the trained stacking model to predict
        
        Parameters:
        X: Test features
        
        Returns:
        Predicted labels
        """
        meta_features = {}
        
        # Generate predictions using each base model
        for name, model in self.base_models.items():
            if self.use_proba:
                preds = model.predict_proba(X)
            else:
                preds = model.predict(X)
            meta_features[name] = preds
        
        # Prepare meta features for meta model
        meta_X = self._prepare_meta_features(meta_features)
        
        # Final prediction using meta model
        return self.meta_model.predict(meta_X)
    
    def predict_proba(self, X):
        """
        Use the trained stacking model to predict probabilities
        
        Parameters:
        X: Test features
        
        Returns:
        Predicted probabilities
        """
        meta_features = {}
        
        # Generate predictions using each base model
        for name, model in self.base_models.items():
            if self.use_proba:
                preds = model.predict_proba(X)
            else:
                preds = model.predict(X)
            meta_features[name] = preds
        
        # Prepare meta features for meta model
        meta_X = self._prepare_meta_features(meta_features)
        
        # Final probability prediction using meta model
        return self.meta_model.predict_proba(meta_X)
    
    def _prepare_meta_features(self, meta_features):
        """
        Prepare meta features
        
        Parameters:
        meta_features: Dictionary of predictions from base models
        
        Returns:
        Combined meta features array
        """
        # Combine predictions from all base models into one array
        all_features = []
        
        for name, preds in meta_features.items():
            # Ensure predictions are 2D arrays
            if preds.ndim == 1:
                preds = preds.reshape(-1, 1)
            all_features.append(preds)
        
        # Concatenate all features horizontally
        return np.hstack(all_features)


# Custom transformer: Feature transformation (Yeo-Johnson)
class FeatureTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, features_to_transform=None, binary_features=None, method='yeo-johnson'):
        self.features_to_transform = features_to_transform
        self.binary_features = binary_features if binary_features is not None else []
        self.method = method
        self.lambdas_ = {}  # Store lambda values for each feature
        self.transformed_feature_names_ = []  # Store transformed feature names
    
    def fit(self, X, y=None):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X)
        
        # If no features specified, use all numeric features
        if self.features_to_transform is None:
            self.features_to_transform = X.select_dtypes(include=np.number).columns.tolist()
        
        # Exclude binary features
        features_to_process = list(set(self.features_to_transform) - set(self.binary_features))
        
        # Add progress bar
        for feature in tqdm(features_to_process, desc="Fitting transformations"):
            if feature not in X.columns:
                continue
                
            if not np.issubdtype(X[feature].dtype, np.number):
                continue
            
            # For Yeo-Johnson transformation, save lambda values
            if self.method == 'yeo-johnson':
                try:
                    # Fit transformer and save lambda value
                    _, lmbda = stats.yeojohnson(X[feature])
                    self.lambdas_[feature] = lmbda
                    self.transformed_feature_names_.append(f"{feature}_yeojohnson")
                    print(f"Fitted yeo-johnson transformation for {feature} with lambda={lmbda:.4f}")
                except Exception as e:
                    print(f"Error fitting {self.method} transformation to {feature}: {str(e)}")
        
        return self
    
    def transform(self, X):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X.copy())
        
        # Add progress bar
        for feature, lmbda in tqdm(self.lambdas_.items(), desc="Applying transformations"):
            if feature in X.columns:
                try:
                    # Apply transformation using saved lambda value
                    X[f"{feature}_yeojohnson"] = stats.yeojohnson(X[feature], lmbda=lmbda)
                except Exception as e:
                    print(f"Error applying {self.method} transformation to {feature}: {str(e)}")
        
        return X
    
    def get_feature_names_out(self):
        # Return transformed feature names
        return self.transformed_feature_names_


# Custom transformer: Feature standardization
class FeatureStandardizer(BaseEstimator, TransformerMixin):
    def __init__(self, features_to_standardize=None):
        self.features_to_standardize = features_to_standardize
        self.scalers_ = {}  # Store a separate scaler for each feature
        self.standardized_feature_names_ = []  # Store standardized feature names
    
    def fit(self, X, y=None):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X)
        
        # If no features specified, use all numeric features
        if self.features_to_standardize is None:
            self.features_to_standardize = X.select_dtypes(include=np.number).columns.tolist()
        
        # Add progress bar
        for feature in tqdm(self.features_to_standardize, desc="Fitting standardization"):
            if feature in X.columns and np.issubdtype(X[feature].dtype, np.number):
                # Create a separate scaler for each feature
                scaler = StandardScaler()
                scaler.fit(X[[feature]])
                self.scalers_[feature] = scaler
                self.standardized_feature_names_.append(f"{feature}_std")
                print(f"Fitted standardization for {feature} with mean={scaler.mean_[0]:.4f}, scale={scaler.scale_[0]:.4f}")
        
        return self
    
    def transform(self, X):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X.copy())
        
        # Add progress bar
        for feature, scaler in tqdm(self.scalers_.items(), desc="Applying standardization"):
            if feature in X.columns:
                # Apply standardization using saved scaler
                X[f"{feature}_std"] = scaler.transform(X[[feature]])
        
        return X
    
    def get_feature_names_out(self):
        # Return standardized feature names
        return self.standardized_feature_names_


# Feature selector: Select specified feature columns
class FeatureSelector(BaseEstimator, TransformerMixin):
    def __init__(self, feature_names=None):
        self.feature_names = feature_names
    
    def fit(self, X, y=None):
        return self
    
    def transform(self, X):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X)
        
        # Select specified features
        if self.feature_names:
            # Check which features exist in the data
            available_features = [f for f in self.feature_names if f in X.columns]
            if len(available_features) < len(self.feature_names):
                missing = set(self.feature_names) - set(available_features)
                print(f"Warning: Some features are missing: {missing}")
            
            return X[available_features]
        else:
            return X


def evaluate_model(y_true, y_pred, dataset_name=""):
    """
    Evaluate model performance with confusion matrix and metrics
    
    Parameters:
    y_true: True labels
    y_pred: Predicted labels
    dataset_name: Name of the dataset for display purposes
    """
    # Calculate accuracy
    accuracy = accuracy_score(y_true, y_pred)
    
    # Print results
    print(f"\nModel Evaluation on {dataset_name}:")
    print(f"Accuracy: {accuracy:.4f}")
    
    # Get unique classes
    classes = np.unique(np.concatenate([y_true, y_pred]))
    n_classes = len(classes)
    
    # Compute confusion matrix
    cm = confusion_matrix(y_true, y_pred, labels=classes)
    
    # Generate classification report
    try:
        report = classification_report(y_true, y_pred, labels=classes, output_dict=False)
        print("\nClassification Report:")
        print(report)
    except Exception as e:
        print(f"Error generating classification report: {str(e)}")
    
    # Attempt to plot confusion matrix
    try:
        plt.figure(figsize=(10, 8))
        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                    xticklabels=classes, 
                    yticklabels=classes)
        plt.xlabel('Predicted')
        plt.ylabel('True')
        plt.title(f'Confusion Matrix - {dataset_name}')
        plt.tight_layout()
        plt.savefig(f'confusion_matrix_{dataset_name.replace(" ", "_")}.png')
        print(f"Confusion matrix saved to confusion_matrix_{dataset_name.replace(' ', '_')}.png")
        plt.close()  # Close the plot to avoid display issues
    except Exception as e:
        print(f"Error plotting confusion matrix: {str(e)}")
        # Print confusion matrix as fallback
        print("\nConfusion Matrix:")
        print(cm)
    
    return accuracy, cm, report


class PCATransformer(BaseEstimator, TransformerMixin):
    def __init__(self, n_components=None, variance_threshold=0.95):
        self.n_components = n_components
        self.variance_threshold = variance_threshold
        self.pca_ = None
        self.feature_names_ = []
    
    def fit(self, X, y=None):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X)
        
        # If n_components is None, use variance threshold
        if self.n_components is None:
            # First, fit a full PCA
            temp_pca = PCA()
            temp_pca.fit(X)
            
            # Determine how many components are needed to reach the variance threshold
            cumulative_variance = np.cumsum(temp_pca.explained_variance_ratio_)
            self.n_components = np.argmax(cumulative_variance >= self.variance_threshold) + 1
            print(f"Selected {self.n_components} components to explain {self.variance_threshold*100:.1f}% of variance")
        
        # Create and fit PCA
        self.pca_ = PCA(n_components=self.n_components)
        self.pca_.fit(X)
        
        # Create feature names
        self.feature_names_ = [f"PC{i+1}" for i in range(self.n_components)]
        
        # Print explained variance ratios
        explained_variance = self.pca_.explained_variance_ratio_
        print(f"Top 5 components explain: {explained_variance[:5].sum()*100:.2f}% of variance")
        print(f"All {self.n_components} components explain: {explained_variance.sum()*100:.2f}% of variance")
        
        return self
    
    def transform(self, X):
        # Ensure X is a DataFrame
        X = pd.DataFrame(X)
        
        # Apply PCA transformation
        X_pca = self.pca_.transform(X)
        
        # Convert to DataFrame and add column names
        X_pca_df = pd.DataFrame(X_pca, columns=self.feature_names_, index=X.index)
        
        return X_pca_df
    
    def get_feature_names_out(self):
        return self.feature_names_


class TrainingVisualizationCallback(Callback):
    def __init__(self, validation_data=None):
        super(TrainingVisualizationCallback, self).__init__()
        self.validation_data = validation_data
        self.train_loss = []
        self.train_acc = []
        self.val_loss = []
        self.val_acc = []
        
    def on_epoch_end(self, epoch, logs=None):
        logs = logs or {}
        self.train_loss.append(logs.get('loss'))
        self.train_acc.append(logs.get('accuracy'))
        
        if self.validation_data:
            val_logs = self.model.evaluate(
                self.validation_data[0], self.validation_data[1], 
                verbose=0
            )
            self.val_loss.append(val_logs[0])
            self.val_acc.append(val_logs[1])
    
    def plot_metrics(self):
        epochs = range(1, len(self.train_loss) + 1)
        
        # Create a 2x1 subplot layout
        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10))
        
        # Plot loss curves
        ax1.plot(epochs, self.train_loss, 'r-', label='Training')
        if self.validation_data:
            ax1.plot(epochs, self.val_loss, 'b-', label='Validation')
        ax1.set_title('Model Loss')
        ax1.set_xlabel('Epoch')
        ax1.set_ylabel('Loss')
        ax1.legend()
        ax1.grid(True)
        
        # Plot accuracy curves
        ax2.plot(epochs, self.train_acc, 'r-', label='Training')
        if self.validation_data:
            ax2.plot(epochs, self.val_acc, 'b-', label='Validation')
        ax2.set_title('Model Accuracy')
        ax2.set_xlabel('Epoch')
        ax2.set_ylabel('Accuracy')
        ax2.legend()
        ax2.grid(True)
        
        plt.tight_layout()
        plt.savefig('training_metrics.png')
        plt.close()
        print("Training metrics plot saved to training_metrics.png")
        
        # Calculate error rate and plot
        train_error = [1 - acc for acc in self.train_acc]
        if self.validation_data:
            val_error = [1 - acc for acc in self.val_acc]
            
            plt.figure(figsize=(12, 5))
            plt.plot(epochs, train_error, 'r-', label='Training')
            plt.plot(epochs, val_error, 'b-', label='Validation')
            plt.title('Model Error Rate')
            plt.xlabel('Epoch')
            plt.ylabel('Error Rate')
            plt.legend()
            plt.grid(True)
            plt.tight_layout()
            plt.savefig('error_rate.png')
            plt.close()
            print("Error rate plot saved to error_rate.png")

class NeuralNetworkClassifier(BaseEstimator, TransformerMixin):
    def __init__(self, input_dim=200, epochs=15, batch_size=64, learning_rate=0.001):
        self.input_dim = input_dim
        self.epochs = epochs
        self.batch_size = batch_size
        self.learning_rate = learning_rate
        self.model = None
        self.label_encoder = LabelEncoder()
        self.history = None
        self.visualization_callback = None
        
    def build_model(self, input_shape, n_classes):
        model = Sequential([
            # First layer
            Dense(128, activation='relu', input_shape=(input_shape,),
                  kernel_regularizer=tf.keras.regularizers.l2(0.001)),
            BatchNormalization(),
            Dropout(0.3),
            
            # Second layer
            Dense(64, activation='relu',
                  kernel_regularizer=tf.keras.regularizers.l2(0.001)),
            BatchNormalization(),
            Dropout(0.3),
            
            # Third layer
            Dense(32, activation='relu',
                  kernel_regularizer=tf.keras.regularizers.l2(0.001)),
            BatchNormalization(),
            Dropout(0.3),
            
            # Output layer
            Dense(n_classes, activation='softmax')
        ])
        
        model.compile(
            optimizer=Adam(learning_rate=self.learning_rate),
            loss='categorical_crossentropy',
            metrics=['accuracy']
        )
        
        return model
    
    def fit(self, X, y, X_val=None, y_val=None):
        # Ensure X is a numpy array
        if isinstance(X, pd.DataFrame):
            X = X.values
        
        # Encode labels
        y_encoded = self.label_encoder.fit_transform(y)
        n_classes = len(np.unique(y_encoded))
        
        # Convert to one-hot encoding
        y_onehot = to_categorical(y_encoded)
        
        # Prepare validation data
        validation_data = None
        if X_val is not None and y_val is not None:
            if isinstance(X_val, pd.DataFrame):
                X_val = X_val.values
            y_val_encoded = self.label_encoder.transform(y_val)
            y_val_onehot = to_categorical(y_val_encoded)
            validation_data = (X_val, y_val_onehot)
        
        # Compute class weights
        y_integers = np.argmax(y_onehot, axis=1)
        class_weights = compute_class_weight('balanced', classes=np.unique(y_integers), y=y_integers)
        class_weight_dict = {i: weight for i, weight in enumerate(class_weights)}
        
        # Build model
        self.model = self.build_model(X.shape[1], n_classes)
        
        # Create visualization callback
        self.visualization_callback = TrainingVisualizationCallback(validation_data)
        
        # Add callbacks
        callbacks = [
            # Early stopping
            tf.keras.callbacks.EarlyStopping(
                monitor='loss',
                patience=5,
                restore_best_weights=True
            ),
            # Learning rate scheduler
            tf.keras.callbacks.ReduceLROnPlateau(
                monitor='loss',
                factor=0.5,
                patience=3,
                min_lr=1e-6
            ),
            # Visualization callback
            self.visualization_callback
        ]
        
        # Train model
        self.history = self.model.fit(
            X, y_onehot,
            epochs=self.epochs,
            batch_size=self.batch_size,
            verbose=1,
            callbacks=callbacks,
            class_weight=class_weight_dict,
            validation_data=validation_data
        )
        
        # Plot training metrics
        self.visualization_callback.plot_metrics()
        
        # Plot training history
        self.plot_training_history()
        
        return self
    
    def plot_training_history(self):
        """Plot training history curves"""
        history = self.history
        
        # Create a 2x1 subplot layout
        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10))
        
        # Plot loss curve
        ax1.plot(history.history['loss'], 'r-', label='Training Loss')
        if 'val_loss' in history.history:
            ax1.plot(history.history['val_loss'], 'b-', label='Validation Loss')
        ax1.set_title('Model Loss')
        ax1.set_xlabel('Epoch')
        ax1.set_ylabel('Loss')
        ax1.legend()
        ax1.grid(True)
        
        # Plot accuracy curve
        ax2.plot(history.history['accuracy'], 'r-', label='Training Accuracy')
        if 'val_accuracy' in history.history:
            ax2.plot(history.history['val_accuracy'], 'b-', label='Validation Accuracy')
        ax2.set_title('Model Accuracy')
        ax2.set_xlabel('Epoch')
        ax2.set_ylabel('Accuracy')
        ax2.legend()
        ax2.grid(True)
        
        plt.tight_layout()
        plt.savefig('nn_training_history.png')
        plt.close()
        print("Neural network training history saved to nn_training_history.png")
    
    def predict(self, X):
        # Ensure X is a numpy array
        if isinstance(X, pd.DataFrame):
            X = X.values
        
        # Predict probabilities
        y_pred_probs = self.model.predict(X, verbose=0)
        
        # Adjust the threshold for class 1 to improve recall
        adjusted_probs = y_pred_probs.copy()
        adjusted_probs[:, 1] = adjusted_probs[:, 1] * 1.15  # Increase probability for class 1
        
        # Convert to class labels
        y_pred = np.argmax(adjusted_probs, axis=1)
        
        # Convert back to original labels
        return self.label_encoder.inverse_transform(y_pred)
    
    def predict_proba(self, X):
        # Ensure X is a numpy array
        if isinstance(X, pd.DataFrame):
            X = X.values
        
        # Predict probabilities
        return self.model.predict(X, verbose=0)        


def train_advanced_models(train_df, target_column, skewed_features=None, binary_features=None, 
                          high_kurtosis_features=None, use_pca=False, use_smote=True,
                          n_pca_components=None, pca_variance=0.95, use_bayes_opt=False,
                          use_stacking=False):
    """
    Train advanced models and evaluate performance
    
    Parameters:
    train_df: Training data DataFrame
    target_column: Target variable column name
    skewed_features: List of skewed features
    binary_features: List of binary features
    high_kurtosis_features: List of high kurtosis features
    use_pca: Whether to use PCA
    use_smote: Whether to use SMOTE oversampling
    n_pca_components: Number of PCA components
    pca_variance: PCA variance threshold
    use_bayes_opt: Whether to use Bayesian Optimization for XGBoost parameters
    use_stacking: Whether to use Stacking ensemble method
    
    Returns:
    Best model, results dictionary, preprocessor
    """
    train_df = train_df.copy()
    train_df.columns = train_df.columns.astype(str)
    
    # Convert features to string if needed
    if skewed_features is not None:
        skewed_features = [str(f) for f in skewed_features]
    
    if binary_features is not None:
        binary_features = [str(f) for f in binary_features]
    
    if high_kurtosis_features is not None:
        high_kurtosis_features = [str(f) for f in high_kurtosis_features]
    
    
    print("=" * 50)
    print("ADVANCED MODEL TRAINING")
    print("=" * 50)
    print(f"Training data shape: {train_df.shape}")
    print(f"Target column: {target_column}")
    print(f"Using PCA: {use_pca}")
    print(f"Using SMOTE: {use_smote}")
    print(f"Using Bayes Optimization: {use_bayes_opt}")
    print(f"Using Stacking: {use_stacking}")
    
    
    X = train_df.drop(columns=[target_column])
    y = train_df[target_column]
    
    
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    
    print(f"Training set shape: {X_train.shape}, Test set shape: {X_test.shape}")
    print(f"Class distribution in training set: {np.bincount(y_train)}")
    print(f"Class distribution in test set: {np.bincount(y_test)}")
    
    
    if skewed_features is not None:
        skewed_in_data = [f for f in skewed_features if f in X.columns]
        features_to_transform = list(set(skewed_in_data) - set(binary_features or []))
    else:
        features_to_transform = X.select_dtypes(include=np.number).columns.tolist()
    
    # Features to standardize
    if high_kurtosis_features is not None:
        kurt_in_data = [f for f in high_kurtosis_features if f in X.columns]
        features_to_standardize = kurt_in_data.copy()
    else:
        features_to_standardize = []
    
    # Create preprocessing steps
    preprocessing_steps = [
        ('transformer', FeatureTransformer(features_to_transform=features_to_transform, 
                                             binary_features=binary_features)),
        ('standardizer', FeatureStandardizer(features_to_standardize=None))  # Standardizes all features
    ]
    
    if use_pca:
        preprocessing_steps.append(
            ('pca', PCATransformer(n_components=n_pca_components, variance_threshold=pca_variance))
        )
    
    # Create preprocessor pipeline
    preprocessing = Pipeline(preprocessing_steps)
    
    # Create preprocessing pipeline for neural network (first standardize then normalize)
    nn_preprocessing_steps = [
        ('transformer', FeatureTransformer(features_to_transform=features_to_transform, 
                                             binary_features=binary_features)),
        ('standardizer', FeatureStandardizer(features_to_standardize=None)),  # Standardize first
        ('normalizer', FeatureNormalizer())  # Then normalize
    ]
    
    if use_pca:
        nn_preprocessing_steps.append(
            ('pca', PCATransformer(n_components=n_pca_components, variance_threshold=pca_variance))
        )
    
    nn_preprocessing = Pipeline(nn_preprocessing_steps)
    
    # Fit preprocessor
    print("\nFitting regular preprocessor pipeline...")
    X_train_processed = preprocessing.fit_transform(X_train)
    X_test_processed = preprocessing.transform(X_test)
    
    print("\nFitting neural network preprocessor pipeline...")
    X_train_nn_processed = nn_preprocessing.fit_transform(X_train)
    X_test_nn_processed = nn_preprocessing.transform(X_test)
    
    print(f"Shape of training data after regular processing: {X_train_processed.shape}")
    print(f"Shape of test data after regular processing: {X_test_processed.shape}")
    print(f"Shape of training data after neural network processing: {X_train_nn_processed.shape}")
    print(f"Shape of test data after neural network processing: {X_test_nn_processed.shape}")
    
    # Apply SMOTE oversampling if enabled
    if use_smote:
        print("\nApplying SMOTE oversampling...")
        smote = SMOTE(random_state=42)
        X_train_resampled, y_train_resampled = smote.fit_resample(X_train_processed, y_train)
        X_train_nn_resampled, y_train_nn_resampled = smote.fit_resample(X_train_nn_processed, y_train)
        print(f"Shape of training data after resampling: {X_train_resampled.shape}")
        print(f"Class distribution after resampling: {np.bincount(y_train_resampled)}")
    else:
        X_train_resampled, y_train_resampled = X_train_processed, y_train
        X_train_nn_resampled, y_train_nn_resampled = X_train_nn_processed, y_train
    
    # Compute class weights
    class_weights = compute_class_weight('balanced', classes=np.unique(y_train), y=y_train)
    weight_dict = {i: weight for i, weight in enumerate(class_weights)}
    print(f"Class weights: {weight_dict}")
    
    # Prepare class weights for neural network
    nn_class_weight = {i: weight for i, weight in enumerate(class_weights)}
    
    # If using Bayesian Optimization, optimize XGBoost parameters
    if use_bayes_opt:
        print("\nUsing Bayesian Optimization to tune XGBoost parameters...")
        # Split training set into training and validation sets
        X_train_opt, X_val_opt, y_train_opt, y_val_opt = train_test_split(
            X_train_resampled, y_train_resampled, test_size=0.2, random_state=42, stratify=y_train_resampled
        )
        
        # Perform Bayesian Optimization
        best_params, _ = optimize_xgboost_params(
            X_train_opt, y_train_opt, X_val_opt, y_val_opt, n_iter=30
        )
        
        # Create XGBoost model using best parameters
        xgb_model = xgb.XGBClassifier(
            **best_params,
            objective='multi:softprob',
            num_class=len(np.unique(y)),
            random_state=42,
            n_jobs=-1,
            eval_metric=['mlogloss', 'merror']
        )
    else:
        # Use default parameters
        xgb_model = xgb.XGBClassifier(
            n_estimators=500,
            max_depth=6,
            learning_rate=0.01,
            gamma=0.1,
            subsample=0.8,
            colsample_bytree=0.8,
            colsample_bylevel=0.8,
            objective='multi:softprob',
            num_class=len(np.unique(y)),
            random_state=42,
            n_jobs=-1,
            reg_alpha=0.2,
            reg_lambda=1.5,
            # Note: scale_pos_weight should be a scalar value. For multiclass problems, you may omit this parameter or use class weights.
            tree_method='hist',
            eval_metric=['mlogloss', 'merror']
        )
    
    # Define models
    models = {
        "XGBoost": xgb_model,
        
        "LightGBM": lgb.LGBMClassifier(
            n_estimators=200, max_depth=8, learning_rate=0.05, num_leaves=31,
            subsample=0.8, colsample_bytree=0.8, reg_alpha=0.1, reg_lambda=0.1,
            objective='multiclass', num_class=len(np.unique(y)), 
            class_weight='balanced',
            random_state=42, n_jobs=-1
        ),
        
        "NeuralNetwork": NeuralNetworkClassifier(
            input_dim=X_train_nn_processed.shape[1],
            learning_rate=0.001,
            batch_size=128,
            epochs=30
        )
    }
    
    # Compare results
    results = {}
    
    # If using Stacking, create Stacking model
    if use_stacking:
        print("\nPreparing Stacking model...")
        
        # Prepare base models
        base_models = {
            "XGBoost": models["XGBoost"],
            "LightGBM": models["LightGBM"]
        }
        
        # Create meta model (using Logistic Regression)
        from sklearn.linear_model import LogisticRegression
        meta_model = LogisticRegression(
            multi_class='multinomial',
            solver='lbfgs',
            max_iter=1000,
            C=1.0,
            random_state=42
        )
        
        # Create Stacking Classifier
        stacking_model = StackingClassifier(
            base_models=base_models,
            meta_model=meta_model,
            n_folds=5,
            use_proba=True,
            random_state=42
        )
        
        # Add to models dictionary
        models["Stacking"] = stacking_model
    
    # Train and evaluate models
    for name, model in models.items():
        print(f"\nTraining {name}...")
        
        # For XGBoost, add early stopping and evaluation set
        if name == "XGBoost":
            eval_set = [(X_train_processed, y_train_resampled), (X_test_processed, y_test)]
            model.fit(
                X_train_resampled, y_train_resampled,
                eval_set=eval_set,
                early_stopping_rounds=20,
                verbose=True
            )
            
            # Plot XGBoost training process
            results_df = pd.DataFrame({
                'train_mlogloss': model.evals_result()['validation_0']['mlogloss'],
                'test_mlogloss': model.evals_result()['validation_1']['mlogloss'],
                'train_merror': model.evals_result()['validation_0']['merror'],
                'test_merror': model.evals_result()['validation_1']['merror']
            })
            
            # Plot loss curves
            plt.figure(figsize=(12, 5))
            plt.subplot(1, 2, 1)
            plt.plot(results_df['train_mlogloss'], 'r-', label='Training')
            plt.plot(results_df['test_mlogloss'], 'b-', label='Test')
            plt.title('XGBoost Log Loss')
            plt.xlabel('Epoch')
            plt.ylabel('Log Loss')
            plt.legend()
            plt.grid(True)
            
            # Plot error rate curves
            plt.subplot(1, 2, 2)
            plt.plot(results_df['train_merror'], 'r-', label='Training')
            plt.plot(results_df['test_merror'], 'b-', label='Test')
            plt.title('XGBoost Error Rate')
            plt.xlabel('Epoch')
            plt.ylabel('Error Rate')
            plt.legend()
            plt.grid(True)
            
            plt.tight_layout()
            plt.savefig('xgboost_training_metrics.png')
            plt.close()
            print("XGBoost training metrics saved to xgboost_training_metrics.png")
            
            # Plot feature importance
            if not use_pca:  # Only plot feature importance if PCA is not used
                feature_names = X_train_processed.columns.tolist()
                importance_df = plot_feature_importance(model, feature_names, top_n=20, model_name="XGBoost")
                print("Top 20 XGBoost important features:")
                print(importance_df.head(20))
            
        elif name == "LightGBM":
            model.fit(X_train_resampled, y_train_resampled)
            
            # Plot feature importance
            if not use_pca:  # Only plot feature importance if PCA is not used
                feature_names = X_train_processed.columns.tolist()
                importance_df = plot_feature_importance(model, feature_names, top_n=20, model_name="LightGBM")
                print("Top 20 LightGBM important features:")
                print(importance_df.head(20))
                
        elif name == "NeuralNetwork":
            # Train neural network using specially preprocessed data
            model.fit(X_train_nn_resampled, y_train_nn_resampled, X_test_nn_processed, y_test)
        
        elif name == "Stacking":
            # Convert data to numpy arrays
            if isinstance(X_train_resampled, pd.DataFrame):
                X_train_resampled = X_train_resampled.reset_index(drop=True)
            if isinstance(y_train_resampled, pd.Series):
                y_train_resampled = y_train_resampled.reset_index(drop=True)
    
    
            X_train_np = X_train_resampled.values if isinstance(X_train_resampled, pd.DataFrame) else X_train_resampled
            X_test_np = X_test_processed.values if isinstance(X_test_processed, pd.DataFrame) else X_test_processed
    
    
            model.fit(X_train_np, y_train_resampled)
        
        # Evaluate on training set - using corresponding preprocessed data
        if name == "NeuralNetwork":
            train_preds = model.predict(X_train_nn_processed)
            test_preds = model.predict(X_test_nn_processed)
        else:
            train_preds = model.predict(X_train_processed)
            test_preds = model.predict(X_test_processed)
            
        train_acc = accuracy_score(y_train, train_preds)
        train_balanced_acc = balanced_accuracy_score(y_train, train_preds)
        train_f1_macro = f1_score(y_train, train_preds, average='macro')
        
        # Evaluate on test set
        test_acc = accuracy_score(y_test, test_preds)
        test_balanced_acc = balanced_accuracy_score(y_test, test_preds)
        test_f1_macro = f1_score(y_test, test_preds, average='macro')
        
        # Save results
        results[name] = {
            'train_acc': train_acc,
            'train_balanced_acc': train_balanced_acc,
            'train_f1_macro': train_f1_macro,
            'test_acc': test_acc,
            'test_balanced_acc': test_balanced_acc,
            'test_f1_macro': test_f1_macro,
            'model': model,
            'test_report': classification_report(y_test, test_preds)
        }
        
        # Print results
        print(f"{name} Results:")
        print(f"  Training Accuracy: {train_acc:.4f}")
        print(f"  Training Balanced Accuracy: {train_balanced_acc:.4f}")
        print(f"  Training F1 (macro): {train_f1_macro:.4f}")
        print(f"  Test Accuracy: {test_acc:.4f}")
        print(f"  Test Balanced Accuracy: {test_balanced_acc:.4f}")
        print(f"  Test F1 (macro): {test_f1_macro:.4f}")
        print("\nClassification Report:")
        print(results[name]['test_report'])
        
        # Plot confusion matrix
        cm = confusion_matrix(y_test, test_preds)
        plt.figure(figsize=(10, 8))
        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                    xticklabels=np.unique(y_test), 
                    yticklabels=np.unique(y_test))
        plt.xlabel('Predicted')
        plt.ylabel('True')
        plt.title(f'Confusion Matrix - {name}')
        plt.tight_layout()
        plt.savefig(f'confusion_matrix_{name}.png')
        plt.close()
        
        # Save model
        if name == "NeuralNetwork":
            full_model = {
                'preprocessing': nn_preprocessing,
                'model': model
            }
        else:
            full_model = {
                'preprocessing': preprocessing,
                'model': model
            }
        joblib.dump(full_model, f'{name.lower()}_model.joblib')
        print(f"Model saved to {name.lower()}_model.joblib")
    
    # Select best model
    best_model_name = max(results.keys(), key=lambda k: results[k]['test_f1_macro'])
    best_model = results[best_model_name]['model']
    print(f"\nBest model: {best_model_name} with Test F1 (macro): {results[best_model_name]['test_f1_macro']:.4f}")
    
    # Return the appropriate preprocessor
    if best_model_name == "NeuralNetwork":
        return best_model, results, nn_preprocessing
    elif best_model_name == "Stacking":
        return best_model, results, preprocessing
    else:
        return best_model, results, preprocessing


def optimize_xgboost_params(X_train, y_train, X_val, y_val, n_iter=10):
    """
    Find the best XGBoost parameters using Bayesian Optimization
    
    Parameters:
    X_train: Training features
    y_train: Training labels
    X_val: Validation features
    y_val: Validation labels
    n_iter: Number of optimization iterations
    
    Returns:
    best_params: Best parameters dictionary
    best_score: Best score
    """
    from sklearn.metrics import f1_score
    from skopt import BayesSearchCV
    from skopt.space import Real, Integer, Categorical
    
    # Use a smaller sample for optimization if necessary
    if X_train.shape[0] > 10000:
        from sklearn.model_selection import train_test_split
        X_sample, _, y_sample, _ = train_test_split(
            X_train, y_train, train_size=10000, random_state=42, stratify=y_train
        )
    else:
        X_sample, y_sample = X_train, y_train
    
    # Define a narrower parameter space
    param_space = {
        'n_estimators': Integer(300, 600),  # Reduced range
        'max_depth': Integer(3, 8),         # Reduced range
        'learning_rate': Real(0.005, 0.2),     # Reduced range
        'subsample': Real(0.6, 0.9),
        'colsample_bytree': Real(0.6, 0.9),
        'gamma': Real(0, 2),                # Reduced range
        'reg_alpha': Real(0, 2),            # Reduced range
        'reg_lambda': Real(1, 5)            # Reduced range
    }
    
    # Create XGBoost model
    model = xgb.XGBClassifier(
        objective='multi:softprob',
        num_class=len(np.unique(y_train)),
        tree_method='hist',  # Use histogram method for speed
        random_state=42,
        n_jobs=-1
    )
    
    # Create Bayesian search object
    bayes_search = BayesSearchCV(
        model,
        param_space,
        n_iter=n_iter,
        cv=3,
        scoring='f1_macro',
        n_jobs=-1,
        verbose=1,
        random_state=42
    )
    
    # Start Bayesian Optimization
    print("Starting Bayesian Optimization...")
    bayes_search.fit(X_sample, y_sample)
    
    # Get best parameters and score
    best_params = bayes_search.best_params_
    best_score = bayes_search.best_score_
    
    print(f"Best F1 score: {best_score:.4f}")
    print("Best parameters:")
    for param, value in best_params.items():
        print(f"  {param}: {value}")
    
    # Evaluate on validation set using best model
    best_model = bayes_search.best_estimator_
    y_pred = best_model.predict(X_val)
    val_f1 = f1_score(y_val, y_pred, average='macro')
    print(f"Validation F1 score: {val_f1:.4f}")
    
    return best_params, best_score

def plot_feature_importance(model, feature_names, top_n=20, model_name="Model"):
    """
    Plot feature importance
    
    Parameters:
    model: Trained model
    feature_names: List of feature names
    top_n: Show top N important features
    model_name: Model name
    
    Returns:
    importance_df: DataFrame of feature importances
    """
    # Get feature importance
    if hasattr(model, 'feature_importances_'):
        importances = model.feature_importances_
    elif model_name == "LightGBM":
        importances = model.booster_.feature_importance(importance_type='gain')
    else:
        print(f"Model {model_name} does not support feature importance extraction")
        return pd.DataFrame()
    
    # Create feature importance DataFrame
    importance_df = pd.DataFrame({
        'feature': feature_names,
        'importance': importances
    }).sort_values('importance', ascending=False)
    
    # Plot feature importance
    plt.figure(figsize=(12, 8))
    sns.barplot(x='importance', y='feature', data=importance_df.head(top_n))
    plt.title(f'{model_name} Feature Importance (Top {top_n})')
    plt.tight_layout()
    plt.savefig(f'{model_name.lower()}_feature_importance.png')
    plt.close()
    print(f"{model_name} feature importance plot saved to {model_name.lower()}_feature_importance.png")
    
    return importance_df


def apply_advanced_model(new_df, target_column=None, model_path=None):
    """
    Apply the saved advanced model to new data
    
    Parameters:
    new_df: New data DataFrame
    target_column: Target variable column name (if any)
    model_path: Model path
    
    Returns:
    Predictions and probabilities
    """
    print("=" * 50)
    print("APPLYING ADVANCED MODEL")
    print("=" * 50)
    print(f"New data shape: {new_df.shape}")
    
    # Convert all column names to strings
    new_df = new_df.copy()
    new_df.columns = new_df.columns.astype(str)
    
    # Prepare data
    if target_column and target_column in new_df.columns:
        X_new = new_df.drop(columns=[target_column])
        y_true = new_df[target_column]
        has_target = True
        print(f"Target column '{target_column}' found in data.")
    else:
        X_new = new_df
        has_target = False
        print("No target column provided or not found in data.")
    
    # Load model
    print(f"Loading model from {model_path}...")
    full_model = joblib.load(model_path)
    preprocessing = full_model['preprocessing']
    model = full_model['model']
    print(f"Model loaded: {type(model).__name__}")
    
    # Apply preprocessing
    print("Applying preprocessing...")
    X_new_processed = preprocessing.transform(X_new)
    print(f"Processed data shape: {X_new_processed.shape}")
    
    # Generate predictions
    print("Generating predictions...")
    predictions = model.predict(X_new_processed)
    
    # Get probabilities (if supported)
    try:
        probabilities = model.predict_proba(X_new_processed)
        has_probabilities = True
    except:
        probabilities = None
        has_probabilities = False
    
    print("Prediction complete")
    print(f"Predictions shape: {predictions.shape}")
    if has_probabilities:
        print(f"Probabilities shape: {probabilities.shape}")
    
    # If target is provided, evaluate predictions
    if has_target:
        print("\nEVALUATING MODEL ON NEW DATA:")
        acc = accuracy_score(y_true, predictions)
        balanced_acc = balanced_accuracy_score(y_true, predictions)
        f1_macro = f1_score(y_true, predictions, average='macro')
        
        print(f"Accuracy: {acc:.4f}")
        print(f"Balanced Accuracy: {balanced_acc:.4f}")
        print(f"F1 (macro): {f1_macro:.4f}")
        
        print("\nClassification Report:")
        print(classification_report(y_true, predictions))
        
        # Plot confusion matrix
        cm = confusion_matrix(y_true, predictions)
        plt.figure(figsize=(10, 8))
        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                    xticklabels=np.unique(y_true), 
                    yticklabels=np.unique(y_true))
        plt.xlabel('Predicted')
        plt.ylabel('True')
        plt.title('Confusion Matrix - New Data')
        plt.tight_layout()
        plt.savefig('confusion_matrix_new_data.png')
        plt.close()
    
    print("=" * 50)
    return predictions, probabilities

    
    
    

def optimize_xgboost_params(X_train, y_train, X_val, y_val, n_iter=10):
    """
    Find the best XGBoost parameters using Bayesian Optimization
    
    Parameters:
    X_train: Training features
    y_train: Training labels
    X_val: Validation features
    y_val: Validation labels
    n_iter: Number of optimization iterations
    
    Returns:
    best_params: Best parameters dictionary
    best_score: Best score
    """
    from sklearn.metrics import f1_score
    from skopt import BayesSearchCV
    from skopt.space import Real, Integer, Categorical
    
    # Use a smaller sample for optimization if necessary
    if X_train.shape[0] > 10000:
        from sklearn.model_selection import train_test_split
        X_sample, _, y_sample, _ = train_test_split(
            X_train, y_train, train_size=10000, random_state=42, stratify=y_train
        )
    else:
        X_sample, y_sample = X_train, y_train
    
    # Define a narrower parameter space
    param_space = {
        'n_estimators': Integer(300, 800),  # Reduced range
        'max_depth': Integer(3, 10),         # Reduced range
        'learning_rate': Real(0.001, 0.2),     # Reduced range
        'subsample': Real(0.5, 1.0),
        'colsample_bytree': Real(0.5, 1.0),
        'gamma': Real(0, 4),                # Reduced range
        'reg_alpha': Real(0, 4),            # Reduced range
        'reg_lambda': Real(0.5, 10),            # Reduced range
        'min_child_weight': Integer(1, 10)
    }
    
    # Create XGBoost model
    model = xgb.XGBClassifier(
        objective='multi:softprob',
        num_class=len(np.unique(y_train)),
        tree_method='hist',  # Use histogram method for speed
        random_state=42,
        n_jobs=-1
    )
    
    # Create Bayesian search object
    bayes_search = BayesSearchCV(
        model,
        param_space,
        n_iter=n_iter,
        cv=3,
        scoring='f1_macro',
        n_jobs=-1,
        verbose=1,
        random_state=42
    )
    
    # Start Bayesian Optimization
    print("Starting Bayesian Optimization...")
    bayes_search.fit(X_sample, y_sample)
    
    # Get best parameters and score
    best_params = bayes_search.best_params_
    best_score = bayes_search.best_score_
    
    print(f"Best F1 score: {best_score:.4f}")
    print("Best parameters:")
    for param, value in best_params.items():
        print(f"  {param}: {value}")
    
    # Evaluate on validation set using best model
    best_model = bayes_search.best_estimator_
    y_pred = best_model.predict(X_val)
    val_f1 = f1_score(y_val, y_pred, average='macro')
    print(f"Validation F1 score: {val_f1:.4f}")
    
    return best_params, best_score

def plot_feature_importance(model, feature_names, top_n=20, model_name="Model"):
    """
    Plot feature importance
    
    Parameters:
    model: Trained model
    feature_names: List of feature names
    top_n: Show top N important features
    model_name: Model name
    
    Returns:
    importance_df: DataFrame of feature importances
    """
    # Get feature importance
    if hasattr(model, 'feature_importances_'):
        importances = model.feature_importances_
    elif model_name == "LightGBM":
        importances = model.booster_.feature_importance(importance_type='gain')
    else:
        print(f"Model {model_name} does not support feature importance extraction")
        return pd.DataFrame()
    
    # Create feature importance DataFrame
    importance_df = pd.DataFrame({
        'feature': feature_names,
        'importance': importances
    }).sort_values('importance', ascending=False)
    
    # Plot feature importance
    plt.figure(figsize=(12, 8))
    sns.barplot(x='importance', y='feature', data=importance_df.head(top_n))
    plt.title(f'{model_name} Feature Importance (Top {top_n})')
    plt.tight_layout()
    plt.savefig(f'{model_name.lower()}_feature_importance.png')
    plt.close()
    print(f"{model_name} feature importance plot saved to {model_name.lower()}_feature_importance.png")
    
    return importance_df


def apply_advanced_model(new_df, target_column=None, model_path=None):
    """
    Apply the saved advanced model to new data
    
    Parameters:
    new_df: New data DataFrame
    target_column: Target variable column name (if any)
    model_path: Model path
    
    Returns:
    Predictions and probabilities
    """
    print("=" * 50)
    print("APPLYING ADVANCED MODEL")
    print("=" * 50)
    print(f"New data shape: {new_df.shape}")
    
    # Convert all column names to strings
    new_df = new_df.copy()
    new_df.columns = new_df.columns.astype(str)
    
    # Prepare data
    if target_column and target_column in new_df.columns:
        X_new = new_df.drop(columns=[target_column])
        y_true = new_df[target_column]
        has_target = True
        print(f"Target column '{target_column}' found in data.")
    else:
        X_new = new_df
        has_target = False
        print("No target column provided or not found in data.")
    
    # Load model
    print(f"Loading model from {model_path}...")
    full_model = joblib.load(model_path)
    preprocessing = full_model['preprocessing']
    model = full_model['model']
    print(f"Model loaded: {type(model).__name__}")
    
    # Apply preprocessing
    print("Applying preprocessing...")
    X_new_processed = preprocessing.transform(X_new)
    print(f"Processed data shape: {X_new_processed.shape}")
    
    # Generate predictions
    print("Generating predictions...")
    predictions = model.predict(X_new_processed)
    
    # Get probabilities (if supported)
    try:
        probabilities = model.predict_proba(X_new_processed)
        has_probabilities = True
    except:
        probabilities = None
        has_probabilities = False
    
    print("Prediction complete")
    print(f"Predictions shape: {predictions.shape}")
    if has_probabilities:
        print(f"Probabilities shape: {probabilities.shape}")
    
    # If target is provided, evaluate predictions
    if has_target:
        print("\nEVALUATING MODEL ON NEW DATA:")
        acc = accuracy_score(y_true, predictions)
        balanced_acc = balanced_accuracy_score(y_true, predictions)
        f1_macro = f1_score(y_true, predictions, average='macro')
        
        print(f"Accuracy: {acc:.4f}")
        print(f"Balanced Accuracy: {balanced_acc:.4f}")
        print(f"F1 (macro): {f1_macro:.4f}")
        
        print("\nClassification Report:")
        print(classification_report(y_true, predictions))
        
        # Plot confusion matrix
        cm = confusion_matrix(y_true, predictions)
        plt.figure(figsize=(10, 8))
        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                    xticklabels=np.unique(y_true), 
                    yticklabels=np.unique(y_true))
        plt.xlabel('Predicted')
        plt.ylabel('True')
        plt.title('Confusion Matrix - New Data')
        plt.tight_layout()
        plt.savefig('confusion_matrix_new_data.png')
        plt.close()
    
    print("=" * 50)
    return predictions, probabilities


In [11]:
path3 = '/Users/mouyasushi/Desktop/vici holdings/Test/data/eval_data.npy'
test_data = np.load(path3)

path4 = '/Users/mouyasushi/Desktop/vici holdings/Test/data/eval_labels.npy'
test_labels = np.load(path4)



test_df = pd.DataFrame(test_data)
test_df['label'] = test_labels  


In [12]:
# 训练多个高级模型并比较
best_model, results, preprocessing = train_advanced_models(
    train_df=train_df,
    target_column='label',
    skewed_features=[52, 67, 65, 61, 62, 64, 63, 66, 69, 5, 49, 3, 68, 53, 57, 56, 54, 55, 51, 15, 31, 34, 41, 43, 39, 16, 14, 44, 42, 40, 4, 2],
    binary_features=[56, 57, 58, 59, 60],
    high_kurtosis_features=[52, 67, 65, 62, 61, 64, 63, 66, 55, 49, 69, 4, 2, 5, 3, 17, 15, 68, 53, 54, 40, 14, 16, 57, 56, 42, 44, 13],
    use_pca= False, 
    use_smote= False, 
    use_bayes_opt= True, 
    use_stacking= True
)



ADVANCED MODEL TRAINING
Training data shape: (1174461, 71)
Target column: label
Using PCA: False
Using SMOTE: False
Using Bayes Optimization: True
Using Stacking: True
Training set shape: (939568, 70), Test set shape: (234893, 70)
Class distribution in training set: [509635 205557 224376]
Class distribution in test set: [127409  51389  56095]

Fitting regular preprocessor pipeline...


Fitting transformations:   3%|▎         | 1/30 [00:00<00:10,  2.83it/s]

Fitted yeo-johnson transformation for 68 with lambda=-9.5392


Fitting transformations:   7%|▋         | 2/30 [00:00<00:08,  3.16it/s]

Fitted yeo-johnson transformation for 2 with lambda=1.3149


Fitting transformations:  10%|█         | 3/30 [00:01<00:14,  1.88it/s]

Fitted yeo-johnson transformation for 43 with lambda=2.0264


Fitting transformations:  13%|█▎        | 4/30 [00:01<00:13,  1.93it/s]

Fitted yeo-johnson transformation for 5 with lambda=0.5698


Fitting transformations:  17%|█▋        | 5/30 [00:02<00:10,  2.28it/s]

Fitted yeo-johnson transformation for 67 with lambda=-13.0012


Fitting transformations:  20%|██        | 6/30 [00:02<00:10,  2.19it/s]

Fitted yeo-johnson transformation for 44 with lambda=2.1400


Fitting transformations:  23%|██▎       | 7/30 [00:03<00:10,  2.24it/s]

Fitted yeo-johnson transformation for 64 with lambda=0.1129


Fitting transformations:  27%|██▋       | 8/30 [00:03<00:08,  2.63it/s]

Fitted yeo-johnson transformation for 66 with lambda=-0.9896


Fitting transformations:  30%|███       | 9/30 [00:03<00:07,  2.95it/s]

Fitted yeo-johnson transformation for 3 with lambda=0.5652


Fitting transformations:  33%|███▎      | 10/30 [00:03<00:06,  3.10it/s]

Fitted yeo-johnson transformation for 53 with lambda=-1.9793


Fitting transformations:  37%|███▋      | 11/30 [00:04<00:07,  2.40it/s]

Fitted yeo-johnson transformation for 49 with lambda=0.7245


Fitting transformations:  43%|████▎     | 13/30 [00:04<00:05,  3.24it/s]

Fitted yeo-johnson transformation for 15 with lambda=1.1072
Fitted yeo-johnson transformation for 55 with lambda=0.8889


Fitting transformations:  47%|████▋     | 14/30 [00:05<00:05,  2.73it/s]

Fitted yeo-johnson transformation for 39 with lambda=2.1344


Fitting transformations:  50%|█████     | 15/30 [00:05<00:05,  2.56it/s]

Fitted yeo-johnson transformation for 34 with lambda=1.7605


Fitting transformations:  53%|█████▎    | 16/30 [00:06<00:04,  2.86it/s]

Fitted yeo-johnson transformation for 69 with lambda=-32.4964


Fitting transformations:  57%|█████▋    | 17/30 [00:06<00:04,  3.00it/s]

Fitted yeo-johnson transformation for 14 with lambda=1.3551


Fitting transformations:  60%|██████    | 18/30 [00:06<00:04,  2.65it/s]

Fitted yeo-johnson transformation for 63 with lambda=-0.3036


Fitting transformations:  63%|██████▎   | 19/30 [00:07<00:05,  2.20it/s]

Fitted yeo-johnson transformation for 62 with lambda=0.1496


Fitting transformations:  67%|██████▋   | 20/30 [00:08<00:04,  2.20it/s]

Fitted yeo-johnson transformation for 31 with lambda=1.6285


Fitting transformations:  70%|███████   | 21/30 [00:08<00:04,  2.17it/s]

Fitted yeo-johnson transformation for 41 with lambda=2.0680


Fitting transformations:  73%|███████▎  | 22/30 [00:09<00:03,  2.05it/s]

Fitted yeo-johnson transformation for 40 with lambda=2.8365


Fitting transformations:  77%|███████▋  | 23/30 [00:09<00:03,  2.11it/s]

Fitted yeo-johnson transformation for 54 with lambda=-0.6452


Fitting transformations:  80%|████████  | 24/30 [00:10<00:03,  1.97it/s]

Fitted yeo-johnson transformation for 42 with lambda=2.2412


Fitting transformations:  83%|████████▎ | 25/30 [00:10<00:02,  2.28it/s]

Fitted yeo-johnson transformation for 4 with lambda=1.2657


Fitting transformations:  87%|████████▋ | 26/30 [00:10<00:01,  2.09it/s]

Fitted yeo-johnson transformation for 51 with lambda=0.3618


Fitting transformations:  90%|█████████ | 27/30 [00:11<00:01,  2.40it/s]

Fitted yeo-johnson transformation for 52 with lambda=-0.8524


Fitting transformations:  93%|█████████▎| 28/30 [00:11<00:00,  2.05it/s]

Fitted yeo-johnson transformation for 61 with lambda=0.2342


Fitting transformations:  97%|█████████▋| 29/30 [00:12<00:00,  2.34it/s]

Fitted yeo-johnson transformation for 16 with lambda=1.3351


Fitting transformations: 100%|██████████| 30/30 [00:12<00:00,  2.32it/s]

Fitted yeo-johnson transformation for 65 with lambda=0.2489



Applying transformations: 100%|██████████| 30/30 [00:00<00:00, 45.19it/s]
Fitting standardization:  44%|████▍     | 44/100 [00:00<00:00, 207.70it/s]

Fitted standardization for 0 with mean=-0.0000, scale=0.9999
Fitted standardization for 1 with mean=-0.0002, scale=0.9999
Fitted standardization for 2 with mean=0.0000, scale=1.0012
Fitted standardization for 3 with mean=-0.0000, scale=1.0009
Fitted standardization for 4 with mean=0.0008, scale=0.9979
Fitted standardization for 5 with mean=-0.0011, scale=0.9974
Fitted standardization for 6 with mean=-0.0003, scale=1.0000
Fitted standardization for 7 with mean=0.0004, scale=0.9998
Fitted standardization for 8 with mean=-0.0004, scale=1.0001
Fitted standardization for 9 with mean=0.0001, scale=1.0002
Fitted standardization for 10 with mean=0.0003, scale=0.9998
Fitted standardization for 11 with mean=0.0006, scale=0.9999
Fitted standardization for 12 with mean=-0.0002, scale=0.9999
Fitted standardization for 13 with mean=-0.0004, scale=1.0001
Fitted standardization for 14 with mean=-0.0000, scale=1.0005
Fitted standardization for 15 with mean=-0.0003, scale=0.9996
Fitted standardization f

Fitting standardization:  93%|█████████▎| 93/100 [00:00<00:00, 228.09it/s]

Fitted standardization for 45 with mean=0.0002, scale=1.0002
Fitted standardization for 46 with mean=-0.0001, scale=1.0001
Fitted standardization for 47 with mean=-0.0002, scale=1.0001
Fitted standardization for 48 with mean=0.0006, scale=1.0003
Fitted standardization for 49 with mean=0.0003, scale=0.9973
Fitted standardization for 50 with mean=-0.0002, scale=0.9997
Fitted standardization for 51 with mean=0.0004, scale=1.0003
Fitted standardization for 52 with mean=1.3461, scale=8.9012
Fitted standardization for 53 with mean=1.1896, scale=1.4834
Fitted standardization for 54 with mean=0.9096, scale=0.7265
Fitted standardization for 55 with mean=0.0041, scale=0.1430
Fitted standardization for 56 with mean=0.1097, scale=0.3125
Fitted standardization for 57 with mean=0.1054, scale=0.3071
Fitted standardization for 58 with mean=0.6684, scale=0.4708
Fitted standardization for 59 with mean=0.3457, scale=0.4756
Fitted standardization for 60 with mean=0.3605, scale=0.4801
Fitted standardizatio

Fitting standardization: 100%|██████████| 100/100 [00:00<00:00, 223.30it/s]


Fitted standardization for 4_yeojohnson with mean=0.0543, scale=0.9044
Fitted standardization for 51_yeojohnson with mean=-0.2049, scale=0.9191
Fitted standardization for 52_yeojohnson with mean=0.5240, scale=0.1683
Fitted standardization for 61_yeojohnson with mean=-0.7230, scale=7.1096
Fitted standardization for 16_yeojohnson with mean=0.0892, scale=0.9387
Fitted standardization for 65_yeojohnson with mean=-3.0672, scale=9.6174


Applying standardization: 100%|██████████| 100/100 [00:00<00:00, 301.16it/s]
Applying transformations: 100%|██████████| 30/30 [00:00<00:00, 202.84it/s]
Applying standardization: 100%|██████████| 100/100 [00:00<00:00, 1210.99it/s]



Fitting neural network preprocessor pipeline...


Fitting transformations:   3%|▎         | 1/30 [00:00<00:09,  2.93it/s]

Fitted yeo-johnson transformation for 68 with lambda=-9.5392


Fitting transformations:   7%|▋         | 2/30 [00:00<00:08,  3.38it/s]

Fitted yeo-johnson transformation for 2 with lambda=1.3149


Fitting transformations:  10%|█         | 3/30 [00:01<00:13,  2.01it/s]

Fitted yeo-johnson transformation for 43 with lambda=2.0264


Fitting transformations:  13%|█▎        | 4/30 [00:01<00:12,  2.00it/s]

Fitted yeo-johnson transformation for 5 with lambda=0.5698


Fitting transformations:  17%|█▋        | 5/30 [00:02<00:10,  2.34it/s]

Fitted yeo-johnson transformation for 67 with lambda=-13.0012


Fitting transformations:  20%|██        | 6/30 [00:02<00:11,  2.13it/s]

Fitted yeo-johnson transformation for 44 with lambda=2.1400


Fitting transformations:  23%|██▎       | 7/30 [00:03<00:10,  2.15it/s]

Fitted yeo-johnson transformation for 64 with lambda=0.1129


Fitting transformations:  27%|██▋       | 8/30 [00:03<00:08,  2.53it/s]

Fitted yeo-johnson transformation for 66 with lambda=-0.9896


Fitting transformations:  30%|███       | 9/30 [00:03<00:07,  2.85it/s]

Fitted yeo-johnson transformation for 3 with lambda=0.5652


Fitting transformations:  33%|███▎      | 10/30 [00:03<00:06,  3.25it/s]

Fitted yeo-johnson transformation for 53 with lambda=-1.9793


Fitting transformations:  37%|███▋      | 11/30 [00:04<00:07,  2.50it/s]

Fitted yeo-johnson transformation for 49 with lambda=0.7245


Fitting transformations:  43%|████▎     | 13/30 [00:04<00:05,  3.38it/s]

Fitted yeo-johnson transformation for 15 with lambda=1.1072
Fitted yeo-johnson transformation for 55 with lambda=0.8889


Fitting transformations:  47%|████▋     | 14/30 [00:05<00:05,  2.82it/s]

Fitted yeo-johnson transformation for 39 with lambda=2.1344


Fitting transformations:  50%|█████     | 15/30 [00:05<00:05,  2.63it/s]

Fitted yeo-johnson transformation for 34 with lambda=1.7605


Fitting transformations:  53%|█████▎    | 16/30 [00:06<00:04,  2.91it/s]

Fitted yeo-johnson transformation for 69 with lambda=-32.4964


Fitting transformations:  57%|█████▋    | 17/30 [00:06<00:04,  3.01it/s]

Fitted yeo-johnson transformation for 14 with lambda=1.3551


Fitting transformations:  60%|██████    | 18/30 [00:06<00:04,  2.58it/s]

Fitted yeo-johnson transformation for 63 with lambda=-0.3036


Fitting transformations:  63%|██████▎   | 19/30 [00:07<00:04,  2.21it/s]

Fitted yeo-johnson transformation for 62 with lambda=0.1496


Fitting transformations:  67%|██████▋   | 20/30 [00:07<00:04,  2.21it/s]

Fitted yeo-johnson transformation for 31 with lambda=1.6285


Fitting transformations:  70%|███████   | 21/30 [00:08<00:04,  2.19it/s]

Fitted yeo-johnson transformation for 41 with lambda=2.0680


Fitting transformations:  73%|███████▎  | 22/30 [00:08<00:03,  2.14it/s]

Fitted yeo-johnson transformation for 40 with lambda=2.8365


Fitting transformations:  77%|███████▋  | 23/30 [00:09<00:03,  2.17it/s]

Fitted yeo-johnson transformation for 54 with lambda=-0.6452


Fitting transformations:  80%|████████  | 24/30 [00:09<00:02,  2.13it/s]

Fitted yeo-johnson transformation for 42 with lambda=2.2412


Fitting transformations:  83%|████████▎ | 25/30 [00:10<00:02,  2.44it/s]

Fitted yeo-johnson transformation for 4 with lambda=1.2657


Fitting transformations:  87%|████████▋ | 26/30 [00:10<00:01,  2.20it/s]

Fitted yeo-johnson transformation for 51 with lambda=0.3618


Fitting transformations:  90%|█████████ | 27/30 [00:10<00:01,  2.52it/s]

Fitted yeo-johnson transformation for 52 with lambda=-0.8524


Fitting transformations:  93%|█████████▎| 28/30 [00:11<00:00,  2.11it/s]

Fitted yeo-johnson transformation for 61 with lambda=0.2342


Fitting transformations:  97%|█████████▋| 29/30 [00:11<00:00,  2.37it/s]

Fitted yeo-johnson transformation for 16 with lambda=1.3351


Fitting transformations: 100%|██████████| 30/30 [00:12<00:00,  2.33it/s]

Fitted yeo-johnson transformation for 65 with lambda=0.2489



Applying transformations: 100%|██████████| 30/30 [00:00<00:00, 39.88it/s]
Fitting standardization:  44%|████▍     | 44/100 [00:00<00:00, 223.42it/s]

Fitted standardization for 0 with mean=-0.0000, scale=0.9999
Fitted standardization for 1 with mean=-0.0002, scale=0.9999
Fitted standardization for 2 with mean=0.0000, scale=1.0012
Fitted standardization for 3 with mean=-0.0000, scale=1.0009
Fitted standardization for 4 with mean=0.0008, scale=0.9979
Fitted standardization for 5 with mean=-0.0011, scale=0.9974
Fitted standardization for 6 with mean=-0.0003, scale=1.0000
Fitted standardization for 7 with mean=0.0004, scale=0.9998
Fitted standardization for 8 with mean=-0.0004, scale=1.0001
Fitted standardization for 9 with mean=0.0001, scale=1.0002
Fitted standardization for 10 with mean=0.0003, scale=0.9998
Fitted standardization for 11 with mean=0.0006, scale=0.9999
Fitted standardization for 12 with mean=-0.0002, scale=0.9999
Fitted standardization for 13 with mean=-0.0004, scale=1.0001
Fitted standardization for 14 with mean=-0.0000, scale=1.0005
Fitted standardization for 15 with mean=-0.0003, scale=0.9996
Fitted standardization f

Fitting standardization:  93%|█████████▎| 93/100 [00:00<00:00, 223.79it/s]

Fitted standardization for 48 with mean=0.0006, scale=1.0003
Fitted standardization for 49 with mean=0.0003, scale=0.9973
Fitted standardization for 50 with mean=-0.0002, scale=0.9997
Fitted standardization for 51 with mean=0.0004, scale=1.0003
Fitted standardization for 52 with mean=1.3461, scale=8.9012
Fitted standardization for 53 with mean=1.1896, scale=1.4834
Fitted standardization for 54 with mean=0.9096, scale=0.7265
Fitted standardization for 55 with mean=0.0041, scale=0.1430
Fitted standardization for 56 with mean=0.1097, scale=0.3125
Fitted standardization for 57 with mean=0.1054, scale=0.3071
Fitted standardization for 58 with mean=0.6684, scale=0.4708
Fitted standardization for 59 with mean=0.3457, scale=0.4756
Fitted standardization for 60 with mean=0.3605, scale=0.4801
Fitted standardization for 61 with mean=751.4038, scale=31937.9306
Fitted standardization for 62 with mean=743.0172, scale=31550.7630
Fitted standardization for 63 with mean=82.9839, scale=2667.3007
Fitted 

Fitting standardization: 100%|██████████| 100/100 [00:00<00:00, 219.95it/s]


Fitted standardization for 4_yeojohnson with mean=0.0543, scale=0.9044
Fitted standardization for 51_yeojohnson with mean=-0.2049, scale=0.9191
Fitted standardization for 52_yeojohnson with mean=0.5240, scale=0.1683
Fitted standardization for 61_yeojohnson with mean=-0.7230, scale=7.1096
Fitted standardization for 16_yeojohnson with mean=0.0892, scale=0.9387
Fitted standardization for 65_yeojohnson with mean=-3.0672, scale=9.6174


Applying standardization: 100%|██████████| 100/100 [00:00<00:00, 284.76it/s]
Fitting normalization:   2%|▏         | 3/200 [00:00<00:15, 13.10it/s]

Normalization for 0 fitted, range: [0, 1]
Normalization for 1 fitted, range: [0, 1]
Normalization for 2 fitted, range: [0, 1]
Normalization for 3 fitted, range: [0, 1]
Normalization for 4 fitted, range: [0, 1]
Normalization for 5 fitted, range: [0, 1]


Fitting normalization:   6%|▋         | 13/200 [00:00<00:05, 34.41it/s]

Normalization for 6 fitted, range: [0, 1]
Normalization for 7 fitted, range: [0, 1]
Normalization for 8 fitted, range: [0, 1]
Normalization for 9 fitted, range: [0, 1]
Normalization for 10 fitted, range: [0, 1]
Normalization for 11 fitted, range: [0, 1]
Normalization for 12 fitted, range: [0, 1]
Normalization for 13 fitted, range: [0, 1]
Normalization for 14 fitted, range: [0, 1]
Normalization for 15 fitted, range: [0, 1]


Fitting normalization:  12%|█▏        | 24/200 [00:00<00:04, 41.37it/s]

Normalization for 16 fitted, range: [0, 1]
Normalization for 17 fitted, range: [0, 1]
Normalization for 18 fitted, range: [0, 1]
Normalization for 19 fitted, range: [0, 1]
Normalization for 20 fitted, range: [0, 1]
Normalization for 21 fitted, range: [0, 1]
Normalization for 22 fitted, range: [0, 1]
Normalization for 23 fitted, range: [0, 1]
Normalization for 24 fitted, range: [0, 1]
Normalization for 25 fitted, range: [0, 1]


Fitting normalization:  18%|█▊        | 35/200 [00:00<00:03, 46.50it/s]

Normalization for 26 fitted, range: [0, 1]
Normalization for 27 fitted, range: [0, 1]
Normalization for 28 fitted, range: [0, 1]
Normalization for 29 fitted, range: [0, 1]
Normalization for 30 fitted, range: [0, 1]
Normalization for 31 fitted, range: [0, 1]
Normalization for 32 fitted, range: [0, 1]
Normalization for 33 fitted, range: [0, 1]
Normalization for 34 fitted, range: [0, 1]
Normalization for 35 fitted, range: [0, 1]
Normalization for 36 fitted, range: [0, 1]


Fitting normalization:  24%|██▎       | 47/200 [00:01<00:03, 48.97it/s]

Normalization for 37 fitted, range: [0, 1]
Normalization for 38 fitted, range: [0, 1]
Normalization for 39 fitted, range: [0, 1]
Normalization for 40 fitted, range: [0, 1]
Normalization for 41 fitted, range: [0, 1]
Normalization for 42 fitted, range: [0, 1]
Normalization for 43 fitted, range: [0, 1]
Normalization for 44 fitted, range: [0, 1]
Normalization for 45 fitted, range: [0, 1]
Normalization for 46 fitted, range: [0, 1]
Normalization for 47 fitted, range: [0, 1]


Fitting normalization:  36%|███▌      | 71/200 [00:01<00:01, 84.61it/s]

Normalization for 48 fitted, range: [0, 1]
Normalization for 49 fitted, range: [0, 1]
Normalization for 50 fitted, range: [0, 1]
Normalization for 51 fitted, range: [0, 1]
Normalization for 52 fitted, range: [0, 1]
Normalization for 53 fitted, range: [0, 1]
Normalization for 54 fitted, range: [0, 1]
Normalization for 55 fitted, range: [0, 1]
Normalization for 56 fitted, range: [0, 1]
Normalization for 57 fitted, range: [0, 1]
Normalization for 58 fitted, range: [0, 1]
Normalization for 59 fitted, range: [0, 1]
Normalization for 60 fitted, range: [0, 1]
Normalization for 61 fitted, range: [0, 1]
Normalization for 62 fitted, range: [0, 1]
Normalization for 63 fitted, range: [0, 1]
Normalization for 64 fitted, range: [0, 1]
Normalization for 65 fitted, range: [0, 1]
Normalization for 66 fitted, range: [0, 1]
Normalization for 67 fitted, range: [0, 1]
Normalization for 68 fitted, range: [0, 1]
Normalization for 69 fitted, range: [0, 1]
Normalization for 68_yeojohnson fitted, range: [0, 1]


Fitting normalization:  50%|█████     | 101/200 [00:01<00:00, 143.24it/s]

Normalization for 66_yeojohnson fitted, range: [0, 1]
Normalization for 3_yeojohnson fitted, range: [0, 1]
Normalization for 53_yeojohnson fitted, range: [0, 1]
Normalization for 49_yeojohnson fitted, range: [0, 1]
Normalization for 15_yeojohnson fitted, range: [0, 1]
Normalization for 55_yeojohnson fitted, range: [0, 1]
Normalization for 39_yeojohnson fitted, range: [0, 1]
Normalization for 34_yeojohnson fitted, range: [0, 1]
Normalization for 69_yeojohnson fitted, range: [0, 1]
Normalization for 14_yeojohnson fitted, range: [0, 1]
Normalization for 63_yeojohnson fitted, range: [0, 1]
Normalization for 62_yeojohnson fitted, range: [0, 1]
Normalization for 31_yeojohnson fitted, range: [0, 1]
Normalization for 41_yeojohnson fitted, range: [0, 1]
Normalization for 40_yeojohnson fitted, range: [0, 1]
Normalization for 54_yeojohnson fitted, range: [0, 1]
Normalization for 42_yeojohnson fitted, range: [0, 1]
Normalization for 4_yeojohnson fitted, range: [0, 1]
Normalization for 51_yeojohnso

Fitting normalization:  58%|█████▊    | 116/200 [00:01<00:00, 94.94it/s] 

Normalization for 4_std fitted, range: [0, 1]
Normalization for 5_std fitted, range: [0, 1]
Normalization for 6_std fitted, range: [0, 1]
Normalization for 7_std fitted, range: [0, 1]
Normalization for 8_std fitted, range: [0, 1]
Normalization for 9_std fitted, range: [0, 1]
Normalization for 10_std fitted, range: [0, 1]
Normalization for 11_std fitted, range: [0, 1]
Normalization for 12_std fitted, range: [0, 1]
Normalization for 13_std fitted, range: [0, 1]
Normalization for 14_std fitted, range: [0, 1]
Normalization for 15_std fitted, range: [0, 1]
Normalization for 16_std fitted, range: [0, 1]


Fitting normalization:  70%|███████   | 140/200 [00:01<00:00, 101.26it/s]

Normalization for 17_std fitted, range: [0, 1]
Normalization for 18_std fitted, range: [0, 1]
Normalization for 19_std fitted, range: [0, 1]
Normalization for 20_std fitted, range: [0, 1]
Normalization for 21_std fitted, range: [0, 1]
Normalization for 22_std fitted, range: [0, 1]
Normalization for 23_std fitted, range: [0, 1]
Normalization for 24_std fitted, range: [0, 1]
Normalization for 25_std fitted, range: [0, 1]
Normalization for 26_std fitted, range: [0, 1]
Normalization for 27_std fitted, range: [0, 1]
Normalization for 28_std fitted, range: [0, 1]
Normalization for 29_std fitted, range: [0, 1]
Normalization for 30_std fitted, range: [0, 1]
Normalization for 31_std fitted, range: [0, 1]
Normalization for 32_std fitted, range: [0, 1]
Normalization for 33_std fitted, range: [0, 1]
Normalization for 34_std fitted, range: [0, 1]
Normalization for 35_std fitted, range: [0, 1]
Normalization for 36_std fitted, range: [0, 1]
Normalization for 37_std fitted, range: [0, 1]
Normalization

Fitting normalization:  76%|███████▌  | 152/200 [00:02<00:00, 85.67it/s] 

Normalization for 41_std fitted, range: [0, 1]
Normalization for 42_std fitted, range: [0, 1]
Normalization for 43_std fitted, range: [0, 1]
Normalization for 44_std fitted, range: [0, 1]
Normalization for 45_std fitted, range: [0, 1]
Normalization for 46_std fitted, range: [0, 1]
Normalization for 47_std fitted, range: [0, 1]
Normalization for 48_std fitted, range: [0, 1]
Normalization for 49_std fitted, range: [0, 1]
Normalization for 50_std fitted, range: [0, 1]
Normalization for 51_std fitted, range: [0, 1]


Fitting normalization:  81%|████████  | 162/200 [00:02<00:00, 75.66it/s]

Normalization for 52_std fitted, range: [0, 1]
Normalization for 53_std fitted, range: [0, 1]
Normalization for 54_std fitted, range: [0, 1]
Normalization for 55_std fitted, range: [0, 1]
Normalization for 56_std fitted, range: [0, 1]
Normalization for 57_std fitted, range: [0, 1]
Normalization for 58_std fitted, range: [0, 1]
Normalization for 59_std fitted, range: [0, 1]
Normalization for 60_std fitted, range: [0, 1]
Normalization for 61_std fitted, range: [0, 1]
Normalization for 62_std fitted, range: [0, 1]
Normalization for 63_std fitted, range: [0, 1]
Normalization for 64_std fitted, range: [0, 1]
Normalization for 65_std fitted, range: [0, 1]


Fitting normalization:  86%|████████▌ | 171/200 [00:02<00:00, 66.97it/s]

Normalization for 66_std fitted, range: [0, 1]
Normalization for 67_std fitted, range: [0, 1]
Normalization for 68_std fitted, range: [0, 1]
Normalization for 69_std fitted, range: [0, 1]
Normalization for 68_yeojohnson_std fitted, range: [0, 1]
Normalization for 2_yeojohnson_std fitted, range: [0, 1]
Normalization for 43_yeojohnson_std fitted, range: [0, 1]
Normalization for 5_yeojohnson_std fitted, range: [0, 1]
Normalization for 67_yeojohnson_std fitted, range: [0, 1]
Normalization for 44_yeojohnson_std fitted, range: [0, 1]
Normalization for 64_yeojohnson_std fitted, range: [0, 1]


Fitting normalization:  93%|█████████▎| 186/200 [00:02<00:00, 60.47it/s]

Normalization for 66_yeojohnson_std fitted, range: [0, 1]
Normalization for 3_yeojohnson_std fitted, range: [0, 1]
Normalization for 53_yeojohnson_std fitted, range: [0, 1]
Normalization for 49_yeojohnson_std fitted, range: [0, 1]
Normalization for 15_yeojohnson_std fitted, range: [0, 1]
Normalization for 55_yeojohnson_std fitted, range: [0, 1]
Normalization for 39_yeojohnson_std fitted, range: [0, 1]
Normalization for 34_yeojohnson_std fitted, range: [0, 1]
Normalization for 69_yeojohnson_std fitted, range: [0, 1]
Normalization for 14_yeojohnson_std fitted, range: [0, 1]
Normalization for 63_yeojohnson_std fitted, range: [0, 1]


Fitting normalization:  96%|█████████▋| 193/200 [00:02<00:00, 58.00it/s]

Normalization for 62_yeojohnson_std fitted, range: [0, 1]
Normalization for 31_yeojohnson_std fitted, range: [0, 1]
Normalization for 41_yeojohnson_std fitted, range: [0, 1]
Normalization for 40_yeojohnson_std fitted, range: [0, 1]
Normalization for 54_yeojohnson_std fitted, range: [0, 1]
Normalization for 42_yeojohnson_std fitted, range: [0, 1]
Normalization for 4_yeojohnson_std fitted, range: [0, 1]
Normalization for 51_yeojohnson_std fitted, range: [0, 1]
Normalization for 52_yeojohnson_std fitted, range: [0, 1]
Normalization for 61_yeojohnson_std fitted, range: [0, 1]


Fitting normalization: 100%|██████████| 200/200 [00:03<00:00, 64.62it/s]


Normalization for 16_yeojohnson_std fitted, range: [0, 1]
Normalization for 65_yeojohnson_std fitted, range: [0, 1]


Applying normalization: 100%|██████████| 200/200 [00:01<00:00, 144.55it/s]
Applying transformations: 100%|██████████| 30/30 [00:00<00:00, 115.12it/s]
Applying standardization: 100%|██████████| 100/100 [00:00<00:00, 427.58it/s]
Applying normalization: 100%|██████████| 200/200 [00:00<00:00, 352.58it/s]


Shape of training data after regular processing: (939568, 200)
Shape of test data after regular processing: (234893, 200)
Shape of training data after neural network processing: (939568, 400)
Shape of test data after neural network processing: (234893, 400)
Class weights: {0: 0.6145365473983014, 1: 1.5236130773135108, 2: 1.3958236769232597}

Using Bayesian Optimization to tune XGBoost parameters...
Starting Bayesian Optimization...
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 can

In [13]:
# 应用最佳模型到新数据
predictions, probabilities = apply_advanced_model(
    new_df=test_df,
    target_column='label',
    model_path='stacking_model.joblib'
)

APPLYING ADVANCED MODEL
New data shape: (1175302, 71)
Target column 'label' found in data.
Loading model from stacking_model.joblib...
Model loaded: StackingClassifier
Applying preprocessing...


Applying transformations: 100%|██████████| 30/30 [00:00<00:00, 39.14it/s]
Applying standardization: 100%|██████████| 100/100 [00:00<00:00, 165.21it/s]


Processed data shape: (1175302, 200)
Generating predictions...
Prediction complete
Predictions shape: (1175302,)
Probabilities shape: (1175302, 3)

EVALUATING MODEL ON NEW DATA:
Accuracy: 0.6011
Balanced Accuracy: 0.5470
F1 (macro): 0.5636

Classification Report:
              precision    recall  f1-score   support

           0       0.61      0.72      0.66    635393
           1       0.59      0.45      0.51    249039
           2       0.58      0.47      0.52    290870

    accuracy                           0.60   1175302
   macro avg       0.59      0.55      0.56   1175302
weighted avg       0.60      0.60      0.59   1175302

