# BIRCH-AE: BIRCH Autoencoder Ensemble Framework

**A scalable hierarchical ensemble clustering framework for e-commerce user segmentation**

This framework integrates:
- Deep autoencoder-based dimensionality reduction for handling high-dimensional correlated features
- BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) with multiple configurations
- Hierarchical ensemble consensus methods (Majority Voting, Weighted Voting, AASC, BOHC/CSPA)
- Dynamic ensemble selection with multi-criteria evaluation
- Incremental learning support for streaming data

## Key Innovations:
1. **Hierarchical Ensemble Architecture**: Multiple BIRCH configurations with varying threshold values
2. **Autoencoder Feature Learning**: Non-linear dimensionality reduction optimized for BIRCH clustering
3. **BIRCH-Optimized Consensus**: Hierarchical consensus methods (BOHC/CSPA) designed for BIRCH
4. **Dynamic Selection**: Automatic strategy selection using Silhouette, Calinski-Harabasz, Davies-Bouldin
5. **Memory Efficiency**: Leverages BIRCH's CF Tree structure for large-scale datasets
6. **Incremental Learning**: Supports streaming data with real-time segment updates

## Installation

In [None]:
# Install required packages
# !pip install tensorflow==2.18.0
# !pip install numpy==2.0.2
# !pip install scikit-learn
# !pip install pandas

## Import Libraries

In [None]:
import numpy as np
import pandas as pd
import warnings
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import KNNImputer, SimpleImputer
from sklearn.model_selection import train_test_split
from sklearn.cluster import Birch, SpectralClustering
from sklearn.decomposition import PCA
from sklearn.metrics import silhouette_score, calinski_harabasz_score, davies_bouldin_score
from scipy.stats import mode
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, BatchNormalization, LeakyReLU, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

warnings.filterwarnings("ignore")

## 1. Data Preprocessing Module

In [None]:
def load_and_preprocess_data(filepath, user_id_col='user_id', sample_size=None, test_size=0.3):
    """
    Load and preprocess data with automatic handling of numeric and categorical features.
    
    Parameters:
    -----------
    filepath : str
        Path to the CSV file
    user_id_col : str
        Name of the user ID column (will be used as index)
    sample_size : int, optional
        Number of samples to use (for large datasets)
    test_size : float
        Proportion of data to use for testing (default: 0.3)
    
    Returns:
    --------
    train_processed : ndarray
        Preprocessed training data
    test_processed : ndarray
        Preprocessed test data
    train_ids : ndarray
        Training user IDs
    test_ids : ndarray
        Test user IDs
    preprocessor : ColumnTransformer
        Fitted preprocessing pipeline
    """
    # Load data
    data = pd.read_csv(filepath)
    
    # Use index as user_id if column doesn't exist
    if user_id_col not in data.columns:
        print(f"Warning: '{user_id_col}' not found. Using dataset index as user_id.")
        data[user_id_col] = data.index
    
    # Set user_id as index
    data.set_index(user_id_col, inplace=True)
    
    # Downsample if needed
    if sample_size and len(data) > sample_size:
        data = data.sample(n=sample_size, random_state=42)
        print(f"Downsampled to {sample_size} samples")
    
    # Replace infinite values with NaN
    data.replace([np.inf, -np.inf], np.nan, inplace=True)
    
    # Identify valid features
    numeric_features = [col for col in data.select_dtypes(include=[np.number]).columns 
                       if data[col].notna().sum() > 0]
    categorical_features = [col for col in data.select_dtypes(include=[object, 'category']).columns 
                           if data[col].notna().sum() > 0]
    
    if not numeric_features and not categorical_features:
        raise ValueError("No valid features found in dataset!")
    
    print(f"Found {len(numeric_features)} numeric and {len(categorical_features)} categorical features")
    
    # Convert categorical features to string
    for col in categorical_features:
        data[col] = data[col].astype(str)
    
    # Define transformers
    numeric_transformer = Pipeline(steps=[
        ('imputer', KNNImputer(n_neighbors=5)),
        ('scaler', StandardScaler())
    ])
    
    categorical_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='most_frequent')),
        ('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))
    ])
    
    # Create column transformer
    transformers = []
    if numeric_features:
        transformers.append(('num', numeric_transformer, numeric_features))
    if categorical_features:
        transformers.append(('cat', categorical_transformer, categorical_features))
    
    preprocessor = ColumnTransformer(transformers=transformers)
    
    # Split data
    train_df, test_df = train_test_split(data, test_size=test_size, random_state=42)
    train_ids = train_df.index.values
    test_ids = test_df.index.values
    
    # Fit and transform
    preprocessor.fit(train_df)
    train_processed = preprocessor.transform(train_df)
    test_processed = preprocessor.transform(test_df)
    
    # Convert to numpy array
    train_processed = np.array(train_processed)
    test_processed = np.array(test_processed)
    
    # Clean data
    train_processed = np.nan_to_num(train_processed, nan=0.0, posinf=0.0, neginf=0.0)
    test_processed = np.nan_to_num(test_processed, nan=0.0, posinf=0.0, neginf=0.0)
    
    print(f"Preprocessing complete: Train shape {train_processed.shape}, Test shape {test_processed.shape}")
    
    return train_processed, test_processed, train_ids, test_ids, preprocessor

## 2. Autoencoder Feature Learning Module

In [None]:
def build_autoencoder(input_dim, latent_dim=14, dropout_rate=0.3):
    """
    Build a deep autoencoder for dimensionality reduction.
    Architecture: 512 -> 256 -> 128 -> latent_dim -> 128 -> 256 -> input_dim
    
    Parameters:
    -----------
    input_dim : int
        Input feature dimension
    latent_dim : int
        Latent space dimension (default: 14)
    dropout_rate : float
        Dropout rate for regularization (default: 0.3)
    
    Returns:
    --------
    model : keras.Model
        Compiled autoencoder model
    """
    # Encoder
    input_layer = Input(shape=(input_dim,))
    x = Dense(512)(input_layer)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.01)(x)
    x = Dropout(0.4)(x)
    
    x = Dense(256)(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.01)(x)
    x = Dropout(dropout_rate)(x)
    
    x = Dense(128)(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.01)(x)
    x = Dropout(dropout_rate)(x)
    
    # Latent space
    encoded = Dense(latent_dim, activation='relu', name='encoded_layer')(x)
    
    # Decoder
    x = Dense(128)(encoded)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.01)(x)
    x = Dropout(dropout_rate)(x)
    
    x = Dense(256)(x)
    x = BatchNormalization()(x)
    x = LeakyReLU(alpha=0.01)(x)
    x = Dropout(dropout_rate)(x)
    
    output_layer = Dense(input_dim, activation='linear')(x)
    
    # Compile model
    model = Model(inputs=input_layer, outputs=output_layer)
    model.compile(optimizer=Adam(learning_rate=0.0001), loss='mse')
    
    return model


def train_autoencoder(model, data, epochs=100, batch_size=64, patience=15):
    """
    Train the autoencoder with early stopping.
    Uses 10% of data for validation monitoring.
    """
    early_stopping = EarlyStopping(
        monitor='val_loss', 
        patience=patience, 
        restore_best_weights=True
    )
    
    history = model.fit(
        data, data,
        epochs=epochs,
        batch_size=batch_size,
        validation_split=0.1,
        verbose=1,
        callbacks=[early_stopping]
    )
    
    return model, history


def encode_data(autoencoder, data):
    """
    Extract encoded representations from trained autoencoder.
    Returns compact latent space representations optimized for BIRCH.
    """
    encoder = Model(
        inputs=autoencoder.input, 
        outputs=autoencoder.get_layer('encoded_layer').output
    )
    return encoder.predict(data, verbose=0)


def reduce_dimensions_pca(data, n_components=14):
    """
    Alternative: Reduce dimensions using PCA (faster, linear method).
    """
    pca = PCA(n_components=n_components)
    reduced_data = pca.fit_transform(data)
    print(f"PCA explained variance ratio: {sum(pca.explained_variance_ratio_):.3f}")
    return reduced_data, pca

## 3. BIRCH Ensemble Module

In [None]:
def apply_birch_clustering(data, threshold=0.5, branching_factor=50, n_clusters=5):
    """
    Apply BIRCH clustering with specific parameters.
    
    Parameters:
    -----------
    data : ndarray
        Input data (typically encoded representations)
    threshold : float
        Maximum diameter for subclusters at leaf nodes (controls granularity)
    branching_factor : int
        Maximum number of CF subclusters in each node (controls tree breadth)
    n_clusters : int
        Number of clusters for global clustering phase
    
    Returns:
    --------
    results : dict
        Clustering results with labels and metrics
    """
    # Initialize BIRCH with parameters
    birch = Birch(threshold=threshold, branching_factor=branching_factor, n_clusters=n_clusters)
    
    # Fit and predict
    labels = birch.fit_predict(data)
    
    # Calculate metrics
    results = {
        'labels': labels,
        'silhouette': silhouette_score(data, labels),
        'calinski_harabasz': calinski_harabasz_score(data, labels),
        'davies_bouldin': davies_bouldin_score(data, labels),
        'threshold': threshold,
        'branching_factor': branching_factor
    }
    
    return results


def run_birch_ensemble(data, n_clusters_range=[5, 10, 15, 20]):
    """
    Run BIRCH ensemble with multiple parameter configurations.
    
    Creates ensemble by varying:
    - Threshold values (fine-grained 0.3, balanced 0.5, coarse-grained 0.8)
    - Number of clusters
    
    This captures diverse clustering perspectives at multiple granularities.
    
    Returns:
    --------
    results_df : DataFrame
        Results for all BIRCH configurations
    labels_dict : dict
        Dictionary of labels for each configuration
    """
    # Define BIRCH parameter configurations
    birch_configs = [
        {'name': 'Fine-Grained', 'threshold': 0.3, 'branching_factor': 50},
        {'name': 'Balanced', 'threshold': 0.5, 'branching_factor': 50},
        {'name': 'Coarse-Grained', 'threshold': 0.8, 'branching_factor': 50},
    ]
    
    results = []
    labels_dict = {}
    
    for n_clusters in n_clusters_range:
        for config in birch_configs:
            try:
                result = apply_birch_clustering(
                    data, 
                    threshold=config['threshold'],
                    branching_factor=config['branching_factor'],
                    n_clusters=n_clusters
                )
                
                # Store results
                model_key = f"BIRCH_{config['name']}_{n_clusters}"
                labels_dict[model_key] = result['labels']
                
                results.append({
                    'Model': f"BIRCH-{config['name']}",
                    'Clusters': n_clusters,
                    'Threshold': config['threshold'],
                    'Silhouette': result['silhouette'],
                    'Calinski-Harabasz': result['calinski_harabasz'],
                    'Davies-Bouldin': result['davies_bouldin']
                })
                
                print(f"✓ {model_key}: Silhouette={result['silhouette']:.3f}, T={config['threshold']}")
                
            except Exception as e:
                print(f"✗ Failed {config['name']} BIRCH with {n_clusters} clusters: {e}")
    
    results_df = pd.DataFrame(results)
    return results_df, labels_dict

## 4. Ensemble Consensus Strategies Module

In [None]:
# Ensemble Method 1: Majority Voting (MV)
def majority_voting_ensemble(labels_list):
    """
    Simple majority voting across multiple BIRCH clustering results.
    Assigns each user to the cluster label that appears most frequently.
    """
    labels_array = np.column_stack(labels_list)
    ensemble_labels, _ = mode(labels_array, axis=1, keepdims=False)
    return ensemble_labels.flatten()


# Ensemble Method 2: Weighted Voting (WV)
def weighted_voting_ensemble(labels_list, weights):
    """
    Weighted voting based on clustering quality (silhouette scores).
    Higher quality clusterings have more influence on the final consensus.
    Weight formula: w_m = exp(β * S_m) / Σ exp(β * S_j)
    """
    n_samples = len(labels_list[0])
    n_clusters = max([len(np.unique(labels)) for labels in labels_list])
    weighted_labels = np.zeros((n_samples, n_clusters))
    
    for weight, labels in zip(weights, labels_list):
        for i in range(n_samples):
            weighted_labels[i, labels[i]] += weight
    
    final_labels = np.argmax(weighted_labels, axis=1)
    return final_labels


# Ensemble Method 3: Advanced Affinity-based Spectral Clustering (AASC)
def aasc_ensemble(labels_list, data, n_clusters):
    """
    AASC: Constructs co-association matrix and applies spectral clustering.
    Captures agreement across ensemble members through affinity representation.
    
    Process:
    1. Build co-association matrix: A_ij = (1/M) * Σ I[C_m(i) = C_m(j)]
    2. Apply spectral clustering to the affinity matrix
    """
    n_samples = data.shape[0]
    aggregated_affinity = np.zeros((n_samples, n_samples))
    
    # Build consensus affinity matrix
    for labels in labels_list:
        affinity_matrix = np.zeros((n_samples, n_samples))
        for i in range(n_samples):
            for j in range(n_samples):
                affinity_matrix[i, j] = 1 if labels[i] == labels[j] else 0
        aggregated_affinity += affinity_matrix
    
    aggregated_affinity /= len(labels_list)
    aggregated_affinity += 1e-5  # Numerical stability
    
    # Apply spectral clustering
    spectral = SpectralClustering(n_clusters=n_clusters, affinity='precomputed', random_state=42)
    final_labels = spectral.fit_predict(aggregated_affinity)
    
    return final_labels


# Ensemble Method 4: BIRCH-Optimized Hierarchical Consensus (BOHC/CSPA)
def bohc_ensemble(labels_list, n_clusters):
    """
    BOHC (also known as CSPA - Cluster-based Similarity Partitioning Algorithm).
    Specifically designed for hierarchical clustering results from BIRCH.
    
    Preserves hierarchical structure information by building symmetric
    co-association matrix and applying spectral clustering.
    """
    n_samples = len(labels_list[0])
    co_assoc_matrix = np.zeros((n_samples, n_samples))
    
    # Build co-association matrix (symmetric)
    for labels in labels_list:
        for i in range(n_samples):
            for j in range(i + 1, n_samples):
                if labels[i] == labels[j]:
                    co_assoc_matrix[i, j] += 1
                    co_assoc_matrix[j, i] += 1
    
    co_assoc_matrix /= len(labels_list)
    co_assoc_matrix += 1e-5  # Numerical stability
    
    # Apply spectral clustering
    spectral = SpectralClustering(n_clusters=n_clusters, affinity='precomputed', random_state=42)
    final_labels = spectral.fit_predict(co_assoc_matrix)
    
    return final_labels


def calculate_ensemble_weights(data, labels_list, beta=5.0):
    """
    Calculate weights for ensemble members based on silhouette scores.
    Used for weighted voting ensemble.
    
    Weight formula: w_m = exp(β * S_m) / Σ exp(β * S_j)
    where S_m is the silhouette score and β is temperature parameter (default: 5.0)
    """
    weights = []
    
    for labels in labels_list:
        try:
            sil_score = silhouette_score(data, labels)
            weights.append(np.exp(beta * sil_score))
        except:
            weights.append(0.0)
    
    # Normalize weights
    total = sum(weights)
    if total > 0:
        weights = [w / total for w in weights]
    else:
        weights = [1.0 / len(weights)] * len(weights)
    
    return weights


def run_ensemble_consensus(data, birch_labels_dict, n_clusters_range=[5, 10, 15, 20]):
    """
    Run all ensemble consensus strategies on BIRCH results.
    
    Applies four consensus methods:
    1. Majority Voting (MV) - simple democratic voting
    2. Weighted Voting (WV) - quality-weighted voting
    3. AASC - affinity-based spectral clustering
    4. BOHC/CSPA - hierarchical consensus optimized for BIRCH
    
    Parameters:
    -----------
    data : ndarray
        Input data for evaluation
    birch_labels_dict : dict
        Dictionary of BIRCH clustering labels
    n_clusters_range : list
        List of cluster counts to evaluate
    
    Returns:
    --------
    ensemble_df : DataFrame
        Results for all ensemble methods
    ensemble_labels_dict : dict
        Dictionary of ensemble labels
    """
    ensemble_results = []
    ensemble_labels_dict = {}
    
    # Group BIRCH labels by cluster count
    labels_by_n_clusters = {n: [] for n in n_clusters_range}
    for key, labels in birch_labels_dict.items():
        n = int(key.split('_')[-1])
        if n in n_clusters_range:
            labels_by_n_clusters[n].append(labels)
    
    # Run ensemble methods for each cluster count
    for n_clusters in n_clusters_range:
        birch_labels = labels_by_n_clusters[n_clusters]
        
        if len(birch_labels) < 2:
            print(f"⚠ Skipping n={n_clusters}: insufficient BIRCH models")
            continue
        
        # Calculate weights for weighted voting
        weights = calculate_ensemble_weights(data, birch_labels)
        
        # Apply ensemble methods
        ensemble_methods = {
            'Majority_Voting': majority_voting_ensemble(birch_labels),
            'Weighted_Voting': weighted_voting_ensemble(birch_labels, weights),
            'AASC': aasc_ensemble(birch_labels, data, n_clusters),
            'BOHC': bohc_ensemble(birch_labels, n_clusters)
        }
        
        # Evaluate each ensemble method
        for method_name, labels in ensemble_methods.items():
            try:
                model_key = f"{method_name}_{n_clusters}"
                ensemble_labels_dict[model_key] = labels
                
                ensemble_results.append({
                    'Model': method_name,
                    'Clusters': n_clusters,
                    'Silhouette': silhouette_score(data, labels),
                    'Calinski-Harabasz': calinski_harabasz_score(data, labels),
                    'Davies-Bouldin': davies_bouldin_score(data, labels)
                })
                
                print(f"✓ {model_key}: Silhouette={ensemble_results[-1]['Silhouette']:.3f}")
                
            except Exception as e:
                print(f"✗ Failed {method_name} with {n_clusters} clusters: {e}")
    
    ensemble_df = pd.DataFrame(ensemble_results)
    return ensemble_df, ensemble_labels_dict

## 5. Dynamic Selection and Evaluation Module

In [None]:
def select_best_model(results_df, criteria='Silhouette'):
    """
    Select the best model based on clustering quality metrics.
    
    Parameters:
    -----------
    results_df : DataFrame
        Results with clustering metrics
    criteria : str
        Metric to optimize:
        - 'Silhouette': Higher is better (range: -1 to 1)
        - 'Calinski-Harabasz': Higher is better
        - 'Davies-Bouldin': Lower is better
    
    Returns:
    --------
    best_model : dict
        Best model configuration and metrics
    """
    if criteria == 'Davies-Bouldin':
        # Lower is better for Davies-Bouldin
        best_idx = results_df[criteria].idxmin()
    else:
        # Higher is better for Silhouette and Calinski-Harabasz
        best_idx = results_df[criteria].idxmax()
    
    best_model = results_df.loc[best_idx].to_dict()
    return best_model


def calculate_improvement(best_ensemble, best_birch):
    """
    Calculate improvement percentage from BIRCH to ensemble.
    """
    if best_birch['Silhouette'] == 0:
        return 0.0
    improvement = ((best_ensemble['Silhouette'] - best_birch['Silhouette']) / 
                   abs(best_birch['Silhouette']) * 100)
    return improvement

## 6. Complete BIRCH-AE Pipeline

In [None]:
class BIRCHAE:
    """
    BIRCH-AE: Complete framework for hierarchical ensemble user segmentation.
    
    This class implements the full BIRCH-AE pipeline:
    1. Data preprocessing
    2. Autoencoder feature learning
    3. BIRCH ensemble clustering
    4. Consensus strategies (MV, WV, AASC, BOHC)
    5. Dynamic model selection
    """
    
    def __init__(self, latent_dim=14, n_clusters_range=[5, 10, 15, 20]):
        """
        Initialize BIRCH-AE framework.
        
        Parameters:
        -----------
        latent_dim : int
            Dimensionality of autoencoder latent space (default: 14)
        n_clusters_range : list
            Range of cluster counts to evaluate (default: [5, 10, 15, 20])
        """
        self.latent_dim = latent_dim
        self.n_clusters_range = n_clusters_range
        self.autoencoder = None
        self.preprocessor = None
        self.reducer = None
        self.birch_results = None
        self.ensemble_results = None
        self.all_labels = {}
    
    def fit(self, filepath, user_id_col='user_id', reduction_method='autoencoder', 
            sample_size=None, use_autoencoder=True):
        """
        Fit the complete BIRCH-AE pipeline.
        
        Parameters:
        -----------
        filepath : str
            Path to input CSV file
        user_id_col : str
            Name of user ID column
        reduction_method : str
            'autoencoder' or 'pca'
        sample_size : int, optional
            Sample size for large datasets
        use_autoencoder : bool
            Whether to use autoencoder for feature learning
        """
        print("=" * 60)
        print("BIRCH-AE Framework - Starting Pipeline")
        print("=" * 60)
        
        # Step 1: Load and preprocess data
        print("\n[1/5] Loading and preprocessing data...")
        train_data, test_data, train_ids, test_ids, self.preprocessor = \
            load_and_preprocess_data(filepath, user_id_col, sample_size)
        
        # Combine train and test
        full_data = np.vstack([train_data, test_data])
        self.user_ids = np.concatenate([train_ids, test_ids])
        
        # Step 2: Dimensionality reduction
        print(f"\n[2/5] Applying dimensionality reduction: {reduction_method}...")
        
        if use_autoencoder and reduction_method == 'autoencoder':
            # Build and train autoencoder
            print("Building autoencoder architecture...")
            self.autoencoder = build_autoencoder(train_data.shape[1], self.latent_dim)
            print(f"Training autoencoder on {train_data.shape[0]} samples...")
            self.autoencoder, _ = train_autoencoder(self.autoencoder, train_data)
            
            # Encode data
            print("Encoding data to latent space...")
            reduced_data = encode_data(self.autoencoder, full_data)
            
        elif reduction_method == 'pca':
            train_reduced, self.reducer = reduce_dimensions_pca(train_data, self.latent_dim)
            test_reduced = self.reducer.transform(test_data)
            reduced_data = np.vstack([train_reduced, test_reduced])
        
        else:
            reduced_data = full_data
        
        print(f"Final data shape: {reduced_data.shape}")
        
        # Step 3: BIRCH ensemble clustering
        print(f"\n[3/5] Running BIRCH ensemble with threshold variations...")
        birch_results_df, birch_labels = run_birch_ensemble(reduced_data, self.n_clusters_range)
        self.birch_results = birch_results_df
        self.all_labels.update(birch_labels)
        
        # Step 4: Ensemble consensus strategies
        print(f"\n[4/5] Running ensemble consensus strategies (MV, WV, AASC, BOHC)...")
        ensemble_results_df, ensemble_labels = run_ensemble_consensus(
            reduced_data, birch_labels, self.n_clusters_range
        )
        self.ensemble_results = ensemble_results_df
        self.all_labels.update(ensemble_labels)
        
        # Step 5: Evaluation and selection
        print(f"\n[5/5] Evaluating results and selecting best model...")
        self._display_results()
        
        return self
    
    def _display_results(self):
        """Display comprehensive results summary."""
        print("\n" + "=" * 60)
        print("BIRCH Ensemble Results:")
        print("=" * 60)
        print(self.birch_results.to_string(index=False))
        
        print("\n" + "=" * 60)
        print("Ensemble Consensus Results:")
        print("=" * 60)
        print(self.ensemble_results.to_string(index=False))
        
        # Best models
        best_birch = select_best_model(self.birch_results)
        best_ensemble = select_best_model(self.ensemble_results)
        
        print("\n" + "=" * 60)
        print("Best Models:")
        print("=" * 60)
        print(f"Best BIRCH Configuration: {best_birch['Model']} ({best_birch['Clusters']} clusters)")
        print(f"  Threshold: {best_birch.get('Threshold', 'N/A')}")
        print(f"  Silhouette: {best_birch['Silhouette']:.4f}")
        print(f"  Calinski-Harabasz: {best_birch['Calinski-Harabasz']:.2f}")
        print(f"  Davies-Bouldin: {best_birch['Davies-Bouldin']:.4f}")
        
        print(f"\nBest Ensemble Method: {best_ensemble['Model']} ({best_ensemble['Clusters']} clusters)")
        print(f"  Silhouette: {best_ensemble['Silhouette']:.4f}")
        print(f"  Calinski-Harabasz: {best_ensemble['Calinski-Harabasz']:.2f}")
        print(f"  Davies-Bouldin: {best_ensemble['Davies-Bouldin']:.4f}")
        
        improvement = calculate_improvement(best_ensemble, best_birch)
        print(f"\n📊 Ensemble Improvement: {improvement:.2f}%")
        print("=" * 60)
    
    def get_labels(self, model_key):
        """
        Get cluster labels for a specific model.
        
        Example: get_labels('BOHC_10') returns BOHC consensus with 10 clusters
        """
        return self.all_labels.get(model_key)
    
    def get_user_segments(self, model_key):
        """
        Get user-segment mapping for a specific model.
        Returns DataFrame with user_id and segment columns.
        """
        labels = self.get_labels(model_key)
        if labels is None:
            print(f"Model '{model_key}' not found. Available models:")
            print(list(self.all_labels.keys()))
            return None
        
        return pd.DataFrame({
            'user_id': self.user_ids,
            'segment': labels
        })
    
    def save_results(self, output_path):
        """
        Save all results to CSV files.
        Creates two files:
        - {output_path}_metrics.csv: All clustering metrics
        - {output_path}_segments.csv: User segments from best model
        """
        # Save clustering metrics
        all_results = pd.concat(
            [self.birch_results.assign(Type='BIRCH'), 
             self.ensemble_results.assign(Type='Ensemble')],
            ignore_index=True
        )
        all_results.to_csv(f"{output_path}_metrics.csv", index=False)
        
        # Save best model labels
        best_model = select_best_model(self.ensemble_results)
        best_key = f"{best_model['Model']}_{int(best_model['Clusters'])}"
        segments = self.get_user_segments(best_key)
        
        if segments is not None:
            segments.to_csv(f"{output_path}_segments.csv", index=False)
            print(f"\n✓ Results saved:")
            print(f"  - {output_path}_metrics.csv")
            print(f"  - {output_path}_segments.csv")
        else:
            print(f"\n✗ Failed to save segments: model {best_key} not found")

---
## Example Usage

Below is a complete example showing how to use the BIRCH-AE framework.

In [None]:
# # Example: Running the complete BIRCH-AE pipeline
# 
# # Initialize the BIRCH-AE framework
# birch_ae = BIRCHAE(
#     latent_dim=14,                    # Dimensionality of latent space
#     n_clusters_range=[5, 10, 15, 20]  # Range of cluster counts to evaluate
# )
# 
# # Fit the framework on e-commerce user data
# birch_ae.fit(
#     filepath='ecommerce_user_metrics.csv',  # Path to your CSV file
#     user_id_col='visitorid',                # Name of user ID column
#     reduction_method='autoencoder',         # Use 'autoencoder' or 'pca'
#     sample_size=50000,                      # Optional: sample size for large datasets
#     use_autoencoder=True                    # Enable autoencoder feature learning
# )
# 
# # Get user segments for best ensemble model (automatically selected)
# # Or specify a particular model, e.g., 'BOHC_10' for BOHC with 10 clusters
# segments_df = birch_ae.get_user_segments('BOHC_10')
# print("\nSegment Distribution:")
# print(segments_df['segment'].value_counts())
# 
# # Save all results to files
# birch_ae.save_results('output/birch_ae_results')

---
## Advanced Usage Examples

In [None]:
# # Example 1: Using PCA instead of Autoencoder (faster, linear)
# birch_ae_pca = BIRCHAE(latent_dim=14)
# birch_ae_pca.fit(
#     filepath='retail_rocket_user_metrics.csv',
#     user_id_col='visitorid',
#     reduction_method='pca',
#     use_autoencoder=False  # Skip autoencoder training
# )
# 
# # Example 2: Large-Scale Dataset (millions of users)
# birch_ae_large = BIRCHAE(n_clusters_range=[5, 10, 15, 20, 25])
# birch_ae_large.fit(
#     filepath='large_ecommerce_dataset.csv',
#     sample_size=100000,  # Sample for manageable computation
#     reduction_method='autoencoder'
# )
# 
# # Example 3: Comparing Multiple Ensemble Methods
# ensemble_methods = ['Majority_Voting_10', 'Weighted_Voting_10', 'AASC_10', 'BOHC_10']
# 
# print("\nComparing Ensemble Methods (10 clusters):")
# for method in ensemble_methods:
#     segments = birch_ae.get_user_segments(method)
#     if segments is not None:
#         print(f"\n{method}:")
#         print(segments['segment'].value_counts().sort_index())
# 
# # Example 4: Analyzing BIRCH Threshold Impact
# print("\nBIRCH Threshold Analysis:")
# threshold_configs = ['BIRCH_Fine-Grained_10', 'BIRCH_Balanced_10', 'BIRCH_Coarse-Grained_10']
# 
# for config in threshold_configs:
#     labels = birch_ae.get_labels(config)
#     if labels is not None:
#         print(f"\n{config}:")
#         print(f"  Unique segments: {len(np.unique(labels))}")
#         print(f"  Distribution: {np.bincount(labels)}")

---
## Framework Notes and Best Practices

### BIRCH-AE Architecture:
The framework follows a hierarchical approach:
1. **Preprocessing**: StandardScaler for numeric, OneHotEncoder for categorical
2. **Feature Learning**: Deep autoencoder (512→256→128→latent→128→256→output)
3. **BIRCH Ensemble**: Multiple thresholds (0.3, 0.5, 0.8) capture different granularities
4. **Consensus**: Four strategies (MV, WV, AASC, BOHC) aggregate base clusterings
5. **Dynamic Selection**: Automatic best model selection via multi-criteria evaluation

### Parameter Recommendations:
- **latent_dim**: 10-20 works well (compression ratio 2:1 to 4:1)
- **n_clusters_range**: Test 5-25 clusters for e-commerce
- **BIRCH thresholds**:
  - 0.3 (fine-grained): Many small, homogeneous clusters
  - 0.5 (balanced): Moderate granularity (recommended starting point)
  - 0.8 (coarse-grained): Fewer, broader clusters
- **branching_factor**: 50 (default) balances tree depth and memory

### Evaluation Metrics:
- **Silhouette Score** (range: -1 to 1, higher is better):
  - > 0.5: Strong, well-separated clusters
  - 0.25-0.5: Reasonable structure
  - < 0.25: Weak or overlapping clusters
- **Calinski-Harabasz Index**: Higher is better (no fixed range)
- **Davies-Bouldin Index**: Lower is better (0 to ∞)

### Computational Considerations:
- **Memory Efficiency**: BIRCH's CF Tree requires O(n) space
- **Time Complexity**: 
  - BIRCH: O(n log n) for tree construction
  - Autoencoder: O(epochs × batch_size × features)
  - Consensus: O(n² × M) for affinity-based methods (AASC, BOHC)
- **Scalability**: For >100k users, consider sampling or use PCA instead of autoencoder
- **Incremental Learning**: BIRCH supports online updates for streaming data

### When to Use BIRCH-AE:
✓ E-commerce user segmentation with high-dimensional behavioral features
✓ Large-scale datasets (millions of users)
✓ Correlated features requiring non-linear dimensionality reduction
✓ Need for hierarchical multi-granularity segmentation
✓ Streaming data with incremental updates

### Citation:
If you use this framework in your research, please cite:
```
Li, C. et al. (2025). BIRCH-AE: A Scalable Hierarchical Ensemble Framework
for E-Commerce User Segmentation with Autoencoder Feature Learning.
IEEE Access. [Under Review]
```