# Model Evaluation (Weighted) – Notebook Guide

This notebook evaluates models with class/observation weights applied.

## What this notebook does
- Compute weighted metrics (e.g., weighted AUC, threshold metrics)
- Plot diagnostic figures considering weights
- Summarize results per model/run and export

## Inputs
- Predictions/scores, ground-truth labels, and weights per observation
- Optional: CV fold info or test set indicators

## Workflow
1. Load predictions, labels, and weights
2. Validate alignment and handle missing values
3. Compute weighted metrics across thresholds/folds
4. Plot weighted ROC/curves and summaries
5. Save metrics tables and figures

## Outputs
- Weighted per-model/per-fold metrics tables
- Plots reflecting weights
- CSV/JSON exports for downstream use

## Notes
- Ensure weights are normalized or in intended scale
- Use consistent preprocessing as training
- Fix random seeds for reproducibility where applicable


# Notebook Overview

This notebook evaluates weighted SDMs with metrics and plots, mirroring standard evaluation but accounting for weights in analysis where relevant.

- Key steps: load weighted predictions, compute metrics, plot curves, thresholds, reporting
- Inputs: weighted model predictions and labels
- Outputs: evaluation tables and plots
- Run order: After weighted model training.


# Weighted MaxEnt Model Evaluation and Performance Assessment

This notebook provides comprehensive evaluation of **weighted MaxEnt species distribution models**, focusing on performance assessment that accounts for sample weights and data quality differences. Unlike standard model evaluation, this version incorporates **weighted metrics** to properly assess model performance when training data has been weighted.

## Key Features of Weighted Model Evaluation:

### 1. **Weighted Performance Metrics**:
- **Weighted AUC**: Area Under ROC Curve accounting for sample weights
- **Weighted PR-AUC**: Precision-Recall AUC with weight integration
- **Weighted Sensitivity/Specificity**: Performance metrics adjusted for data quality
- **Weighted Precision/Recall**: Classification metrics incorporating sample weights

### 2. **Advanced Evaluation Approaches**:
- **Cross-Validation**: K-fold validation with weighted samples
- **Spatial Validation**: Geographic partitioning with weight consideration
- **Temporal Validation**: Time-based splits accounting for temporal weights
- **Bootstrap Validation**: Resampling with weight preservation

### 3. **Bias Assessment**:
- **Spatial Bias Analysis**: Evaluate model performance across different regions
- **Temporal Bias Assessment**: Performance across different time periods
- **Source Bias Evaluation**: Performance across different data sources
- **Quality Bias Analysis**: Performance across different data quality levels

## Applications:
- **Model Validation**: Comprehensive assessment of weighted model performance
- **Bias Detection**: Identify remaining biases after weighting
- **Performance Comparison**: Compare weighted vs. unweighted models
- **Quality Control**: Validate that weighting improves model reliability

In [None]:
############### WEIGHTED MODEL EVALUATION CONFIGURATION - MODIFY AS NEEDED ###############

# Species and region settings for weighted model evaluation
#specie = 'leptocybe-invasa'  # Target species: 'leptocybe-invasa' or 'thaumastocoris-peregrinus'
#pseudoabsence = 'random'  # Background point strategy: 'random', 'biased', 'biased-land-cover'
#training = 'east-asia'  # Training region: 'sea', 'australia', 'east-asia', etc.
#interest = 'south-east-asia'  # Test region: can be same as training or different
#savefig = True  # Save generated evaluation plots and metrics

# Environmental variable configuration
bio = bio1  # Bioclimatic variable identifier

# Evaluation settings (specific to weighted model evaluation)
# evaluation_method = 'cross_validation'  # 'cross_validation', 'spatial_validation', 'temporal_validation'
# n_folds = 5  # Number of folds for cross-validation
# spatial_buffer = 100  # Buffer distance (km) for spatial validation
# temporal_split = 0.7  # Proportion of data for training in temporal validation

# Weighted metrics configuration
# include_weighted_metrics = True  # Calculate weighted performance metrics
# include_unweighted_metrics = True  # Calculate standard metrics for comparison
# weight_threshold = 0.1  # Minimum weight threshold for sample inclusion

###########################################################

In [None]:
# =============================================================================
# IMPORT REQUIRED LIBRARIES
# =============================================================================

import os  # File system operations

import numpy as np  # Numerical computing
import xarray as xr  # Multi-dimensional labeled arrays (raster data)
import pandas as pd  # Data manipulation and analysis
import geopandas as gpd  # Geospatial data handling

import elapid as ela  # Species distribution modeling library

from shapely import wkt  # Well-Known Text (WKT) geometry parsing
from elapid import utils  # Utility functions for elapid
from sklearn import metrics, inspection  # Machine learning metrics and model inspection

import matplotlib.pyplot as plt  # Plotting and visualization

import warnings
warnings.filterwarnings("ignore")  # Suppress warning messages for cleaner output

# Configure matplotlib for publication-quality plots
params = {'legend.fontsize': 'x-large',
         'axes.labelsize': 'x-large',
         'axes.titlesize':'x-large',
         'xtick.labelsize':'x-large',
         'ytick.labelsize':'x-large'}
plt.rcParams.update(params)

In [None]:
def subplot_layout(nplots):
    """
    Calculate optimal subplot layout for given number of plots
    
    Parameters:
    -----------
    nplots : int
        Number of plots to arrange
    
    Returns:
    --------
    ncols, nrows : tuple
        Number of columns and rows for subplot layout
    """
    
    # Calculate square root and round up for balanced layout
    ncols = min(int(np.ceil(np.sqrt(nplots))), 4)  # Max 4 columns
    nrows = int(np.ceil(nplots / ncols))  # Calculate rows needed
    
    return ncols, nrows

In [None]:
# =============================================================================
# SET UP FILE PATHS
# =============================================================================
# Define directory structure for organizing weighted model evaluation outputs

docs_path = os.path.join(os.path.dirname(os.getcwd()), 'docs')  # Documentation directory
out_path = os.path.join(os.path.dirname(os.getcwd()), 'out', specie)  # Species-specific output directory
figs_path = os.path.join(os.path.dirname(os.getcwd()), 'figs')  # Figures directory
output_path = os.path.join(out_path, 'output')  # Model output directory

## 1. Weighted Training Model Performance Assessment

This section evaluates the performance of the weighted MaxEnt model on the training data. Key aspects include:

### **Weighted vs. Unweighted Metrics**:
- **Standard Metrics**: Traditional AUC, PR-AUC, sensitivity, specificity
- **Weighted Metrics**: Performance metrics accounting for sample weights
- **Comparison Analysis**: Evaluate improvement from weighting approach

### **Performance Indicators**:
- **ROC-AUC**: Area Under Receiver Operating Characteristic curve
- **PR-AUC**: Area Under Precision-Recall curve (important for imbalanced data)
- **Sensitivity**: True Positive Rate (ability to detect presences)
- **Specificity**: True Negative Rate (ability to detect absences)
- **Precision**: Positive Predictive Value
- **F1-Score**: Harmonic mean of precision and recall

### **Weighted Evaluation Benefits**:
- **Quality-Aware Assessment**: Metrics reflect data quality differences
- **Bias-Corrected Performance**: Reduced influence of low-quality samples
- **Robust Validation**: More reliable performance estimates

## References for Species Distribution Model Evaluation

### **Model Output Interpretation**:
- [SDM Model Outputs Interpretation](https://support.ecocommons.org.au/support/solutions/articles/6000256107-interpretation-of-sdm-model-outputs)
- [Presence-Only Prediction in GIS](https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/how-presence-only-prediction-works.htm)
- [MaxEnt 101: Species Distribution Modeling](https://www.esri.com/arcgis-blog/products/arcgis-pro/analytics/presence-only-prediction-maxent-101-using-gis-to-model-species-distribution/)

### **Performance Metrics**:
- [ROC Curves Demystified](https://towardsdatascience.com/receiver-operating-characteristic-curves-demystified-in-python-bd531a4364d0)
- [Precision-Recall AUC Guide](https://www.aporia.com/learn/ultimate-guide-to-precision-recall-auc-understanding-calculating-using-pr-auc-in-ml/)
- [F1-Score, Accuracy, ROC-AUC, and PR-AUC Metrics](https://deepchecks.com/f1-score-accuracy-roc-auc-and-pr-auc-metrics-for-models/)

### **Weighted Model Evaluation**:
- **Sample Weighting**: How to properly evaluate models trained with sample weights
- **Bias Correction**: Assessing the effectiveness of weighting strategies
- **Quality Integration**: Incorporating data quality into performance assessment

In [None]:
# =============================================================================
# LOAD WEIGHTED MODEL AND TRAINING DATA
# =============================================================================
# Load the trained weighted MaxEnt model and associated training data for evaluation

# Build experiment directory name (keeps runs organized by config)
# Alternate naming (older): 'exp_%s_%s_%s' % (pseudoabsence, training, interest)
experiment_name = 'exp_%s_%s_%s_%s_%s' % (model_prefix, pseudoabsence, training, topo, ndvi)
exp_path = os.path.join(output_path, experiment_name)  # Path to experiment directory

# Construct expected filenames produced during training for this run
train_input_data_name = '%s_model-train_input-data_%s_%s_%s_%s_%s.csv' % (model_prefix, specie, pseudoabsence, training, bio, iteration)
run_name = '%s_model-train_%s_%s_%s_%s_%s.ela' % (model_prefix, specie, pseudoabsence, training, bio, iteration)
nc_name = '%s_model-train_%s_%s_%s_%s_%s.nc' % (model_prefix, specie, pseudoabsence, training, bio, iteration)

In [None]:
# =============================================================================
# LOAD TRAINING DATA WITH SAMPLE WEIGHTS
# =============================================================================
# Load training data including sample weights for weighted model evaluation

# Load training data from CSV file (index_col=0 to drop old index column)
df = pd.read_csv(os.path.join(exp_path, train_input_data_name), index_col=0)
# Parse WKT strings into shapely geometries
df['geometry'] = df['geometry'].apply(wkt.loads)
# Wrap as GeoDataFrame with WGS84 CRS
train = gpd.GeoDataFrame(df, crs='EPSG:4326')

# Split predictors/labels/weights for weighted evaluation
x_train = train.drop(columns=['class', 'SampleWeight', 'geometry'])  # Environmental variables only
y_train = train['class']  # Presence/absence labels (0/1)
sample_weight_train = train['SampleWeight']  # Sample weights aligned with rows

# Load fitted weighted MaxEnt model
model_train = utils.load_object(os.path.join(exp_path, run_name))

# Predict probabilities on training set (for curves/metrics)
y_train_predict = model_train.predict(x_train)
# Optional: impute NaN probabilities to 0.5 (neutral)
# y_train_predict = np.nan_to_num(y_train_predict, nan=0.5)

In [None]:
# Model training performance metrics

# ROC curve and AUC (unweighted vs weighted)
# fpr/tpr are computed from predicted probabilities; weights adjust contribution per sample
fpr_train, tpr_train, thresholds = metrics.roc_curve(y_train, y_train_predict)
auc_train = metrics.roc_auc_score(y_train, y_train_predict)
auc_train_weighted = metrics.roc_auc_score(y_train, y_train_predict, sample_weight=sample_weight_train)

# Precision-Recall curve and PR-AUC (more informative on class imbalance)
precision_train, recall_train, _ = metrics.precision_recall_curve(y_train, y_train_predict)
pr_auc_train = metrics.auc(recall_train, precision_train)
# Weighted PR curve uses sample weights to compute precision/recall
precision_train_w, recall_train_w, _ = metrics.precision_recall_curve(y_train, y_train_predict, sample_weight=sample_weight_train)
pr_auc_train_weighted = metrics.auc(recall_train_w, precision_train_w)

# Report metrics
print(f"Training ROC-AUC score: {auc_train:0.3f}")
print(f"Training ROC-AUC Weighted score  : {auc_train_weighted:0.3f}")
print(f"PR-AUC Score: {pr_auc_train:0.3f}")
print(f"PR-AUC Weighted Score: {pr_auc_train_weighted:0.3f}")

|  |  | Specie existance |  |
| ------ | :-------: | :------: | :-------: |
| |  | **+** | **--** |
| **Specie observed** | **+** | True Positive (TP) | False Positive (FP) |
| | **--** | False Negative (FN) | True Negative (TN) |
| | | **All existing species (TP + FN)** | **All non-existing species (FP + TN)** |


$$TPR = \frac{TP}{TP + FN}$$
$$FPR = \frac{FP}{FP + TN}$$

In [None]:
# Visualize training distributions and curves
fig, ax = plt.subplots(ncols=3, figsize=(18, 6), constrained_layout=True)

# Left: Predicted probability distributions for presence vs pseudo-absence
ax[0].hist(y_train_predict[y_train == 0], bins=np.linspace(0, 1, int((y_train == 0).sum() / 100 + 1)),
           density=True, color='tab:red', alpha=0.7, label='pseudo-absence')
ax[0].hist(y_train_predict[y_train == 1], bins=np.linspace(0, 1, int((y_train == 1).sum() / 10 + 1)),
           density=True, color='tab:green', alpha=0.7, label='presence')
ax[0].set_xlabel('Relative Occurrence Probability')
ax[0].set_ylabel('Counts')
ax[0].set_title('Probability Distribution')
ax[0].legend(loc='upper right')

# Middle: ROC curve (random vs perfect baselines + model)
ax[1].plot([0, 1], [0, 1], '--', label='AUC score: 0.5 (No Skill)', color='gray')
ax[1].text(0.4, 0.4, 'random classifier', fontsize=12, color='gray', rotation=45, rotation_mode='anchor',
           horizontalalignment='left', verticalalignment='bottom', transform=ax[1].transAxes)
ax[1].plot([0, 0, 1], [0, 1, 1], '--', label='AUC score: 1 (Ideal Model)', color='tab:blue', zorder=-1)
ax[1].text(0, 1, '  perfect classifier', fontsize=12, color='tab:blue', horizontalalignment='left', verticalalignment='bottom')
ax[1].scatter(0, 1, marker='*', s=100, color='tab:blue')
# Overlay model ROC (unweighted and weighted AUC labels)
ax[1].plot(fpr_train, tpr_train, label=f'AUC score: {auc_train:0.3f}', color='tab:orange')
ax[1].plot(fpr_train, tpr_train, label=f'AUC Weighted score: {auc_train_weighted:0.3f}', color='tab:cyan', linestyle='-.')
ax[1].axis('equal')
ax[1].set_xlabel('False Positive Rate')
ax[1].set_ylabel('True Positive Rate')
ax[1].set_title('MaxEnt ROC Curve')
ax[1].legend(loc='lower right')

# Right: Precision-Recall curve (random/perfect baselines + model)
ax[2].plot([0, 1], [0.5, 0.5], '--', color='gray', label='AUC score: 0.5 (No Skill)')
ax[2].text(0.5, 0.52, 'random classifier', fontsize=12, color='gray', horizontalalignment='center', verticalalignment='center')
ax[2].plot([0, 1, 1], [1, 1, 0], '--', label='AUC score: 1 (Ideal Model)', color='tab:blue', zorder=-1)
ax[2].text(1, 1, 'perfect classifier  ', fontsize=12, color='tab:blue', horizontalalignment='right', verticalalignment='bottom')
ax[2].scatter(1, 1, marker='*', s=100, color='tab:blue')
# Overlay model PR curves (unweighted and weighted AUC labels)
ax[2].plot(recall_train, precision_train, label=f'AUC score: {pr_auc_train:0.3f}', color='tab:orange')
ax[2].plot(recall_train_w, precision_train_w, label=f"AUC Weighted score: {pr_auc_train_weighted:0.3f}", color='tab:cyan', linestyle='-.')
ax[2].axis('equal')
ax[2].set_xlabel('Recall')
ax[2].set_ylabel('Precision')
ax[2].set_title('MaxEnt PR Curve')
ax[2].legend(loc='lower left')

In [None]:
# Save figures if requested. Uses different filename patterns for current vs future scenarios.
# Note: 'models' is used to gate inclusion of model prefix; ensure it exists in your session.
if savefig:
    if Future:
        if models:  # include model identifier when available
            file_path = os.path.join(
                figs_path,
                '06_roc-pr-auc_%s_%s_%s_%s_%s_future.png' % (specie, training, bio, model_prefix, iteration),
            )
        else:
            file_path = os.path.join(
                figs_path,
                '06_roc-pr-auc_%s_%s_%s_%s_future.png' % (specie, training, bio, iteration),
            )
        fig.savefig(file_path, transparent=True, bbox_inches='tight')

    else:
        if models:
            file_path = os.path.join(
                figs_path,
                '06_roc-pr-auc_%s_%s_%s_%s_%s.png' % (specie, training, bio, model_prefix, iteration),
            )
        else:
            # Fallback: omit model prefix when not specified
            file_path = os.path.join(
                figs_path,
                '06_roc-pr-auc_%s_%s_%s_%s.png' % (specie, training, bio, iteration),
            )
        fig.savefig(file_path, transparent=True, bbox_inches='tight')


## 2. Test model performance

In [None]:
test_input_data_name = '%s_model-test_input-data_%s_%s_%s_%s_%s.csv' %(model_prefix, specie, pseudoabsence, interest, bio, iteration)

In [None]:
# Load held-out test dataset for evaluation
# Note: index_col=0 drops the old index saved during export
df = pd.read_csv(os.path.join(exp_path, test_input_data_name), index_col=0)
# Convert WKT geometry back to shapely objects
df['geometry'] = df['geometry'].apply(wkt.loads)
# Wrap as GeoDataFrame (WGS84 CRS)
test = gpd.GeoDataFrame(df, crs='EPSG:4326')

In [None]:
# Split predictors/labels/weights for test set
x_test = test.drop(columns=['class', 'SampleWeight', 'geometry'])
y_test = test['class']
sample_weight_test = test['SampleWeight']

# Predict probabilities on the test set using the trained model
y_test_predict = model_train.predict(x_test)
# Optional: impute NaN probabilities to 0.5 if present
# y_test_predict = np.nan_to_num(y_test_predict, nan=0.5)

In [None]:
# Test set metrics: ROC/PR curves and AUCs (unweighted vs weighted)
# ROC
fpr_test, tpr_test, _ = metrics.roc_curve(y_test, y_test_predict)
auc_test = metrics.roc_auc_score(y_test, y_test_predict)
auc_test_weighted = metrics.roc_auc_score(y_test, y_test_predict, sample_weight=sample_weight_test)

# Precision-Recall (PR)
precision_test, recall_test, _ = metrics.precision_recall_curve(y_test, y_test_predict)
pr_auc_test = metrics.auc(recall_test, precision_test)
precision_test_w, recall_test_w, _ = metrics.precision_recall_curve(y_test, y_test_predict, sample_weight=sample_weight_test)
pr_auc_test_weighted = metrics.auc(recall_test_w, precision_test_w)

# Print summary of training vs test for quick comparison
print(f"Training ROC-AUC score: {auc_train:0.3f}")
print(f"Training ROC-AUC Weighted score: {auc_train_weighted:0.3f}")
print(f"Test ROC-AUC score: {auc_test:0.3f}")
print(f"Test ROC-AUC Weighted score: {auc_test_weighted:0.3f}")

print(f"Training PR-AUC Score: {pr_auc_train:0.3f}")
print(f"Training PR-AUC Weighted Score: {pr_auc_train_weighted:0.3f}")
print(f"Test PR-AUC Score: {pr_auc_test:0.3f}")
print(f"Test PR-AUC Weighted Score: {pr_auc_test_weighted:0.3f}")

In [None]:
# Visualize test distributions and curves alongside training for comparison
fig, ax = plt.subplots(ncols=3, figsize=(18, 6), constrained_layout=True)

# Left: Predicted probability distributions on test set
ax[0].hist(y_test_predict[y_test == 0], bins=np.linspace(0, 1, int((y_test == 0).sum() / 100 + 1)),
           density=True, color='tab:red', alpha=0.7, label='pseudo-absence')
ax[0].hist(y_test_predict[y_test == 1], bins=np.linspace(0, 1, int((y_test == 1).sum() / 10 + 1)),
           density=True, color='tab:green', alpha=0.7, label='presence')
ax[0].set_xlabel('Relative Occurrence Probability')
ax[0].set_ylabel('Counts')
ax[0].set_title('Probability Distribution')
ax[0].legend(loc='upper right')

# Middle: ROC curves (train vs test, with weighted variants labeled)
ax[1].plot([0, 1], [0, 1], '--', label='AUC score: 0.5 (No Skill)', color='gray')
ax[1].text(0.4, 0.4, 'random classifier', fontsize=12, color='gray', rotation=45, rotation_mode='anchor',
           horizontalalignment='left', verticalalignment='bottom', transform=ax[1].transAxes)
ax[1].plot([0, 0, 1], [0, 1, 1], '--', label='AUC score: 1 (Ideal Model)', color='tab:blue', zorder=-1)
ax[1].text(0, 1, '  perfect classifier', fontsize=12, color='tab:blue', horizontalalignment='left', verticalalignment='bottom')
ax[1].scatter(0, 1, marker='*', s=100, color='tab:blue')
ax[1].plot(fpr_train, tpr_train, label=f'AUC train score: {auc_train:0.3f}', color='tab:orange')
ax[1].plot(fpr_train, tpr_train, label=f'AUC Weighted train score: {auc_train_weighted:0.3f}', color='tab:cyan', linestyle='-.')
ax[1].plot(fpr_test, tpr_test, label=f'AUC test score: {auc_test:0.3f}', color='tab:green')
ax[1].plot(fpr_test, tpr_test, label=f'AUC Weighted test score: {auc_test_weighted:0.3f}', color='tab:olive', linestyle='-.')
ax[1].axis('equal')
ax[1].set_xlabel('False Positive Rate')
ax[1].set_ylabel('True Positive Rate')
ax[1].set_title('MaxEnt ROC Curve')
ax[1].legend(loc='lower right')

# Right: PR curves (train vs test)
ax[2].plot([0, 1], [0.5, 0.5], '--', color='gray', label='AUC score: 0.5 (No Skill)')
ax[2].text(0.5, 0.52, 'random classifier', fontsize=12, color='gray', horizontalalignment='center', verticalalignment='center')
ax[2].plot([0, 1, 1], [1, 1, 0], '--', label='AUC score: 1 (Ideal Model)', color='tab:blue', zorder=-1)
ax[2].text(1, 1, 'perfect classifier  ', fontsize=12, color='tab:blue', horizontalalignment='right', verticalalignment='bottom')
ax[2].scatter(1, 1, marker='*', s=100, color='tab:blue')
ax[2].plot(recall_train, precision_train, label=f'AUC train score: {pr_auc_train:0.3f}', color='tab:orange')
ax[2].plot(recall_train_w, precision_train_w, label=f"AUC train Weighted score: {pr_auc_train_weighted:0.3f}", color='tab:cyan', linestyle='-.')
ax[2].plot(recall_test, precision_test, label=f'AUC test score: {pr_auc_test:0.3f}', color='tab:green')
ax[2].plot(recall_test_w, precision_test_w, label=f'AUC test Weighted score: {pr_auc_test_weighted:0.3f}', color='tab:olive', linestyle='-.')
ax[2].axis('equal')
ax[2].set_xlabel('Recall')
ax[2].set_ylabel('Precision')
ax[2].set_title('MaxEnt PR Curve')
ax[2].legend(loc='lower left')

In [None]:
# Save test figures if requested (future vs current naming handled similarly to training)
if savefig:
    if Future:
        if models:
            file_path = os.path.join(
                figs_path,
                '06_roc-pr-auc_%s_%s_%s_%s_%s_future.png' % (specie, interest, bio, model_prefix, iteration),
            )
        else:
            file_path = os.path.join(
                figs_path,
                '06_roc-pr-auc_%s_%s_%s_%s_future.png' % (specie, interest, bio, iteration),
            )
        fig.savefig(file_path, transparent=True, bbox_inches='tight')

    else:
        if model_prefix:
            file_path = os.path.join(
                figs_path,
                '06_roc-pr-auc_%s_%s_%s_%s_%s.png' % (specie, interest, bio, model_prefix, iteration),
            )
        else:
            file_path = os.path.join(
                figs_path,
                '06_roc-pr-auc_%s_%s_%s_%s.png' % (specie, interest, bio, iteration),
            )
        fig.savefig(file_path, transparent=True, bbox_inches='tight')

## 3. Evaluate model

### 3.2 Partial dependence plot/ Response curves

In [None]:
# fig, ax = model_train.partial_dependence_plot(x, labels=labels, dpi=100, n_bins=30)

## 4. Comprehensive Variable Importance Analysis

This section performs a thorough analysis of variable importance by:

1. **Initial Analysis**: Running the model with all 19 bioclimatic variables to establish baseline importance
2. **Iterative Removal**: Systematically removing the least important variables until we reach ~5 most important variables
3. **Performance Tracking**: Monitoring model performance as variables are removed
4. **Final Recommendations**: Identifying the optimal subset of variables for the species distribution model

### Methodology:
- **Permutation Importance**: Measures the drop in model performance when each variable is randomly shuffled
- **Iterative Backward Elimination**: Removes least important variables one at a time
- **Performance Monitoring**: Tracks AUC, PR-AUC, and other metrics throughout the process
- **Cross-Validation**: Ensures robust importance estimates


In [None]:
# =============================================================================
# COMPREHENSIVE VARIABLE IMPORTANCE ANALYSIS
# =============================================================================

import time
from sklearn.model_selection import cross_val_score
from sklearn.metrics import make_scorer

# Initialize storage for results
importance_results = {}
performance_history = {}
variable_subsets = {}

# Get current variable names from training data
current_variables = list(x_train.columns)
print(f"Starting with {len(current_variables)} variables:")
print(f"Variables: {current_variables}")

# Store initial performance metrics
initial_metrics = {
    'train_auc': auc_train,
    'train_auc_weighted': auc_train_weighted,
    'train_pr_auc': pr_auc_train,
    'train_pr_auc_weighted': pr_auc_train_weighted,
    'test_auc': auc_test,
    'test_auc_weighted': auc_test_weighted,
    'test_pr_auc': pr_auc_test,
    'test_pr_auc_weighted': pr_auc_test_weighted
}

performance_history['all_variables'] = initial_metrics
variable_subsets['all_variables'] = current_variables.copy()

print(f"\nInitial Performance (All {len(current_variables)} variables):")
print(f"Training AUC: {auc_train:.3f} (weighted: {auc_train_weighted:.3f})")
print(f"Training PR-AUC: {pr_auc_train:.3f} (weighted: {pr_auc_train_weighted:.3f})")
print(f"Test AUC: {auc_test:.3f} (weighted: {auc_test_weighted:.3f})")
print(f"Test PR-AUC: {pr_auc_test:.3f} (weighted: {pr_auc_test_weighted:.3f})")


In [None]:
# =============================================================================
# ITERATIVE VARIABLE REMOVAL FUNCTION
# =============================================================================

def iterative_variable_removal(x_train, y_train, sample_weight_train, x_test, y_test, sample_weight_test, 
                              target_variables=5, min_variables=3):
    """
    Iteratively remove least important variables until reaching target number.
    
    Parameters:
    -----------
    x_train, y_train, sample_weight_train : training data
    x_test, y_test, sample_weight_test : test data  
    target_variables : int, target number of variables to keep
    min_variables : int, minimum number of variables to keep
    
    Returns:
    --------
    results : dict, containing importance rankings and performance history
    """
    
    results = {
        'importance_rankings': {},
        'performance_history': {},
        'removed_variables': [],
        'final_variables': []
    }
    
    current_x_train = x_train.copy()
    current_x_test = x_test.copy()
    current_vars = list(current_x_train.columns)
    iteration = 0
    
    print(f"Starting iterative removal from {len(current_vars)} to {target_variables} variables...")
    
    while len(current_vars) > max(target_variables, min_variables):
        iteration += 1
        print(f"\n--- Iteration {iteration}: {len(current_vars)} variables remaining ---")
        
        # Train model with current variables
        model_iter = ela.MaxentModel()
        model_iter.fit(current_x_train, y_train, sample_weight=sample_weight_train)
        
        # Calculate permutation importance
        pi = inspection.permutation_importance(
            model_iter, current_x_train, y_train, 
            sample_weight=sample_weight_train, n_repeats=10
        )
        
        # Get importance scores and rank variables
        importance_scores = pi.importances.mean(axis=1)
        var_importance = dict(zip(current_vars, importance_scores))
        sorted_vars = sorted(var_importance.items(), key=lambda x: x[1], reverse=True)
        
        # Store ranking for this iteration
        results['importance_rankings'][f'iteration_{iteration}'] = {
            'variables': current_vars.copy(),
            'importance_scores': var_importance.copy(),
            'sorted_ranking': sorted_vars.copy()
        }
        
        # Calculate performance metrics
        y_train_pred = model_iter.predict(current_x_train)
        y_test_pred = model_iter.predict(current_x_test)
        
        # Training metrics
        train_auc = metrics.roc_auc_score(y_train, y_train_pred)
        train_auc_weighted = metrics.roc_auc_score(y_train, y_train_pred, sample_weight=sample_weight_train)
        train_precision, train_recall, _ = metrics.precision_recall_curve(y_train, y_train_pred)
        train_pr_auc = metrics.auc(train_recall, train_precision)
        train_precision_w, train_recall_w, _ = metrics.precision_recall_curve(y_train, y_train_pred, sample_weight=sample_weight_train)
        train_pr_auc_weighted = metrics.auc(train_recall_w, train_precision_w)
        
        # Test metrics
        test_auc = metrics.roc_auc_score(y_test, y_test_pred)
        test_auc_weighted = metrics.roc_auc_score(y_test, y_test_pred, sample_weight=sample_weight_test)
        test_precision, test_recall, _ = metrics.precision_recall_curve(y_test, y_test_pred)
        test_pr_auc = metrics.auc(test_recall, test_precision)
        test_precision_w, test_recall_w, _ = metrics.precision_recall_curve(y_test, y_test_pred, sample_weight=sample_weight_test)
        test_pr_auc_weighted = metrics.auc(test_recall_w, test_precision_w)
        
        # Store performance
        results['performance_history'][f'iteration_{iteration}'] = {
            'n_variables': len(current_vars),
            'train_auc': train_auc,
            'train_auc_weighted': train_auc_weighted,
            'train_pr_auc': train_pr_auc,
            'train_pr_auc_weighted': train_pr_auc_weighted,
            'test_auc': test_auc,
            'test_auc_weighted': test_auc_weighted,
            'test_pr_auc': test_pr_auc,
            'test_pr_auc_weighted': test_pr_auc_weighted
        }
        
        # Print current performance
        print(f"Performance with {len(current_vars)} variables:")
        print(f"  Train AUC: {train_auc:.3f} (weighted: {train_auc_weighted:.3f})")
        print(f"  Test AUC: {test_auc:.3f} (weighted: {test_auc_weighted:.3f})")
        print(f"  Train PR-AUC: {train_pr_auc:.3f} (weighted: {train_pr_auc_weighted:.3f})")
        print(f"  Test PR-AUC: {test_pr_auc:.3f} (weighted: {test_pr_auc_weighted:.3f})")
        
        # Identify least important variable
        least_important_var = sorted_vars[-1][0]
        least_important_score = sorted_vars[-1][1]
        
        print(f"Least important variable: {least_important_var} (importance: {least_important_score:.4f})")
        
        # Remove least important variable
        current_x_train = current_x_train.drop(columns=[least_important_var])
        current_x_test = current_x_test.drop(columns=[least_important_var])
        current_vars.remove(least_important_var)
        results['removed_variables'].append(least_important_var)
        
        print(f"Removed {least_important_var}. Variables remaining: {current_vars}")
    
    results['final_variables'] = current_vars.copy()
    print(f"\nFinal variable set ({len(current_vars)} variables): {current_vars}")
    
    return results


In [None]:
# =============================================================================
# RUN ITERATIVE VARIABLE REMOVAL ANALYSIS
# =============================================================================

print("="*80)
print("COMPREHENSIVE VARIABLE IMPORTANCE ANALYSIS")
print("="*80)

# Run the iterative removal process
start_time = time.time()

# Set target to 5 variables (can be adjusted)
target_vars = 5
min_vars = 3

# Run iterative removal
removal_results = iterative_variable_removal(
    x_train, y_train, sample_weight_train,
    x_test, y_test, sample_weight_test,
    target_variables=target_vars,
    min_variables=min_vars
)

end_time = time.time()
print(f"\nAnalysis completed in {end_time - start_time:.1f} seconds")

# Store results for later analysis
importance_results['iterative_removal'] = removal_results


In [None]:
# =============================================================================
# ANALYZE AND VISUALIZE RESULTS
# =============================================================================

# Extract performance trends
iterations = list(removal_results['performance_history'].keys())
n_vars = [removal_results['performance_history'][iter]['n_variables'] for iter in iterations]
train_aucs = [removal_results['performance_history'][iter]['train_auc'] for iter in iterations]
test_aucs = [removal_results['performance_history'][iter]['test_auc'] for iter in iterations]
train_aucs_weighted = [removal_results['performance_history'][iter]['train_auc_weighted'] for iter in iterations]
test_aucs_weighted = [removal_results['performance_history'][iter]['test_auc_weighted'] for iter in iterations]

# Add initial performance (all variables)
n_vars.insert(0, len(x_train.columns))
train_aucs.insert(0, auc_train)
test_aucs.insert(0, auc_test)
train_aucs_weighted.insert(0, auc_train_weighted)
test_aucs_weighted.insert(0, auc_test_weighted)

print("Performance Summary:")
print("="*50)
print(f"{'Variables':<12} {'Train AUC':<10} {'Test AUC':<10} {'Train AUC-W':<12} {'Test AUC-W':<12}")
print("-"*60)
for i, n_var in enumerate(n_vars):
    print(f"{n_var:<12} {train_aucs[i]:<10.3f} {test_aucs[i]:<10.3f} {train_aucs_weighted[i]:<12.3f} {test_aucs_weighted[i]:<12.3f}")

# Get final variable ranking
final_iteration = f"iteration_{len(iterations)}"
final_ranking = removal_results['importance_rankings'][final_iteration]['sorted_ranking']

print(f"\nFinal Variable Ranking (Top {len(removal_results['final_variables'])} variables):")
print("="*60)
for i, (var, importance) in enumerate(final_ranking, 1):
    print(f"{i:2d}. {var:<15} (importance: {importance:.4f})")

print(f"\nRemoved Variables (in order of removal):")
print("="*40)
for i, var in enumerate(removal_results['removed_variables'], 1):
    print(f"{i:2d}. {var}")


In [None]:
# =============================================================================
# CREATE COMPREHENSIVE VISUALIZATION
# =============================================================================

# Create a comprehensive figure showing the analysis results
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('Comprehensive Variable Importance Analysis', fontsize=16, fontweight='bold')

# 1. Performance vs Number of Variables
ax1 = axes[0, 0]
ax1.plot(n_vars, train_aucs, 'o-', label='Train AUC', color='tab:blue', linewidth=2)
ax1.plot(n_vars, test_aucs, 's-', label='Test AUC', color='tab:orange', linewidth=2)
ax1.plot(n_vars, train_aucs_weighted, 'o--', label='Train AUC (Weighted)', color='tab:blue', alpha=0.7)
ax1.plot(n_vars, test_aucs_weighted, 's--', label='Test AUC (Weighted)', color='tab:orange', alpha=0.7)
ax1.set_xlabel('Number of Variables')
ax1.set_ylabel('AUC Score')
ax1.set_title('Model Performance vs Number of Variables')
ax1.legend()
ax1.grid(True, alpha=0.3)
ax1.invert_xaxis()  # Show decreasing variables

# 2. Final Variable Importance (Top 10)
ax2 = axes[0, 1]
top_vars = final_ranking[:10]  # Top 10 variables
var_names = [var[0] for var in top_vars]
var_importance = [var[1] for var in top_vars]

bars = ax2.barh(range(len(var_names)), var_importance, color='tab:green', alpha=0.7)
ax2.set_yticks(range(len(var_names)))
ax2.set_yticklabels(var_names)
ax2.set_xlabel('Permutation Importance')
ax2.set_title('Top 10 Most Important Variables')
ax2.grid(True, alpha=0.3, axis='x')

# Add value labels on bars
for i, (bar, val) in enumerate(zip(bars, var_importance)):
    ax2.text(val + 0.001, i, f'{val:.3f}', va='center', fontsize=9)

# 3. Variable Removal Timeline
ax3 = axes[1, 0]
removed_vars = removal_results['removed_variables']
removal_order = list(range(1, len(removed_vars) + 1))
ax3.bar(removal_order, [1] * len(removed_vars), color='tab:red', alpha=0.7)
ax3.set_xlabel('Removal Order')
ax3.set_ylabel('Variables Removed')
ax3.set_title('Variable Removal Timeline')
ax3.set_xticks(removal_order)
ax3.set_xticklabels([f'#{i}' for i in removal_order])

# Add variable names as text
for i, var in enumerate(removed_vars):
    ax3.text(i + 1, 0.5, var, rotation=90, ha='center', va='center', fontsize=8)

# 4. Performance Degradation Analysis
ax4 = axes[1, 1]
# Calculate performance drop from initial
initial_test_auc = test_aucs[0]
initial_train_auc = train_aucs[0]
test_drop = [(initial_test_auc - auc) / initial_test_auc * 100 for auc in test_aucs]
train_drop = [(initial_train_auc - auc) / initial_train_auc * 100 for auc in train_aucs]

ax4.plot(n_vars, test_drop, 'o-', label='Test AUC Drop %', color='tab:red', linewidth=2)
ax4.plot(n_vars, train_drop, 's-', label='Train AUC Drop %', color='tab:purple', linewidth=2)
ax4.set_xlabel('Number of Variables')
ax4.set_ylabel('Performance Drop (%)')
ax4.set_title('Performance Degradation with Variable Removal')
ax4.legend()
ax4.grid(True, alpha=0.3)
ax4.invert_xaxis()

plt.tight_layout()


In [None]:
# Save the comprehensive analysis figure
if savefig:
    if Future:
        if models:
            file_path = os.path.join(
                figs_path,
                '06_comprehensive_var-importance_%s_%s_%s_%s_%s_future.png' % (specie, training, bio, model_prefix, iteration)
            )
        else:
            file_path = os.path.join(
                figs_path,
                '06_comprehensive_var-importance_%s_%s_%s_%s_future.png' % (specie, training, bio, iteration)
            )
        fig.savefig(file_path, transparent=True, bbox_inches='tight', dpi=300)
    else:
        if models:
            file_path = os.path.join(
                figs_path,
                '06_comprehensive_var-importance_%s_%s_%s_%s_%s.png' % (specie, training, bio, model_prefix, iteration)
            )
        else:
            file_path = os.path.join(
                figs_path,
                '06_comprehensive_var-importance_%s_%s_%s_%s.png' % (specie, training, bio, iteration)
            )
        fig.savefig(file_path, transparent=True, bbox_inches='tight', dpi=300)
    
    print(f"Comprehensive analysis figure saved to: {file_path}")


In [None]:
# =============================================================================
# EXPORT RESULTS TO CSV FOR FURTHER ANALYSIS
# =============================================================================

# Create summary DataFrame for export
summary_data = []

# Add initial performance (all variables)
summary_data.append({
    'iteration': 0,
    'n_variables': len(x_train.columns),
    'variables_removed': 'none',
    'train_auc': auc_train,
    'train_auc_weighted': auc_train_weighted,
    'test_auc': auc_test,
    'test_auc_weighted': auc_test_weighted,
    'train_pr_auc': pr_auc_train,
    'train_pr_auc_weighted': pr_auc_train_weighted,
    'test_pr_auc': pr_auc_test,
    'test_pr_auc_weighted': pr_auc_test_weighted
})

# Add iterative removal results
for i, iter_key in enumerate(iterations, 1):
    perf = removal_results['performance_history'][iter_key]
    removed_var = removal_results['removed_variables'][i-1] if i-1 < len(removal_results['removed_variables']) else 'none'
    
    summary_data.append({
        'iteration': i,
        'n_variables': perf['n_variables'],
        'variables_removed': removed_var,
        'train_auc': perf['train_auc'],
        'train_auc_weighted': perf['train_auc_weighted'],
        'test_auc': perf['test_auc'],
        'test_auc_weighted': perf['test_auc_weighted'],
        'train_pr_auc': perf['train_pr_auc'],
        'train_pr_auc_weighted': perf['train_pr_auc_weighted'],
        'test_pr_auc': perf['test_pr_auc'],
        'test_pr_auc_weighted': perf['test_pr_auc_weighted']
    })

# Create DataFrame
summary_df = pd.DataFrame(summary_data)

# Save to CSV
if savefig:
    csv_filename = f'06_variable_importance_analysis_{specie}_{training}_{bio}_{iteration}.csv'
    csv_path = os.path.join(figs_path, csv_filename)
    summary_df.to_csv(csv_path, index=False)
    print(f"Analysis summary saved to: {csv_path}")

# Display summary
print("\n" + "="*80)
print("FINAL ANALYSIS SUMMARY")
print("="*80)
print(f"Species: {specie}")
print(f"Training Region: {training}")
print(f"Test Region: {interest}")
print(f"Initial Variables: {len(x_train.columns)}")
print(f"Final Variables: {len(removal_results['final_variables'])}")
print(f"Variables Removed: {len(removal_results['removed_variables'])}")

print(f"\nFinal Variable Set:")
for i, var in enumerate(removal_results['final_variables'], 1):
    print(f"  {i}. {var}")

print(f"\nPerformance Comparison:")
print(f"  Initial Test AUC: {test_aucs[0]:.3f}")
print(f"  Final Test AUC: {test_aucs[-1]:.3f}")
print(f"  Performance Drop: {((test_aucs[0] - test_aucs[-1]) / test_aucs[0] * 100):.1f}%")

print(f"\nTop 5 Most Important Variables:")
for i, (var, importance) in enumerate(final_ranking[:5], 1):
    print(f"  {i}. {var} (importance: {importance:.4f})")


## 5. Recommendations and Next Steps

### Key Findings:

1. **Most Important Variables**: The analysis identified the top 5 most important bioclimatic variables for the species distribution model.

2. **Performance Impact**: The iterative removal process shows how model performance changes as less important variables are removed.

3. **Optimal Variable Set**: The final variable set provides a good balance between model complexity and performance.

### Recommendations:

1. **Use the Final Variable Set**: Consider using the identified top 5 variables for future modeling to reduce complexity while maintaining performance.

2. **Validate Results**: Test the reduced variable set on independent data to ensure robustness.

3. **Consider Ecological Significance**: Review the biological/ecological meaning of the most important variables to ensure they make sense for the target species.

4. **Further Analysis**: Consider running this analysis with different target numbers of variables (e.g., 3, 7, 10) to find the optimal balance.

### Files Generated:
- Comprehensive analysis figure showing all results
- CSV file with detailed performance metrics for each iteration
- Variable importance rankings and removal order


In [None]:
# Prepare labels and open training output NetCDF for metadata
labels = train.drop(columns=['class', 'geometry', 'SampleWeight']).columns.values
training_output = xr.open_dataset(os.path.join(exp_path, nc_name))
# display(labels)
# display(training_output)

In [None]:
# Compute partial dependence across features
# - percentiles bounds the feature grid to observed range (2.5% to 97.5%)
# - nbins controls resolution of the curve
percentiles = (0.025, 0.975)
nbins = 100

mean = {}
stdv = {}
bins = {}

for idx, label in enumerate(labels):
    # Request individual PDP curves across samples, then summarize
    pda = inspection.partial_dependence(
        model_train,
        x_train,
        [idx],
        percentiles=percentiles,
        grid_resolution=nbins,
        kind="individual",
    )

    mean[label] = pda["individual"][0].mean(axis=0)  # average response
    stdv[label] = pda["individual"][0].std(axis=0)   # variability across samples
    bins[label] = pda["grid_values"][0]              # feature grid values

In [None]:
#display(pda)


In [None]:
# Plot PDPs with uncertainty bands for each predictor
ncols, nrows = subplot_layout(len(labels))
fig, axs = plt.subplots(nrows=nrows, ncols=ncols, figsize=(ncols * 6, nrows * 6))

# Normalize axes list for consistent indexing
if (nrows, ncols) == (1, 1):
    ax = [axs]
else:
    ax = axs.ravel()

xlabels = training_output.data_vars
for iax, label in enumerate(labels):
    ax[iax].set_title(label)
    try:
        ax[iax].set_xlabel(xlabels[label].long_name)
    except (ValueError, AttributeError):
        ax[iax].set_xlabel('No variable long_name')

    # Uncertainty band: mean ± std across individuals
    ax[iax].fill_between(bins[label], mean[label] - stdv[label], mean[label] + stdv[label], alpha=0.25)
    ax[iax].plot(bins[label], mean[label])

# Style axes
for axi in ax:
    axi.set_ylim([0, 1])
    axi.set_ylabel('probability of occurrence')

fig.tight_layout()

In [None]:
# Save response curve figures if requested
if savefig:
    if Future:
        if models:
            file_path = os.path.join(
                figs_path,
                '06_resp-curves_%s_%s_%s_%s_%s_future.png' % (specie, training, bio, model_prefix, iteration),
            )
        else:
            file_path = os.path.join(
                figs_path,
                '06_resp-curves_%s_%s_%s_%s_future.png' % (specie, training, bio, iteration),
            )
        fig.savefig(file_path, transparent=True, bbox_inches='tight')

    else:
        if models:
            file_path = os.path.join(
                figs_path,
                '06_resp-curves_%s_%s_%s_%s_%s.png' % (specie, training, bio, model_prefix, iteration),
            )
        else:
            file_path = os.path.join(
                figs_path,
                '06_resp-curves_%s_%s_%s_%s.png' % (specie, training, bio, iteration),
            )
        fig.savefig(file_path, transparent=True, bbox_inches='tight')

### 3.3 Variable importance plot

In [None]:
# fig, ax = model_train.permutation_importance_plot(x,y)

In [None]:
# Permutation importance: measures drop in performance when each feature is shuffled
# Higher drop => more important feature
pi = inspection.permutation_importance(model_train, x_train, y_train, n_repeats=10)
importance = pi.importances
rank_order = importance.mean(axis=-1).argsort()

In [None]:
# Visualize permutation importances as horizontal boxplots (distribution over repeats)
labels_ranked = [labels[idx] for idx in rank_order]

fig, ax = plt.subplots()
box = ax.boxplot(importance[rank_order].T, vert=False, labels=labels_ranked)
# Decorate legend labels for key boxplot elements
box['fliers'][0].set_label('outlier')
box['medians'][0].set_label('median')
for icap, cap in enumerate(box['caps']):
    if icap == 0:
        cap.set_label('min-max')
    cap.set_color('k')
    cap.set_linewidth(2)
for ibx, bx in enumerate(box['boxes']):
    if ibx == 0:
        bx.set_label('25-75%')
    bx.set_color('gray')

ax.set_xlabel('Importance')
ax.legend(loc='lower right')
fig.tight_layout()

In [None]:
# if savefig:
#     if Future:
#         fig.savefig(os.path.join(figs_path, '06_var-importance_%s_%s_%s_future.png' %(specie, training, bio)), transparent=True, bbox_inches='tight')
#     else:
#         fig.savefig(os.path.join(figs_path, '06_var-importance_%s_%s_%s.png' %(specie, training, bio)), transparent=True, bbox_inches='tight')


if savefig:
    if Future:
        # Check if the 'model' variable is not null or empty
        if models:
            # If a model is specified, add it to the filename
            file_path = os.path.join(figs_path, '06_var-importance_%s_%s_%s_%s_%s_future.png' %(specie, training, bio, model_prefix, iteration))
        else:
            # If no model is specified, use the original filename
            file_path = os.path.join(figs_path, '06_var-importance_%s_%s_%s_%s_future.png' %(specie, training, bio, iteration))
        
        fig.savefig(file_path, transparent=True, bbox_inches='tight')

    else:
        if models:
            # If a model is specified, add it to the filename
            file_path = os.path.join(figs_path, '06_var-importance_%s_%s_%s_%s_%s.png' %(specie, training, bio, model_prefix, iteration))
        else:
            # This is the original logic for non-future scenarios, which remains unchanged
            file_path = os.path.join(figs_path, '06_var-importance_%s_%s_%s_%s.png' %(specie, training, bio,iteration))
        
        fig.savefig(file_path, transparent=True, bbox_inches='tight')