<img src="../Images/DSC_Logo.png" style="width: 400px;">

# Introduction: Geoscience Meets Machine Learning and Explainable AI (XAI)

This notebook briefly introduces core concepts and techniques of statistical learning - machine learning (ML) such as simple linear regression or complex artificial neural networks - through the lens of geoscientific problems.

Geoscientific modeling has traditionally relied on **process-based models** (PBMs), which encode established physical laws and empirical relationships to simulate Earth system processes. These models are grounded in theory, offering interpretability and physical consistency, and can extrapolate beyond observed conditions by design. However, they often struggle to fully exploit today’s big data and can stagnate when faced with unknown or poorly understood processes. Developing and calibrating PBMs is labor-intensive, and missing process knowledge or coarse parameterizations can lead to persistent biases. In contrast, purely data-driven ML approaches excel at detecting complex patterns from large datasets and have demonstrated state-of-the-art predictive skill in many Earth science applications. ML models are highly flexible and can combine diverse data to find unexpected correlations. Yet, pure ML has its own limitations: it typically requires vast training data, can struggle with noisy or biased observations, and is not as interpretable as PBMs. Many ML models function as **black boxes** — systems where data go in and results come out, with limited insight into what happens in between. Several strategies exist to make use of ML models under scientific claims including the use of white box models where the ML model itself is actually explainable, methods that focus on either explaining the data or the fitted model, and hybrid approaches that integrate ML with PBMs or physical knowledge (Reichstein et al. 2019; Irrgang et al. 2021; Jiang et al. 2024). 

Explainable ML (now refered to using the common term **explainable AI - XAI**) has huge potential for advancing scientific predictions and understanding in geosciences. The most obvious reason for seeking explanations for model outputs is to justify predictions (knowing why a model made a particular decision). Beyond that, XAI can uncover relationships between variables, highlighting which factors are most influential, how they interact, and whether their effects are linear, nonlinear, or conditional. XAI also enables the generation of new scientific hypotheses by revealing unexpected patterns in large, high-dimensional datasets, sparking fresh questions or challenging existing assumptions. Finally, XAI can support the evaluation of process-based models by comparing learned patterns to real-world data, identifying where traditional models may fall short or deviate from observed behavior (Jiang et al. 2024; Dramsch et al. 2025).

## 1. Characteristics of Geoscientific Data 

## 1.1 Data Sources

Geoscience data primarily come from three sources based on how the data are originally obtained (Karpatne et al. 2019): 

- **Remote Sensing Data:** Acquired by Earth-observing satellites and airborne platforms (e.g., drones, aircraft), remote sensing data offer global, continuous coverage of variables like surface temperature, humidity, and atmospheric composition. Agencies such as NASA, ESA, JAXA, as well as companies contribute to this growing data pool. These data are typically available as spatially and temporally gridded rasters and may span decades (e.g., Landsat from the 1970s). They can be used for both large-scale monitoring and localized studies (such as with image data collected with drones or airplanes).

- **Sensor (In-Situ) Data:** Collected by ground-based (e.g. river gauges), airborne (e.g. weather balloons), or ocean-based instruments (e.g. on ships), sensor data are some of the most direct and reliable sources of Earth observations. These datasets include meteorological, hydrological, and geophysical measurements, and can also include paleoclimate proxies. The data are typically non-uniform in spatial coverage and irregular in time, and are represented as geostatistical point reference data.

- **Model Simulation Data:** Generated by PBMs including numerical models that simulate Earth system processes using physical laws. These models require initial conditions and parameters to produce data. Examples include climate models and terrestrial models. Simulation data provide information about both current states and future projections of geoscientific variables.
 
## 1.2 Data Properties and Limitations

Working with geoscientific data means inference under uncertainty. Much of a geoscientists work relies on indirect observations and models under certain man-made assumptions to understand subsurface structures, hydrologic systems, or Earth surface processes. ML is data-driven and therefore highly dependent on the quality, structure, and availability of data. In geosciences, it faces a set of challenges arising from the inherently complex, structured, and often imperfect nature of Earth system data (Karpatne et al. 2019; Reichstein et al. 2019):

- **Spatio-temporal structure:** Geoscience data exhibit strong spatial and temporal autocorrelation due to the nature of Earth processes. Local similarity (e.g., neighboring land cover types) follows Tobler’s First Law of Geography, while long-range dependencies (e.g., El Nino Southern Oscillation (ENSO) effects) highlight complex global teleconnections and memory in time.

- **Heterogeneity in space and time:** Variability in geography, climate, and natural cycles leads to high heterogeneity across regions and time. This includes seasonal patterns, decadal oscillations, and long-term geological or climate changes, which make modeling across all space-time points challenging.

- **High dimensionality:** The Earth system involves numerous interacting variables across various depths (subsurface, atmosphere, oceans) and scales. High-resolution data further increases dimensionality, sometimes involving a massive amount of variables.

- **Lack of concise object definitions:** Geoscience entities like storms or ocean eddies have vague, dynamic spatial and temporal boundaries. Unlike discrete entities in other domains (e.g. products in a retail store), their complex, continuous behavior complicates detection and tracking.

- **Big volume but small sample sizes and rare data:** Earth system data volumes are growing quasi-exponentially. Despite high data volume, certain key applications have limited samples. For example, satellite data only span ~50 years, and paleo-climate data are sparse both spatially and temporally. Labeled, gold-standard data are scarce because ground truth acquisition (e.g., field surveys, aircraft measurements) is expensive and limited. Some processes (e.g., subsurface flows) lack observable ground truth entirely, limiting supervised learning. In addition, many impactful phenomena (e.g., hurricanes, floods, deforestation) are rare yet critical. Detecting these rare events is vital but difficult due to their infrequency and atypical signatures.

- **Multi-source, multi-resolution data:** Geoscience data come from varied sources (see Sect. 1.1) with differing spatial and temporal resolutions. For instance, aircraft imagery may be high-resolution but infrequent, while satellites offer lower resolution but regular coverage. Integrating these complementary datasets is essential for analyzing processes across scales, often requiring methods like interpolation or reanalysis techniques.

- **Poor data quality:** Geoscience data often suffer from noise, gaps, and inconsistencies due to sensor failures, environmental interference (e.g., clouds, snow), and changes in measurement systems (e.g. reference point for water table depth measurements). Model-based data also carry uncertainty due to imperfect initial conditions and approximations.

## 2. What is Machine Learning?

***"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E."*** (Mitchell et al. 1997)

In general, machine learning is all about **making predictions**. Fig. 1 below illustrates that we can fit a blue line to the data points to show the trend, but we can also use the line to make predictions. Machine learning builds on the principle of learning from data to improve task performance through experience. At its core, it follows a **trial-and-error approach**, where models are refined based on the errors they make when predicting outcomes. This process involves three key components:

1. **Data** consisting of features (and labels in supervised learning).

2. A **model** space containing candidate functions or hypotheses.

3. A **loss** function that quantifies how well a model's predictions match the observed data.

The original data used to fit a model is called **training data**. In Fig. 1 besides the blue line a green squiggle was fitted to the training data. The squiggle fits the training data much better, but the goal of ML is to make predictions. So we need a way to decide if the line of the squiggle is better or worse using some new data that the model don't know: **testing data**. If we compare the sum of **distances (errors)** between the predictions and actual values on the testing data, we may find that the blue line actually performs better on new data. There are many ML methods available, including highly complex techniques like deep learning. However, what matters most is not how complex or fancy a method is, but whether it best fits our needs and this is something we can evaluate using testing data.

<figure style="text-align: center;">
  <img src="../Images/ML1.png" alt="Regression line and squiggle on training data" style="width: 300px;">
  <figcaption style="font-size: 14px; margin-top: 8px;">
    Fig. 1 Data points with fitted regression line and squiggle using training data, modified from 
    <a href="https://www.youtube.com/watch?v=Gv9_4yMHFhI&list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF" target="_blank">
      Josh Starmer (YouTube)
    </a>.
  </figcaption>
</figure>

## 3. Geoscientific Statistical Learning

The properties and limitations of geoscientific data present both challenges and opportunities for ML. This section offers a basic introduction to statistical learning, illustrated through selected applications in the geosciences.

## 3.1 Machine Learning Tasks

ML tasks include the following:

- **Detecting** objects and events: Identifying dynamic geoscience phenomena from spatio-temporal data is critical for understanding environmental processes. ML offers the potential for automated, data-driven detection. Examples: Finding extreme weather patterns; Land-use and change detection.

- **Estimating** geoscience variables: Many physical variables are difficult to measure directly (e.g., methane levels, groundwater flow) or not measured at all (e.g. river runoff in ungauged catchments). ML can be used to estimate such variables from observable data like satellite imagery, enabling broader environmental monitoring.

- Long-term **forecasting**: Forecasting future values of geoscience variables (e.g., temperature, GHG concentrations) supports climate modeling and planning. ML can model temporal trends using historical data, complementing physics-based simulations.

- Mining **relationships** in data: Understanding how different Earth system components influence each other is a central scientific goal. ML can help discover such relationships from large-scale geoscience datasets.

## 3.2 Learning Types and Algorithms

While geoscientific problems vary widely in scope and complexity, they can often be mapped onto standard ML paradigms depending on the availability of labeled data. We here focus on supervised and unsupervised learning that serve as foundational tools for extracting knowledge from geoscientific datasets. Each method leverages different aspects of the data: supervised learning uses historical observations with known outcomes, while unsupervised learning uncovers patterns and structures from unlabeled data.

- **Supervised learning:** When labeled data is available, supervised learning can be used to predict or classify geoscientific phenomena based on past observations. Formally, it assumes an unknown functional relationship between the input and output spaces, which is learned from a dataset consisting of paired input-output instances. The learned function can then be used to predict outputs for new, unseen inputs. Supervised learning tasks fall into two main categories:

    - **Classification**: When the outputs are categorical (e.g., classifying an area as forest, water, or urban). Examples:
        - Logistic Regression
        - Support Vector Machines (SVM)
        - k-Nearest Neighbors (k-NN)
        - Tree-based methods (Random Forest, Gradient Boosting)
        - Neural Networks (MLPs, CNNs, RNNs).
        <p>
    - **Regression**: When the outputs are numerical or continuous (e.g., estimating air pollution concentration or river discharge). Examples:
        - Linear Regression
        - Ridge / Lasso Regression
        - Support Vector Regression (SVR)
        - Tree-based methods (Random Forest, Gradient Boosting)
        - Neural Networks (MLPs, CNNs, RNNs).
<p>
    
- **Unsupervised learning:** In the absence of labels, unsupervised learning can be used to discover hidden patterns, structures, or groupings within complex datasets. Hence, unsupervised learning is especially usefull for exploratory data analysis. It involves techniques such as:

    - **Clustering**: The model groups similar data points together (e.g., identifying regions with similar climate patterns). Examples: K-means; Hierarchical Clustering
    - **Dimensionality reduction**: The aim is to find lower-dimensional representations of high-dimensional data (e.g., summarizing multivariate satellite time series). Examples: Principal Component Analysis (PCA); t-Distributed Stochastic Neighbor Embedding (t-SNE).
    - **Anomaly detection and density estimation**: The aim is to identify unusual observations or estimate the underlying data distribution (e.g. identifying areas with unusual mineral concentrations). Examples: Isolation Forest; One-Class SVM.
<p>

Note: 
- Many ML algorithms are versatile and can be adapted for different tasks depending on how the problem is framed. The type of learning task (supervised or unsupervised) and the type of output (categorical or continuous) guide the configuration and use of algorithms.
- ML methods can support both data preparation and problem solving. For example, clustering can help group similar land use regions before modeling (preparation), while a neural network might be used to predict drought conditions (solution).

## 3.3 What Lies Between Raw Data and the Model?
The interface between geoscientific data and ML involves several key components and processes that transform raw data into meaningful predictions or insights:

- **Data integration**
    - Combining multi-source data (e.g., satellite and in situ) with varying spatial and temporal resolutions.
    - Harmonizing formats, projections, and units.
<p>
    
- **Data preprocessing** 
    - Cleaning: Handling missing values, noise, and inconsistencies in the data.
    - Transformation: Scaling, normalization, log transforms, etc.
    - Augmentation: Creating synthetic data to increase sample size and variability (e.g., adding noise, flipping images, generating new samples). Especially useful when labeled data is limited.
<p>

- **Feature engineering and labeling** 
    - Feature engineering: Extracting or deriving meaningful variables (e.g., temporal trends, spatial gradients).
    - For supervised learning: assigning labels (e.g., event occurrence, land cover class), often limited by paucity of ground truth.
<p>

- **Feature selection and dimensionality reduction**
    - Identifying the most relevant variables for learning. Missing important features may lead to incomplete explanations. Too many features might produce issues related to the **curse of dimensionality** (when high-dimensional data becomes sparse and harder to model).
    - Apply feature selection to reduce covariance among features.
    - Apply dimensionality reduction techniques to reduce complexity (e.g. PCA).
<p>
    
- **Supervised learning: data splitting**
    - Typically, data is split into **training and testing sets** to evaluate model generalization (see Sect. 3.4).
    - A third **validation set** is used, especially when performing **hyperparameter tuning** without **cross-validation**.

Note: 
- Not all techniques for preparing raw data for ML are suitable or necessary for every use case or model. Their applicability depends on the nature of the data, the selected ML algorithm, and the specific modeling objectives.

## 3.4 Geoscientific Data is Used to Train, Test, and Apply ML Models. What Does This Mean?

1. **Training** involves **learning** a mathematical mapping from input to output using a dataset. In supervised learning, this means learning from labeled examples by minimizing a **loss function** that quantifies prediction error; In unsupervised learning, the model seeks to uncover patterns or structures by optimizing an objective function that captures internal consistency (e.g., cluster compactness). Training may also involve **feature extraction or selection**, **hyperparameter tuning**, and **choosing a model** from a family of candidate models. When training a model, we try to avoid both **overfitting** and **underfitting**. Overfitting occurs when the model performs well on the training data but poorly on unseen data (e.g., test data), because it has essentially memorized the training examples rather than learning general patterns. Underfitting happens when a model is too simple to capture the underlying patterns in the data.
    
    These behaviors are related to the **bias–variance tradeoff**:
    - Underfitting is typically caused by **high bias**: the model is too simple.
    - Overfitting is caused by **high variance**: the model is too sensitive to fluctuations in the training data. Let's have a look again at Fig. 1: Because the squiggle fits the training data better it has little bias. However, the straight line fits the testing data set better. This difference in fits between the training and testing data sets is called variance. In machine learning, we try to find a model that has low bias AND low variability. <p>

3. **Testing** refers to **evaluating** how well the trained model generalizes to **new, unseen data**. This phase uses a testing set, which was not used during training, to estimate the performance of the model. For supervised tasks, this generally involves calculating **performance metrics** such as accuracy (for classification) and mean absolute error (MAE) or root mean squared error (RMSE) (for regression). For unsupervised tasks, evaluating might involve, for example, calculating internal clustering indices (e.g. silhouette score). Testing is critical for model validation, and poor performance often leads to revisiting the design process in order to adjust features, algorithms, or data preparation strategies.

4. **Application** refers to deploying the trained model on new, operational data to make predictions, support decisions, or analyze structure. The model now operates independently of the training data, and must **generalize** well to maintain utility.

## 4. XAI

We can interpret ML models when we understand or can follow how they learn from data. This is the core motivation of XAI. Interpretability is important because it helps us trust, debug, and improve models. In science, decisions based on ML predictions need to be transparent and justifiable. Without interpretability, we risk using models that are accurate for the wrong reasons, sensitive to biases, or incapable of generalizing beyond the training data. XAI has therefore been receiving much attention across the geoscience domains including atmospheric sciences (Eyring et al. 2024; Ruyi et al. 2024), hydrology (Basagaoglu et al. 2022; Maier et al. 2024), and remote sensing (Roscher et al. 2020; Höhl et al. 2024; Tuia et al. 2024).

## 4.1 Learning Mechanisms Impact Interpretability

ML models learn and "think" in different ways. Some build human-readable rules, while others rely on detecting hidden patterns. Here's a simple comparison using the example of predicting landslide risk with supervised methods:

| **Model Type**        | **How it Learns**                                          | **What it “Remembers”**                    | **Interpretability**                    |
|-----------------------|----------------------------------------------------------------------------------|---------------------------------------------|------------------------------------------|
| **Linear Model**      | “Every 1° in slope adds 3% landslide risk. Rainfall adds 5% per 10mm.”          | A weight for each feature                   | 🟢 Simple formula; weights are readable.  |
| **Decision Tree**     | “If slope > 35° and rainfall > 100mm, then landslide likely.”                   | A flowchart of IF-THEN rules                | 🟢 Easy to trace step-by-step decisions.  |
| **k-Nearest Neighbors** | “This slope is like 3 others that slid last year.”                              | All training examples + similarity rule     | 🟡 Can show neighbors, not rules.         |
| **Support Vector Machine** | “I draw the cleanest possible line between safe and risky slopes.”              | A few support vectors + the separating line | 🟡 Easy to visualize in 2D, but hard to scale/explain in many dimensions. |
| **Random Forest / XGBoost** | “Ask 100 different small trees and combine their votes.”                       | A large group of decision trees             | 🔴 Hard to trace the final vote reason.   |
| **Neural Network**    | “I learned deep, layered patterns between rain, slope, and soil.”              | Hidden layers and weights              | 🔴 Not human-readable without tools.      |

Unsupervised learning often tries to discover hidden structure in data (e.g. clusters, dimensions) without labeled outcomes. No target to validate against means that discovering structures is not always obvious. Here are some examples for unsupervised models for the use case of landslide pattern discovery:

| **Model Type**              | **How It Learns**                                                                   | **What It “Remembers”**                                         | **Interpretability**                             |
|----------------------------|----------------------------------------------------------------------------------------------------------|------------------------------------------------------------------|--------------------------------------------------|
| **K-Means Clustering**      | “I group locations into K zones based on similar features (e.g., rainfall, soil, slope).”                | Coordinates of cluster centers + point assignments              | 🟡 Cluster centers can be interpreted if K is low.|
| **Hierarchical Clustering** | “I build a tree of location similarities and cut it at a level that separates risk zones.”               | A dendrogram linking similar data points                        | 🟡 Tree is readable for small areas; gets complex.|
| **Autoencoders (Unsupervised)**| “Compress and reconstruct input features like slope and rainfall using neural networks.”             | Weights and activations in hidden layers                     | 🔴 Not human-readable without tools.                             |

## 4.2 Types of Interpretability in Machine Learning

Models have different inherent levels of interpretability. Some naturally offer insights into how they make decisions, while others function more like black boxes. Because of this, additional interpretability methods are often used to help us better understand and trust complex models.

Interpretability can be split into two types: 
1. **By design:** Interpretability by design involves using models that are **inherently understandable**, like linear regression. By design means the algorithm has to constrain the search of models to those that are interpretable (see Notebook 2). This can also include **hybrid models** that integrate physical process knowledge or conceptual frameworks with ML components (see Notebook 5).
   
2. **Post-hoc:** Post-hoc interpretability applies **explanation methods** after training a model. In this sense, we use XAI methods to interpret models. Explanation methods can be:
<p>

- **Model-specific:** Relies on the internal structure of the model, i.e. they can be used only for a single algorithm class (see e.g. Random Forest feature importance in Notebook 3).
  
- **Model-agnostic:** Explains the model by observing how outputs change with input changes, without looking inside the model (see Notebook 4). These methods further divide into:

    - **Local:** Focuses on individual predictions, explaining why a model acted a certain way for a specific instance.
    - **Global:** Explains the overall model behavior.

<div style="text-align: center;">
  <img src="../Images/XAI.png" style="width: 400px;">
  <div style="font-size: 14px; margin-top: 8px;">Fig. 2 Overview of interpretability methods in machine learning, modified from Molnar (2025)</div>
</div>

## 4.3 XAI Best Practices

- **Best-practice XAI begins with best-practices for data and ML:** XAI relies fundamentally on sound modeling foundations such as robust data preparation, appropriate feature selection, rigorous validation aligned with data structure (e.g., spatial or temporal), and uncertainty analysis in ML. 

- **Choosing XAI methods:** No single interpretation technique works best for every model or task. In fact, different methods can often produce inconsistent results. For this reason, it’s recommended to apply multiple XAI approaches when possible, to better assess the reliability and robustness of the insights. In addition, starting with simple models and increasing complexity gradually or considering hybrid models that are somehwat physics-informed or -embedded is generally considered best practice (Jiang et al. 2024; Zhaou et al. 2024).

- **Understanding what interpretations can and cannot:** A model explanation should be considered a hypothesis rather than a definitive truth about the data or causal relations. XAI methods reveal how models make predictions, but not necessarily why the underlying phenomena occur. A comprehensive understanding of a model’s interpretation should incorporate both data-driven perspectives (e.g., correlations or biases in the input data) and model-driven perspectives (e.g., architectural assumptions, constraints, or learned representations), in order to examine all potential relationships that contribute to the model’s behavior.

- **Pay close attention to implementation details in code libraries:** XAI methods often have multiple implementations across programming languages and (Python) packages, each with configurable parameters and default settings that can significantly affect the explanations produced and their fit to the data. It's essential to review the code documentation, understand default behaviors, and align parameter settings with expectations. For instance, Generalized Additive Models (see Notebook 2) are implemented differently with different options in R (e.g., `mgcv`) and Python (e.g., `pyGAM`). Similarly, SHAP in Python (see Notebook 4) offers multiple explainers (`TreeExplainer`, `DeepExplainer`, `KernelExplainer`, ...), each tailored to specific model types and with different assumptions and possible settings. These variations can lead to different outputs even for the same model name and data values.

## 5. Access to ML Models & XAI Methods

Modern tools written in Python, R, and MATLAB have laid the foundation for robust, scalable, and accessible ML research and applications. In Python, libraries such as **scikit-learn**, **PyTorch**, and **TensorFlow** have become essential building blocks. These libraries offer efficient implementations of core algorithms, streamlined workflows for model development and evaluation, and strong support from both academic and industry communities.

Beyond general-purpose libraries, domain-specific packages (e.g. NeuralHydrology for hydrological forecasting; or Biopython for bioinformatics) extend these foundations to specialized fields, offering tailored solutions. Post-hoc XAI methods such as SHAP (SHapley Additive exPlanations) are often implemented as separate libraries that operate alongside core model libraries. 

## References and Further Learning

Başağaoğlu, H., Chakraborty, D., Lago, C. D., Gutierrez, L., Şahinli, M. A., Giacomoni, M., Furl, C., Mirchi, A., Moriasi, D., and Şengör, S. S.: A review on interpretable and explainable artificial intelligence in hydroclimatic applications, Water, 14, 1230, doi:10.3390/w14081230,     2022.

Dwivedi, R., Dave, D., Naik, H., Singhal, S., Omer, R., Patel, P., Qian, B., Wen, Z., Shah, T., and Morgan, G.: Explainable AI (XAI): Core ideas, techniques, and solutions, ACM Computing Surveys, 55, 1–33, doi:10.1145/3561048,  2023.

Eyring, V., Collins, W. D., Gentine, P., Barnes, E. A., Barreiro, M., Beucler, T., Bocquet, M., Bretherton, C. S., Christensen, H. M., and Dagon, K.: Pushing the frontiers in climate modelling and analysis with machine learning, Nat. Clim. Chang., 14, 916–928, doi:10.1038/s41558-024-02095-y,  2024.

Irrgang, C., Boers, N., Sonnewald, M., Barnes, E. A., Kadow, C., Staneva, J., and Saynisch-Wagner, J.: Towards neural Earth system modelling by integrating artificial intelligence in Earth system science, Nat Mach Intell, 3, 667–674, doi:10.1038/s42256-021-00374-3,    2021.

Jiang, S., Sweet, L., Blougouras, G., Brenning, A., Li, W., Reichstein, M., Denzler, J., Shangguan, W., Yu, G., Huang, F., and Zscheischler, J.: How Interpretable Machine Learning Can Benefit Process Understanding in the Geosciences, Earth's Future, 12, doi:10.1029/2024EF004540,  2024.

Höhl, A., Obadic, I., Fernández-Torres, M.-Á., Najjar, H., Oliveira, D. A. B., Akata, Z., Dengel, A., and Zhu, X. X.: Opening the Black Box: A systematic review on explainable artificial intelligence in remote sensing, IEEE Geoscience and Remote Sensing Magazine, doi:10.1109/MGRS.2024.3467001,  2024.

Karpatne, A., Ebert-Uphoff, I., Ravela, S., Babaie, H. A., and Kumar, V.: Machine Learning for the Geosciences: Challenges and Opportunities, IEEE Trans. Knowl. Data Eng., 31, 1544–1554, doi:10.1109/TKDE.2018.2861006,  2019.

Maier, H. R., Taghikhah, F. R., Nabavi, E., Razavi, S., Gupta, H., Wu, W., Radford, Douglas AG, and Huang, J.: How much X is in XAI: Responsible use of “Explainable” artificial intelligence in hydrology and water resources, Journal of Hydrology X, 100185, doi:10.1016/j.hydroa.2024.100185, 2024.

Mitchell, T. M.: Machine learning, 9, McGraw-hill New York, 1997.

Molnar, C.: Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (3rd ed.). Retrieved from christophm.github.io/interpretable-ml-book/, 2025.

Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N., and Prabhat: Deep learning and process understanding for data-driven Earth system science, Nature, 566, 195–204, doi:10.1038/s41586-019-0912-1, 2019.

Roscher, R., Bohn, B., Duarte, M. F., and Garcke, J.: Explain it to me–facing remote sensing challenges in the bio-and geosciences with explainable machine learning, ISPRS annals of the photogrammetry, remote sensing and spatial information sciences, 3, 817–824, doi:10.5194/isprs-annals-V-3-2020-817-2020,  2020.

[StatQuest with Josh Starmer](https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw) on YouTube.

Tuia, D., Schindler, K., Demir, B., Zhu, X. X., Kochupillai, M., Džeroski, S., van Rijn, J. N., Hoos, H. H., Del Frate, F., and Datcu, M.: Artificial Intelligence to Advance Earth Observation: A review of models, recent trends, and pathways forward, IEEE Geoscience and Remote Sensing Magazine, doi:10.1109/MGRS.2024.3425961,   2024.

Yang, R., Hu, J., Li, Z., Mu, J., Yu, T., Xia, J., Li, X., Dasgupta, A., and Xiong, H.: Interpretable machine learning for weather and climate prediction: A review, Atmospheric Environment, 120797, doi:10.1016/j.atmosenv.2024.120797,    2024.

Zhao, T., Wang, S., Ouyang, C., Chen, M., Liu, C., Zhang, J., Yu, L., Wang, F., Xie, Y., Li, J., Wang, F., Grunwald, S., Wong, B. M., Zhang, F., Qian, Z., Xu, Y., Yu, C., Han, W., Sun, T., Shao, Z., Qian, T., Chen, Z., Zeng, J., Zhang, H., Letu, H., Zhang, B., Wang, L., Luo, L., Shi, C., Su, H., Zhang, H., Yin, S., Huang, N., Zhao, W., Li, N., Zheng, C., Zhou, Y., Huang, C., Feng, D., Xu, Q., Wu, Y., Hong, D., Wang, Z., Lin, Y., Zhang, T., Kumar, P., Plaza, A., Chanussot, J., Zhang, J., Shi, J., and Wang, L.: Artificial intelligence for geoscience: Progress, challenges, and perspectives, Innovation (Cambridge (Mass.)), 5, 100691, doi:10.1016/j.xinn.2024.100691,       2024.

Zollanvari, A.: Machine Learning with Python, Springer International Publishing, Cham, 457 pp., 2023.