# Introduction to Satellite Imagery Preprocessing and Model Training and Prediction

This notebook provides a comprehensive workflow for processing satellite imagery and predicting water bodies using a combination of remote sensing data and exogenous factors. The model was trained on a project area in the riverbed of the Isar in the Alps. The main objectives of this notebook are:

1. **Satellite Imagery Retrieval and Preprocessing**:
   - Retrieve Sentinel-2 satellite imagery from Google Earth Engine (GEE).
   - Calculate the Normalized Difference Water Index (NDWI) to identify water bodies.
   - Mask snow/ice regions to improve the accuracy of water detection.

2. **Water Mask Creation**:
   - Generate binary water masks based on individual NDWI thresholds.
   - Save the processed water masks as GeoTIFF files for further analysis.

3. **Prepare the Input Data for Model Training and the Integration of Exogenous Factors**:
   - Load exogenous factors (e.g., precipitation, discharge) from external datasets.
   - Combine these factors with satellite imagery to enhance predictive modeling.

4. **Model Building and Training**:
   - Train a CNN-LSTM model to predict future water body extents based on historical satellite imagery and exogenous factors. 
   - Evaluate the model's performance using metrics such as Mean Squared Error (MSE) and Intersection over Union (IoU).

5. **Visualization**:
   - Visualize the predicted water masks and compare them with the ground truth.
   - Analyze the model's accuracy and identify areas for improvement.

This notebook serves as a step-by-step guide for researchers and practitioners working on hydrological modeling, water resource management, or environmental monitoring. By combining satellite imagery with machine learning techniques, it demonstrates how to derive actionable insights from remote sensing data.

In [None]:
#Import necessary libraries

import ee
import geemap
import os

import pandas as pd
import numpy as np
from osgeo import gdal
import matplotlib.pyplot as plt

import rasterio
import glob

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import load_model


## Step 1: Satellite Imagery Retrieval and Preprocessing


In [None]:
#Step 1: Get Satellite Imagery from Google Earth Engine


# Initialize Earth Engine API
# Make sure to authenticate if you haven't done so already
#ee.Authenticate()  # Uncomment this line if you need to authenticate
# Initialize the Earth Engine library

ee.Initialize()


#Define the coordinates of the bounding box (xmin, ymin) and (xmax, ymax) of your project area
#Coordinates are in EPSG:4326
#Example coordinates for a bounding box around Vorderriß, Germany
bbox_coords = ((11.437948183612978, 47.55891195678332), (11.484581611702358, 47.56320074444652))
geometry = ee.Geometry.Polygon([
    [
        [bbox_coords[0][0], bbox_coords[0][1]],  # (xmin, ymin)
        [bbox_coords[0][0], bbox_coords[1][1]],  # (xmin, ymax)
        [bbox_coords[1][0], bbox_coords[1][1]],  # (xmax, ymax)
        [bbox_coords[1][0], bbox_coords[0][1]],  # (xmax, ymin)
        [bbox_coords[0][0], bbox_coords[0][1]],  # Closing the polygon
    ]
])

# Define a function to calculate snow fraction of each image. Snow has similiar reflectance properties as water.
def add_snow_fraction(image):
    scl = image.select("SCL")
    snow_mask = scl.eq(11)  # SCL = 11 means snow/ice

    snow_fraction = snow_mask.reduceRegion(
        reducer=ee.Reducer.mean(),
        geometry=geometry,
        scale=20,
        maxPixels=1e8
    ).get('SCL')  # result is a number between 0 and 1

    return image.set('snow_fraction', snow_fraction)


# Define NDWI calculation function (normalized difference using B3 and B8)
def calculateNDWI(image):
    ndwi = image.normalizedDifference(["B3", "B8"]).rename("NDWI")
    return image.addBands(ndwi)

# Define a function to clip the image to your geometry
def clip_image(image):
    return image.clip(geometry)

# Create a function to assign a chronological image_id based on the collection index
def create_feature_with_index(image, index):
    # Convert the index to a string to be used as the image ID
    image_id = ee.Number(index)  # Using index as the image ID
    timestamp = ee.Date(image.get('system:time_start')).format('YYYY-MM-dd')  # Format timestamp as YYYY-MM-DD
    
    # Create a feature with the image ID and timestamp
    feature = ee.Feature(None, {
        'image_id': image_id,
        'timestamp': timestamp
    })
    return feature

# Function to assign a chronological index to each image in the collection
def add_index_to_collection(image_collection):
    # Create a list of features with image_id as index
    def assign_index(image, index):
        return create_feature_with_index(image, index)
    
    # Map the function over the collection and add indices
    feature_collection = image_collection.map(lambda image: assign_index(image, image_collection.toList(image_collection.size()).indexOf(image)))
    return feature_collection


# Step 1. 1: Load Sentinel-2 image collection. Filter by date, bounds, and cloud cover.
sentinel = ee.ImageCollection('COPERNICUS/S2_SR') \
    .filterDate('2015-06-27', '2025-03-31') \
    .filterBounds(geometry) \
    .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 30))  

# Add snow fraction to each image
sentinel_snow_filtered = sentinel.map(add_snow_fraction)

# Filter images with < 3 % snow
sentinel_clean = sentinel_snow_filtered.filter(ee.Filter.lt('snow_fraction', 0.03))

processed_images = sentinel_clean.map(clip_image).map(calculateNDWI)


# Map the function over the processed image collection to create a FeatureCollection with chronological image_ids
features = add_index_to_collection(processed_images)

# Step 1. 2: Export images and metadata to Google Drive. If you authenticated with your Google account in Google Earth Engine, the images will be saved in your Google Drive.

# Export the FeatureCollection as a CSV to Google Drive
export_task = ee.batch.Export.table.toDrive(
    collection=features,
    description='Image_Metadata',
    fileFormat='CSV',
    folder='Satellite_metadata',  # Folder in your Google Drive
    fileNamePrefix='satellite_metadata_all',
)

# Start the export task
export_task.start()

print("Export task for image metadata started. Monitor Earth Engine for task completion.")


# Export each image in the collection
def export_image(image, idx):  
    # Export the image to a GeoTIFF using Earth Engine's export method
    export_task = ee.batch.Export.image.toDrive(
        image=image.select("NDWI"),
        description=None,
        folder="GEE_Images_all",
        fileNamePrefix=f"NDWI_{idx}",
        region=geometry,
        scale=10,
        crs='EPSG:4326',
        maxPixels=1e8  # Increase the max pixels if necessary
    )
    export_task.start()  # Start the export task
    print(f"Exporting {idx}")

# Convert the image collection to a list of images. This triggers the actual fetching of the collection.
image_list = processed_images.toList(processed_images.size())

# Loop through the image collection and export each image
for idx in range(image_list.size().getInfo()):
    ee_image = ee.Image(image_list.get(idx))  # Get the image by its index in the list
    export_image(ee_image, idx)

print("Export tasks started. Monitor Earth Engine for task completion.")


In [None]:
# Step 1. 3: Add an index column to the CSV file exported from Earth Engine

metadata_path = ''  
metadata_df = pd.read_csv(metadata_path)

metadata_df['Index'] = range(len(metadata_df)) #Creating an index column with ascending values

output_path = ''  # Save the modified DataFrame to a new CSV file
metadata_df.to_csv(output_path, index=False)

print(f"Added new column 'Index' and saved it in: {output_path}")


## Step 2: Water Mask Creation

In [None]:
# Define a function to extract the ID from the filename
# This function assumes the filename format is "NDWI_<ID>.tif"
def extract_id_from_filename(filename):
    return int(filename.split('_')[1].split('.')[0]) 

# Define a function to get the timestamp from the image ID and double check if the ID is in the new CSV file 
def get_timestamp_from_id(image_id):
    timestamp_data = pd.read_csv('')  
    row = timestamp_data[timestamp_data['Index'] == image_id]
    if not row.empty:
        return row['timestamp'].values[0]
    else:
        return None  # If the ID is not found, return None

# Define a function to process NDWI images and create water masks
def process_images_and_create_water_masks(folder_path):
    output_dir_water_masks = '/'  # folder for saving water masks

    # Make sure that the folder exists
    if not os.path.exists(output_dir_water_masks):
        os.makedirs(output_dir_water_masks)

    # Order by numeric ID
    image_files =[f for f in os.listdir(folder_path) if f.lower().endswith(('.tif', '.tiff'))]
    image_files.sort(key=extract_id_from_filename)  

    # Loop through each image file
    for filename in image_files:
        if filename.lower().endswith(('.tif', '.tiff')):
            image_id = extract_id_from_filename(filename)
            timestamp = get_timestamp_from_id(image_id)
            if timestamp is None:
                print(f"No timestamp for {filename}. Pass.")
                continue
            
            # Open the NDWI image using GDAL
            file_path = os.path.join(folder_path, filename)
            dataset = gdal.Open(file_path)
            if dataset is None:
                print(f"There was a mistake while opening: {filename}.")
                continue
            
            # Read the NDWI band and calculate the individual threshold. 
            # For the project area possible fraction of water in the image is between 5,8 % and 6,5 %. 
            # Assuming a unique threshold for each image is not recommended.
            band = dataset.GetRasterBand(1)
            ndwi = band.ReadAsArray()
            
            threshold = np.percentile(ndwi, 93)
            if np.isnan(threshold):
                threshold = 0.05
                print(f"Threshold for {filename} is NaN. Set to {threshold}.")
            # Check if the threshold is within the expected range. Unless change it.
            elif threshold < -0.1 or threshold > 0.1:
                # Get the sign of the threshold (-1, +1)
                sign = np.sign(threshold) 
                
                # If the threshold is negative, set it to -0.1, but proportional to the size of the threshold
                if threshold < -0.1:
                    threshold = -0.1 + (threshold + 0.1) * 0.7
                # If the threshold is positive, set it to 0.1, but proportional to the size of the threshold
                elif threshold > 0.1:
                    threshold = 0.1 - (threshold - 0.1) * 0.7

                threshold = np.clip(threshold, -0.1, 0.1)
                
                print(f"Invalid threshold for {filename}: Set to {threshold}.")
            else:
                print(f"Valid threshold for {filename} is {threshold}.")

            # Create a binary water mask based on the threshold and save it as a new GeoTIFF 
            water_mask = ndwi > threshold
        
            water_mask_filename = os.path.join(output_dir_water_masks, f"water_mask_{os.path.splitext(filename)[0]}.tif")
            driver = gdal.GetDriverByName('GTiff')
            mask_dataset = driver.Create(
                water_mask_filename,
                dataset.RasterXSize,
                dataset.RasterYSize,
                1,
                gdal.GDT_Byte
            )
            mask_dataset.SetGeoTransform(dataset.GetGeoTransform())
            mask_dataset.SetProjection(dataset.GetProjection())
            mask_band = mask_dataset.GetRasterBand(1)
            mask_band.WriteArray(water_mask.astype(np.uint8)) 
            mask_dataset = None
            print(f"Saved water mask: {water_mask_filename}")

    print("All water masks created and saved.")
    return


folder_path = ''  # Path to the folder with NDWI images

# Call the function to process images and create water masks
process_images_and_create_water_masks(folder_path)


## Step 3: Prepare Train Data Set

In [None]:
# Step 3. 1: Load water masks from the folder and add a new dimension to the images because the model requires 3D input or 4D input

def extract_id_from_filename(filename):
    return int(filename.split('_')[-1].split('.')[0]) 

# Define a function to load all TIF images from a directory in sorted order
def load_all_tif_images(directory):
    image_files = [f for f in os.listdir(directory) if f.lower().endswith('.tif')]
    image_files.sort(key=extract_id_from_filename)
    
    images = []
    for filename in image_files:
        file_path = os.path.join(directory, filename)
        with rasterio.open(file_path) as src:
            image = src.read(1)  
            image = np.expand_dims(image, axis=-1) # Add a new dimension
            images.append(image)
    return np.array(images)

images_directory = "/" # Path to the folder with water masks
images = load_all_tif_images(images_directory)

print(f"Amount of loaded images: {len(images)}")

In [None]:
#Step 3. 2: Prepare exogenous factors for train data set

exo_factors_df = pd.read_csv("")  # Path to the CSV file with exogenous factors

# Convert the 'timestamp' column to datetime format
exo_factors_df['timestamp'] = pd.to_datetime(exo_factors_df['timestamp'])

# Calculate the time difference in days between consecutive timestamps
exo_factors_df['time_diff_days'] = exo_factors_df['timestamp'].diff().dt.days

# Extract the month from the timestamp and one-hot encode it and save the columns names
exo_factors_df['month'] = exo_factors_df['timestamp'].dt.month
month_one_hot = pd.get_dummies(exo_factors_df['month'], prefix='month')
month_one_hot_columns = list(month_one_hot.columns)

# Add the one-hot encoded month columns to the DataFrame
exo_factors_df = pd.concat([exo_factors_df, month_one_hot], axis=1)

# Drop the original month column
exo_factors_df.drop(columns=['month'], inplace=True)

# Convert other important columns of the DataFrame. Save only the columns that are needed for the model.
exo_factors_df['discharge'] = pd.to_numeric(exo_factors_df['delta_discharge'], errors='coerce')
exo_factors_df['precipitation'] = pd.to_numeric(exo_factors_df['delta_precipitation'], errors='coerce')
exo_factors_df['prec_extreme'] = pd.to_numeric(exo_factors_df['extreme_precipitation'], errors='coerce')
exo_factors_df['disc_extreme'] = pd.to_numeric(exo_factors_df['extreme_discharge'], errors='coerce')

exo_factors_df = exo_factors_df[['timestamp', 'image_id', 'discharge', 'precipitation', 'prec_extreme', 'disc_extreme', 'time_diff_days'] + list(month_one_hot.columns)]
exo_factors_df.fillna(0, inplace=True)
print(exo_factors_df.head())

# Convert the DataFrame to a NumPy array because the model requires 3D input or 4D input
exo_factors = exo_factors_df[['discharge', 'precipitation', 'prec_extreme', 'disc_extreme', 'time_diff_days'] + list(month_one_hot_columns)].values
print(exo_factors.shape)


In [None]:
# Step 3. 3: Split the data into training, validation, and test sets (e. g. 70% train, 15% val, 15% test)

train_images = images[:173]  
train_exo = exo_factors[:173]  

val_images = images[173:210]  
val_exo = exo_factors[173:210]

test_images = images[210:]  
test_exo = exo_factors[210:]

In [None]:
#Step 3. 4: Create a data generator for the training set
# This generator will yield batches of images and exogenous factors for training

def data_generator_with_future_exo(images, exo_factors, batch_size, time_steps=5):
    while True:  
        indices = np.random.choice(len(images) - time_steps - 1, batch_size)

        X_images_batch = []  
        exogenous_batch = []  
        future_exo_batch = []  
        Y_image_batch = []  

        for idx in indices:
            # Create a batch with the amount of `time_steps` images
            X_images_batch.append(images[idx:idx + time_steps])

            # Create a batch with the amount of `time_steps` exogenous factors
            exogenous_batch.append(exo_factors[idx:idx + time_steps])

            # For better prediction, we add the exogenous factors for the output
            future_exo_batch.append(exo_factors[idx + time_steps + 1])

            # The output is the image after the time steps
            Y_image_batch.append(images[idx + time_steps])

        # Convert the lists to NumPy arrays
        X_images_batch = np.array(X_images_batch)
        exogenous_batch = np.array(exogenous_batch)
        future_exo_batch = np.array(future_exo_batch)
        Y_image_batch = np.array(Y_image_batch)

        # Convert the batches to TensorFlow tensors
        yield (
            (tf.convert_to_tensor(X_images_batch, dtype=tf.float32),
             tf.convert_to_tensor(exogenous_batch, dtype=tf.float32),
             tf.convert_to_tensor(future_exo_batch, dtype=tf.float32)),
            tf.convert_to_tensor(Y_image_batch, dtype=tf.float32)
        )

In [None]:
# Define a function to create the evaluation metric Intersection over Union (IoU) for the model
# The IoU (Intersection over Union) metric calculates the overlap between predicted and true regions of the positive class.
# Set a threshold for binary classification. If the value is greater than the threshold, it is classified as 1 (water), otherwise 0 (no water).
# The threshold is set to 0.93, which is the same as the 93rd percentile of NDWI values and suitable for the project area.

# IoU should be interpreted carefully in cases where the positive class occupies a small fraction of the image.

def iou_metric(y_true, y_pred, threshold=0.93):
    y_pred = tf.cast(y_pred > threshold, tf.float32)
    y_true = tf.cast(y_true > threshold, tf.float32)

    intersection = tf.reduce_sum(y_true * y_pred)
    union = tf.reduce_sum(y_true) + tf.reduce_sum(y_pred) - intersection

    # Avoid division by zero
    iou = tf.math.divide_no_nan(intersection, union)
    return iou

## Step 4: Model Building and Training

In [None]:
def build_cnn_lstm_model_with_future_exo(image_shape, exo_input_shape, future_exo_shape, time_steps):
    # The model uses a CNN-LSTM architecture with small pooling sizes and 'valid' padding.
    # Smaller pooling sizes help retain more spatial details, which is crucial for detecting small features like rivers.
    # 'Valid' padding ensures no artificial padding is added, preserving the integrity of the input data.
    # This design is tailored to the project area to improve the model's ability to predict water bodies in satellite imagery.
    cnn_input = layers.Input(shape=(time_steps, *image_shape))  
    x = layers.Conv3D(8, (3, 3, 3), activation='relu', padding='valid')(cnn_input)  
    x = layers.MaxPooling3D((1, 2, 2), padding='valid')(x)  
    x = layers.Conv3D(16, (3, 3, 3), activation='relu', padding='valid')(x)  
    x = layers.MaxPooling3D((1, 2, 2), padding='valid')(x)  
    x = layers.Flatten()(x)  # Flatten the output for the Dense layer
    
    # LSTM Layer for exogenous factors
    exo_input = layers.Input(shape=(time_steps, exo_input_shape))  
    exo_lstm = layers.LSTM(16)(exo_input)  

    future_exo_input = layers.Input(shape=(future_exo_shape,))  
    future_exo_dense = layers.Dense(4, activation='relu')(future_exo_input)

    combined = layers.concatenate([x, exo_lstm, future_exo_dense])

    # Dense Layer for final prediction of the next image
    output = layers.Dense(image_shape[0] * image_shape[1], activation='sigmoid')(combined)
    output = layers.Reshape((image_shape[0], image_shape[1], 1))(output)

    # Define the model
    model = models.Model(inputs=[cnn_input, exo_input, future_exo_input], outputs=output)

    return model

# Shape of the input data must be defined
image_shape = (49, 520, 1)  # Shape of the images e. g. (height, width, channels)
exo_input_shape = exo_factors.shape[1]   # Amount of features in the exogenous factors
future_exo_shape = exo_factors.shape[1]  
time_steps = 5  

# Create model
model = build_cnn_lstm_model_with_future_exo(image_shape, exo_input_shape, future_exo_shape, time_steps)

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy', iou_metric])

# Summarise the model
model.summary()


In [None]:
# Train generator is a tensorflow generator that yields batches of data for training the model.
train_generator = data_generator_with_future_exo(train_images, train_exo, batch_size=32, time_steps=time_steps)

# Validation generator is a tensorflow generator that yields batches of data for validating the model.
val_generator = data_generator_with_future_exo(val_images, val_exo, batch_size=32, time_steps=time_steps)

# Train the model 
model.fit(
    train_generator,
    steps_per_epoch=len(train_images) // 32,
    epochs=10,
    validation_data=val_generator,
    validation_steps=len(val_images) // 32  
)

In [None]:
# Save the entire model (architecture, weights, and training configuration)
model.save('')


In [None]:
# Load model and check summary especially the shape of the input layer
model = load_model('')
model.summary()

In [None]:
def create_test_sequences_with_future_exo(images_test, exo_factors_test, time_steps=5, return_y=False):
    """
    Create sequences of images and exogenous factors for testing. The shape of the input data must be the same as in the training data.
    Args:
        images_test (np.array): Array of test images.
        exo_factors_test (np.array): Array of test exogenous factors.
        time_steps (int): Number of time steps for the sequence.
        return_y (bool): Whether to return the target variable Y.
    Returns:
        tuple: Tuple containing the sequences of images, exogenous factors, future exogenous factors, and target variable Y (if return_y is True).  
    """
    X_image_seqs = []
    X_exo_seqs = []
    future_exo_seqs = []
    Y_image_seqs = []

    max_start = len(images_test) - time_steps - 1  

    for i in range(max_start):
        img_seq = images_test[i:i + time_steps]  
        exo_seq = exo_factors_test[i:i + time_steps]  
        future_exo = exo_factors_test[i + time_steps]  
        
        X_image_seqs.append(img_seq)
        X_exo_seqs.append(exo_seq)
        future_exo_seqs.append(future_exo)


        if return_y:
            y_img = images_test[i + time_steps]  
            Y_image_seqs.append(y_img)

    if return_y:
        return (
            np.array(X_image_seqs), 
            np.array(X_exo_seqs),    
            np.array(future_exo_seqs), 
            np.array(Y_image_seqs)   
        )
    else:
        return (
            np.array(X_image_seqs), 
            np.array(X_exo_seqs), 
            np.array(future_exo_seqs)
        )

In [None]:
# Create test sequences with future exogenous factors
X_img_seq, X_exo_seq, future_exo_seq, Y_true = create_test_sequences_with_future_exo(test_images, test_exo, time_steps=5, return_y=True)

# Convert the sequences to float32
X_img_seq = X_img_seq.astype('float32')
X_exo_seq = X_exo_seq.astype('float32')
future_exo_seq = future_exo_seq.astype('float32')

# Predict the images using the model
predicted_images = model.predict((X_img_seq, X_exo_seq, future_exo_seq))

## Step 5: Visualization

In [None]:
# Create a Heatmap of the river shift by comparing the predicted image with the last actual image

last_actual_image = Y_true[i-1].squeeze()
predicted_image = predicted_images[i].squeeze()

# Calculate the difference to visualize the river shift and create a mask to highlight significant changes
river_shift = predicted_image - last_actual_image
alpha_mask = np.where((river_shift < -0.15) | (river_shift > 0.15), 1.0, 0.0)

# Create a heatmap of the river shift
plt.figure(figsize=(8, 6))
plt.title("Heatmap of River Shift")
plt.imshow(river_shift, cmap='coolwarm', vmin=-1, vmax=1, alpha=alpha_mask)  
plt.colorbar(label="Shift Intensity")
plt.show()


### Improvement: 
AI-powered modeling is a powerful tool to address complex hydrological tasks. The notebook could be improved by using the Dice Loss function instead of binary cross-entropy or a combination of both. The Dice Loss function is more suitable for imbalanced datasets, as it focuses on the overlap between predicted and actual regions, which is critical for detecting water bodies.

Additionally, visually inspecting the generated water masks and removing images with poor results could enhance the dataset quality and improve model performance. The integration of Digital Elevation Models (DEMs) to identify preferential river paths and refine water masks did not succeed in this case. However, exploring alternative methods to incorporate DEM data, such as using it as an additional input feature or applying terrain-based corrections, could still provide valuable insights and improve predictions. A further look on the application of post-processing techniques like morphological operations (e.g., dilation or erosion) could help to refine the predicted water masks and remove noise.

Another key limitation is the resolution of freely available satellite imagery, which is currently limited to 10 meters for Sentinel-2. Higher-resolution imagery, if available in the future, could significantly enhance the ability to extract water bodies more accurately, especially in narrow or fragmented river systems.

Further improvements could include:
* Multi-Sensor Data Fusion: Combining data from multiple satellite sensors (e.g., Sentinel-1 SAR data for flood detection) could provide complementary information and improve water body detection under challenging conditions, such as cloud cover. Unlike optical sensors (e.g., Sentinel-2), SAR can penetrate clouds and operate in all weather conditions, making it ideal for monitoring water bodies during cloudy or rainy periods. SAR is sensitive to surface roughness and moisture, which can help distinguish between water and other land cover types, even in challenging conditions like flooded areas or wetlands.
* Advanced Architectures: Exploring advanced deep learning architectures, such as U-Net or Transformer-based models, could improve spatial feature extraction and temporal modeling.
* Uncertainty Quantification: Implementing methods to quantify prediction uncertainty could help identify areas where the model is less confident, guiding further data collection or model refinement.
* Improve the implication of the Hydrological Components: Include cumulative precipitation or discharge over specific seasons (e.g., spring or summer) to capture broader hydrological trends. Add lagged versions of exogenous factors (e.g., precipitation or discharge from the previous month) to model delayed hydrological responses.
* Explainability: Adding explainability techniques, such as saliency maps, could help understand which features or regions in the input data contribute most to the predictions, aiding model interpretability.