In this notebook we test the catboost inference through openEO. 

First we will create a helper function to set up a dummy 'master cube' which contains sufficient input bands to perfrom inference on (65). This we will do by computing 5 quantiles for 13 S2 bands.

In [2]:
##%% Dummy function for fake cube generation

def compute_quantiles(base_features, quantiles=[0.10, 0.25, 0.50, 0.75, 0.90]):
    """
    Computes specified quantiles (default P10, P25, P50, P75, P90) 
    for each band in the base_features along the time dimension.
    
    Args:
        base_features: A data structure (e.g., xarray or similar) that contains
                       time series data along a dimension.
        quantiles: A list of quantiles to compute (default: [0.10, 0.25, 0.50, 0.75, 0.90])
    
    Returns:
        A data structure with computed quantiles, renamed to reflect
        both the band and the quantile.
    """
    
    def compute_stats(timeseries, quantiles):
        return timeseries.quantiles(probabilities=quantiles)
    
    # Compute the quantiles for each band along the time dimension ('t')
    stats = base_features.apply_dimension(
        dimension="t", target_dimension="bands", process=lambda ts: compute_stats(ts, quantiles)
    )
    
    # Generate band names by appending the quantile labels (P10, P25, etc.) to each band
    quantile_labels = [f"P{int(q*100)}" for q in quantiles]
    all_bands = [
        f"{band}_{label}"
        for band in base_features.metadata.band_names
        for label in quantile_labels
    ]
    
    return stats.rename_labels("bands", all_bands)

Next we define our input cube. As explained above we take a S2 with 13 bands, which we expand up to 65 bands in order to have a sufficiently large input for our test catboost model which expect 65 entry vectors as an input

In [8]:
import openeo
from eo_processing.utils import getUDFpath

#TODO replace with mastercube

connection = openeo.connect("openeo.dataspace.copernicus.eu").authenticate_oidc()

#input parameters
spatial_extent={"west": 5.14, "south": 51.17, "east": 5.17, "north": 51.19}
temporal_extent=["2021-02-01", "2021-03-01"]
max_cloud_cover = 90


#get input cube
s2 = connection.load_collection(
        "SENTINEL2_L2A",
        spatial_extent=spatial_extent,
        temporal_extent=temporal_extent,
        bands=["B01", "B02", "B03", "B04", "B05", "B06","B07","B08", "B8A", "B09", "B11", "B12","SCL"],
        max_cloud_cover=max_cloud_cover)

s2_expanded = compute_quantiles(s2)

import os
print(os.getcwd())

Authenticated using refresh token.
c:\Git_projects\eo_processing\notebooks


We load in the UDF for the catboost inference. Given that the UDF is a single pixel prediction we use the openEO apply operation.

In [9]:
# Load the inference UDF from resources
udf = openeo.UDF.from_file(getUDFpath('udf_catboost_inference.py'))

# Apply the UDF to the data cube.
catboost_classification = s2_expanded.apply(
    process=udf)

#run inference
output = catboost_classification.rename_labels(dimension="bands",target= [
 'predicted_label', 'prob_class_30000', 'prob_class_40000',
 'prob_class_50000', 'prob_class_60000', 'prob_class_70000',
 'prob_class_80000', 'prob_class_90000', 'prob_class_100000',
 'prob_class_110000'])



We provide a public path to zip files of the required model and dependencies and pass these through the job options. Aferwards we excecute the UDF and obtain the output.

In [10]:
#set job dependencies (URL to zipped model)
DEPENDENCY_URL = "https://s3.waw3-1.cloudferro.com/swift/v1/project_dependencies/onnx_dependencies_1.16.3.zip"
MODEL_URL = "https://s3.waw3-1.cloudferro.com/swift/v1/project_dependencies/WEED_test_catboost.zip"

job_options = {}
job_options["udf-dependency-archives"] = [
            f"{DEPENDENCY_URL}#onnx_deps",
            f"{MODEL_URL}#onnx_models",
        ]

output.execute_batch("output.nc", job_options=job_options)

0:00:00 Job 'j-24091859ceeb486da4e50a64453841a3': send 'start'
0:00:19 Job 'j-24091859ceeb486da4e50a64453841a3': created (progress 0%)
0:00:24 Job 'j-24091859ceeb486da4e50a64453841a3': created (progress 0%)
0:00:31 Job 'j-24091859ceeb486da4e50a64453841a3': created (progress 0%)
0:00:39 Job 'j-24091859ceeb486da4e50a64453841a3': created (progress 0%)
0:00:49 Job 'j-24091859ceeb486da4e50a64453841a3': created (progress 0%)
0:01:01 Job 'j-24091859ceeb486da4e50a64453841a3': queued (progress 0%)
0:01:17 Job 'j-24091859ceeb486da4e50a64453841a3': queued (progress 0%)
0:01:36 Job 'j-24091859ceeb486da4e50a64453841a3': queued (progress 0%)
0:02:02 Job 'j-24091859ceeb486da4e50a64453841a3': queued (progress 0%)
0:02:32 Job 'j-24091859ceeb486da4e50a64453841a3': queued (progress 0%)
0:03:10 Job 'j-24091859ceeb486da4e50a64453841a3': queued (progress 0%)
0:03:57 Job 'j-24091859ceeb486da4e50a64453841a3': queued (progress 0%)
0:04:55 Job 'j-24091859ceeb486da4e50a64453841a3': running (progress N/A)
0:05:56

Inspect the output

In [2]:
import xarray as xr

xr.open_dataset('output.nc')