# WEED inference
In this notebook we will showcase how we couple the EO processing with ONNX ML inference within weed. 

The way we operate is by first lazy loading a cube which contains every enabled training feature as a band. 
Next we read from the model stored on an openEO accesible storage site, on which features it was trained. 

It is important that users add this information to the stored models. There is code provided in onnx_model_utilities to showcase how it can be done. This code was specialized for obtaining the relevant information from a json file and adding it into the onnx metadata. As the project continues this approach will change since model training will also be streamlined within the WEED operation. 

note: it is important to store your model in a zip file as shown here, as openEO has a preprogrammed way of unzipping dependency folders.

In [9]:
import os
import sys
import openeo

sys.path.append(os.path.abspath('C:/Git_projects/eo_processing/src'))

from eo_processing.utils.helper import init_connection, getUDFpath
from eo_processing.utils.onnx_model_utilities import get_training_features_from_model

from eo_processing.openeo.processing import generate_master_feature_cube
from eo_processing.config import get_job_options, get_collection_options,  get_standard_processing_options

Connect to openEO processing backend

In [10]:
backend = 'cdse' 
# establish the connection to the selected backend
connection = init_connection(backend)

job_options = get_job_options(provider=backend)
collection_options = get_collection_options(provider=backend)

Authenticated using refresh token.


### specify space & time context

In [11]:
# the time context is given by start and end date
start = '2021-01-01'
end = '2021-02-01'   # the end is always exclusive
AOI = {'east': 4832000, 'south': 2818000, 'west': 4831000, 'north': 2819000, 'crs': 'EPSG:3035'}

Below we initiate the job settings, it must be noted that these have not been optimized yet for the UDF inference workflow. 

Next we also read out the metadata of the provided model to extrac the features a model was trained on. 

In [12]:
# we link towards the used model
DEPENDENCY_URL = "https://s3.waw3-1.cloudferro.com/swift/v1/project_dependencies/onnx_dependencies_1.16.3.zip"
MODEL_URL = "https://s3.waw3-1.cloudferro.com/swift/v1/weed/catboost_models/model_1.onnx"


# we call again the standard processing options for feature generation
processing_options = get_standard_processing_options(provider=backend, task='feature_generation')

#add the udf dependencies into the job options
job_options["udf-dependency-archives"] = [
            f"{DEPENDENCY_URL}#onnx_deps"]

# Get the feature list from the ONNX model inside the zip file
metadata = get_training_features_from_model(MODEL_URL)
INPUT_BANDS = metadata['input_features']
OUTPUT_BANDS = metadata['output_features']

# just print for an overview
print(f'job_options: {job_options}')
print(f'collection_options: {collection_options}')
print(f'processing_options: {processing_options}')
print(f'ML INPUT features: {INPUT_BANDS}')
print(f'ML OUTPUT features: {OUTPUT_BANDS}')



job_options: {'driver-memory': '8G', 'driver-memoryOverhead': '5G', 'driver-cores': '1', 'executor-memory': '1500m', 'executor-memoryOverhead': '2500m', 'executor-cores': '1', 'max-executors': '25', 'soft-errors': 'true', 'executor-request-cores': '800m', 'executor-threads-jvm': '7', 'logging-threshold': 'info', 'udf-dependency-archives': ['https://s3.waw3-1.cloudferro.com/swift/v1/project_dependencies/onnx_dependencies_1.16.3.zip#onnx_deps']}
collection_options: {'S2_collection': 'SENTINEL2_L2A', 'S1_collection': 'SENTINEL1_GRD'}
processing_options: {'provider': 'cdse', 's1_orbitdirection': 'DESCENDING', 'target_crs': 3035, 'resolution': 10.0, 'time_interpolation': False, 'ts_interval': 'dekad', 'SLC_masking_algo': 'mask_scl_dilation', 'S2_bands': ['B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B11', 'B12'], 'optical_vi_list': ['NDVI', 'AVI', 'CIRE', 'NIRv', 'NDMI', 'NDWI', 'BLFEI', 'MNDWI', 'NDVIMNDWI', 'S2WI', 'S2REP', 'IRECI'], 'radar_vi_list': ['VHVVD', 'VHVVR', 'RVI'], 

Here we 'master cube' which contains (up till now) 237 bands which could have been used as training features. This cube is 'lazy loaded' in the sense its not fully loaded into memory. 

With the filter bands operation, we only keep the (65) bands in the cube on which our ML model was trained on. 

In [None]:
# define the openEO pipeline to use
data_cube = generate_master_feature_cube(connection,
                                   AOI,
                                   start,
                                   end,
                                   **collection_options,
                                   **processing_options)

data_cube = data_cube.filter_bands(INPUT_BANDS)   



In [15]:
getUDFpath('udf_catboost_inference.py')

'C:\\Git_projects\\eo_processing\\src\\eo_processing\\resources\\udf_catboost_inference.py'

Verify that the bands of the input cube match with those of our training dataset

In [18]:
#source: https://github.com/clausmichele/openEO_photovoltaic/blob/main/udf_inference/openeo_pv_farms_inference_udf.ipynb

#we pass the model url as context information within the UDF
udf  = openeo.UDF.from_file(
        getUDFpath('udf_catboost_inference.py'),
        context={
            "model_url": MODEL_URL
                }
)

# Apply the UDF to the data cube.
catboost_classification = data_cube.apply(
    process=udf)

#run inference
output = catboost_classification.rename_labels(dimension="bands",target= OUTPUT_BANDS)

output.execute_batch("output.nc", job_options=job_options)

0:00:00 Job 'j-241121a187634356a2c200cbb2e46c29': send 'start'
0:00:15 Job 'j-241121a187634356a2c200cbb2e46c29': created (progress 0%)
0:00:21 Job 'j-241121a187634356a2c200cbb2e46c29': created (progress 0%)
0:00:27 Job 'j-241121a187634356a2c200cbb2e46c29': created (progress 0%)
0:00:35 Job 'j-241121a187634356a2c200cbb2e46c29': created (progress 0%)
0:00:45 Job 'j-241121a187634356a2c200cbb2e46c29': created (progress 0%)
0:00:58 Job 'j-241121a187634356a2c200cbb2e46c29': created (progress 0%)
0:01:13 Job 'j-241121a187634356a2c200cbb2e46c29': queued (progress 0%)
0:01:33 Job 'j-241121a187634356a2c200cbb2e46c29': queued (progress 0%)
0:01:57 Job 'j-241121a187634356a2c200cbb2e46c29': queued (progress 0%)
0:02:27 Job 'j-241121a187634356a2c200cbb2e46c29': queued (progress 0%)
0:03:05 Job 'j-241121a187634356a2c200cbb2e46c29': queued (progress 0%)
0:03:52 Job 'j-241121a187634356a2c200cbb2e46c29': queued (progress 0%)
0:04:51 Job 'j-241121a187634356a2c200cbb2e46c29': queued (progress 0%)
0:05:52 

JobFailedException: Batch job 'j-241121a187634356a2c200cbb2e46c29' didn't finish successfully. Status: error (after 0:09:55).