# ML API Specification for EO Data Cubes
### Brian Pondi and Rolf Simoes

This notebook demonstrates the ML API processes for the openEO ecosystem, structured to address four pillars: initialization of model architectures, execution of training/prediction tasks, and management of model artifacts. By decoupling ML logic from backend implementations, the API enables portable workflows that are reusable across infrastructures.

![ML API Architecture](./eo-ml.png)

## Design Principles

The core principles that guided the API development include:

- **Modularity**
- **Consistency**
- **Backend-Agnosticism** 
- **Default Parameters**
- **Extensible**

## Model Initialization

The model initialization phase, facilitated by processes prefixed with `mlm_`, is designed to create untrained model definition objects. Each process name follows the pattern:

```
mlm_<type>_<model>
```

where `<type>` is an abbreviated ML action category (e.g., `class` for classification, `regr` for regression, `segm` for segmentation, `gen` for generative, etc.) and `<model>` specifies the particular algorithm (e.g., `random_forest`, `svm`, `xgboost`, `tempcnn`, `tae`, etc.).

| Process Name              | Description                                                               |
| ------------------------- | ------------------------------------------------------------------------- |
| `mlm_class_catboost`      | Initializes a CatBoost classification model                               |
| `mlm_class_mlp`           | Initializes a Multi-Layer Perceptron (MLP) classification model           |
| `mlm_class_random_forest` | Initializes a Random Forest classification model                          |
| `mlm_class_svm`           | Initializes a Support Vector Machine (SVM) classification model           |
| `mlm_class_xgboost`       | Initializes an XGBoost classification model                               |
| `mlm_class_tempcnn`       | Initializes a Temporal Convolutional Neural Network (TempCNN) model       |
| `mlm_class_tae`           | Initializes a Temporal Attention Encoder (TAE) model                      |
| `mlm_class_lighttae`      | Initializes a lightweight version of the Temporal Attention Encoder model |
| `mlm_regr_svm`            | Initializes a Support Vector Machine regression model                     |
| `mlm_regr_random_forest`  | Initializes a Random Forest regression model                              |

## Model Actions

Prefixed with `ml_`, model action processes are responsible for executing key ML workflows on model artifacts and EO data cubes. These actions include model training, prediction, uncertainty estimation, and post-processing.

| Process Name               | Description                                                                                |
| -------------------------- | ------------------------------------------------------------------------------------------ |
| `ml_fit`                   | Fits a machine learning model to a data cube of input features and target values           |
| `ml_predict`               | Applies a trained machine learning model to a data cube of input features                  |
| `ml_predict_probabilities` | Applies a model to input features and returns predicted class probabilities                |
| `ml_uncertainty_class`     | Estimates classification uncertainty using methods like margin, ratio, or least-confidence |
| `ml_smooth_class`          | Applies spatial smoothing to classification probability results using Bayesian inference   |
| `ml_label_class`           | Converts a probability data cube to a labeled data cube                                    |

## Model Management

Model management processes handle the storage and retrieval of ML artifacts. They enable users to export and import preliminary or private models into their workspace and to export and import models using STAC ML Model extension compliance.

| Process Name      | Description                                                                              |
| ----------------- | ---------------------------------------------------------------------------------------- |
| `import_ml_model` | Imports a previously exported machine learning model from a specified workspace location |
| `export_ml_model` | Exports a machine learning model to a specified workspace location                       |
| `load_ml_model`   | Loads a machine learning model from a STAC:MLM Item into the current session             |
| `save_ml_model`   | Saves a machine learning model with STAC MLM Extension compliance                        |

# Temporal CNN Example

This section demonstrates how to train a Temporal CNN (TempCNN) model using OpenEO processes. The example uses deforestation data from Rondonia to train a deep learning model for time series classification.

## Setup and Connection

First, we load the required libraries and connect to the OpenEO backend.

In [33]:
import openeo # type: ignore
from rpy2 import robjects
from rpy2.robjects.packages import importr
jsonlite = importr('jsonlite')

In [34]:
connection = openeo.connect(
    url="http://127.0.0.1:8000",
    auth_type="basic",
    auth_options={"username": "brian", "password": "123456"}
)

In [None]:
print(connection.list_collections())

In [None]:
print(connection.list_collection_ids())

## Explore Available Processes

Let's examine the available processes on the backend, particularly focusing on ML-related ones.


In [None]:
process_ids = [process["id"] for process in connection.list_processes()]
print("Available processes on this backend:")
for process_id in process_ids:
    print(f"- {process_id}")

## Examine ML Process Details

Let's look at the details of some ML processes to understand their parameters and requirements.


In [None]:
connection.describe_process("mlm_class_random_forest")

In [None]:
connection.describe_process("mlm_class_tempcnn")

## Load and Prepare Data

We'll load Sentinel-2 data and prepare it for our analysis.

In [None]:
# Load a data cube
datacube = connection.load_collection(
    collection_id="mpc-sentinel-2-l2a",
    spatial_extent={"west": -63.9, "south": -9.14, "east": -62.9, "north": -8.14},
    temporal_extent=["2022-01-01", "2022-12-31"]
)

In [None]:
datacube = datacube.process(
    process_id="cube_regularize",
    arguments={
        "data": datacube,
        "period": "P1M",  # Monthly regularization
        "resolution": 320
    }
)

## Load Training Data

We'll load the pre-processed training data for deforestation in Rondonia.


In [None]:
# Load the RDS file using rpy2
readRDS = robjects.r['readRDS']
data_deforestation_rondonia = readRDS("./monthly_rondonia_data.rds")

In [None]:
data_deforestation_rondonia

In [None]:
# Serialize the data using jsonlite::serializeJSON
serializeJSON = robjects.r['serializeJSON']
# Use the function
serialized_data = serializeJSON(data_deforestation_rondonia)

In [None]:
serialized_data

## Initialize and Train the Model

Now we'll initialize the Temporal CNN model and train it with our data.

In [None]:
tempcnn_model_init = connection.mlm_class_tempcnn(
    optimizer="adam",
    learning_rate=0.0005,
    seed=42
)

In [None]:
# Fit the model using the training dataset
tempcnn_model = tempcnn_model_init.fit(
    training_set=serialized_data,
    target="label"
)

## Make Predictions

Apply the trained model to make predictions on new data.


In [None]:
datacube =  tempcnn_model.predict(
    data=datacube,
    model=tempcnn_model
)

## Save the Model

Save the trained model for future use.

In [None]:
tempcnn_model.save_ml_model(name ="tempcnn_rondonia",
                            tasks=["classification"]
                            #options={}
                            )

## Save and Execute Results

Finally, we'll save the prediction results and execute the job.


In [None]:
result = datacube.save_result(
    format="GTiff"
)

In [None]:
job = result.create_job(
    title="Deforestation Prediction in Rondonia",
    description="Using TempCNN model to predict deforestation in Rondonia"
)
job

In [None]:
job.start_and_wait()
job.get_results().download_files("output")

## Conclusion

This example demonstrated how to:
1. Connect to an OpenEO backend
2. Load and prepare training data
3. Define a Temporal CNN model architecture
4. Train the model
5. Make predictions on new data
6. Save the results

The trained model can now be used for making predictions on new time series data. 