# ML API Specification for EO Data Cubes
### Brian Pondi and Rolf Simoes

## Setup and Connection

First, we load the required libraries and connect to the OpenEO backend.
Install the `openeo` package variant that's on this GitHub repository: https://github.com/PondiB/openeo-python-client

In [2]:
import openeo # type: ignore

In [3]:
connection = openeo.connect(url="http://127.0.0.1:8000")

In [4]:
connection.authenticate_basic("brian", "123456")

<Connection to 'http://127.0.0.1:8000/' with BasicBearerAuth>

In [5]:
connection.list_collection_ids()

['mpc-landsat-c2-l2',
 'mpc-sentinel-2-l2a',
 'mpc-sentinel-1-grd',
 'mpc-sentinel-1-rtc']

In [6]:
connection.describe_collection("mpc-sentinel-2-l2a")

## Explore Available Processes

Let's examine the available processes on the backend, particularly focusing on ML-related ones.


In [7]:
process_ids = [process["id"] for process in connection.list_processes()]
print("Available processes on this backend:")
for process_id in process_ids:
    print(f"- {process_id}")

Available processes on this backend:
- load_collection
- mlm_class_random_forest
- mlm_class_svm
- mlm_class_xgboost
- ml_class_mlp
- mlm_class_tempcnn
- mlm_class_tae
- mlm_class_lighttae
- ml_fit
- ml_predict
- ml_predict_probabilities
- ml_uncertainty_class
- ml_smooth_class
- ml_label_class
- cube_regularize
- ndvi
- save_result
- load_result
- export_cube
- import_cube
- export_ml_model
- import_ml_model
- save_ml_model
- load_ml_model


## Examine ML Process Details

Let's look at the details of some ML processes to understand their parameters and requirements.


In [8]:
connection.describe_process("mlm_class_random_forest")

In [9]:
connection.describe_process("mlm_class_tempcnn")

## Load and Prepare Data

We'll load Sentinel-2 data and prepare it for our analysis.

In [10]:
datacube = connection.load_collection(
    collection_id="mpc-sentinel-2-l2a",
    spatial_extent={"west": -63.6078, "south": -8.95630, "east": -63.25790, "north": -8.72290},
    temporal_extent=["2022-01-01", "2022-12-31"]
)

In [11]:
datacube = datacube.process(
    process_id="cube_regularize",
    arguments={
        "data": datacube,
        "period": "P1M",  # Monthly regularization
        "resolution": 30
    }
)

## Load Training Data

We'll load the pre-processed training data for deforestation in Rondonia.


In [12]:
serialized_data = connection.readRDS("./data/monthly_rondonia_data.rds")

## Initialize and Train the Model

Now we'll initialize the Temporal CNN model and train it with our data.

In [13]:
tempcnn_model_init = connection.mlm_class_tempcnn(
    optimizer="adam",
    learning_rate=0.0005,
    seed=42
)

Fit the model using the training dataset

In [14]:
tempcnn_model = tempcnn_model_init.fit(
    training_set=serialized_data,
    target="label"
)

## Make Predictions

Apply the trained model to make predictions on new data.


In [15]:
datacube =  tempcnn_model.predict(datacube)

## Save the Model

Save the trained model for future use.

In [16]:
tempcnn_model.save_ml_model(name ="tempcnn_rondonia", tasks=["classification"]
                            #, options={"mlm:accelerator":"macos-arm", "mlm:framework":"Torch"}
                            )

## Save and Execute Results

Finally, we'll save the prediction results and execute the job.


In [17]:
result = datacube.save_result(
    format="GeoTiff"
)

In [18]:
result

In [19]:
job = result.create_job(
    title="Deforestation Prediction in Rondonia",
    description="Using TempCNN model to predict deforestation in Rondonia"
)

In [None]:
job.start_and_wait()
results = job.get_results()

0:00:00 Job '170690b6707deff13a9d97b6ddd22baf': send 'start'
0:00:12 Job '170690b6707deff13a9d97b6ddd22baf': running (progress N/A)
0:00:17 Job '170690b6707deff13a9d97b6ddd22baf': running (progress N/A)
0:00:24 Job '170690b6707deff13a9d97b6ddd22baf': running (progress N/A)
0:00:31 Job '170690b6707deff13a9d97b6ddd22baf': running (progress N/A)
0:00:41 Job '170690b6707deff13a9d97b6ddd22baf': running (progress N/A)
0:00:53 Job '170690b6707deff13a9d97b6ddd22baf': running (progress N/A)
0:01:09 Job '170690b6707deff13a9d97b6ddd22baf': running (progress N/A)
0:01:28 Job '170690b6707deff13a9d97b6ddd22baf': running (progress N/A)
0:01:52 Job '170690b6707deff13a9d97b6ddd22baf': running (progress N/A)
0:02:22 Job '170690b6707deff13a9d97b6ddd22baf': running (progress N/A)
0:02:59 Job '170690b6707deff13a9d97b6ddd22baf': running (progress N/A)
0:03:46 Job '170690b6707deff13a9d97b6ddd22baf': running (progress N/A)
0:04:44 Job '170690b6707deff13a9d97b6ddd22baf': running (progress N/A)
0:05:44 Job '170

In [None]:
results.download_files("data/output")

## Conclusion

This example demonstrated how to:
1. Connect to an OpenEO backend
2. Load and prepare training data
3. Define a Temporal CNN model architecture
4. Train the model
5. Make predictions on new data
6. Save the results

The trained model can now be used for making predictions on new time series data. 