#  OpenEO limitations

In [1]:
import openeo
from openeo.rest.connection import Connection

To use OpenEO, we import *openeo* and *openeo.rest.connection.Connection*. These modules enable interaction with the OpenEO API and communication with various backends. With these tools, we can connect to a backend, process data, manage collections, save results in different formats, and much more.

In [2]:
connection = openeo.connect(url="openeo.dataspace.copernicus.eu/")
connection.authenticate_oidc()

Authenticated using refresh token.


<Connection to 'https://openeo.dataspace.copernicus.eu/openeo/1.2/' with OidcBearerAuth>

The dependencies listed below are included in the *requirements.txt* file on the [OpenEO GitHub repository](https://github.com/Open-EO/openeo-python-client), but no specific library versions are indicated. This can lead to incompatibilities between different library versions, potentially causing errors during code execution. Additionally, library updates could make the code unstable or difficult to reproduce.

- numpy
- pandas
- matplotlib
- matplotlib-scalebar
- GDAL

Here, we use the OpenEO Copernicus Data Space Ecosystem backend, which provides access to Sentinel data collections and other datasets from the Copernicus program.

### Difficulty to estimate credit usage

Estimating resource usage in OpenEO before running your job is not straight forward, especially when working with larger datasets or running the same job at fixed times. The resource consumption depends on the combination of processes in the workflow, and credits are deducted based on various factors, that includes CPU usage, memory usage, storage, data access, third-party services, synchronous requests, and batch jobs. A practical approach is to start with small data to estimate resource requirements. Once your test is done, you can extrapolate your data to a larger area. You can also monitor your current credit balance on [OpenEO Platform](https://docs.openeo.cloud/join/free_trial.html#connect-with-egi-check-in) and [Terrascope](https://terrascope.be/fr).

### Lack of documentation and exemples for specific functions

Some openEO functions lack of documentation and examples, which can make it challenging for users to understand them and to properly implement them. For instance, the documentation of functions like *if_* and *and_* is tough to find. Additionally, the documentation for more complex processes, such as *fit_class_random_forest()*, provides very limited examples and only one method of parameterizing the random forest algorithm. For example, there is no clear documentation on how to modify parameters like the number of trees in the random forest algorithm. This lack in resources can make it difficult for users to fully use the platform's capabilities.

###  The number of dates loaded for a requested parameters is unclear 


The dates of the data loaded for the requested temporal extent in OpenEO can be unclear as, in many cases, images are not available for all the dates within the specified range. OpenEO does not provide an easy way to visualize or check the available dates within a datacube directly through the platform. To access this information, users must save the datacube locally and open it with a tool like xarray to display the available dates.

In the following exemple, we request data for a 2 month time period, and we show how to have access to the dates through xarray. Only 8 dates are available for this time period.

In [3]:
s2_timeseries = connection.load_collection(
    "SENTINEL2_L2A",
    temporal_extent=("2023-05-01", "2023-07-01"),
    spatial_extent={
        "west":  2.213649,
        "south": 43.450702,
        "east": 2.251248,
        "north":  43.472631,
        "crs": "EPSG:4326",
    },
    bands=["B02", "B03", "B04", "B08", "SCL"],
    max_cloud_cover=50,
);

In [4]:
# Download the data to the NetCDF format
s2_timeseries.download("output/s2_timeseries.nc")

In [5]:
import xarray
# Load the Sentinel-2 time series dataset from a NetCDF file
s2_datarray = xarray.load_dataset("output/s2_timeseries.nc")

In [6]:
# Display the dates of each image
s2_datarray["t"]

### Impossible to define a UDF that takes two datacubes as arguments

A User-Defined Function (UDF) allows you to execute custom Python code directly on the backend. UDFs can be particularly useful to identify the libraries and their versions deployed on a backend.
Currently, it is not possible to define a User-Defined Function (UDF) in OpenEO that takes two DataCubes as arguments as mentionned in the following [OpenEO discussion](https://discuss.eodc.eu/t/creating-a-udf-from-a-python-module-that-operates-on-multiple-datacubes/1019/4). This limitation prevents users from performing operations that require multiple DataCubes as inputs within a single UDF.

For example, we try below to create a UDF that takes two XarrayDataCube objects, the following error is raised : *openeo.udf.OpenEoUdfException: No UDF found.*

In [7]:
s2_scl = s2_timeseries.band("SCL")
udf = openeo.UDF(
"""
from openeo.udf import XarrayDataCube
def apply_datacube(cube1: XarrayDataCube, cube2: XarrayDataCube, context: dict) -> XarrayDataCube:
    return cube1
"""
)
out = s2_scl.apply(process=udf)
out.download("2_cubes.nc")

OpenEoApiError: [500] Internal: Server error: UDF exception while evaluating processing graph. Please check your user defined functions.   File "/opt/openeo/lib/python3.8/site-packages/openeo/udf/run_code.py", line 235, in run_udf_code
    raise OpenEoUdfException("No UDF found.")
openeo.udf.OpenEoUdfException: No UDF found. (ref: r-25010730ea7f496c8732b5708d12e1f0)

### SAFE format is not supported by openEO 

The SAFE format, that commonly used for Sentinel-1 and Sentinel-2 satellite data, is currently not supported by OpenEO. This limitation is highlighted in the following [OpenEO discussion](https://discuss.eodc.eu/t/need-help-downloading-raw-images-as-safe-files-using-openeo/677) and restricts users from downloading data processed by openEO to the SAFE format.

### Unclear error message

Error messages returned by OpenEO backends can be vague, making it difficult to understand and resolve issues. OpenEO returns standardized error codes such as 400 (Bad Request) or 500 (Internal Server Error). However, these codes do not always provide enough information to effectively diagnose the problem.

We use for our exemple a Sentinel-2 datacube.

In [8]:
s2_timeseries = connection.load_collection(
    "SENTINEL2_L2A",
    temporal_extent=("2023-05-01", "2023-06-16"),
    spatial_extent={
        "west":  2.213649,
        "south": 43.450702,
        "east": 2.251248,
        "north":  43.472631,
        "crs": "EPSG:4326",
    },
    bands=["B02", "B03", "B04", "B08", "SCL"],
    max_cloud_cover=50,
);

We imagine that the user makes a mistake in using the *reduce_dimension()* function and provides an incorrect reducer, no error occurs during the creation of the workflow.

In [9]:
s2_red = s2_timeseries.reduce_dimension(dimension="t", reducer="bands")

We now try to download our datacube with the *.download()* method, the following error appears :

In [10]:
s2_red.download("output/s2_red.nc")

OpenEoApiError: [400] BadRequest: java.lang.IllegalArgumentException: Unsupported operation: bands (arguments: [data, context]) (ref: r-25010719415f47c284f905a115223fa6)

This error indicates that the *bands* operation is not supported. However, it does not help identify the specific process causing the error. The error message should specify that the issue arises from the *reduce_dimension()* function and the incorrect reducer argument.

### Additional limitations

For more limitationd of the OpenEO Python client, you may check the [OpenEO Platform Forum](https://discuss.eodc.eu/tag/python) and the [OpenEO Github Issues](https://github.com/Open-EO/openeo-geopyspark-driver/issues).

### Identification of libraries deployed on a backend

It is essential to know the libraries available on a backend, especially when developing a UDF that relies on specific libraries. This ensures that the library versions used during the execution of your UDF are compatible with your function.

The following UDF uses the *inspect* function from *openeo.udf.debug* to address this need.

In [None]:
udf = openeo.UDF(
    """
from openeo.udf import XarrayDataCube
from openeo.udf.debug import inspect
import pkg_resources
def apply_datacube(cube: XarrayDataCube, context: dict) -> XarrayDataCube:
    inspect(message="Hello UDF logging")
    installed_packages = pkg_resources.working_set
    installed_packages_list = sorted(["%s==%s" % (i.key, i.version) for i in installed_packages])
    pkg_list = " ".join(installed_packages_list)
    inspect(message=pkg_list)
    inspect(data=[1,2,3], message="Hello UDF logging with data")
    array = cube.get_array()
    array.values = 0.0001 * array.values
    return cube
"""
)
s2_timeseries = s2_timeseries.apply(process=udf)
job = s2_timeseries.execute_batch()

print(job.logs())