<img src="https://raw.githubusercontent.com/EO-College/cubes-and-clouds/main/icons/cnc_3icons_process_circle.svg"
     alt="Cubes & Clouds logo"
     style="float: center; margin-right: 10px;" />

# 2.3 Data Access
This is a basic introduction on how to access EO data on a cloud provided using an Application Protocol Interface (API).

The main concepts that will presented here will use openEO as API, and the openEO Python Client library.

We will interact with cloud providers offering free and open data through STAC catalogs.

This functionality allows to create openEO workflows using the same syntax as when working connected to an openEO back-end, but works with local computing resources.

The main concepts presented in this notebook are:
- Lazy data loading
- Filter operators
- Reduce operators
- Apply operators
- Aggregate operators
- Create a simple EO workflow: load, filter, apply a function, extract information


In [1]:
import openeo
from openeo.local import LocalConnection
local_conn = LocalConnection('')



## Filter Operators

When interacting with large data collections, it is necessary to keep in mind that it's not possible to load everything!

Therefore, we always have to define our requirements in advance and apply them to the data using filter operators:

Let's start again with the same sample data from the Sentinel-2 STAC Collection:

In [2]:
url = "https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs"
spatial_extent = {"west": 11.1, "east": 11.5, "south": 46.1, "north": 46.5}

datacube = local_conn.load_stac(url=url,
                    spatial_extent=spatial_extent)
datacube.execute()

  complain("No cube:dimensions metadata")
Deserialised process graph into nested structure
Running process load_stac
kwargs: {'spatial_extent': <class 'openeo_pg_parser_networkx.pg_schema.BoundingBox'>, 'url': <class 'str'>}
--------------------------------------------------------------------------------


Walking node root-f78dcced-6a08-4695-bad0-d6f6149f1eec


Unnamed: 0,Array,Chunk
Bytes,1.35 TiB,8.00 MiB
Shape,"(751, 17, 4534, 3209)","(1, 1, 1024, 1024)"
Dask graph,383010 chunks in 4 graph layers,383010 chunks in 4 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 1.35 TiB 8.00 MiB Shape (751, 17, 4534, 3209) (1, 1, 1024, 1024) Dask graph 383010 chunks in 4 graph layers Data type float64 numpy.ndarray",751  1  3209  4534  17,

Unnamed: 0,Array,Chunk
Bytes,1.35 TiB,8.00 MiB
Shape,"(751, 17, 4534, 3209)","(1, 1, 1024, 1024)"
Dask graph,383010 chunks in 4 graph layers,383010 chunks in 4 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


### Temporal filter

To slice along time the required data collection with openEO, we can use the `filter_temporal` process.

In [3]:
temporal_extent = ['2022-06-10','2022-06-30']

temporal_slice = datacube.filter_temporal(temporal_extent)
temporal_slice.execute()

Deserialised process graph into nested structure
Running process load_stac
kwargs: {'spatial_extent': <class 'openeo_pg_parser_networkx.pg_schema.BoundingBox'>, 'url': <class 'str'>}
--------------------------------------------------------------------------------


Walking node root-57d482cb-887e-47f8-a3c7-139f4cc17644
Walking node loadstac1-57d482cb-887e-47f8-a3c7-139f4cc17644


Running process filter_temporal
kwargs: {'data': <class 'xarray.core.dataarray.DataArray'>, 'extent': <class 'openeo_pg_parser_networkx.pg_schema.TemporalInterval'>}
--------------------------------------------------------------------------------


Unnamed: 0,Array,Chunk
Bytes,14.74 GiB,8.00 MiB
Shape,"(8, 17, 4534, 3209)","(1, 1, 1024, 1024)"
Dask graph,4080 chunks in 5 graph layers,4080 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 14.74 GiB 8.00 MiB Shape (8, 17, 4534, 3209) (1, 1, 1024, 1024) Dask graph 4080 chunks in 5 graph layers Data type float64 numpy.ndarray",8  1  3209  4534  17,

Unnamed: 0,Array,Chunk
Bytes,14.74 GiB,8.00 MiB
Shape,"(8, 17, 4534, 3209)","(1, 1, 1024, 1024)"
Dask graph,4080 chunks in 5 graph layers,4080 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


After running the previous cell, it is visible that the result has less elements (or labels) in the temporal dimension `time`.

Additionally, the size of the selected data reduced a lot.

### Spatial filter

In [4]:
## TODO: implementation missing in openeo-processes-dask -> PR here https://github.com/Open-EO/openeo-processes-dask/pull/118

In [5]:
spatial_extent = {"west": 11.259613, "east": 11.406212, "south": 46.461019, "north": 46.522237}

spatial_slice = datacube.filter_bbox(spatial_extent)
spatial_slice.execute()

Deserialised process graph into nested structure
Running process load_stac
kwargs: {'spatial_extent': <class 'openeo_pg_parser_networkx.pg_schema.BoundingBox'>, 'url': <class 'str'>}
--------------------------------------------------------------------------------


Walking node root-704cdbda-020b-40b0-b5db-1860b6467ee0
Walking node loadstac1-704cdbda-020b-40b0-b5db-1860b6467ee0


Running process filter_bbox
kwargs: {'data': <class 'xarray.core.dataarray.DataArray'>, 'extent': <class 'openeo_pg_parser_networkx.pg_schema.BoundingBox'>}
--------------------------------------------------------------------------------


Unnamed: 0,Array,Chunk
Bytes,53.26 GiB,2.51 MiB
Shape,"(751, 17, 489, 1145)","(1, 1, 382, 860)"
Dask graph,51068 chunks in 5 graph layers,51068 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 53.26 GiB 2.51 MiB Shape (751, 17, 489, 1145) (1, 1, 382, 860) Dask graph 51068 chunks in 5 graph layers Data type float64 numpy.ndarray",751  1  1145  489  17,

Unnamed: 0,Array,Chunk
Bytes,53.26 GiB,2.51 MiB
Shape,"(751, 17, 489, 1145)","(1, 1, 382, 860)"
Dask graph,51068 chunks in 5 graph layers,51068 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


### Bands filter

In [6]:
bands = ["B04","B08"]
# TODO: fix datacube metadata from stac
bands_slice = datacube.filter_bands(bands)
bands_slice.execute()

ValueError: Invalid band name/index 'B04'. Valid names: ['unknown']