<img src="https://raw.githubusercontent.com/EO-College/cubes-and-clouds/main/icons/cnc_3icons_process_circle.svg"
     alt="Cubes & Clouds logo"
     style="float: center; margin-right: 10px;" />

# 2.3 Data Access and Basic Processing

## Filter Operators

When interacting with large data collections, it is necessary to keep in mind that it's not possible to load everything!

Therefore, we always have to define our requirements in advance and apply them to the data using filter operators.

Let's start again with the same sample data from the Sentinel-2 STAC Collection:

In [1]:
import openeo
from openeo.local import LocalConnection
local_conn = LocalConnection('')

url = "https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a"
spatial_extent = {"west": 11.1, "east": 11.5, "south": 46.1, "north": 46.5}

datacube = local_conn.load_stac(url=url,
                    spatial_extent=spatial_extent)
datacube.execute()

  times = pd.to_datetime(
  times = pd.to_datetime(


Unnamed: 0,Array,Chunk
Bytes,3.55 TiB,8.00 MiB
Shape,"(1047, 32, 4534, 3209)","(1, 1, 1024, 1024)"
Dask graph,1005120 chunks in 4 graph layers,1005120 chunks in 4 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.55 TiB 8.00 MiB Shape (1047, 32, 4534, 3209) (1, 1, 1024, 1024) Dask graph 1005120 chunks in 4 graph layers Data type float64 numpy.ndarray",1047  1  3209  4534  32,

Unnamed: 0,Array,Chunk
Bytes,3.55 TiB,8.00 MiB
Shape,"(1047, 32, 4534, 3209)","(1, 1, 1024, 1024)"
Dask graph,1005120 chunks in 4 graph layers,1005120 chunks in 4 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


### Temporal filter

To slice along time the data collection with openEO, we can use the `filter_temporal` process.

In [2]:
temporal_extent = ["2022-05-10T00:00:00Z","2022-06-30T00:00:00Z"]
temporal_slice = datacube.filter_temporal(temporal_extent)
temporal_slice.execute()

  times = pd.to_datetime(


Unnamed: 0,Array,Chunk
Bytes,72.85 GiB,8.00 MiB
Shape,"(21, 32, 4534, 3209)","(1, 1, 1024, 1024)"
Dask graph,20160 chunks in 9 graph layers,20160 chunks in 9 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 72.85 GiB 8.00 MiB Shape (21, 32, 4534, 3209) (1, 1, 1024, 1024) Dask graph 20160 chunks in 9 graph layers Data type float64 numpy.ndarray",21  1  3209  4534  32,

Unnamed: 0,Array,Chunk
Bytes,72.85 GiB,8.00 MiB
Shape,"(21, 32, 4534, 3209)","(1, 1, 1024, 1024)"
Dask graph,20160 chunks in 9 graph layers,20160 chunks in 9 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


After running the previous cell, it is visible that the result has less elements (or labels) in the temporal dimension `time`.

Additionally, the size of the selected data reduced a lot.

### Spatial filter

To slice along the spatial dimensions the data collection with openEO, we can use `filter_bbox` or `filter_spatial` processes.

The `filter_bbox` process is used with a set of coordinates:

In [4]:
spatial_extent = {"west": 11.259613, "east": 11.406212, "south": 46.461019, "north": 46.522237}
spatial_slice = datacube.filter_bbox(spatial_extent)
spatial_slice.execute()

Unnamed: 0,Array,Chunk
Bytes,134.29 GiB,2.51 MiB
Shape,"(1006, 32, 489, 1145)","(1, 1, 382, 860)"
Dask graph,128768 chunks in 5 graph layers,128768 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 134.29 GiB 2.51 MiB Shape (1006, 32, 489, 1145) (1, 1, 382, 860) Dask graph 128768 chunks in 5 graph layers Data type float64 numpy.ndarray",1006  1  1145  489  32,

Unnamed: 0,Array,Chunk
Bytes,134.29 GiB,2.51 MiB
Shape,"(1006, 32, 489, 1145)","(1, 1, 382, 860)"
Dask graph,128768 chunks in 5 graph layers,128768 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


The `filter_spatial` process accepts vector data cubes, like geoJSON, defining the geometries used for filtering.

The raster data cube will be cropped to the minimum bounding box around the provided geometry(s), pixels outside of the bounding box of the given geometry will not be available after filtering. All pixels inside the bounding box that are not retained will be set to null (no data).

In [None]:
## Not yet available

### Bands filter

To slice along the bands dimension, keeping only the necessary bands, we can use the `filter_bands` process.

In [7]:
bands = ["red","green","blue"]
bands_slice = datacube.filter_bands(bands)
bands_slice.execute()

Unnamed: 0,Array,Chunk
Bytes,327.16 GiB,8.00 MiB
Shape,"(1006, 3, 4534, 3209)","(1, 1, 1024, 1024)"
Dask graph,90540 chunks in 5 graph layers,90540 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 327.16 GiB 8.00 MiB Shape (1006, 3, 4534, 3209) (1, 1, 1024, 1024) Dask graph 90540 chunks in 5 graph layers Data type float64 numpy.ndarray",1006  1  3209  4534  3,

Unnamed: 0,Array,Chunk
Bytes,327.16 GiB,8.00 MiB
Shape,"(1006, 3, 4534, 3209)","(1, 1, 1024, 1024)"
Dask graph,90540 chunks in 5 graph layers,90540 chunks in 5 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
