<img src="https://github.com/EO-College/cubes-and-clouds/blob/main/icons/cnc_3icons_process_circle.svg"
     alt="Cubes & Clouds logo"
     style="float: center; margin-right: 10px;" />

# 2.3 Data Access
This is a basic introduction on how to access EO data on a cloud provided using an Application Protocol Interface (API).

The main concepts that will presented here will use openEO as API, and the openEO Python Client library.

There won't be interaction with a cloud provider, because for the sake of this exercise it's not necessary.

Instead, some sample public data will be downloaded and used with the [client side processing functionality of the openEO Python Client](https://open-eo.github.io/openeo-python-client/cookbook/localprocessing.html).

This functionality allows to create openEO workflows using the same syntax as when working connected to an openEO back-end, but works with local data and local computing resources.

The main concepts presented in this notebook are:
- Lazy data loading
- Filter operators
- Reduce operators
- Apply operators
- Aggregate operators
- Create a simple EO workflow: load, filter, apply a function, extract information


## Sample Datasets

Clone the repository containing sample datasets:

In [4]:
import os
if not os.path.exists('./openeo-localprocessing-data'): # If the directory does not exists, clone the repository containing sample data
    !git clone https://github.com/Open-EO/openeo-localprocessing-data.git

Inspect the metadata of a sample collection:

In [5]:
from openeo.local import LocalConnection
local_conn = LocalConnection(['./openeo-localprocessing-data'])
local_conn.describe_collection('openeo-localprocessing-data/sample_netcdf/S2_L2A_sample.nc')



## Lazy data loading

When accessing data using an API, most of the time the data is **lazily** loaded.

It means that just the metadata is loaded, so that it is possible to know about the data dimensions and their extents (spatial and temporal), the available bands and other additional information, as you can see in the previous cell which is visualizing the STAC metadata of a local data colelction.

Let's start with a call to the openEO process [load_collection](https://processes.openeo.org/#load_collection) for loading the data:

In [11]:
datacube = local_conn.load_collection('openeo-localprocessing-data/sample_netcdf/S2_L2A_sample.nc')

Calling the `.execute()` method, the data will be lazily loaded and an [xArray](xarray.pydata.org/) object returned.

Running the next cell will show the local data collection content with the dimension names and their extent:

In [12]:
datacube.execute()

Deserialised process graph into nested structure
Running process load_local_collection
kwargs: {'id': <class 'str'>, 'spatial_extent': <class 'NoneType'>, 'temporal_extent': <class 'NoneType'>}
--------------------------------------------------------------------------------


Walking node root-354b6544-2ba5-4e8f-ae3d-0015180ce5c2


Unnamed: 0,Array,Chunk
Bytes,150.87 MiB,30.17 MiB
Shape,"(5, 12, 705, 935)","(1, 12, 705, 935)"
Dask graph,5 chunks in 11 graph layers,5 chunks in 11 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 150.87 MiB 30.17 MiB Shape (5, 12, 705, 935) (1, 12, 705, 935) Dask graph 5 chunks in 11 graph layers Data type float32 numpy.ndarray",5  1  935  705  12,

Unnamed: 0,Array,Chunk
Bytes,150.87 MiB,30.17 MiB
Shape,"(5, 12, 705, 935)","(1, 12, 705, 935)"
Dask graph,5 chunks in 11 graph layers,5 chunks in 11 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Filter Operators

When interacting with large data collections, it is necessary to keep in mind that it's not possible to load everything!

Therefore, we always have to define our requirements in advance and apply them to the data using filter operators:

### Temporal filter

To slice along time the required data collection with openEO, we can use the [filter_temporal](https://processes.openeo.org/#filter_temporal) process.

After running the next cell, it is visible that the result has less elements (or labels) in the temporal dimension `t`.

In [15]:
datacube = local_conn.load_collection('openeo-localprocessing-data/sample_netcdf/S2_L2A_sample.nc')

datacube_temp_slice = datacube.filter_temporal(['2022-06-10','2022-06-20'])
datacube_temp_slice.execute()

Deserialised process graph into nested structure
Running process load_local_collection
kwargs: {'id': <class 'str'>, 'spatial_extent': <class 'NoneType'>, 'temporal_extent': <class 'NoneType'>}
--------------------------------------------------------------------------------
Running process filter_temporal
kwargs: {'data': <class 'xarray.core.dataarray.DataArray'>, 'extent': <class 'openeo_pg_parser_networkx.pg_schema.TemporalInterval'>}
--------------------------------------------------------------------------------


Walking node root-2139668d-a557-4030-9eab-f5c165c51153
Walking node loadcollection1-2139668d-a557-4030-9eab-f5c165c51153


Unnamed: 0,Array,Chunk
Bytes,50.29 MiB,10.06 MiB
Shape,"(5, 4, 705, 935)","(1, 4, 705, 935)"
Dask graph,5 chunks in 12 graph layers,5 chunks in 12 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 50.29 MiB 10.06 MiB Shape (5, 4, 705, 935) (1, 4, 705, 935) Dask graph 5 chunks in 12 graph layers Data type float32 numpy.ndarray",5  1  935  705  4,

Unnamed: 0,Array,Chunk
Bytes,50.29 MiB,10.06 MiB
Shape,"(5, 4, 705, 935)","(1, 4, 705, 935)"
Dask graph,5 chunks in 12 graph layers,5 chunks in 12 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


### Spatial filter

In [17]:
## TODO: implementation missing in openeo-processes-dask

### Bands filter

To slice along bands the required data collection with openEO, we can use the [filter_bands](https://processes.openeo.org/#filter_bands) process.

After running the next cell, it is visible that the result has only the filtered elements (or labels) in the bands dimension `bands`.

In [16]:
datacube = local_conn.load_collection('openeo-localprocessing-data/sample_netcdf/S2_L2A_sample.nc')

datacube_bands_slice = datacube.filter_bands(['B04','SCL'])
datacube_bands_slice.execute()

Deserialised process graph into nested structure
Running process load_local_collection
kwargs: {'id': <class 'str'>, 'spatial_extent': <class 'NoneType'>, 'temporal_extent': <class 'NoneType'>}
--------------------------------------------------------------------------------
Running process filter_bands
kwargs: {'bands': <class 'list'>, 'data': <class 'xarray.core.dataarray.DataArray'>}
--------------------------------------------------------------------------------


Walking node root-41c024d5-0d59-4997-8f40-5888d2dd0036
Walking node loadcollection1-41c024d5-0d59-4997-8f40-5888d2dd0036


Unnamed: 0,Array,Chunk
Bytes,60.35 MiB,30.17 MiB
Shape,"(2, 12, 705, 935)","(1, 12, 705, 935)"
Dask graph,2 chunks in 12 graph layers,2 chunks in 12 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 60.35 MiB 30.17 MiB Shape (2, 12, 705, 935) (1, 12, 705, 935) Dask graph 2 chunks in 12 graph layers Data type float32 numpy.ndarray",2  1  935  705  12,

Unnamed: 0,Array,Chunk
Bytes,60.35 MiB,30.17 MiB
Shape,"(2, 12, 705, 935)","(1, 12, 705, 935)"
Dask graph,2 chunks in 12 graph layers,2 chunks in 12 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Reduce operators

When computing statistics over time or indices based on multiple bands, it is possible to use reduce operators.

In openEO we can use the [reduce_dimension](https://processes.openeo.org/#reduce_dimension) process, which applies a reducer to a data cube dimension by collapsing all the values along the specified dimension into an output value computed by the reducer.

### Reduce the temporal dimension

The reducer can be a single process like [mean](https://processes.openeo.org/#mean), for computing the average over time. Noitce that in the output the temporal dimension `t` disappear, because it was reduced.

In [18]:
datacube = local_conn.load_collection('openeo-localprocessing-data/sample_netcdf/S2_L2A_sample.nc')

datacube_temp_mean = datacube.reduce_dimension(reducer='mean',dimension='t')
datacube_temp_mean.execute()

Deserialised process graph into nested structure
Running process load_local_collection
kwargs: {'id': <class 'str'>, 'spatial_extent': <class 'NoneType'>, 'temporal_extent': <class 'NoneType'>}
--------------------------------------------------------------------------------
Running process reduce_dimension
kwargs: {'data': <class 'xarray.core.dataarray.DataArray'>, 'dimension': <class 'str'>, 'reducer': <class 'functools.partial'>}
--------------------------------------------------------------------------------
Running process mean
kwargs: {'data': <class 'dask.array.core.Array'>, 'axis': <class 'int'>}
--------------------------------------------------------------------------------


Walking node root-88f2478f-9a8d-4ebd-8db5-5b77b08a6d77
Walking node mean1-156c16a4-9c3c-40da-b7c3-78de4c725382
Walking node loadcollection1-88f2478f-9a8d-4ebd-8db5-5b77b08a6d77


Unnamed: 0,Array,Chunk
Bytes,12.57 MiB,2.51 MiB
Shape,"(5, 705, 935)","(1, 705, 935)"
Dask graph,5 chunks in 13 graph layers,5 chunks in 13 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 12.57 MiB 2.51 MiB Shape (5, 705, 935) (1, 705, 935) Dask graph 5 chunks in 13 graph layers Data type float32 numpy.ndarray",935  705  5,

Unnamed: 0,Array,Chunk
Bytes,12.57 MiB,2.51 MiB
Shape,"(5, 705, 935)","(1, 705, 935)"
Dask graph,5 chunks in 13 graph layers,5 chunks in 13 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


### Reduce the bands dimension

The reducer could be again a single process, but when computing spectral indices like NDVI, NDSI etc. an arithmentical formula is used instead.

For instance, the [NDVI](https://en.wikipedia.org/wiki/Normalized_difference_vegetation_index) formula can be expressed using a `reduce_dimension` process over the `bands` dimension:

$$ NDVI = {{NIR - RED} \over {NIR + RED}} $$

In [31]:
## TODO: add support for labels: https://github.com/Open-EO/openeo-processes-dask/issues/73

In [None]:
datacube = local_conn.load_collection('openeo-localprocessing-data/sample_netcdf/S2_L2A_sample.nc')

def NDVI(data):
    red = data.array_element(label='B04')
    nir = data.array_element(label='B08')
    ndvi = (nir - red)/(nir + red)
    return ndvi

ndvi = datacube.reduce_dimension(reducer=NDVI,dimension='bands')
ndvi.execute()