<img src="https://raw.githubusercontent.com/EO-College/cubes-and-clouds/main/icons/cnc_3icons_process_circle.svg"
     alt="Cubes & Clouds logo"
     style="float: center; margin-right: 10px;" />

# 2.3 Data Access
This is a basic introduction on how to access EO data on a cloud provided using an Application Protocol Interface (API).

The main concepts that will presented here will use openEO as API, and the openEO Python Client library.

We will interact with cloud providers offering free and open data through STAC catalogs.

This functionality allows to create openEO workflows using the same syntax as when working connected to an openEO back-end, but works with local computing resources.

The main concepts presented in this notebook are:
- Lazy data loading
- Filter operators
- Reduce operators
- Apply operators
- Aggregate operators
- Create a simple EO workflow: load, filter, apply a function, extract information


In [1]:
import openeo
from openeo.local import LocalConnection
local_conn = LocalConnection('')



## Lazy data loading

When accessing data using an API, most of the time the data is **lazily** loaded.

It means that just the metadata is loaded, so that it is possible to know about the data dimensions and their extents (spatial and temporal), the available bands and other additional information.

Let's start with a call to the openEO process `load_stac` for lazily loading some Sentinel-2 data from a public STAC Collection.

We need to specify an Area Of Interest (AOI) to get only part of the Collection, otherwise our code would try to load the metadata of all Sentinel-2 tiles available in the world!

In [2]:
url = "https://earth-search.aws.element84.com/v0/collections/sentinel-s2-l2a-cogs"
spatial_extent = {"west": 11.1, "east": 11.5, "south": 46.1, "north": 46.5}

datacube = local_conn.load_stac(url=url,
                    spatial_extent=spatial_extent)
datacube

  complain("No cube:dimensions metadata")


Calling the `.execute()` method, the data will be lazily loaded and an `xArray` object returned.

Running the next cell will show the selected data content with the dimension names and their extent:

In [3]:
datacube.execute()

Deserialised process graph into nested structure
Running process load_stac
kwargs: {'spatial_extent': <class 'openeo_pg_parser_networkx.pg_schema.BoundingBox'>, 'url': <class 'str'>}
--------------------------------------------------------------------------------


Walking node root-08978391-9e75-467b-a321-2e545469e2ce


Unnamed: 0,Array,Chunk
Bytes,1.35 TiB,8.00 MiB
Shape,"(751, 17, 4534, 3209)","(1, 1, 1024, 1024)"
Dask graph,383010 chunks in 4 graph layers,383010 chunks in 4 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 1.35 TiB 8.00 MiB Shape (751, 17, 4534, 3209) (1, 1, 1024, 1024) Dask graph 383010 chunks in 4 graph layers Data type float64 numpy.ndarray",751  1  3209  4534  17,

Unnamed: 0,Array,Chunk
Bytes,1.35 TiB,8.00 MiB
Shape,"(751, 17, 4534, 3209)","(1, 1, 1024, 1024)"
Dask graph,383010 chunks in 4 graph layers,383010 chunks in 4 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


From the output of the previous cell you can notice something really interesting: **the size of the selected data is 1.35 TB!**

But you should have noticed that it was too quick to download this big amount of data.

This is what lazy loading allows: getting all the information about the data in a quick manner without having to access and download all the available files.