<center><img src="https://raw.githubusercontent.com/EO-College/cubes-and-clouds/main/icons/cnc_3icons_process_circle.svg"
     alt="Cubes & Clouds logo"
     style="float: center; margin-right: 10px; margin-left: 10px; max-height: 250px;" /></center>

# 2.3 Data Access and Basic Processing

<img src="https://raw.githubusercontent.com/pangeo-data/pangeo.io/refs/heads/main/public/Pangeo-assets/pangeo_logo.png"
     alt="Pangeo logo"
     style="float: center; margin-right: 10px; max-height: 80px;"/>

## Aggregate Operators with Pangeo

### `resample`: temporal aggregation with predefined intervals

We start by creating the shared folders and data files needed to complete the exercise using the following shell commands

In [None]:
!cp -r ${DATA_PATH%/*/*}/notebooks/cubes-and-clouds/lectures/2.3_data_access/exercises/pangeo/region.geojson $HOME/

Start importing the necessary libraries.

In [None]:
import pystac_client
import stackstac
import rioxarray

Define the necessary parameters

In [None]:
spatial_extent = [11.4, 45.5, 11.42, 45.52]
temporal_extent = ["2020-01-01", "2020-12-31"]
bands = ["red","green","blue"]
properties = {"eo:cloud_cover": dict(lt=15)}

Query the STAC Catalog to get the corresponding STAC Items for Sentinel-2

In [None]:
URL = "https://earth-search.aws.element84.com/v1"
catalog = pystac_client.Client.open(URL)
items = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=spatial_extent,
    datetime=temporal_extent,
    query=properties
).item_collection()

Create the starting Sentinel-2 datacube:

In [None]:
s2_cube = stackstac.stack(items,
                     bounds_latlon=spatial_extent,
                     assets=bands
)
s2_cube

We might be interested in aggregating our data over periods like week, month, year etc., defining what operation to use to combine the data available in the chosen period.

Using `resample` with a sampling frequency (e.g. '1MS' ) to specify how to resample the data, we can achieve this easily:

In [None]:
s2_monthly_min = s2_cube.resample(time="1MS").min(dim="time")

Check what happens to the datacube inspecting the resulting Xarray object. Now the `time` dimension has 12 labels, one for each month.

In [None]:
s2_monthly_min

### Spatial aggregation over an Area of Interest
One of the basic concepts in GIS is to clip data using a vector geometry. Xarray is not directly capable of dealing with vectors but thanks to Rioxarray it can be easily achieved. Rioxarray extends Xarray with most of the features that Rasterio (GDAL) brings.

Let's first define the area of interest. It is defined in a geojson file which we can read with geopandas.

In [None]:
import geopandas as gpd

In [None]:
AOI = gpd.read_file('region.geojson')

In [None]:
AOI.geometry

In [None]:
AOI.plot()

In [None]:
epsg = s2_cube["proj:epsg"].values

We reproject our AOI to the same coordinate Reference System than our Sentinel-2 datacube and we clip the data with the polygon that has been obtained through geopandas at the beginning of the notebook.

In [None]:
s2_clipped = s2_cube.rio.clip(AOI.to_crs(epsg=epsg).geometry, crs=epsg)
s2_clipped

In [None]:
s2_clipped.isel(time=0).isel(band=0).plot()

We finally perform the spatial aggregation, taking the average over the remaining pixels:

In [None]:
region_mean_xr = s2_clipped.mean(("x", "y"))
region_mean_xr

We can compute the result and plot the resulting time series of values for a sample band:

In [None]:
region_mean_xr = region_mean_xr.loc[dict(band="red")].compute()
region_mean_xr.plot()