# Use case: Sentinel-1 ARD on a small area 

Earth Data Hub offers an innovative and super-efficient way to access data.

Here we present how to best access the service in one of the simplest use cases.

## Setup the environment

**If you haven't done it already, follow the [Getting started notebook](./00-getting-started.ipynb) to setup your environment and DestinE credentials.**

## Download 1 year of Sentinel-1 radiometrically corrected images on a small area

Our use case is to compute various statistice of Sentinel-1 radiometrically corrected images over a period of time, e.g. over 1 year or 31 accquisition dates. Note that we will be using the Sentinel-1 ARD datasets over Rome, Italy, acquired with relative orbit 117 and projected in the 33 north UTM zone.

The best practice for downloading data form the Earth Data Hub comprise the following steps:
1. open the dataset using the code snippet found on the page [Sentinel-1 ARD for relative orbit 117 on UTM 33N](https://earthdatahub.destine.eu/collections/sentinel-1-ard/datasets/IW-UTM-RO-italy-33N-117)
2. select the variable
3. select the area of interest (optionally alligning it on chunk boundaries)
4. select the time interval of interest
5. download the data to memory (with `.persist()` or `.compute()`)
6. save the data or compute the result in memory

### Open the dataset

The following assumes you set up the EDH Personal Access Token in your _netrc_ file.

In [None]:
import xarray as xr

s1_33n_117 = xr.open_dataset(
    "https://data.earthdatahub.destine.eu/sentinel-1-ard/IW-UTM-RO-italy-v0/Z33N/O117",
    storage_options={"client_kwargs": {"trust_env": True}},
    chunks={},
    engine="zarr",
    decode_coords="all",
)
s1_33n_117

### Select and prepare the variable

Note that all operation are lazy, that is are not applied to the whole 500GB of data, but are just recorded for later use. Download and computations are only done when requested.

In [None]:
nrb_vv_full = s1_33n_117.nrb.sel(polarization="VV")
nrb_vv_full

### Select the area of interest

For example we are interested in the area of Rome.

This a very convenient example as:
* it is a large enough area to be more interesting than a time-series of a single point,
* it is small enough to fit a single Zarr chunk, so the notebook can be run even with a slow internet connection,
* when plotted it is easy to identify even without adding coastlines and country borders that will make the notebook more complex.

In [None]:
aoi_selection = {
    "x": slice(265_000, 325_000),
    "y": slice(4_615_000, 4_665_000),
}
nrb_vv_aoi = nrb_vv_full.sel(aoi_selection)
nrb_vv_aoi

Note that the data size is not much smaller, we went from 3TB for global data to 3GB of our small area of interest.

We plot the image at one time to double check that the selection is correct:

In [None]:
nrb_vv_aoi.isel(date=-1).coarsen(x=10, y=10).mean().plot(vmax=1.5)

### Optional: allign the area of interest on chunk boundaries

This a pro move and you can skip it.

When accessing the data in Zarr you always dowload whole chunks, even if you are only interestind in part of them. In the case of Sentinel-1 ARD datasets the spatial chunks are `(5000, 5000)` in size and even if you can read above that your DataArray in only 3.5GB of data you are most probably donwloading more data and then throwing a part of it away.

You can use the following (not very nice) code to grow your area of interest to the boundaries of the Zarr chunks, so you use all the data you donwload. Note that now you also have the size of the data that would be downloaded is it was not compressed (the data is in fact compressed so actual downlaod is smaller).

In [None]:
def align_indexer(step, indexer):
    if not isinstance(indexer, slice):
        return indexer
    assert indexer.step is None
    start = indexer.start // step * step if indexer.start else indexer.start
    stop = (indexer.stop // step + 1) * step if indexer.stop else indexer.stop
    return slice(start, stop)


query_results = xr.core.indexing.map_index_queries(
    nrb_vv_full, indexers=aoi_selection, method=None, tolerance=None
)
print(query_results.dim_indexers)

aoi_iselection = {
    dim: align_indexer(5000, query_results.dim_indexers[dim])
    for dim in query_results.dim_indexers
}
print(aoi_iselection)

nrb_vv_aoi = nrb_vv_full.isel(aoi_iselection)
nrb_vv_aoi

So, the full time-series for a whole chunk is really 11.5GB, not 3.5GB.

Let's have a look at the area of the whole chunk.

In [None]:
nrb_vv_aoi.isel(date=-1).coarsen(x=10, y=10).mean().plot(vmax=1.5)

### Select the time interval of interest

Since this dataset is only one year we get all the date available, but you can select on time if you are interested in a smaller time interval. And finally we have a definition of the data we want to download.

In [None]:
nrb_vv_aoi_toi = nrb_vv_aoi.sel(date="2024")
nrb_vv_aoi_toi

From the representation above we learn a few things:
1. the uncompressed data to be downloaded, e.g. 11.5GB (compression depends on the dataset and the variable)
2. the number of chunks to be downloaded, e.g. 124

### Download the data to memory

Finally we are ready to download only the data that we are interested in, in memory.

The best practice is to call the `.compute()` method load the data into a numpy array in memory.

**This operation is the slow one. It takes up to 20 minutes on a 8 Mbps connection.**

It depends on the download speed of your internet connection and on the load on the Earth Data Hub. The closer you are to the data the faster it is, and this is one of the reason the EDH is best suited to be used from within the DestinE platform.

In [None]:
%%time

nrb_vv_aoi_toi_data = nrb_vv_aoi_toi.compute()

### Perform any computation

#### High variance areas south of Rome

We select a slightly samller area so comeputations and plots are faster

In [None]:
nrb_rome = nrb_vv_aoi_toi_data.sel(
    x=slice(290_000, 310_000), y=slice(4_625_000, 4_645_000)
)
nrb_rome

#### Plot one image and the average over a year

First we plot the last image of 2024. Zooming you meay notice the typical gainy texture of SAR images.

In [None]:
nrb_rome.isel(date=-1).plot(vmax=1.5, figsize=(10, 8))

Then we plot the average over the 31 images of 2024. Zooming you may notice fetures are much sharper and the texture is much smoother.

In [None]:
nrb_rome.mean("date").plot(vmax=1.5, figsize=(10, 8))

#### Identify area that change a lot

Relative standard deviation.

In [None]:
relative_std_rome = nrb_rome.std("date") / nrb_rome.mean("date")
relative_std_rome.plot(vmin=0.5, vmax=2, figsize=(10, 8))

In [None]:
x_0, y_0 = 306_100, 4_644_150
width = 250
aoi = dict(x=slice(x_0 - width, x_0 + width + 1), y=slice(y_0 - width, y_0 + width + 1))

nrb_area = nrb_rome.sel(**aoi)
relative_std_area = relative_std_rome.sel(**aoi)

relative_std_area.sel(**aoi).plot(vmin=0.5, vmax=2, figsize=(10, 8))

In [None]:
nrb_area_anomaly = nrb_area - nrb_area.mean("date")
nrb_area_anomaly

In [None]:
nrb_area_anomaly.mean(["x", "y"]).plot()

In [None]:
nrb_area.sel(**aoi).resample(date="MS").mean().plot(col="date", col_wrap=3, vmax=1.5)