{{title_s1_3}}

Now that we have read in and organized the stack of Sentinel-1 RTC images, let's take a look at the data.

::::{tab-set}
:::{tab-item} Outline

(content.section_A)=
**[A. Read and prepare data](#a-read-and-prepare-data)**  
- {{a1_s1_nb3}}  

(content.section_B)=
**[B. Initial data visualization](#b-initial-visualization)**  
- {{b1_s1_nb3}}  

(content.section_C)=
**[C. Orbital directoin](#c-orbital-direction)**  
- {{c1_s1_nb3}}
- {{c2_s1_nb3}}

(content.section_D)=
**[D. Examine backscatter variability](#d-examine-backscatter-variability)**  
- {{d1_s1_nb3}}

(content.section_E)=
**[E. Handle duplicate time steps](#e-handling-duplicate-time-steps)**
- {{e1_s1_nb3}}
- {{e2_s1_nb3}}
- {{e3_s1_nb3}}

:::
:::{tab-item} Learning goals  
{{concepts}}
- Spatial joins of raster and vector data.  
- Visualize raster data.  
- Use raster metadata to aid interpretation of backscatter imagery.  
- Examine data quality using provided layover-shadow maps.  
- Identify and remove duplicate time step observations.  

{{techniques}}
- Clip raster data cube using vector data with [`rioxarray.clip()`](https://corteva.github.io/rioxarray/html/examples/clip_geom.html).  
- Using `xr.groupby()` for [grouped statistics](https://docs.xarray.dev/en/stable/user-guide/groupby.html).  
- Reorganizing data with `xr.Dataset.reindex()`.  
- Visualizing multiple facets of the data using `FacetGrid`


:::
::::

:::{admonition} ASF Data Access
You can download the RTC-processed backscatter time series [here](https://zenodo.org/record/7236413#.Y1rNi37MJ-0). For more detail, see [tutorial data](../background/tutorial_data.md#sentinel-1-rtc-datasets) and the [notebook](1_read_asf_data.ipynb) on reading ASF Sentinel-1 RTC data into memory.
:::

In [None]:
%xmode minimal
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import pathlib
import rioxarray as rio
import warnings
import xarray as xr

import s1_tools

warnings.filterwarnings("ignore", category=UserWarning)

In [70]:
cwd = pathlib.Path.cwd()
tutorial2_dir = pathlib.Path(cwd).parent

## A. Read and prepare data

We'll go through the steps shown in [metadata wrangling](2_wrangle_metadata.ipynb), but this time,  combined into one function from `s1_tools`. 

In [71]:
vv_vrt_path = "../data/vrt_files/s1_stackVV.vrt"
vh_vrt_path = "../data/vrt_files/s1_stackVH.vrt"
ls_vrt_path = "../data/vrt_files/s1_stackLS.vrt"

In [72]:
asf_data_cube = s1_tools.metadata_processor(
    vv_path=vv_vrt_path, vh_path=vh_vrt_path, ls_path=ls_vrt_path
)

In [None]:
asf_data_cube

### {{a1_s1_nb3}}

Until now, we've kept the full spatial extent of the dataset. This hasn't been a problem because all of our operations have been lazy. Now, we'd like to visualize the dataset in ways that require eager instead of lazy computation. We subset the data cube to a smaller area to focus on a location interest to make computation more less computationally-intensive.

Later notebooks use a different Sentinel-1 RTC dataset that is accessed for a smaller area of interest. Clip the current data cube to that spatial footprint:

In [74]:
# Read vector data
pc_aoi = gpd.read_file(
    "https://github.com/e-marshall/sentinel1_rtc/raw/main/hma_rtc_aoi.geojson"
)

Visualize location

In [None]:
pc_aoi.explore()

Check the CRS and ensure it matches that of the raster data cube:

In [76]:
assert asf_data_cube.rio.crs == pc_aoi.crs, (
    f"Expected: {asf_data_cube.rio.crs}, received: {pc_aoi.crs}"
)

Clip the raster data cube by the extent of the vector:

In [None]:
clipped_cube = asf_data_cube.rio.clip(pc_aoi.geometry, pc_aoi.crs)
clipped_cube

Persist the dataset in memory so that later operations will be more efficient.

In [78]:
clipped_cube = clipped_cube.persist()

Great, we've gone from an object where each 3-d variable is ~ 90 GB to one where each 3-d variable is ~45 MB, this will be much easier to work with.

## B. Initial data visualization

Define a function to visualize backscatter imagery for VV and VH polarizations and the layover-shadow at a given point in time.

In [79]:
def plot_timestep(input_arr:xr.DataArray, time_step_int:int):
    """plots VV and VH polarizations of a given dataset at a given time step"""
    date = input_arr.isel(acq_date=time_step_int).acq_date.dt.date.data
    fig, axs = plt.subplots(ncols=3, figsize=(24, 7))

    input_arr.isel(acq_date=time_step_int).ls.plot(ax=axs[0])
    s1_tools.power_to_db(input_arr.isel(acq_date=time_step_int).vv).plot(
        ax=axs[1], cmap=plt.cm.Greys_r
    )
    s1_tools.power_to_db(input_arr.isel(acq_date=time_step_int).vh).plot(
        ax=axs[2], cmap=plt.cm.Greys_r
    )
    fig.suptitle(
        f"Layover-shadow mask (L), VV (C) and VH (R) backscatter {str(input_arr.isel(acq_date=time_step_int).acq_date.data)[:-19]}"
    )
    axs[0].set_title(f"{date} layover-shadow map")
    axs[1].set_title(f"{date} VV backscatter")
    axs[2].set_title(f"{date} VH backscatter")

### {{b1_s1_nb3}}

Each Sentinel-1 RTC scene directory contains a GeoTIFF file containing a layover-shadow mask. This can help to understand missing data you might see in the imagery files. The layover shadow masks are coded to represent a number of different types of pixels:

The following information is copied from the README file that accompanies each scene: 

```{**Layover-shadow mask**}

The layover/shadow mask indicates which pixels in the RTC image have been affected by layover and shadow. This layer is tagged with _ls_map.tif

The pixel values are generated by adding the following values together to indicate which layover and shadow effects are impacting each pixel:
0.  Pixel not tested for layover or shadow
1.  Pixel tested for layover or shadow
2.  Pixel has a look angle less than the slope angle
4.  Pixel is in an area affected by layover
8.  Pixel has a look angle less than the opposite of the slope angle
16. Pixel is in an area affected by shadow

There are 17 possible different pixel values, indicating the layover, shadow, and slope conditions present added together for any given pixel._

The values in each cell can range from 0 to 31:
0.  Not tested for layover or shadow
1.  Not affected by either layover or shadow
3.  Look angle < slope angle
5.  Affected by layover
7.  Affected by layover; look angle < slope angle
9.  Look angle < opposite slope angle
11. Look angle < slope and opposite slope angle
13. Affected by layover; look angle < opposite slope angle
15. Affected by layover; look angle < slope and opposite slope angle
17. Affected by shadow
19. Affected by shadow; look angle < slope angle
21. Affected by layover and shadow
23. Affected by layover and shadow; look angle < slope angle
25. Affected by shadow; look angle < opposite slope angle
27. Affected by shadow; look angle < slope and opposite slope angle
29. Affected by shadow and layover; look angle < opposite slope angle
31. Affected by shadow and layover; look angle < slope and opposite slope angle

```

The ASF RTC image [product guide](https://hyp3-docs.asf.alaska.edu/guides/rtc_product_guide/) has detailed descriptions of how the data is processed and what is included in the processed dataset.

In [None]:
plot_timestep(clipped_cube, 11)

The `layover-shadow` variable provides categorical information. We can use a more appropriate colormap for this purpose:

In [None]:
cat_cmap = plt.colormaps["gist_ncar"].resampled(20)

fig, axs = plt.subplots(ncols=2, figsize=(18, 9))

clipped_cube.isel(acq_date=10).ls.plot(ax=axs[0], vmax=22, vmin=1, cmap=cat_cmap)
clipped_cube.isel(acq_date=11).ls.plot(ax=axs[1], vmax=22, vmin=1, cmap=cat_cmap);

It looks like there are areas affected by different types of distortion on different dates. For example, in the lower left quadrant, there is a region that is blue (5 - affected by layover) on 6/7/2021 but much of that area appears to be in radar shadow on 6/10/2021. This pattern is present throughout much of the scene with portions of area that are affected by layover in one acquisition in shadow in the next acquisition. This is due to the viewing geometries of different orbital passes: one of the above scenes  was collected during an ascending pass of the satellite and one during a descending pass.

## C. Orbital direction

Sentinel-1 is a right-looking sensor and it images areas on Earth's surface in orbits when it is moving N-S (a descending orbit) and S-N (an ascending orbit). It images the same footprint on both passes but from different directions. The data coverage map below illustrates these directional passes, it can be found online [here](https://asf.alaska.edu/daac/sentinel-1-acquisition-maps/)

```{image} ../imgs/slc_coverage_asf.png
:align center
```
ASF Sentinel-1 Cumulative coverage map.

In areas of high-relief topography such as the area we're observing, there can be strong terrain distortion effects such as layover and shadow. These are some of the distortions that RTC processing corrects, but sometimes it is not possible to reliably extract backscatter in the presence of strong distortions. The above image shows the layover-shadow map for an ascending and a descending image side-by-side, which is why different areas are affected by layover (5) and shadow (17) in each. 

Thanks to all the setup work we did in the previous notebook, we can quickly confirm that all of the observations were taken at two times of day, corresponding to ascending and descending passes of the satellite, and that the time steps shown above were taken at different times of day.

:::{note}
The acquisition time of Sentinel-1 images is not in local time.
:::

In [None]:
print(
    "Hour of day of acquisition 10: ",
    clipped_cube.isel(acq_date=10).acq_date.dt.hour.data,
)
print(
    "Hour of day of acquisition 11: ",
    clipped_cube.isel(acq_date=11).acq_date.dt.hour.data,
)
clipped_cube.acq_date.dt.hour

### {{c1_s1_nb3}}

While it is simple to determine one pass from another, it is not always straightforward to know if a pass is ascending or descending. The timing of these passes depends on the location on earth of the image. 

In the location covered by this dataset, ascending passes correspond to an acquisition time roughly 0:00 UTC and descending passes correspond to approximately 12:00 UTC. 

### {{c2_s1_nb3}}
This is another example of time-varying metadata, so it should be stored as a coordinate variable. Use [`xr.where()`](https://docs.xarray.dev/en/stable/generated/xarray.where.html) to assign the correct orbital direction value depending on an observation's acquisition time and then assign it as a coordinate variable to the clipped raster data cube.

In [83]:
clipped_cube.coords["orbital_dir"] = (
    "acq_date",
    xr.where(clipped_cube.acq_date.dt.hour.data == 0, "asc", "desc"),
)

## D. Examine backscatter variability

Let's look at how VV and VH backscatter vary over time. Make a new Xarray object that holds the mean backscatter of VV and VH to visualize this more easily. 

In [84]:
# Mean backscatter of both polarizations
mean_pol = xr.Dataset(
    {
        "vv": clipped_cube["vv"].mean("acq_date"),
        "vh": clipped_cube["vh"].mean("acq_date"),
    }
)

Convert the power measurements provided in the raster data cube to decibels and then use Xarray [FacetGrid](https://docs.xarray.dev/en/latest/generated/xarray.plot.FacetGrid.html) plotting to make a two-column plot of mean backscatter for VV and VH. 

In [None]:
a = s1_tools.power_to_db(mean_pol).to_array("pol").plot(col="pol", 
                                                        cmap=plt.cm.Greys_r,
                                                        cbar_kwargs={"label":"dB"},
                                                        )
for i in range(len(a.axs[0])):
    a.axs[0][i].set_xlabel(None)
    a.axs[0][i].tick_params(axis='x', labelrotation=45)
a.axs[0][0].set_ylabel(None)
a.fig.suptitle('Mean backscatter over time of dual-pol (VV) and cross-pol (VH) imagery',
               y=1.02)
a.fig.supylabel('Y-coordinate of projection (m)')
a.fig.supxlabel('X-coordinate of projection (m)', y=-0.15)
;

The area that we're looking at is in the mountainous region on the border between the Sikkim region of India and China. There are four north-facing glacier visible in the image, each with a lake at the toe. Bodies of water like lakes tend to appear dark in C-band SAR images because water is smooth with respect to the wavelength of the signal, meaning that most of the emitted signal is scattered away from the sensor. where surfaces that are rough at the scale of C-band wavelength, more signal is returned to the sensor and the backscatter image is brighter. For much more detail on interpreting SAR imagery, see the resources linked in the Sentinel-1 section of the [tutorial data](../../background/tutorial_data.md) page.

### {{d1_s1_nb1}}

Now let's look at how backscatter may vary seasonally for a single polarization (for more on time-related GroupBy operations see the [Xarray User Guide](https://docs.xarray.dev/en/stable/user-guide/time-series.html#resampling-and-grouped-operations). This is an example of a 'split-apply-combine' operation, where a dataset is split into groups (in this case, time steps are split into seasons), an operation is applied (in this case, the mean is calculated) and then the groups are combined into a new object.

In [86]:
clipped_cube_gb = clipped_cube.groupby("acq_date.season").mean()

The temporal dimension of the new object has an element for each season rather than an element for each time step.

In [None]:
clipped_cube_gb

In [None]:
# order seasons correctly
clipped_cube_gb = clipped_cube_gb.reindex({"season": ["DJF", "MAM", "JJA", "SON"]})
clipped_cube_gb

Visualize mean backscatter in each season:

In [None]:
fg_ = s1_tools.power_to_db(clipped_cube_gb.vv).plot(col="season", cmap=plt.cm.Greys_r);

The glacier surfaces appear much darker during the summer months than other seasons, especially in the lower reaches of the glaciers. Like the lake surfaces above, this suggests largely specular scattering where no signal returns in the incident direction. It could be that during the summer months, enough liquid water is present at the glacier surface to produce this scattering. 

## E. Handling duplicate time steps

If we take a closer look at the ASF dataset, we can see that there are a few scenes from identical acquisitions (this is apparent in `acq_date` and more specifically in `product_id`). Let's examine these and see what's going on. 

First we'll extract the `data_take_ID` from the Sentinel-1 granule ID: 

In [None]:
clipped_cube.data_take_ID.data

Let's look at the number of unique elements using [`np.unique()`](https://numpy.org/doc/stable/reference/generated/numpy.unique.html).

In [None]:
data_take_ids_ls = clipped_cube.data_take_ID.data.tolist()
data_take_id_set = np.unique(clipped_cube.data_take_ID)
len(data_take_id_set)

### {{e1_s1_nb3}}

Interesting - it looks like there are only 96 unique elements. Let's figure out which are duplicates:

In [92]:
def duplicate(input_ls):
    return list(set([x for x in input_ls if input_ls.count(x) > 1]))

In [None]:
duplicate_ls = duplicate(data_take_ids_ls)
duplicate_ls

These are the data take IDs that are duplicated in the dataset. We now want to subset the xarray object to only include these data take IDs: 

In [None]:
asf_duplicate_cond = clipped_cube.data_take_ID.isin(duplicate_ls)
asf_duplicate_cond

In [None]:
duplicates_cube = clipped_cube.where(asf_duplicate_cond == True, drop=True)
duplicates_cube

### {{e2_s1_nb3}}

Great, now we have a 12-time step Xarray object that contains only the duplicate data takes. Let's see what it looks like. We can use `xr.FacetGrid` objects to plot all of the arrays at once.

Before we make a FacetGrid plot, we need to make a change to the dataset. FacetGrid takes a column and expands the levels of the provided dimension into individual sub-plots (a small multiples plot). We're looking at the duplicate time steps, meaning the elements of the `acq_date` dimension are non-unique. FacetGrid expects unique values along the specified coordinate array. If we were to directly call: 
```python
fg = duplicates_cube.vv.plot(col="acq_date", col_wrap=4)
``` 
We would receive the following error: 
```
ValueError: Coordinates used for faceting cannot contain repeated (nonunique) values.
```


Renaming the dimensions of `duplicates_cube` with [`xr.rename_dims()`](https://docs.xarray.dev/en/latest/generated/xarray.Dataset.rename_dims.html) demotes the `acq_date` coordinate array to non-dimensional coordinate and replaces it with `step` an array of integers. Because these are unique, we can make a FaceGrid plot with the `step` dimension.

In [None]:
duplicates_cube.rename_dims({'acq_date':'step'})

In [None]:
fg = duplicates_cube.rename_dims({"acq_date": "step"}).vv.plot(col="step", col_wrap=4)

Interesting, it looks like there's only really data for the 0, 2, 4, 7 and 9 elements of the list of duplicates. It could be that the processing of these files was interrupted and then restarted, producing extra empty arrays.

### {{e3_s1_nb3}}

To drop these arrays, extract the product ID (the only variable that is unique among the duplicates) of each array we'd like to remove.

In [98]:
drop_ls = [1, 3, 5, 6, 8, 10, 11]

We can use xarray's `.isel()` method, `.xr.DataArray.isin()`, `xr.Dataset.where()`, and list comprehension to efficiently subset the time steps we want to keep: 

In [None]:
drop_product_id_ls = duplicates_cube.isel(acq_date=drop_ls).product_id.data
drop_product_id_ls

Using this list, we want to drop all of the elements of `clipped_cube` where product Id is one of the values in the list.

In [100]:
duplicate_cond = ~clipped_cube.product_id.isin(drop_product_id_ls)

In [None]:
clipped_cube = clipped_cube.where(duplicate_cond == True, drop=True)
clipped_cube

## Conclusion

In this notebook, we demonstrated how to use the data cube that we assembled in the previous notebooks. We saw various ways that having metadata accessible and attached to the correct dimensions of the data cube made learning about teh dataset much smoother and more efficient than it would otherwise be. 

In the next notebook, we'll work with a different Sentinel-1 RTC dataset. We'll write this dataset to disk in order to use it in the final notebook of the tutorial, a comparison of two datasets. 

In [None]:
clipped_cube.to_zarr("../data/s1_asf_clipped_cube.zarr", mode="w")