# Load, Subset, and Reproject NISAR GCOV Data with `xarray`

## Load GCOV data with `xarray` using [utility functions](../util/load_gcov.py) included in this Cookbook

Loading the data as an `xarray.Dataset` provides access to `xarray`â€™s full toolset and makes it easy to work with lazily loaded, Dask-backed arrays.

### Find the paths to the GCOV data for the time series

In [None]:
from pathlib import Path

gcov_paths = list(Path("/home/jovyan/NISAR_GCOV_Cookbook/notebooks/time_series_example/data").glob("*.h5"))
gcov_paths

### Load GCOV data into an xarray.Dataset with `load_gcov_ts_xr`

If you pass a single GCOV path to `load_gcov_ts_xr`, instead of a list of paths, it will create a time series Dataset with a single time step.

`load_gcov_ts_xr` lazily loads all raster data into xarray data structures with delayed HDF5 reads, so the data are not stored in memory until computed.

#### Load a single GCOV dataset

In [None]:
import sys
from pathlib import Path

util_dir = Path.cwd().parent / "util"
sys.path.insert(0, str(util_dir))

from load_gcov import load_gcov_ts_xr

ds = load_gcov_ts_xr(gcov_paths[0])
ds

#### Access the backscatter raster


In [None]:
vvvv = ds["VVVV"].isel(time=0, frequency=0)
vvvv

#### With the data in `xarray`, we can view its {abbr}`T (transpose)` attribute without having to first load it into memory 

In [None]:
# vvvv transpose
vvvv.T

#### Load a GCOV time series

Pass a list of gcov paths instead of a single path.

In [None]:
ds = load_gcov_ts_xr(gcov_paths)
ds

#### Access a single time step of the VVVV data

In [None]:
vvvv = ds["VVVV"].sel(time="2025-10-31T04:44:09", frequency="B")
vvvv

#### Call the `xarray.mean` function

Notice that it does not provide a value. This is because the data is lazily loaded and has not yet been computed or stored to memory.

In [None]:
vvvv_mean = vvvv.mean()
vvvv_mean

#### We can force the value to be computed by casting it to an appropriate datatype or calling `xarray.compute`

Note that computing does not cache the data.

In [None]:
# cast to float
float(vvvv_mean)

In [None]:
# call compute
vvvv_mean.compute()

### Subset the data at load-time

#### Load the time series of only the VVVV data for a single frequency in a subset spatial {abbr}`AOI (Area of Interest)`

In [None]:
ds = load_gcov_ts_xr(
    gcov_paths, 
    vars_to_load=["VVVV"],
    freqs=["B"],
    y_slice=slice(1900000, 1800000),
    x_slice=slice(600000, 700000))
ds

### Plot data with `xarray.plot`

#### Notice that due to outliers, most of the data are squeezed into a narrow portion of the colormap.

In [None]:
# Plot a single VVVV image
vvvv = ds["VVVV"].isel(time=0, frequency=0)
vvvv.plot()

#### Set `vmin` and `vmax` to better scale the data across the colormap.

In [None]:
vvvv = ds["VVVV"].isel(time=0, frequency=0)
vvvv.plot(vmin=0, vmax=0.0000000019)

### Reproject the data


:::{note} A note about reprojecting with `rioxarray`
`rioxarray` makes it easy to reproject data in an `xarray.Dataset` or `xarray.DataArray`, however it only works with 2D and 3D data. The time series data set contains four dimensions (time, frequency, y, x). 

To reproject, we must select data in 2 or 3 dimensions:
- Select a single frequency and time-step
- Select all (or multiple) time-steps for a single frequency
- Select both frequencies for a single time-step
:::

#### Reproject a single frequency and time-step

In [None]:
# select first time-step and frequency
ds_single_date_freq = ds.isel(time=0, frequency=0)

# reproject to EPSG 4326
ds_single_date_freq = ds_single_date_freq.rio.reproject("EPSG:4326")

ds_single_date_freq 

#### Reproject all (or multiple) time-steps for a single frequency

In [None]:
# select frequency B for all time-steps
ds_single_freq = ds.sel(frequency="B")

# reproject to EPSG 4326
ds_single_freq = ds_single_freq.rio.reproject("EPSG:4326")

ds_single_freq

#### Reproject both frequencies for a single time-step

In [None]:
# select all frequencies for a single time-step
ds_single_date = ds.isel(time=0)

# reproject to EPSG 4326
ds_single_date = ds_single_date.rio.reproject("EPSG:4326")

ds_single_date

In [None]:
d_list = [{"a": 1, "b": 2}, {"a": 11, "b": 22}]
c = {k: [d[k] for d in d_list] for k in d_list[0]}
c