# Adding Spatial Metadata to AORC Forcing

**Authors**:  
  - Tony Castronova <acastronova@cuahsi.org>   
  - Irene Garousi-Nejad <igarousi@cuahsi.org>    
  
**Last Updated**: 04.04.2023  

**Description**:  

This notebook demonstrates how to add spatial metadata to the AORC v1.0 forcing data that is stored on [HydroShare's THREDDs](https://thredds.hydroshare.org/thredds/catalog/aorc/data/16/catalog.html). The original AORC v1.0 data contains `east_west` and `south_north`, which allows us to slice the gridded data via `x` and `y` indices for simple visualization and data analysis purposes. However, adding additional spatially-related metadata (e.g. coordinate reference system) to datasets that only contain `x` and `y` indices can significantly enhance their utility for a wide range of spatial analysis and modeling applications such as: 

- Perform spatial queries: Spatial information allows us to perform location-based queries, which can be help identify patterns or trends in the data that may be specific to certain areas. 
- Conduct spatial analysis: Spatial information enables spatial analysis, such as interpolation, zoning, or overlaying data layers.
- Visualize data on maps: Spatial information allows you to display data on maps, making it easier to understand spatial patterns and relationships in the data.

This notebook demonstrates one method for doing this.

**Software Requirements**

This notebook was developed using the following software and operating system versions.

OS: MacOS Ventura 13.0.1  
> Python: 3.10.0  \
> re: <>  \
> numpy: 1.24.1  \
> pyproj: 3.4.1  \
> xarray: 0.17.0  \
> rioxarray: 0.13.3  \
> cartopy: 0.21.1  \
> netCDF4: 1.6.1  \
> owslib: <>  \
> matplotlib: <>  

OS: Microsoft Windows 11 Pro version 10.0.22621
> Conda: 22.9.0  \
> Python: 3.9.16  \
> re: 2.2.1  \
> numpy: 1.23.5  \
> pyproj: 3.5.0  \
> xarray: 2023.3.01<span style="font-size:12px"><sup>1</sup></span>  \
> rioxarray: 0.14.02<span style="font-size:12px"><sup>2</sup></span>  \
> cartopy: 0.21.13<span style="font-size:12px"><sup>3</sup></span>  \
> netCDF4: 1.6.34<span style="font-size:12px"><sup>4</sup></span>  \
> owslib: 0.24.15<span style="font-size:12px"><sup>5</sup></span> \
> matplotlib: 3.7.1  

<span style="font-size:12px"> <sup>1</sup> If not yet installed, use `conda install xarray -c conda-forge`. </span>   
<span style="font-size:12px"> <sup>2</sup>If not yet installed, use `conda install rioxarray -c conda-forge`. </span>        
<span style="font-size:12px"> <sup>3</sup>If not yet installed, use `conda install cartopy`. Note that we are not using `-c conda-forge` because it gives DLL failed message for some reason. </span>       
<span style="font-size:12px"> <sup>4</sup>If not yet installed, use `conda install netcdf4`. Note that we are not using `-c conda-forge` because it gives DLL failed message for some reason. </span>      
<span style="font-size:12px"> <sup>5</sup>If not yet installed, use `conda install owslib`.   


---

In [None]:
import re
import numpy
import pyproj
import xarray
import rioxarray 
import cartopy.crs as ccrs
from pyproj import Transformer
import matplotlib.pyplot as plt
from owslib.wms import WebMapService

## Load the AORC v1.0 Data and Check its Dimensions

The AORC v1.0 data stored in HydroShare's ThREDDS catalog covers the Great Basin watershed from 2010-2019. The dataset is divided into 120 netCDF files, each containing hourly values of meteorological variables for an entire month of a year. To load a single month of the AORC v1.0 data from HydroShare's THREDDS, you can use the `open_dataset` function from the `xarray` package to do this task. To learn more about `chuncks` and `decode_coords` properties that are used when loading the data, please refer to our how to [Query AORC FOrcing Data via HydroShare Thredds](https://github.com/CUAHSI/notebook-examples/blob/main/thredds/query-aorc-thredds.ipynb) jupyter notebook. 

In [None]:
# load a single month of data
ds = xarray.open_dataset('http://thredds.hydroshare.org/thredds/dodsC/aorc/data/16/201001.nc',
                         chunks={'Time': 10, 'west_east': 285, 'south_north':275},
                         decode_coords="all" )
ds

Notice that the `south_north` and `west_east` dimensions contain indices, without corresponding coordinate values.

In [None]:
ds.south_north

## Load GeoSpatial Metadata for the National Water Model

Using the spatial information provided in the [GeoSpatial Metadata for NWM v2.0](https://www.hydroshare.org/resource/2a8a3566e1c84b8eb3871f30841a3855/) that is stored in HydroShare, we can determine the corresponding coordinate values for the `south_north` and `west_east` indices and add them to the AORC v1.0 data. This information can be found in the `WRF_Hydro_NWM_geospatial_data_template_land_GIS.nc` file, which is a part of the NWM v2.0 domain dataset and contains the necessary spatial metadata. We can obtain this file by accessing via HydroShare's THREDDS too.

In [None]:
ds_meta = xarray.open_dataset('http://thredds.hydroshare.org/thredds/dodsC/hydroshare/resources/2a8a3566e1c84b8eb3871f30841a3855/data/contents/WRF_Hydro_NWM_geospatial_data_template_land_GIS.nc')
ds_meta

Check the coordinates as well as the coordinate reference system (CRS) from the geospatial metadata file.

In [None]:
ds_meta.coords

In [None]:
ds_meta.crs.attrs

## Add GeoSpatial Metadata to the AORC Dataset

The AORC v1.0 data we use here (`ds`) covers the Great Basin, whereas the geospatial metadata (`ds_meta`) encompasses the entire CONUS. It is noteworthy that the `south_north` and `west_east` dimensions's indices start from 0, which poses a challenge in relating the smaller `ds` domain to the larger `ds_meta` domain, as there is no explicit spatial information. This complicates the task of assigning the corresponding coordinates to `ds` from `ds_meta`. To establish a spatial linkage between the `ds` and `ds_meta` datasets, we use the offsets defined in the AORC v1.0 history attribute. These offsets help us subset the corresponding `ds_meta` coordinates for the same area as the `ds` domain. The following function simplifies the lookup of these offsets.

In [None]:
def pattern_lookup(pattern, input):

    """
    Searches for a specified pattern in a string and extracts the values.

    Args: 
        pattern (str): A string pattern to search for using a regular expression.
        input (str): The input string where the regular expression pattern will be searched for.

    Returns:
        str: The matched values concatenated into the desired pattern format.
    """
    
    # use the re.search() function to search for the pattern in the string
    match = re.search(pattern, input)

    # check if a match was found
    if match:
        # extract the matched values and concatenate them into the desired string format
        result = f'{match.group(0)}'
        return result
    else:
        # if no match was found, print an error message
        print('No match found.')

Define the regular expression patterns for both indices along the `x` and `y` coordinates.

In [None]:
pattern_we = r'west_east,(\d+),(\d+)'
pattern_sn = r'south_north,(\d+),(\d+)'

Execute the `pattern_lookup` function to search for the previously defined patterns in the `ds` attribute. Then, print the outcomes (i,e., the x and y indices representing the smaller region relative to the NWM domain).

In [None]:
GSL_westeast = pattern_lookup(pattern_we, ds.attrs['history'])
GSL_southnorth = pattern_lookup(pattern_sn, ds.attrs['history'])

y_index = GSL_southnorth.split(',')[1:]
x_index = GSL_westeast.split(',')[1:]

print('\n', 'y indices: ', y_index, '\n', 'x indices: ', x_index)

To extract the `x` and `y` values from `ds_meta` that correspond to the subset indices obtained above, we can use Python's array indexing operator (`[]`). Since the `x` values increase as we move from west to east, we can simply use the subset indices obtained above to select the desired chunk. However, for the `y` selection, we need to subtract the subset indices obtained above from the length of the `y` array in `ds_meta`. This is because the projected coordinates in the `y` array decrease as their indices increase, which is opposite to the usual direction of array indices. Thus, we can use the following code to select the desired chunks of `x` and `y`:

In [None]:
leny = len(ds_meta.y)
x = ds_meta.x[int(x_index[0]) : int(x_index[1]) + 1].values
y = ds_meta.y[leny - int(y_index[1]) - 1 : leny - int(y_index[0])].values

The next step is to rename the `ds` coordinates `south_north`, `west_east`, and `Time` to follow the [NetCDF Climate and Forecast (CF) Metadata Conventions](http://cfconventions.org/#:~:text=The%20CF%20conventions%20are%20increasingly%20gaining%20acceptance%20and,spatial%20and%20temporal%20properties%20of%20the%20data.%20). The CF Conventions provides a standard the way to describe data and metadata and enables easy searching across different datasets based on their attributes.

In [None]:
ds = ds.rename_dims(south_north='y', west_east='x', Time='time')

In order to enhance visualization capabilities, it would be useful to have geographical information in addition to the projected coordinates (`x` and `y` values). With latitude and longitude coordinates available, it would be easier to generate maps and plot data onto them. To create these coordinates in the **WGS84** standard, the following code can be used.

Here we start by creating a mesh grid for the `x` and `y` projected coordinates to ensure the `X` and `Y` arrays have the same size when working with the `pyproj` package. Then, using the `pyproj` packag, we define Lambert Conformal Conic projection system taht was used for the NWM datasets based on the `ds_meta` attributes. We also specify the desired output coordinate system. Finally, we use the `Transformer` function from the `pyproj` package to perform the coordinate transformation and obtain the corresponding geographic coordinates for the given projected inputs.

In [None]:
X, Y = numpy.meshgrid(x, y)

# define the input crs
wrf_proj = pyproj.Proj(proj='lcc',
                       lat_1=30.,
                       lat_2=60., 
                       lat_0=40.0000076293945, lon_0=-97., # Center point
                       a=6370000, b=6370000) 

# define the output crs
wgs_proj = pyproj.Proj(proj='latlong', datum='WGS84')

# transform X, Y into Lat, Lon
transformer = Transformer.from_crs(wrf_proj.crs, wgs_proj.crs)
lon, lat = transformer.transform(X, Y)

Add both geographical and projected coordinate values to the AORC v1.0 dataset (`ds`) by creating DataSet Coordinates. Note that the `lat` and `lon` arrays have two dimensions `(x, y)` but the `x` and `y` arrays only have one dimension.

In this case, each element in the `x` and `y` arrays represents the coordinates of a single point in the grid, while each element in the `lat` and `lon` arrays represents the latitude and longitude values for the corresponding point in the grid. The `lat` and `lon` arrays are thus two-dimensional, with dimensions `(x, y)`, because they contain the latitude and longitude values for each point in the grid.

This type of arrangement is common in geospatial data analysis, where it is often necessary to work with regularly spaced grids of data that have associated geographic coordinates. The `x` and `y` arrays provide the location of each point in the grid, while the `lat` and `lon` arrays provide the corresponding latitude and longitude values.

In [None]:
ds = ds.assign_coords(lon = (['y', 'x'], lon))
ds = ds.assign_coords(lat = (['y', 'x'], lat))
ds = ds.assign_coords(x = x)
ds = ds.assign_coords(y = y)

Follow CF conventions and add metadata for these coordinates. Note that some of these information are obtained from the variable attributes in the `ds_meta` dataset.

In [None]:
ds.x.attrs['axis'] = 'X'
ds.x.attrs['standard_name'] = 'projection_x_coordinate'
ds.x.attrs['long_name'] = 'x-coordinate in projected coordinate system'
ds.x.attrs['resolution'] = 1000.  # cell size
ds.x.attrs['units'] = 'm'

ds.y.attrs['axis'] = 'Y' 
ds.y.attrs['standard_name'] = 'projection_y_coordinate'
ds.y.attrs['long_name'] = 'y-coordinate in projected coordinate system'
ds.y.attrs['resolution'] = 1000.  # cell size
ds.y.attrs['units'] = 'm'

ds.lon.attrs['units'] = 'degrees_east'
ds.lon.attrs['standard_name'] = 'longitude' 
ds.lon.attrs['long_name'] = 'longitude'

ds.lat.attrs['units'] = 'degrees_north'
ds.lat.attrs['standard_name'] = 'latitude' 
ds.lat.attrs['long_name'] = 'latitude'

Add the WRF-Hydro Coordinate Reference System (CRS) that can be obtained from `ds_meta` attributes to the AORC v1.0 dataset. This `WKT` (well-known text representation of coordinate reference systems) string can be found within the WRF-Hydro `geo_em.d01_1km.nc` file.

In [None]:
# add crs to netcdf file
ds.rio.write_crs(ds_meta.crs.attrs['spatial_ref'], inplace=True
                ).rio.set_spatial_dims(x_dim="x",
                                       y_dim="y",
                                       inplace=True,
                                       ).rio.write_coordinate_system(inplace=True)

## Visualization

In order to validate our workflow for integrating geospatial information into the AORC v1.0 datasets, we have selected the incoming downward longwave radiation (`LWDOWN`) estimates for the initial time step from `ds`. Additionally, we have included the Great Basin watersheds shapefile using the `OWSLib` package to verify and ensure that the AORC v1.0 is correctly located in the appropriate geospatial context. The `OWSLib` package is a Python library utilized for working with Open Geospatial Consortium (OGC) web services, such as Web Map Service (WMS). It offers a range of classes and methods that enable the querying, manipulation, and visualization of geospatial data from OGC web services.

To facilitate the creation of maps and geospatial visualizations in Python, we have utilized the `cartopy` package, which provides a simple means of accomplishing this task.

In [None]:
plt.figure(figsize=(14, 14))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_global()

# WMS for GB shapefile
gb_wms = 'https://geoserver.hydroshare.org/geoserver/HS-965eab1801c342a58a463f386c9f3e9b/wms'
ax.add_wms(wms=gb_wms,
          layers=['GB_shapefile'],
          zorder=10)

# plot LWDOWN at the first timesteop
ds.isel(time=1).LWDOWN.plot(
               ax=ax, transform=ccrs.PlateCarree(), x="lon", y="lat",
               zorder=2,
               cmap='Reds')

gl = ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True,
                  linewidth=2, color='gray', alpha=0.5, linestyle='--')

ax.set_ylim([30, 45])
ax.set_xlim([-125, -105])
ax.set_aspect('equal')
ax.coastlines()

plt.show()