# UGRID Conventions

UGRID Conventions are used for storing unstructured (or flexible mesh) model data in the Unidata Network Common Data Form (NetCDF) file. See the documentation [here](https://ugrid-conventions.github.io/ugrid-conventions/) and a working example of formatting Python data with these conventions [here](https://github.com/ugrid-conventions/ugrid-conventions/issues/48). 

In this notebook, we read unstructured mesh model output from an HDF5 file to an xarray following UGRID conventions. 

In [1]:
import pandas as pd
import numpy as np
import h5py
import xarray as xr
import os
import sys
import datetime

# Open Results File with h5py library

[h5py](https://docs.h5py.org) is the most capable Python library for dealing with HDF5 files, with a low-level API that closely follows the HDF5 C API, and with a high-level API which offers the main features of HDF5 in an interface modelled on dictionaries and NumPy arrays.

- Demo: https://nbviewer.jupyter.org/github/jackdbd/hdf5-pydata-munich/blob/master/hdf5_in_python.ipynb
- Docs: 
  - https://docs.h5py.org
  - File Objects: https://docs.h5py.org/en/stable/high/file.html#opening-creating-files
  - Groups: https://docs.h5py.org/en/stable/high/group.html
  - Datasets: https://docs.h5py.org/en/stable/high/dataset.html
- Refs:
  - File Objects: https://docs.h5py.org/en/stable/high/file.html#reference

In [2]:
print(os.listdir())
fpath = '../tests/input_files/Muncie.p04.hdf'

h5py_file = h5py.File(fpath,
                      mode='r',  # Readonly, file must exist (default)
                     )

['.virtual_documents', 'File_Conversion.ipynb', 'HDF_Exploration.ipynb', 'HDF_Plotting.ipynb', 'model', 'Sparse_Matrix_Framework.ipynb', 'ugrid-example.nc']


## Read Geometry Data 

In [3]:
def get_project_name(inp_file):
    return inp_file['Geometry/2D Flow Areas/Attributes'][()][0][0].decode('UTF-8')

In [4]:
project_name = get_project_name(h5py_file)
print(project_name)
h5py_file.close()

2D Interior Area


In [21]:
# src_path = os.path.join('..', 'src', 'riverine', 'ras2d')
# sys.path.insert(0, src_path)
# import RAS2D

class RAS_HDF5:
    '''
    Read HEC-RAS 2D geometry and variables and return as a dictionary
    '''

    def __init__(self, hdf5_file_path: str, variables: list = []):

        self.variables = {}
        self.hdf5_file_path = hdf5_file_path
        self.results = {}
        self.geometry = {}



    def read(self):
        with h5py.File(self.hdf5_file_path, 'r') as infile:
            '''
            Read the Geometry data
            '''

            project_name = infile['Geometry/2D Flow Areas/Attributes'][()][0][0].decode('UTF-8')

            # For the Muncie data set: max value: 5773, shape(5765, 7)
            self.geometry['elements_array'] = infile[f'Geometry/2D Flow Areas/{project_name}/Cells FacePoint Indexes'][()]
            # For the Muncie data set: shape(5774, 2)
            self.geometry['nodes_array'] = infile[f'Geometry/2D Flow Areas/{project_name}/FacePoints Coordinate'][()]
            self.geometry['faces_cell_indexes'] = infile[f'Geometry/2D Flow Areas/{project_name}/Faces Cell Indexes'][()]
            self.geometry['cells_surface_area'] = infile[f'Geometry/2D Flow Areas/{project_name}/Cells Surface Area'][()]
            self.geometry['faces_normal_unit_vector_and_length'] = infile[f'Geometry/2D Flow Areas/{project_name}/Faces NormalUnitVector and Length'][()]
            self.geometry['cells_center_coordinate'] = infile[f'Geometry/2D Flow Areas/{project_name}/Cells Center Coordinate'][()]
            # faces_area_elevation_values = infile['Geometry/2D Flow Areas/2D Interior Area/Faces Area Elevation Values'][()]

            self.geometry['face_length'] = self.geometry['faces_normal_unit_vector_and_length'][:,2]
            
            self.geometry['face_facepoint_connectivity'] = infile[f'Geometry/2D Flow Areas/{project_name}/Faces FacePoint Indexes'][()]


            '''
            Read the Results data
            '''
            self.results['depth'] = infile[f'Results/Unsteady/Output/Output Blocks/Base Output/Unsteady Time Series/2D Flow Areas/{project_name}/Depth'][()]

            '''
            NOTE:
            The node velocities (Node X Vel and Node Y vel) are not automatically written to the HDF output file. 
            Have to opt into printing them to HDF: https://www.hec.usace.army.mil/software/hec-ras/documentation/HEC-RAS%205.0%202D%20Modeling%20Users%20Manual.pdf
            How to handle? Try/except? Do we need them?
            '''

            self.results['node_x_velocity'] = infile[f'Results/Unsteady/Output/Output Blocks/Base Output/Unsteady Time Series/2D Flow Areas/{project_name}/Node X Vel'][()]
            self.results['node_y_velocity'] = infile[f'Results/Unsteady/Output/Output Blocks/Base Output/Unsteady Time Series/2D Flow Areas/{project_name}/Node Y Vel'][()]
            self.results['face_velocity'] = infile[f'Results/Unsteady/Output/Output Blocks/Base Output/Unsteady Time Series/2D Flow Areas/{project_name}/Face Velocity'][()]
            self.results['face_q'] = infile[f'Results/Unsteady/Output/Output Blocks/Base Output/Unsteady Time Series/2D Flow Areas/{project_name}/Face Q'][()]
            self.results['node_speed'] = np.sqrt(self.results['node_x_velocity']**2 + self.results['node_y_velocity']**2)

            
            time_stamps_binary = infile['Results/Unsteady/Output/Output Blocks/Base Output/Unsteady Time Series/Time Date Stamp'][()]

            # Read the specified variables, if any
            for variable in self.variables:
                data_path = f'Results/Unsteady/Output/Output Blocks/Base Output/Unsteady Time Series/2D Flow Areas/{project_name}/{variable}'
                self.results['variable'] = infile[data_path]

        # Convert from binary strings to utf8 strings
        time_stamps = [x.decode("utf8") for x in time_stamps_binary]
        self.results['dates'] = [datetime.datetime.strptime(x, '%d%b%Y %H:%M:%S') for x in time_stamps] # '02JAN1900 22:55:00'

        # Convert all lists to numpy arrays
        for key, value in self.geometry.items():
            self.geometry[key] = np.array(value)
        for key, value in self.results.items():
            self.results[key] = np.array(value)

In [6]:
%%time
ras2d_data = RAS_HDF5(fpath, variables=[])
ras2d_data.read()

CPU times: total: 344 ms
Wall time: 471 ms


In [7]:
ras2d_data.geometry.keys()

dict_keys(['elements_array', 'nodes_array', 'faces_cell_indexes', 'cells_surface_area', 'faces_normal_unit_vector_and_length', 'cells_center_coordinate', 'face_length', 'face_facepoint_connectivity'])

In [8]:
ras2d_data.results['face_velocity'].min()

-5.692632

## Geometry Data to Xarray
Some notes on naming conventions:

| RAS Output  | UGRID       |
| ----------- | ----------- |
| Facepoint   | Node        |
| Face        | Edge        |
| Cell        | Face        |

Very confusing! 

This is a 2D flexible mesh (mixed triangles, quadrilaterals, etc.), which can have the following attributes:

In [9]:
out = xr.Dataset()

out["mesh2d"] = xr.DataArray(
    data=0,
    attrs={
        # required topology attributes
        'cf_role': 'mesh_topology',
        'long_name': 'Topology data of 2D mesh',
        'topology_dimension': 2,
        'node_coordinates': 'node_x node_y',
        'face_node_connectivity': 'face_nodes',
        # optionally required attributes
        'face_dimension': 'face',
        'edge_node_connectivity': 'edge_nodes',
        'edge_dimension': 'edge',
        # optional attributes 
        'face_edge_connectivity': 'face_edges',
        'face_face_connectivity': 'face_face_connectivity',
        'edge_face_connectivity': 'edge_face_connectivity',
        'boundary_node_connectivity': 'boundary_node_connectivity',
        'face_coordinates': 'face x face_y',
        'edge_coordinates': 'edge_x edge_y',
    }
)

Start by filling in the coordinates of the facepoints (aka nodes in UGRID).

In [10]:
out = out.assign_coords(
    node_x=xr.DataArray(
        data=[f[0] for f in ras2d_data.geometry['nodes_array']],
        dims=("node",),
    )
)


out = out.assign_coords(
    node_y=xr.DataArray(
        data=[f[1] for f in ras2d_data.geometry['nodes_array']],
        dims=("node",),
    )
)

out = out.assign_coords(
    time=xr.DataArray(
        data=ras2d_data.results['dates'],
        dims=("time",),
            )
        )


The attribute `face_node_connectivity` points to an index variable identifying for every face the indices of its corner nodes. The corner nodes should be specified in anticlockwise direction as viewed from above (consistent with the CF-convention for bounds of p-sided cells). The connectivity array will be a matrix of size nFaces x MaxNumNodesPerFace; if a face has less corner nodes than MaxNumNodesPerFace then the last node indices shall be equal to _FillValue (which should obviously be larger than the number of nodes in the mesh). 

In [11]:
out["face_nodes"] = xr.DataArray(
    data=ras2d_data.geometry['elements_array'],
    coords={
        "face_x": ("nface", [f[0] for f in ras2d_data.geometry['cells_center_coordinate']]),
        "face_y": ("nface", [f[1] for f in ras2d_data.geometry['cells_center_coordinate']]),
    },
    dims=("nface", "nmax_face"),
    attrs={
        'cf_role': 'face_node_connectivity',
        'long_name': 'Vertex nodes of mesh faces (counterclockwise)',
        'start_index': 0, 
        '_FillValue': -1
    })

`edge_node_connectivity` attribute maps edges to nodes. Although the face to node mapping implicitly also defines the location of the edges, it does not specify the global numbering of the edges. Again the indexing convention of edge_node_connectivity should be specified using the start_index attribute to the index variable (i.e. Mesh2_edge_nodes in the example below) and 0-based indexing is the default. Since it does not apply to edges globally, specifying the boundary_node_connectivity attribute described below does not (in and of itself) necessitate the need to specify the edge_node_connectivity attribute too.

In [12]:
out["edge_nodes"] = xr.DataArray(
    data=ras2d_data.geometry['face_facepoint_connectivity'],
    dims=("nedge", '2'),
    attrs={
        'cf_role': 'edge_node_connectivity',
        'long_name': 'Vertex nodes of mesh edges',
        'start_index': 0
    })

`edge_face_connectivity` points to an index variable identifying all faces that share the same edge, i. e. are neighbors to an edge. This connectivity array is thus a matrix of size (# of edges) x 2. It is intended to be used in combination with data defined on edges. The start_index attribute should be used to specify the indexing convention and 0-based indexing is the default. Attribute _FillValue must be present. Missing neighbor faces are expressed using _FillValue, e.g for edges at the boundary with only one neighbor face present. For details see definition of variable Mesh2_edge_face_links below. **Note** do not have fill value. 

In [13]:
out["edge_face_connectivity"] = xr.DataArray(
    data=ras2d_data.geometry['faces_cell_indexes'],
    dims=("nedge", '2'),
    attrs={
        'cf_role': 'edge_face_connectivity',
        'long_name': 'neighbor faces for edges',
        'start_index': 0
    })

## Store results

For discussion: vocab to use.

In [32]:
out["depth"] = xr.DataArray(
    data=ras2d_data.results['depth'],
    dims=("time", 'nface'),
    attrs={
        'units':'feet' # will need to update units based on prj file
    })

out["faces_surface_area"] = xr.DataArray(
    data = ras2d_data.geometry['cells_surface_area'],
    dims = ("nface"), 
    attrs={
        'units': 'square feet' # will need to update units based on prj file
})

out["edge_length"] = xr.DataArray(
    data = ras2d_data.geometry['face_length'],
    dims = ("nedge"), 
    attrs={
        'units': 'feet' # will need to update units based on prj file
})


out["edge_velocity"] = xr.DataArray(
    data=ras2d_data.results['face_velocity'],
    dims=("time", 'nedge'),
    attrs={
        'units':'feet per second' # will need to update units based on prj file
    })



In [61]:
import math

In [62]:
d = np.zeros(len(out['nedge']))
for i in range(len(out['edge_face_connectivity'])):
    val = out['edge_face_connectivity'][i]
    x1 = out['face_x'][val[0]]
    x2 = out['face_x'][val[1]]

    y1 = out['face_y'][val[0]]
    y2 = out['face_y'][val[1]]
    
    d = math.dist((x1, y1), (x2, y2))




In [90]:
f1 = out['edge_face_connectivity'].T[0]
f2 = out['edge_face_connectivity'].T[1]

x1_coords = out['face_x'][f1]
y1_coords = out['face_y'][f1]
x2_coords = out['face_x'][f2]
y2_coords = out['face_y'][f2]

dist_data = np.sqrt((x1_coords - x2_coords)**2 + (y1_coords - y2_coords)**2)



In [102]:
out["face_to_face_dist"] = xr.DataArray(
    data = dist_data,
    dims = ("nedge"), 
    attrs={
        'units': 'feet' # will need to update units based on prj file
})

Units are saved in the ".prj" file - we will need this! This will allow us to store attributes for each variable, whether they are feet/feet per second or meters/meters per second: https://www.kleinschmidtgroup.com/ras-post/hec-ras-file-types/

## Save to NetCDF

In [103]:
out.attrs = {'Conventions': 'CF-1.8 UGRID-1.0 Deltares-0.10'}
out.to_netcdf("ugrid-example.nc")

## Save to Zarr

[Zarr](http://zarr.readthedocs.io/) is a chunked, compressed, N-dimensional data array file format designed for performance in the cloud:
- Zarr is preferred by Pangeo for cloud applications: https://pangeo.io/data.html#data-in-the-cloud
- Guide:
  - http://xarray.pydata.org/en/stable/user-guide/io.html#zarr
- Refs:
  - http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_zarr.html#xarray.Dataset.to_zarr

In [104]:
%%time

# Save to Zarr
out.to_zarr('ugrid-example.zarr',
           mode='w',
           consolidated=True,  # http://xarray.pydata.org/en/stable/user-guide/io.html#consolidated-metadata
          )

CPU times: total: 359 ms
Wall time: 681 ms


<xarray.backends.zarr.ZarrStore at 0x278379bfc80>