# Data Structures


In [1]:
import uxarray as ux
import xarray as xr



UXarray provides three data structures for interacting with unstructured grids and the data variables that reside on them:

1. **ux.Grid**: Stores the grid representation (i.e. coordinates, connectivity information, etc.)
2. **ux.UxDataset**: One or more data variable that resided on a grid.
3. **ux.UxDataArray**: A single data variable that resides on a grid 


## Grid and Data Files

When working with unstructured grid datasets, the grid definition is typically stored separately from any data variables. 

For example, the dataset we're using in this example is made up of two files: 1 grid definition and 1 data file.

```
quad-hexagon
│   grid.nc
│   data.nc
```

In [23]:
grid_path = "../../test/meshfiles/ugrid/quad-hexagon/grid.nc"
data_path = "../../test/meshfiles/ugrid/quad-hexagon/data.nc"

Additionally, there may be multiple data files that are mapped to the same unstructured grid (think climate model history files). Using our sample dataset, this may look something like this:

```
quad-hexagon
│   grid.nc
│   data1.nc
|   data2.nc
|   data3.nc
|   data4.nc
```

We can store these paths as a list (in this case we simply repeat the original data file to imitate having 4 separate data files)

In [24]:
multiple_data_paths = [data_path for i in range(4)]

## Grid

UXarray's `Grid` class is used for representing unstructured grids in terms of their coordinates and connectivity information.

### Creating a Grid

The recommended way to construct a `Grid` is by using the `ux.open_grid()` method, which takes in a grid file path, detects the input grid format, and parses and encodes the provided coordinates and connectivity into the UGRID conventions. Details on supported grid formats and what variables are parsed can be found in other parts of this user guide.

In [4]:
uxgrid = ux.open_grid(grid_path)
uxgrid

<uxarray.Grid>
Original Grid Type: UGRID
Grid Dimensions:
  * n_node: 16
  * n_face: 4
  * n_edge: 19
  * two: 2
  * n_max_face_nodes: 6
  * n_nodes_per_face: (4,)
Grid Coordinates (Spherical):
  * node_lon: (16,)
  * node_lat: (16,)
  * edge_lon: (19,)
  * edge_lat: (19,)
  * face_lon: (4,)
  * face_lat: (4,)
Grid Coordinates (Cartesian):
Grid Connectivity Variables:
  * face_node_connectivity: (4, 6)
  * edge_node_connectivity: (19, 2)

### Grid Attributes

Each `Grid` contains multiple dimensions, coordinates, connectivity, and descriptor variables that represent an unstructured grid. All currently avaliable variables can be viewed by printing a `Grid`


In [5]:
uxgrid

<uxarray.Grid>
Original Grid Type: UGRID
Grid Dimensions:
  * n_node: 16
  * n_face: 4
  * n_edge: 19
  * two: 2
  * n_max_face_nodes: 6
  * n_nodes_per_face: (4,)
Grid Coordinates (Spherical):
  * node_lon: (16,)
  * node_lat: (16,)
  * edge_lon: (19,)
  * edge_lat: (19,)
  * face_lon: (4,)
  * face_lat: (4,)
Grid Coordinates (Cartesian):
Grid Connectivity Variables:
  * face_node_connectivity: (4, 6)
  * edge_node_connectivity: (19, 2)

These variables can be accessed as an attribute of the `Grid` class. 

In [6]:
uxgrid.n_node

16

In [7]:
uxgrid.face_node_connectivity

## UxDataset

Up to this point, we've exclusively looked at the unstructured grid without any data variables mapped to it. When working with one or more data variables, they are linked to a grid through a `UxDataset` instance.

When pairing data variables with an unstructured grid files, they can be opened together using the `ux.open_dataset()` method, which returns a ``UxDataset``


### Opening a Single Data File

We can load a pair of grid and data files using the `ux.open_dataset()` method.



In [8]:
uxds = ux.open_dataset(grid_path, data_path)
uxds

### Opening Multiple Data Files

When working with multiple data paths, we can open them using the `ux.open_mfdataset()` method. 

In [9]:
uxds_multi = ux.open_mfdataset(grid_path, multiple_data_paths, combine ='nested', concat_dim='time')
uxds_multi

Unnamed: 0,Array,Chunk
Bytes,160 B,16 B
Shape,"(10, 4)","(1, 4)"
Dask graph,10 chunks in 4 graph layers,10 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 160 B 16 B Shape (10, 4) (1, 4) Dask graph 10 chunks in 4 graph layers Data type float32 numpy.ndarray",4  10,

Unnamed: 0,Array,Chunk
Bytes,160 B,16 B
Shape,"(10, 4)","(1, 4)"
Dask graph,10 chunks in 4 graph layers,10 chunks in 4 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Grid Accessor

Each `UxDataset` (and in the next section `UxDataArray`) is linked to a `Grid` instance, which contain the unstructured grid information

In [10]:
uxds.uxgrid

<uxarray.Grid>
Original Grid Type: UGRID
Grid Dimensions:
  * n_node: 16
  * n_face: 4
  * n_edge: 19
  * two: 2
  * n_max_face_nodes: 6
  * n_nodes_per_face: (4,)
Grid Coordinates (Spherical):
  * node_lon: (16,)
  * node_lat: (16,)
  * edge_lon: (19,)
  * edge_lat: (19,)
  * face_lon: (4,)
  * face_lat: (4,)
Grid Coordinates (Cartesian):
Grid Connectivity Variables:
  * face_node_connectivity: (4, 6)
  * edge_node_connectivity: (19, 2)

## UxDataArray

While a `UxDataset` represents one or more data variables linked to some unstructured grid, a `UxDataArray` represent a single data variable. Alternatively, one can think of a `UxDataset` as a collection of one or more `UxDataArray` instances.

In our sample dataset, we have a variable called `t2m`, which can be used to index our `UxDataset`


In [11]:
uxds['t2m']

As mentioned before, each `UxDataArray` is linked to a `Grid` instance.

In [12]:
uxds['t2m'].uxgrid

<uxarray.Grid>
Original Grid Type: UGRID
Grid Dimensions:
  * n_node: 16
  * n_face: 4
  * n_edge: 19
  * two: 2
  * n_max_face_nodes: 6
  * n_nodes_per_face: (4,)
Grid Coordinates (Spherical):
  * node_lon: (16,)
  * node_lat: (16,)
  * edge_lon: (19,)
  * edge_lat: (19,)
  * face_lon: (4,)
  * face_lat: (4,)
Grid Coordinates (Cartesian):
Grid Connectivity Variables:
  * face_node_connectivity: (4, 6)
  * edge_node_connectivity: (19, 2)

This Grid is identical to the one linked to the `UxDataset`. There is a single `Grid` that is shared by all data variables.

Just like with Xarray, we can perform various operations on our data variable

In [13]:
uxds['t2m'].uxgrid == uxds.uxgrid

True

In [14]:
uxds['t2m'].min()

In [15]:
uxds['t2m'].mean()

UXarray also provides custom data analysis operators which are explored in further sections of this user guide

In [16]:
uxds['t2m'].gradient()

## Inheritance from Xarray

For those that are familiar with Xarray, the naming of the methods and data structures looks familiar. UXarray aims to provide a familiar experience to Xarray by inheriting the `xr.Dataset` and `xr.DataArray` objects and linking them to an instance of a `Grid` class to provide grid-aware implementations.

We can observe this inheritance by checking for subclassing.

In [17]:
issubclass(ux.UxDataset, xr.Dataset)

True

In [18]:
issubclass(ux.UxDataArray, xr.DataArray)

True

## Overloaded Methods

With subclassing, all methods are inherited by default. This means that while most will execute, their output may not be as expected. We have re-implemented and added many new methods to provide grid-aware implementations of their Xarray counterparts. 

These are discussed in detail in the next sections, but one notable example are all the plotting methods are all custom to support unstructured grid visualuzation.


In [22]:
uxds['t2m'].plot(fig_size=150, colorbar=False, backend='matplotlib')