# Data Structures


In [1]:
import uxarray as ux
import xarray as xr



UXarray provides three data structures for interacting with unstructured grids and the data variables that reside on them:

1. **[`uxarray.Grid`](https://uxarray.readthedocs.io/en/latest/user_api/generated/uxarray.UxDataArray.html)**: Stores the grid representation (i.e. coordinates, connectivity information, etc.)
2. **[`uxarray.UxDataset`](https://uxarray.readthedocs.io/en/latest/user_api/generated/uxarray.UxDataset.html)**: One or more data variable that resided on a grid.
3. **[`uxarray.UxDataArray`](https://uxarray.readthedocs.io/en/latest/user_api/generated/uxarray.UxDataArray.html)**: A single data variable that resides on a grid 


## Grid and Data Files

When working with unstructured grid datasets, the grid definition is typically stored separately from any data variables. 

For example, the dataset used in this example is made up of two files: a single grid definition and a single data file.


```
quad-hexagon
│   grid.nc
│   data.nc
```

In [2]:
grid_path = "../../test/meshfiles/ugrid/quad-hexagon/grid.nc"
data_path = "../../test/meshfiles/ugrid/quad-hexagon/data.nc"

Additionally, there may be multiple data files that are mapped to the same unstructured grid (think climate model history files). Using our sample dataset, this may look something like this:

```
quad-hexagon
│   grid.nc
│   data1.nc
|   data2.nc
|   data3.nc
```

We can store these paths as a list (in this case we simply repeat the original data file to imitate having 4 separate data files)

In [3]:
multiple_data_paths = [data_path for i in range(3)]

## Grid

The `Grid` class is used for storing variables associated with an unstructured grid's topology. This includes dimensions, coordinates, and connectivity variables.

### Creating a Grid

The recommended way to construct a `Grid` is by using the `ux.open_grid()` method, which takes in a grid file path, detects the input grid format, and parses and encodes the provided coordinates and connectivity into the UGRID conventions. Details on supported grid formats and what variables are parsed can be found in other parts of this user guide.

In [4]:
uxgrid = ux.open_grid(grid_path)
uxgrid

<uxarray.Grid>
Original Grid Type: UGRID
Grid Dimensions:
  * n_node: 16
  * n_edge: 19
  * n_face: 4
  * n_max_face_nodes: 6
  * two: 2
  * n_nodes_per_face: (4,)
Grid Coordinates (Spherical):
  * node_lon: (16,)
  * node_lat: (16,)
  * edge_lon: (19,)
  * edge_lat: (19,)
  * face_lon: (4,)
  * face_lat: (4,)
Grid Coordinates (Cartesian):
Grid Connectivity Variables:
  * face_node_connectivity: (4, 6)
  * edge_node_connectivity: (19, 2)
Grid Descriptor Variables:

### Accessing Variables

As we saw above when printing out Grid instance, there are many variables that are associated with a single grid. In addition to the general repr, we can obtain the stored dimensions, coordinates, and connectivity variables through the following attributes.



In [5]:
uxgrid.dims

{'n_edge', 'n_face', 'n_max_face_nodes', 'n_node', 'two'}

In [6]:
uxgrid.sizes

{'n_node': 16, 'n_face': 4, 'n_edge': 19, 'n_max_face_nodes': 6, 'two': 2}

In [7]:
uxgrid.coordinates

{'edge_lat', 'edge_lon', 'face_lat', 'face_lon', 'node_lat', 'node_lon'}

In [8]:
uxgrid.connectivity

{'edge_node_connectivity', 'face_node_connectivity'}

We can access any desired quantity by either calling an attribute by the same name or by indexing a Grid like a dictionary>

In [12]:
uxgrid.node_lon

In [13]:
uxgrid['node_lon']

### Constructing Additional Variables

Looking at `Grid.connectivity` one more time, we can see that there are two available variables. 

In [11]:
uxgrid.connectivity

{'edge_node_connectivity', 'face_node_connectivity'}

These variables were the ones that were able to be parsed and encoded in the UGRID conventions from the inputted grid file. 

In addition to parsing variables, we can construct additional variables by calling the attribute or indexing the Grid with the desired name. For example, if we wanted to construct the `face_edge_connectivity`, we would do the following:

In [22]:
uxgrid.face_edge_connectivity

Now if we look at our `Grid.connectivity`, we can see that it now contains our new connectivity variable.

In [23]:
uxgrid.connectivity

{'edge_node_connectivity', 'face_edge_connectivity', 'face_node_connectivity'}

All grid variables are internally stored as a Python property. At the time the user calls the attribute (in the above example `uxgrid.face_edge_connectivity`), there is code in place to check whether the variable is present within the `Grid`. If it's avaliable, it is directly returned to the user, otherwise it is constructed. TODO


## UxDataset

Up to this point, we've exclusively looked at the unstructured grid without any data variables mapped to it. Working with a standalone `Grid` has it's applications, such as grid debugging and analysis, more commononly an unstructured grid is paired with data variables that are mapped to it.  

The `UxDataset` class is used for pairing one or more data variables with an unstructured grid. 


### Opening a Single Data File

The 

We can load a pair of grid and data files using the `ux.open_dataset()` method.

```{note}
UXarray's Plotting API is build around the [Holoviews](https://holoviews.org/) package. For details about customization and accepted parameters, pleases refer to their documentation.
```



In [None]:
uxds = ux.open_dataset(grid_path, data_path)
uxds

### Opening Multiple Data Files

When working with multiple data paths, we can open them using the `ux.open_mfdataset()` method. 

In [None]:
uxds_multi = ux.open_mfdataset(grid_path, multiple_data_paths, combine ='nested', concat_dim='time')
uxds_multi

## Grid Accessor

Each `UxDataset` (and in the next section `UxDataArray`) is linked to a `Grid` instance, which contain the unstructured grid information

In [None]:
uxds.uxgrid

## UxDataArray

While a `UxDataset` represents one or more data variables linked to some unstructured grid, a `UxDataArray` represent a single data variable. Alternatively, one can think of a `UxDataset` as a collection of one or more `UxDataArray` instances.

In our sample dataset, we have a variable called `t2m`, which can be used to index our `UxDataset`


In [None]:
uxds['t2m']

As mentioned before, each `UxDataArray` is linked to a `Grid` instance.

In [None]:
uxds['t2m'].uxgrid

This Grid is identical to the one linked to the `UxDataset`. There is a single `Grid` that is shared by all data variables.

In [None]:
uxds['t2m'].uxgrid == uxds.uxgrid

### Functionality

Just like with Xarray, we can perform various operations on our data variable.


In [None]:
uxds['t2m'].min()

In [None]:
uxds['t2m'].mean()

UXarray also provides custom data analysis operators which are explored in further sections of this user guide.

In [None]:
uxds['t2m'].gradient()

## Inheritance from Xarray

For those that are familiar with Xarray, the naming of the methods and data structures looks familiar. UXarray aims to provide a familiar experience to Xarray by inheriting the `xr.Dataset` and `xr.DataArray` objects and linking them to an instance of a `Grid` class to provide grid-aware implementations.

We can observe this inheritance by checking for subclassing.

In [None]:
issubclass(ux.UxDataset, xr.Dataset)

In [None]:
issubclass(ux.UxDataArray, xr.DataArray)

## Overloaded Methods

With subclassing, all methods are inherited by default. This means that while most will execute, their output may not be as expected. We have re-implemented and added many new methods to provide grid-aware implementations of their Xarray counterparts. 

These are discussed in detail in the next sections, but one notable example are all the plotting methods are all custom to support unstructured grid visualuzation.


In [None]:
uxds['t2m'].plot(fig_size=150, colorbar=False, backend='matplotlib')