# End-to-End Analysis and Visualization of E3SM Data using UXarray and xCDAT

E3SM Tutorial Workshop 2024

05/07/2024

Authors: [Tom Vo](https://github.com/tomvothecoder) and [Stephen Po-Chedley](https://github.com/pochedls)


## Overview

This exercise notebook will walkthrough and example end-to-end analysis workflow of E3SM
native format data using UXarray and xCDAT. It explores the capabilities of UXarray and xCDAT
at a high-level, including grid analysis and computational operations. Towards the end
there will brief coverage of parallelizing Xarray-based operations using Dask.


1. Open E3SM Data with Grid Files using UXarray
2. Run some UXarray operations (e.g., view )
3. Transform the data to make it compatible with xCDAT
4. Use xCDAT for post-processing operations
5. Visualize data
6. Advanced users: parallelizing operations (Dask Schedulers)


## Setup


In [7]:
import glob
import warnings

import numpy as np
import xarray as xr
import xcdat as xc
import uxarray as ux

# The data directory containing the NetCDF files.
# TODO: Update to perlmutter directory
data_dir = "/p/user_pub/work/E3SM/1_0/1950-Control-21yrContHiVol-HR/0_25deg_atm_18-6km_ocean/atmos/native/model-output/mon/ens1/v1/"
# The absolute paths to each NetCDF file in the data directory.
data_paths = glob.glob(data_dir + "*.nc")

# The path to the grid file.
grid_path = "/p/user_pub/e3sm/grids_maps/grids/ne120.g"

## I/O and Computations with UXarray

UXarray offers support for loading and representing unstructured grids by providing Xarray-like functionality paired with new routines that are specifically written for operating on unstructured grids.

Source: https://uxarray.readthedocs.io/en/latest/examples/001-working-with-unstructured-grids.html#


### Exercise 1. Open E3SM Dataset with Grid Files using UXarray

When working with Unstructured Grids, the grid definition and data variables are often stored as separate files. This means that there are multiple separate files that need to be read and linked together to represent the entire dataset.

A `ux.Dataset` object is an `xarray.Dataset-like`, multi-dimensional, in memory, array database. Inherits from `xarray.Dataset` and has its own unstructured grid-aware dataset operators and attributes through the uxgrid accessor.

Source: https://uxarray.readthedocs.io/en/latest/getting-started/overview.html


#### 💻 Your turn:

Use `ux.open_mfdataset()` to open the grid file and the NetCDF files as a `ux.Dataset` object.

Hint: Use `grid_path` and `data_paths` as function arguments.


In [8]:
# Your code here. When ready, click on the three dots below for the solution.

In [11]:
uxds = ux.open_mfdataset(grid_path, data_paths[0:1])

### Exercise 2: Visualize the Grid Topology

https://uxarray.readthedocs.io/en/latest/examples/006-plot-api-topology.html


#### 💻 Your turn:

Extract the grid topology from the `uxds` and plot it.

Hint: Use the `.uxgrid` attribute and call `.plot()`


In [None]:
# Your code here. When ready, click on the three dots below for the solution.

In [None]:
grid = uxds.uxgrid
grid.plot(title="Default Grid Plot Method", height=350, width=700)

### Exercise 3: Grid Calculations

TODO: https://uxarray.readthedocs.io/en/latest/examples/003-area-calc.html


#### 💻 Your turn:

Calculate the total face area for the grid.all

Hint: Use `.calculate_total_face_area()`


In [None]:
# Your code here. When ready, click on the three dots below for the solution.

In [15]:
t4_area = grid.calculate_total_face_area()
t4_area

### Exercise 4: Subsetting an Unstructured Grid

TODO: https://uxarray.readthedocs.io/en/latest/examples/009-subsetting.html#


In [23]:
uxds.TS.subset

<uxarray.UxDataArray.subset>
Supported Methods:
  * nearest_neighbor(center_coord, k, element, **kwargs)
  * bounding_circle(center_coord, r, element, **kwargs)
  * bounding_box(lon_bounds, lat_bounds, element, method, **kwargs)

In [24]:
uxds.TS.uxgrid.subset

<uxarray.Grid.subset>
Supported Methods:
  * nearest_neighbor(center_coord, k, element, **kwargs)
  * bounding_circle(center_coord, r, element, **kwargs)
  * bounding_box(lon_bounds, lat_bounds, element, method, **kwargs)

### Exercise 5: Working with MPAS Grids

TODO: https://uxarray.readthedocs.io/en/latest/examples/004-working-with-mpas-grids.html


## Computations with xCDAT

TODO:


## Check the CF attributes

xCDAT requires CF attributes to be set on axes in order for the APIs
to map to them for operations. For example, the time axis should have `axis="T"` or
`axis="standard_name`".


In [6]:
uxds.cf

Coordinates:
             CF Axes: * Z: ['ilev', 'lev']
                      * T: ['time']
                        X, Y: n/a

      CF Coordinates: * vertical: ['ilev', 'lev']
                      * time: ['time']
                        longitude, latitude: n/a

       Cell Measures:   area, volume: n/a

      Standard Names: * atmosphere_hybrid_sigma_pressure_coordinate: ['ilev', 'lev']

              Bounds:   n/a

       Grid Mappings:   n/a

Data Variables:
       Cell Measures:   area, volume: n/a

      Standard Names:   n/a

              Bounds:   T: ['time_bnds']
                        cosp_ht: ['cosp_ht_bnds']
                        cosp_htmisr: ['cosp_htmisr_bnds']
                        cosp_prs: ['cosp_prs_bnds']
                        cosp_sr: ['cosp_sr_bnds']
                        cosp_tau: ['cosp_tau_bnds']
                        cosp_tau_modis: ['cosp_tau_modis_bnds']
                        time: ['time_bnds']

       Grid Mappings:   n/a

## Decode the time coordinates

- `ux.open_dataset()` and `ux.open_mfdataset()` set `decode_times=False`.
- An extra call to `xr.decode_cf()` is required to decode time coordinates before
  using xCDAT temporal averaging APIs.

> NOTICE: This will convert a `uxarray.UxDataset` to an `xarray.Dataset`


In [53]:
uxds_cf = xr.decode_cf(uxds, decode_times=True, use_cftime=True)

In [24]:
uxds_cf["T"].time

### Exercise 1 - Temporal Averaging


In [25]:
uxds_cf.temporal.average("T")["T"]

Unnamed: 0,Array,Chunk
Bytes,427.15 MiB,427.15 MiB
Shape,"(72, 777602)","(72, 777602)"
Dask graph,1 chunks in 19 graph layers,1 chunks in 19 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 427.15 MiB 427.15 MiB Shape (72, 777602) (72, 777602) Dask graph 1 chunks in 19 graph layers Data type float64 numpy.ndarray",777602  72,

Unnamed: 0,Array,Chunk
Bytes,427.15 MiB,427.15 MiB
Shape,"(72, 777602)","(72, 777602)"
Dask graph,1 chunks in 19 graph layers,1 chunks in 19 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [26]:
uxds.temporal.group_average("T", freq="day")["T"]

Unnamed: 0,Array,Chunk
Bytes,854.30 MiB,427.15 MiB
Shape,"(2, 72, 777602)","(1, 72, 777602)"
Dask graph,2 chunks in 36 graph layers,2 chunks in 36 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 854.30 MiB 427.15 MiB Shape (2, 72, 777602) (1, 72, 777602) Dask graph 2 chunks in 36 graph layers Data type float64 numpy.ndarray",777602  72  2,

Unnamed: 0,Array,Chunk
Bytes,854.30 MiB,427.15 MiB
Shape,"(2, 72, 777602)","(1, 72, 777602)"
Dask graph,2 chunks in 36 graph layers,2 chunks in 36 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [27]:
uxds_cf.temporal.climatology("T", freq="day")["T"]

Unnamed: 0,Array,Chunk
Bytes,427.15 MiB,427.15 MiB
Shape,"(1, 72, 777602)","(1, 72, 777602)"
Dask graph,1 chunks in 26 graph layers,1 chunks in 26 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 427.15 MiB 427.15 MiB Shape (1, 72, 777602) (1, 72, 777602) Dask graph 1 chunks in 26 graph layers Data type float64 numpy.ndarray",777602  72  1,

Unnamed: 0,Array,Chunk
Bytes,427.15 MiB,427.15 MiB
Shape,"(1, 72, 777602)","(1, 72, 777602)"
Dask graph,1 chunks in 26 graph layers,1 chunks in 26 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [28]:
uxds_cf.temporal.departures("T", freq="day")["T"]

Unnamed: 0,Array,Chunk
Bytes,854.30 MiB,427.15 MiB
Shape,"(2, 72, 777602)","(1, 72, 777602)"
Dask graph,2 chunks in 60 graph layers,2 chunks in 60 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 854.30 MiB 427.15 MiB Shape (2, 72, 777602) (1, 72, 777602) Dask graph 2 chunks in 60 graph layers Data type float64 numpy.ndarray",777602  72  2,

Unnamed: 0,Array,Chunk
Bytes,854.30 MiB,427.15 MiB
Shape,"(2, 72, 777602)","(1, 72, 777602)"
Dask graph,2 chunks in 60 graph layers,2 chunks in 60 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


### Convert UXarray unstructured dataset to Xarray structured dataset for xCDAT spatial averaging

TODO:

Related post: https://github.com/xCDAT/xcdat/issues/89


In [29]:
series = {
    "lat": uxds["lat"],
    "lon": uxds["lon"],
    "time": uxds["time"],
    "TREFHT": uxds["TREFHT"].values.flatten(),
}

In [30]:
import pandas as pd

pd.DataFrame(series).set_index(["lat", "lon", "time"]).to_xarray()

ValueError: All arrays must be of the same length

In [36]:
uxds["TREFHT"].set_index(lat=uxds["lat"])

TypeError: unhashable type: 'UxDataArray'