# CubedSphereCommunicator

Note: Some features of this notebook such as gather may require moving it to the `notebooks/` directory.

## Getting Started

This notebook is a little different from the others, because we're going to run it using MPI through ipyparallel. The `ipcluster` command will start a cluster of MPI processes. We will use `%autopx` to enable a parallel mode where the code you run in each cell runs on all ranks in the cluster, instead of being run on the notebook process (which is not in the cluster). At the end, we will use `%autopx` again to switch back to the notebook process and shut down the cluster.

It's possible that while running this notebook, you might encounter a deadlock or get into an otherwise inconsistent state. If that happens, go to the bottom two cells of this notebook starting with `%autopx` and run them, then start the notebook again from the top. If this doesn't work, you should shut down this notebook, use the console to kill the `mpiexec` process, and then restart the notebook. Alternatively you can cancel the Orion job hosting the notebook and re-submit a new job with a new notebook server instance.

So without further ado, let's start the ipcluster. Make sure the workshop directory corresponds to where you've checked out the workshop repository.

In [None]:
%%bash
if [ "${DOCKER}" != "True" ] ; then
    source ~/workshop/venv/bin/activate
fi
ipcluster start --profile=mpi -n 6 --daemonize
sleep 10  # command is asynchronous, so let's wait to avoid an error in the next cell

In [None]:
import ipyparallel as ipp
rc = ipp.Client(profile='mpi', targets='all', block=True)
dv = rc[:]
dv.activate()
dv.block = True
print("Running IPython Parallel on {0} MPI engines".format(len(rc.ids)))
print("Commands in the following cells will be executed in parallel (disable with %autopx)")
%autopx

The line above should say `%autopx enabled`. If it says `%autopx disabled`, run a new cell with only `%autopx`.

Let's confirm MPI is working and running on 6 ranks.

In [None]:
from mpi4py import MPI

comm = MPI.COMM_WORLD
mpi_size = comm.Get_size()
mpi_rank = comm.Get_rank()

print(f"Number of ranks is {mpi_size}.")
print(f"I am rank {mpi_rank}.")

Now we're ready to import the symbols we're going to use from `fv3gfs.util`, as well as other packages.

In [None]:
from fv3gfs.util import (
    TilePartitioner, CubedSpherePartitioner, CubedSphereCommunicator, Quantity,
    X_DIM, Y_DIM, Z_DIM, X_INTERFACE_DIM, Y_INTERFACE_DIM
)
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr

# we will add scatter and gather to CubedSphereCommunicator in the future,
# for now this function is a limited implementation for this workshop.
# This lets us do some plotting.
from tools import gather

Then we'll initialize a partitioner and communicator for a 1-by-1 tile layout (i.e. one rank for each tile face). We're using a reduced layout because it is easier to read the output from 6 ranks than 24.

In [None]:
# careful changing this, we only initialized 6 ranks!
# if you do explore different layouts, start a new ipcluster
# layouts bigger than (2, 2) may not work (too many processes on one node)
layout = (1, 1)
cube = CubedSphereCommunicator(
    MPI.COMM_WORLD,
    CubedSpherePartitioner(
        TilePartitioner(layout)
    ),
)

## A Quick Look

The focus of this session has been to get to the point where we can do a halo update. So let's finally do one!

We'll start by initializing a quantity on each rank filled with the rank's value. When we halo update it, we can see which halos are updated from which nearby ranks!

In [None]:
quantity = Quantity(
    np.full((6, 6), cube.rank, dtype=np.float64),
    origin=(1, 1),
    extent=(4, 4),
    dims=[X_DIM, Y_DIM],
    units="",
    gt4py_backend="numpy",
)

First let's plot the initial state of each quantity, using the same colorbar on each rank. This should show each rank's data is constant.

In [None]:
plt.figure()
# transpose is needed so the plot shows the first axis as the x axis
plt.pcolormesh(quantity.data[:].T, vmin=0, vmax=5)

Now let's do a halo update! This part is up to you. Remember that in the allocation above, we only gave the quantity one halo point. When you plot the data afterwards, you should see the halos filled with data from adjacent tiles.

In [None]:
# answer: cube.halo_update(quantity, n_points=1)

In [None]:
plt.figure()
# transpose is needed so the plot shows the first axis as the x axis
plt.pcolormesh(quantity.data[:].T, vmin=0, vmax=5)

We see (hopefully) that the corner data is unchanged (and that the rest of the halo is updated). When we're using one rank per tile face, why do we expect this to be the case?

Next, let's plot all the data from the first rank, so we can better see the relationship betwen ranks. First, we need to gather all the data onto the root rank. Notice that the other ranks do not receive data.

In [None]:
global_quantity = gather(quantity, cube)
print(global_quantity)

Now that we have all the data on one rank, you could use any approach you like to plot it on the globe. Cartopy projections can work well for this. Since Cartopy is hard to install, we've implemented a simple "flattened cube" projection below.

To plot on a cube with pcolormesh, each tile's 2D data needs to be plotted in a separate command. This also requires a value for vmin and vmax, so that the same colormap bounds are used on each process!

First, we'll generate the x and y positions on the plot for each rank.

In [None]:
def get_X_Y(shape):
    X = np.zeros([shape[0], shape[1] + 1, shape[2] + 1]) + np.arange(0, shape[1] + 1)[None, :, None]
    Y = np.zeros([shape[0], shape[1] + 1, shape[2] + 1]) + np.arange(0, shape[2] + 1)[None, None, :]
    # offset and rotate the data for each rank, with zero at the "center"
    for tile, shift_x, shift_y, n_rotations in [
        (1, 1, 0, 0), (2, 0, 1, -1), (3, 2, 0, 1), (4, -1, 0, 1), (5, 0, -1, 0)
    ]:
        X[tile, :, :] += shift_x * shape[1]
        Y[tile, :, :] += shift_y * shape[2]
        X[tile, :, :] = np.rot90(X[tile, :, :], n_rotations)
        Y[tile, :, :] = np.rot90(Y[tile, :, :], n_rotations)
    return X, Y

In [None]:
if global_quantity is not None:  # only true on the first rank
    X, Y = get_X_Y(global_quantity.extent)

Then we can use these positions with pcolormesh to plot the cube. Including an `if` statement ensures the plotting only happens on the first rank.

In [None]:
if global_quantity is not None:
    plt.figure(figsize=(9, 5.5))
    for tile in range(global_quantity.extent[0]):
        im = plt.pcolormesh(X[tile, :, :], Y[tile, :, :], global_quantity.view[tile, :, :], vmin=0, vmax=5)
    plt.colorbar(im)
    plt.show()

Let's look again at the cube image, keeping in mind rank 0 is on tile 1 (we numbered the tiles based on FV3GFS's restart file indices). Do the positions of the ranks appear as expected in your plots above?

<img src="files/images/cube.png">

Here's a wrapped function for the plotting above to be re-used.

In [None]:
def plot_global(quantity, vmin, vmax):
    # gather is written for this workshop to collapse any third
    # dimension (e.g. vertical) by selecting only the first index
    global_quantity = gather(quantity, cube)
    if global_quantity is not None:  # only on first rank
        X, Y = get_X_Y(global_quantity.extent)
        plt.figure(figsize=(9, 5.5))
        for tile in range(global_quantity.extent[0]):
            im = plt.pcolormesh(
                X[tile, :, :],
                Y[tile, :, :],
                global_quantity.view[tile, :, :],
                vmin=vmin,
                vmax=vmax,
        )
        plt.colorbar(im)
        # we don't plt.show() here in case you want to run more commands after plot_global

Now that we can perform halo updates, we'll look at some more interesting data. Here we provide a routine to load latitude and longitude data from a C48 data file. You can use sizes that evenly divide 48, like 6, 8, or 12.

These routines run in each process and load only the data needed for the current rank, which is the same as a tile for this notebook.

In [None]:
grid = xr.open_zarr("data/c48.zarr")

In [None]:
from typing import Tuple

def get_lat_lon_bounds(nx: int, ny: int, tile: int) -> Tuple[np.ndarray, np.ndarray]:
    """Return latitude and longitude on the corners of grid cells.
    
    Takes in the number of grid cells and the tile index.
    """
    assert 48 % nx == 0, "48 must be divisible by nx"
    assert 48 % ny == 0, "48 must be divisible by ny"
    lat = np.empty([nx+1, ny+1])
    lon = np.empty([nx+1, ny+1])
    for ix_new, ix_orig in enumerate(np.linspace(0, 48, nx+1, dtype=int)):
        for iy_new, iy_orig in enumerate(np.linspace(0, 48, ny+1, dtype=int)):
            lat[ix_new, iy_new] = grid["latb"][tile, iy_orig, ix_orig]
            lon[ix_new, iy_new] = grid["lonb"][tile, iy_orig, ix_orig]
    return lat, lon

def mean_of_corners(array: np.ndarray) -> np.ndarray:
    """Given values on cell corners, return the average for each cell.
    
    Assumes input arrays have dimensions [x, y] or [y, x].
    """
    return 0.25 * (array[1:, 1:] + array[:-1, 1:] + array[1:, :-1] + array[:-1, :-1])

def get_lat_lon(nx: int, ny: int, cube: CubedSphereCommunicator) -> Tuple[Quantity, Quantity]:
    lat_bounds, lon_bounds = get_lat_lon_bounds(nx, ny, cube.rank)
    lat_array = mean_of_corners(lat_bounds)  # not strictly accurate, but good enough
    lon_array = mean_of_corners(lon_bounds)
    lat = Quantity(  # no halo, so we don't need origin and extent
        lat_array,
        dims=[X_DIM, Y_DIM],
        units="degrees_east",
        gt4py_backend="numpy",
    )
    lon = Quantity(
        lon_array,
        dims=[X_DIM, Y_DIM],
        units="degrees_north",
        gt4py_backend="numpy",
    )
    return lat, lon

Let's see what the latitude and longitude look like in our flattened cube plot! Keep in mind you need to update X and Y for plotting whenever your data changes shape.

In [None]:
nx = ny = 8
lat, lon = get_lat_lon(nx, ny, cube)

In [None]:
plot_global(lat, vmin=-90, vmax=90)
plot_global(lon, vmin=0, vmax=360)
plt.show()

Exercise: We invite you to compute your own favorite function of latitude and longitude on each rank, gather it onto a single rank, and plot it! For example, for a Gaussian centered on GFDL, you could try `gaussian = np.exp(- ((lat.view[:] - 40.35)**2 + (lon.view[:] - 74.67)**2)/(2*30**2))`.

In [None]:
# Solution
gaussian = Quantity(
    np.exp(- ((lat.view[:] - 40.35)**2 + (lon.view[:] - 74.67)**2)/(2*30**2)),
    dims=[X_DIM, Y_DIM],
    units=""
)

plot_global(gaussian, vmin=0, vmax=1)

## Moving Forward

What we've done so far is the tip of the iceberg. Now we have all the tools needed to run parallel GT4py code on a cubed sphere with halo updates. Before we start, let's import gt4py and run a command needed to prevent hangs from filesystem collisions.

In [None]:
from gt4py import gtscript
import gt4py

# need to use a different gt cache on each rank, for now
# without this, your notebook and ipcluster will hang and need to be restarted
gt4py.config.cache_settings["dir_name"] = f".gt_cache_{cube.rank:0>4d}"

Now, use what you learned in the rest of the workshop to implement a routine of your choice. You can copy a stencil from earlier in this workshop such as diffusion (e.g. `laplacian` in notebook Session-A1.2) or hyperdiffusion, or you can implement a new routine altogether.

Some points to keep in mind in this section:

- we have shown mainly 2D quantities so far, but gt4py stencils currently require 3D inputs. We suggest using a `[X_DIM, Y_DIM, Z_DIM]` quantity with a length of 1 on the z-dimension. While Quantity supports arbitrary dimension ordering, gt4py does not yet do so (coming soon).
- if you implement a truly 3D stencil with more than 1 vertical level, the gather routine we have written for this workshop and are using for plotting will gather only the lowest vertical level.
- you can call multiple stencils in sequence if you wish, just remember to halo update as needed
- ensure you use the same backend and dtype for the stencil and Quantity
- The starter code after this cell will assume you write a stencil called `update` which takes in one input and updates it to the next timestep, you are free to change this and update the later cells.
- For the purposes of this exercise, you can make the approximation that the gridcell width is constant, even though it's not.

In [None]:
# Solution

backend = "numpy"
dtype=np.float64

@gtscript.function
def d2x(u):
    return (-2 * u[0, 0, 0] + u[-1, 0, 0] + u[1, 0, 0])

@gtscript.function
def d2y(u):
    return (-2 * u[0, 0, 0] + u[0, -1, 0] + u[0, 1, 0])

@gtscript.function
def lap_cube_cells(u):
    result = d2x(u) + d2y(u)
    return result

@gtscript.stencil(backend=backend)
def diffuse(u: gtscript.Field[dtype], coeff: float):
    with computation(PARALLEL), interval(...):
        du = coeff * lap_cube_cells(u)
        u = u + du

In [None]:
# starter code to initialize a uniform random Quantity
N = 12
dx = 1.0 / (N-1)
nhalo = 2
shape = (N + 2 * nhalo, N + 2 * nhalo, 1)

random = np.random.RandomState(cube.rank)
# you can keep these separately to reset to the start values
# with quantity.data[:] = start_values
start_values = random.uniform(0, 1, size=shape).astype(dtype)
quantity = Quantity(
    start_values.copy(),
    origin=(nhalo, nhalo, 0),
    extent=(N, N, 1),
    dims=[X_DIM, Y_DIM, Z_DIM],
    units="",
    gt4py_backend=backend,
)
plot_global(quantity, vmin=0, vmax=1)

In [None]:
# Solution

# reset to initial values for easy re-execution of this cell
quantity.data[:] = start_values

n_steps = 10
for i in range(n_steps):
    # halos need to be updated when we start the first timestep
    cube.halo_update(quantity, n_points=1)
    diffuse(quantity.storage, coeff=0.1)
    plot_global(quantity, vmin=0, vmax=1)
    if cube.rank == 0:
        plt.title(f"timestep {i}")

If you notice sharp boundaries at tile edges, make sure that you are halo updating at the right time. If you notice issues at corners, think hard about any special handling the corners need and whether they've been implemented correctly with regions.

In [None]:
%autopx

In [None]:
rc.shutdown(hub=True)