# GT4Py Tutorial : Stencil Basics

### Introduction

This notebook will show how to create a simple GT4Py stencil that copies data from one variable to another.

### Notebook Requirements

- Python v3.11.x to v3.12.x
- [NOAA/NASA Domain Specific Language Middleware](https://github.com/NOAA-GFDL/NDSL)
- `ipykernel==6.1.0`
- [`ipython_genutils`](https://pypi.org/project/ipython_genutils/)

### Quick GT4Py (Cartesian version) Overview

GT4Py enables a developer to write code using a Domain Specific Language (DSL) implemented using Python syntax.  Performance is achieved when the GT4Py Python code is translated and compiled into a lower level language such as C++ and CUDA, enabling the codebase to execute on a multitude of architectures.  In this notebook, we will cover the basics of GT4Py and teach the developer the intracies of the DSL. Additional information about GT4Py can be found at the [GT4Py site](https://gridtools.github.io/gt4py/latest/index.html).

### GT4Py Parallel/Execution Model

Within a 3-dimensional domain, GT4Py considers computations in two parts.  If we assume an `(I,J,K)` coordinate system as a reference, GT4Py separates computations in the Horizontal (IJ) spatial plane and Vertical (K) spatial interval.  In the Horizontal spatial plane, computations are implicitly executed in parallel, which also means that there is no assumed order to the calculations within the plane.  In the Vertical spatial interval, comptuations are specified by an iteration policy that will be discussed through examples.

Another thing to note is that the computations are executed sequentially in the order they appear in code.

### Copy Stencil example

To demonstrate how to implement a GT4Py stencil, we'll step through an example that copies the values of one array into another array.  First, we import several packages. 

In [None]:
from gt4py.cartesian.gtscript import PARALLEL, computation, interval, stencil
from ndsl.dsl.typing import FloatField
import gt4py.storage as gt_storage
import numpy as np
from boilerplate import plot_field_at_k0, plot_field_at_kN

As we walk through the example, we'll highlight different terms and such from the imports.  Let's first define, in GT4Py terms, two arrays of size 5 by 5 by 1 (dimensionally `I` by `J` by `K`).  These arrays are defined using the `gt_storage` allocator, which creates arrays similar to NumPy and provide similar routines like `ones`, `zeros`, `full` and `empty`.  The `gt_storage` array will be created using two different functions: `.zeros` and `.from_array`.  There also is a `.from_array` function that lets the user define a `numpy` array whose data can be passed into a `gt_storage` array.

A `gt_storage` functions can take several parameters:

- `backend` tells GT4Py how to optimally lay out the array for a particular architecture.  In this example, we will use the `numpy` backend since it's the easiest for debugging and testing purposes.  

- `data` contains the numpy array that gets passed into a `gt_storage`

- `dtype` is the data type

- `shape` contains the array shape that is passed as a tuple.

In [None]:
backend = 'numpy'

nx = 5
ny = 5
nz = 2

shape = (nx, ny, nz)

qty_out = gt_storage.zeros(
    backend=backend,
    dtype=float,
    shape=shape,
)

arr = np.indices(shape).sum(axis=0)  # Value of each entry is sum of the I and J index at each point
qty_in = gt_storage.from_array(
    data=arr,
    backend=backend,
    dtype=float,
)

We will next create a simple stencil that copies values from one `gt_storage` to another.  A GT4Py stencil will look like a Python subroutine or function except that it uses specific GT4Py functionalities.

In [None]:
@stencil(backend=backend)
def copy_stencil(input_field: FloatField,
                 output_field: FloatField):
    with computation(PARALLEL), interval(...):
        output_field = input_field

We see that this stencil does not contain any iterative loops typically associated with copying arrays.  As mentioned above in the notebook, GT4Py has a particular computation policy that implicitly executes in parallel within an `IJ` plane and is user defined in the `K` interval.  This execution policy in the `K` interval is dictated by the `computation` and `interval` keywords, and in this example, the line `with computation(PARALLEL), interval(...)` defines the `K` interval execution.  

- `with computation(PARALLEL)` means that there's no order preference to executing the `K` intervals, which means that multiple `K` interval can be computed in parallel to potentially gain performace if computational resources are available.

- `interval(...)` means that the entire `K` interval is executed.  Instead of `(...)`, more specific intervals can be specified using a tuple of integers.  For example... 

    - `interval(0,2)` : The interval `K` = 0 to 1 is executed 
    - `interval(0,-1)` : The interval `K` = 0 to N-2 (where N is the size of `K`) is executed

The decorator `@stencil(backend=backend)` (Note: `stencil` comes from the package `gt4py.cartesian.gtscript`) converts `copy_stencil` to use the specified `backend` to create the "executable" code.

Note that the input and output parameters to `copy_stencil` are type `FloatField`, which is essentially a 3-dimensional `gt_storage` array of `float` types.

`plot_field_at_k0` plots the values of the `IJ` plane at `K = 0`.  As we can see in the plots below, `copy_stencil` can copy the values from `qty_in` into `qty_out`.

In [None]:
print("Plotting values of qty_in")
plot_field_at_kN(qty_in)
print("Plotting values of qty_out")
plot_field_at_kN(qty_out)
print("Executing `copy_stencil`")
copy_stencil(qty_in, qty_out)
print("Plotting qty_out from `copy_stencil`")
plot_field_at_kN(qty_out)
plot_field_at_kN(qty_out,1)

### Choosing subsets (or offsets) to perform stencil calculations

GT4Py also allows a subset of the IJ plane to be executed in a fashion similar to the effect of `interval(...)` in the K interval.  This is done by setting the `origin` and `domain`.

- `origin` : This specifies the "starting" coordinate in the IJ plane from which the stencil will start its calculation.  

- `domain` : This specifies the range of the stencil computation (Note: I may need to check whether this affects `interval()`)

If these two parameters are not set, the stencil by default will iterate over the entire input domain.

In [None]:
import matplotlib.pyplot as plt

qty_out = gt_storage.zeros(
    backend=backend,
    dtype=float,
    shape=shape,
)

print("Plotting values of qty_in")
plot_field_at_kN(qty_in)
print("Plotting values of qty_out")
plot_field_at_kN(qty_out)
print("Executing `copy_stencil`")
copy_stencil(qty_in, qty_out,origin=(1,0,0))
print("Plotting qty_out from `copy_stencil` with origin=(1,0,0)")
plot_field_at_kN(qty_out)

qty_out = gt_storage.zeros(
    backend=backend,
    dtype=float,
    shape=shape,
)

print("Resetting qty_out to zero...")
print("Plotting values of qty_out")
plot_field_at_kN(qty_out)
print("Executing `copy_stencil`")
copy_stencil(qty_in, qty_out,origin=(0,1,0))
print("Plotting qty_out from `copy_stencil` with origin=(0,1,0)")
plot_field_at_kN(qty_out)

qty_out = gt_storage.zeros(
    backend=backend,
    dtype=float,
    shape=shape,
)

print("Resetting qty_out to zero...")
print("Plotting values of qty_out")
plot_field_at_kN(qty_out)
print("Executing `copy_stencil`")
copy_stencil(qty_in, qty_out,origin=(0,0,1))
print("Plotting qty_out from `copy_stencil` with origin=(0,0,1)")
plot_field_at_kN(qty_out)

qty_out = gt_storage.zeros(
    backend=backend,
    dtype=float,
    shape=shape,
)

print("Plotting values of qty_in")
plot_field_at_kN(qty_in)
print("Plotting values of qty_out")
plot_field_at_kN(qty_out)
print("Executing `copy_stencil`")
copy_stencil(qty_in, qty_out, domain=(2,2,nz))
print("Plotting qty_out from `copy_stencil` with domain = (2,2,:)")
plot_field_at_kN(qty_out)

### `FORWARD` and `BACKWARD` `computation` keywords

Besides `PARALLEL`, the developer can specify `FORWARD` or `BACKWARD` as the interation policy in `K`.  Essentially, the `FORWARD` policy has `K` iterating consecutively starting from the lowest vertical index to the highest, while the `BACKWARD` policy performs the reverse.  The following examples demonstrate the use of these two iteration policies.

In [None]:
from gt4py.cartesian.gtscript import FORWARD, BACKWARD

nx = 5
ny = 5
nz = 5
nhalo = 1
backend="numpy"

shape = (nx + 2 * nhalo, ny + 2 * nhalo, nz)

qty_out = gt_storage.zeros(
    backend=backend,
    dtype=float,
    shape=shape,
)

arr = np.indices(shape).sum(axis=0)  # Value of each entry is sum of the I and J index at each point
qty_in = gt_storage.from_array(
    data=arr,
    backend=backend,
    dtype=float,
)

plot_field_at_kN(qty_in,0)
plot_field_at_kN(qty_out,0)

copy_stencil(qty_in,qty_out,origin=(nhalo,nhalo,0),domain=(nx,ny,5))
plot_field_at_kN(qty_out,0)
plot_field_at_kN(qty_out,1)

@stencil(backend=backend)
def mult_upward(qty_in: FloatField):
    with computation(FORWARD), interval(...):
        qty_in = qty_in[0,0,-1] * 2.0

mult_upward(qty_out, origin=(nhalo,nhalo,1), domain=(nx,ny,1))
plot_field_at_kN(qty_out,0)
plot_field_at_kN(qty_out,1)
plot_field_at_kN(qty_out,2)

@stencil(backend=backend)
def copy_downward(qty_in: FloatField):
    with computation(BACKWARD), interval(...):
        qty_in = qty_in[0,0,1]

copy_stencil(qty_out, qty_in)

print("***")
plot_field_at_kN(qty_out,0)
plot_field_at_kN(qty_out,1)
plot_field_at_kN(qty_out,2)
plot_field_at_kN(qty_out,3)
plot_field_at_kN(qty_out,4)

copy_downward(qty_in, origin=(1,1,0), domain=(nx,ny,nz-1))
print("***")
plot_field_at_kN(qty_in,0)
plot_field_at_kN(qty_in,1)
plot_field_at_kN(qty_in,2)

### `if/else` statements

In [None]:
qty_out = gt_storage.zeros(
    backend=backend,
    dtype=float,
    shape=shape,
)

arr = np.indices(shape).sum(axis=0)  # Value of each entry is sum of the I and J index at each point
qty_in = gt_storage.from_array(
    data=arr,
    backend=backend,
    dtype=float,
)

plot_field_at_kN(qty_in,0)
plot_field_at_kN(qty_out,0)

copy_stencil(qty_in,qty_out,origin=(nhalo,nhalo,0),domain=(nx,ny,5))
plot_field_at_kN(qty_out,0)
plot_field_at_kN(qty_out,1)

@stencil(backend=backend)
def stencil_if_zero(in_out_field: FloatField):
    with computation(PARALLEL), interval(...):
        if in_out_field == 0.0:
            in_out_field = 30
        else:
            in_out_field = 10

stencil_if_zero(qty_out)
plot_field_at_kN(qty_out,0)
plot_field_at_kN(qty_out,1)

### `while` statements

### Function calls

GT4Py also has the capability to create functions in order to better organize code.  The main difference between a function and a GT4Py stencil is that a function cannot contain calls to `computation` and `interval`.  However, the indexing to arrays is the same as in a stencil.

Functions in GT4Py can be created by using the decorator `function` (Note: `function` originates from the package `gt4py.cartesian.gtscript.`).

In [None]:
from gt4py.cartesian.gtscript import function

@function
def plus_one(field: FloatField):
   return field[0, 0, 0] + 1

@stencil(backend=backend)
def field_plus_one(source: FloatField, target: FloatField):
   with computation(PARALLEL), interval(...):
       target = plus_one(source)

nx = 5
ny = 5
nz = 5
nhalo = 1
backend="numpy"

shape = (nx + 2 * nhalo, ny + 2 * nhalo, nz)

qty_out = gt_storage.zeros(
    backend=backend,
    dtype=float,
    shape=shape,
)

arr = np.indices(shape).sum(axis=0)  # Value of each entry is sum of the I and J index at each point
qty_in = gt_storage.from_array(
    data=arr,
    backend=backend,
    dtype=float,
)

plot_field_at_kN(qty_in)
plot_field_at_kN(qty_out)

field_plus_one(qty_in, qty_out)

plot_field_at_kN(qty_out)