# Examples of Using the gt4py Storages and Calling Stencils with Custom Data Containers 

## State of The implementation

In its current state, this branch is not yet a full implementation of the GDP-3.Noteable gaps are

* Storages are still assumed to be in IJK order when passed to stencils, since the generated code is not yet able to flexibly handle nonstandard layouts and this functionality would be very limited.
* As a consequence of the last point, the xarray `__gt_data_interface__` property has not been implemented yet, since without the "dims", this functionality would be very restrictive on how the DataArray can be used. 
* The raising of errors for incompatible or inconsistent parameters is in largely untested. Given valid parameters, this should be working, though.


In [1]:
from types import SimpleNamespace

import numpy as np
import cupy as cp

from gt4py.gtscript import stencil, Field 
import gt4py.storage as gt_storage

In [2]:
def copy_stencil(field_a: Field[np.float64], field_b: Field[np.float64]):
    with computation(PARALLEL), interval(...):
        field_b = field_a[0, 0, 0]
        
copy_stencil_debug = stencil(definition=copy_stencil, backend="debug")
copy_stencil_gtcuda = stencil(definition=copy_stencil, backend="gtcuda")

## Creating Storages

Storages can be constructed through the functions `empty`, `zeros`, `ones`, `full` with the parameters specified in GDP-3

In [3]:
in_f = gt_storage.full((3, 3, 3), 3.0, defaults="gtmc")
out_f = gt_storage.full((5, 5, 5), 3.0, defaults="gtmc", halo=(1, 1, 1))

Creating by wrapping an existing buffer can be achieved through `gt_storage.as_storage` 

In [4]:
array = np.zeros((3,3,3), dtype=np.int32)
storage = gt_storage.as_storage(array)
storage[1,1,1] = 1
print(array)

[[[0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 1 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]]]


Creating from existing buffer with copying remains possible through `gt_storage.storage`

In [5]:
array = np.zeros((3,3,3), dtype=np.int32)
storage = gt_storage.storage(array)
storage[1,1,1] = 2
print(array)

[[[0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]]]


## Buffers as `ndarray`

The storages do not support ufuncs, i.e. additions, comparisons, reductions etc. To perform those they need to be converted e.g. to numpy or cupy arrays, there are a number of ways to achieve this. 

If no copy is desired, either `np.asarray` (`cp.asarray`) as well as the method `as_numpy` (`as_cupy`) will do the trick, if any of the buffers in the storage can be used on cpu (gpu).

In [6]:
cpu_only_storage = gt_storage.ones((3, 3, 3), device="cpu", managed=False)
gpu_only_storage = gt_storage.ones((3, 3, 3), device="gpu", managed=False)

cp_array = cp.asarray(gpu_only_storage)
np_array = np.asarray(cpu_only_storage)

cpu_only_storage[1,1,1] = 2
gpu_only_storage[1,1,1] = 3

print(cp_array)
print(np_array)

[[[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 3. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]]
[[[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 2. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]]


Be wary of using `np.asarray` on a storage without cpu buffer, since it will return a scalar ndarray with "object" dtype, which is likely not what you want. (A similar problem applies in the other direction, since cupy triggers a copy.)

Depending on the use case, a platform portable option may be to use `cp.asnumpy` (`cp.asarray`), which will only do a copy when moving from gpu to cpu (cpu to gpu). This will however prevent changes to the resulting buffer to be applied to the storage as well, if the storage does not contain a CPU (GPU) buffer itself. 

In [7]:
#probably not what you want:
np.asarray(gpu_only_storage) # results in 0-d "object" array

#possibly, depending on context:
cp.asnumpy(gpu_only_storage) # triggers copy
cp.asarray(cpu_only_storage) # triggers copy


array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 2., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]])

If a copy is acceptable or desired, either `np.array` (`cp.array`) as well as the method `to_numpy` (`to_cupy`) will do the trick, if any of the buffers in the storage can be used on CPU (GPU). Again, be wary of using `np.array` on a storage without CPU buffer. The method `to_numpy` (`to_cupy`) will always perform a copy and works even if the storage does not hold a buffer on the respective device.

In [8]:
cpu_only_storage = gt_storage.ones((3, 3, 3), device="cpu", managed=False)
gpu_only_storage = gt_storage.ones((3, 3, 3), device="gpu", managed=False)

cp_array = gpu_only_storage.to_cupy()
np_array = cpu_only_storage.to_numpy()

cpu_only_storage[1,1,1] = 2
gpu_only_storage[1,1,1] = 3

print(cp_array)
print(np_array)

[[[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]]
[[[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]]


## Passing non-storage objects to stencil computations

It is possible to pass objects to stencils that support the `__array_interface__` (`__cuda_array_interface__`). If there is no buffer on the same device as the computation will run, a copy is issued, s.t. it is e.g. possible to pass GPU buffers to the `"debug"` backend.

In [9]:
cp_in = cp.ones((3,3,3), dtype=np.float64)
np_out = np.zeros((3,3,3), dtype=np.float64)

copy_stencil_debug(cp_in, np_out)
np.testing.assert_equal(cp.asnumpy(cp_in), np_out)

Further, objects can alternatively implement the `__gt_data_interface__`. Note that the `"dims"` key is not supported yet. 

In [10]:
in_storage = gt_storage.ones((3,3,3), defaults="gtcuda")
out_storage = gt_storage.zeros_like(in_storage, defaults="gtcuda")

in_obj=SimpleNamespace(__gt_data_interface__=in_storage.__gt_data_interface__)
out_obj=SimpleNamespace(__gt_data_interface__=out_storage.__gt_data_interface__)

copy_stencil_gtcuda(in_obj, out_obj)
np.testing.assert_equal(in_storage.to_numpy(), out_storage.to_numpy())