# Workshop on Domain-Specific Lanugages for Performance-Portable Weather and Climate Models

## Session A3: Serializing from Fortran with Serialbox

This notebook will walk through an example to get familiar with the basic Serialbox features to save data from Fortran and load the data into Python.

### Inserting `!$ser` Directives

Take a look at the simple Fortran code below. The cell magic `%%writefile serialBox_tutorial.F90` will cause the contents of the cell to be written to an actual file on disk when the cell is executed.

When working through the example, feel free to refer back to the slides of the presentation. As a quick reference, here is a summary of the most important high-level API features of Serialbox:

* `!$ser init directory=<path> prefix=<string>` - Initialize Serialbox
    * `unique_id=.true.` will add a unique identifier to each savepoint, useful when data collisions are likely
    * `mpi_rank=<rank>` will make sure that each MPI rank will write to a different file

* `!$ser savepoint "<string>"` - Create a savepoint to which data is associated

* `!$ser data <name>=<variable>` - Save a Fortran scalar or array <variable> to name <name>

* `!$ser verbatim <code>` - Execute code only if -DSERIALIZE is set

* `!$ser <on/off>` - Turn serialization on/off. Subsequent `!$ser data` statements are ignored (or not)

* `!$ser mode write` - Activate serialization mode (`read` would indicate de-serialization)

* `!$ser cleanup` - Shut down Serialbox

<div class="alert alert-block alert-info">
    <b> Now it's your turn: </b><br>
    (Hint: Make sure that when you modify code you retain the original code by commenting it out so that you undo any of the modifications you do.)
    <ol>
        <li style="margin-bottom: 10px">Initialize Serialbox, have the serialized data written to a directory called <code>./data</code>, and set the Serialbox file prefix to <code>example</code>.</li>
        <li style="margin-bottom: 10px">Create a Serialbox savepoint called <code>input_data</code> that contains all the serialized data.</li>
        <li style="margin-bottom: 10px">Create an integer scalar that has a value equal to <code>7</code> and serialize it with the name <code>int0</code>.</li>
        <li style="margin-bottom: 10px">Create a real scalar that has a value equal to <code>8.9</code> and serialize it with the name <code>real0</code>.</li>
        <li style="margin-bottom: 10px">Create a 2D double precision array of size (10,11) where the value at the <code>(i,j)</code> index is computed as <code>(j-1) + i + 0.1</code> when looping through <code>i</code> and <code>j</code>. Save the array using a Serialbox variable called <code>dp_arr0</code>.</li>
        <li style="margin-bottom: 10px">Create a Fortran derived data type that contains an integer, a real, and a 2D double precision array that are set to the same values as indicated in the 3 previous bullets (i.e. integer equal to <code>7</code>, real equal to <code>8.9</code>, etc.) and serialize the three values.  When serializing the data from the derived data type, write the integer into a Serialbox variable called <code>ddt_int0</code>, write the real into a Serialbox variable called <code>ddt_real0</code>, and write the 2D double precision array into a variable called <code>ddt_arr0</code>.</li>
    </ol>
</div>

In [1]:
%%writefile serialBox_tutorial.F90

module mod0
    implicit none

    type der_data_type
        integer :: int_dd
        real    :: real_dd
        double precision, dimension(:,:), allocatable :: dp_arr_dd     
    end type der_data_type

end module mod0

program serialBox_tutorial
    use mod0
    implicit none

    integer          :: int0, ii, jj
    real             :: real0

    double precision, dimension(:,:), allocatable :: dp_arr0

    type(der_data_type) dd_Type
            
    ! Initialize Serialbox   
    ! Set up the data as indicated in the above cell
    ! Serialize the data as indicated in the above cell using Serialbox
   
end program

Writing serialBox_tutorial.F90


### Generating Serialized Data

Once the Fortran file is written, we need to run the pre-processor provided by Serialbox `pp_ser.py` to generate the Fortran code which contains the low-level Serialbox API calls. In the example below, we generate a Fortran file `s_serialBox_tutorial.F90`. After that, the code is compiled using a Fortran compiler (e.g., `gfortran`) and executed to generated the serialized data.

In [None]:
%%bash

# remove old stuff
[ -f tutorial_run ] && rm tutorial_run
[ -f s_serialBox_tutorial.F90 ] && rm s_serialBox_tutorial.F90
rm -rf ./data

# run the Serialbox directives pre-processor
python3 ${SERIALBOX_ROOT}/python/pp_ser/pp_ser.py -s -v --output=s_serialBox_tutorial.F90 serialBox_tutorial.F90
pygmentize s_serialBox_tutorial.F90

# compile generated source file
gfortran -O3 -cpp -DSERIALIZE \
    -o tutorial_run s_serialBox_tutorial.F90 \
    -I${SERIALBOX_ROOT}/include \
    ${SERIALBOX_ROOT}/lib/libSerialboxFortran.a \
    ${SERIALBOX_ROOT}/lib/libSerialboxC.a \
    ${SERIALBOX_ROOT}/lib/libSerialboxCore.a \
    -lpthread -lstdc++ -lstdc++fs

# run executable and generate serialization data
./tutorial_run

### Reading Serialized Data and Writing a Python Unit-Test

Once the binary executes and writes out the data, the developer can run the following cell containing a Python script to verify whether the data serialized properly or not.  The script assumes that the developer has set up the data exactly as specified in bulleted list above.  If the only message printed is `All tests ran successfully!`, all the tests have passed!

In [4]:
#!/usr/bin/env python3

import numpy as np
import sys
import os
sys.path.append(os.environ.get('SERIALBOX_ROOT') + '/python')
import serialbox as ser

serializer = ser.Serializer(ser.OpenModeKind.Read,'./data', 'example')

sp = serializer.get_savepoint('input_data')

int0  = serializer.read('int0',  sp[0])[0]
real0 = serializer.read('real0', sp[0])[0]

dp_arr0   = serializer.read('dp_arr0',   sp[0])

ddt_int0    = serializer.read('ddt_int0', sp[0])
ddt_real0   = serializer.read('ddt_real0', sp[0])
ddt_dp_arr0 = serializer.read('ddt_arr0', sp[0])

int0_ref = 7
real0_ref = np.float32(8.9)

dp_arr0_ref = np.zeros((10,11), dtype=np.float32)

for j in range(11):
    for i in range(10):
        dp_arr0_ref[i,j] = j + i+1 + 0.1

try:
    assert int0_ref == int0, "int0 does not match!"
    assert real0_ref == real0, "real0 does not match!"
    assert np.array_equal(dp_arr0_ref, dp_arr0), "dp_arr0 does not match!"
    assert np.allclose(int0_ref, ddt_int0), "ddt_int0 does not match!"
    assert np.allclose(real0_ref, ddt_real0), "ddt_real0 does not match!"
    assert np.allclose(dp_arr0_ref, ddt_dp_arr0), "ddt_arr0 does not match!"
except AssertionError as msg:
    print(msg)

print("All tests passed successfully!")

All tests passed successfully!


## Moving Forward (Optional)

Often setting a savepoint can result in an excessive amount of data being produced. You may want to be strategic about using when you save data, either by using `!$ser verbatim if (condition)` around savepoints or using `!$ser on` and `!$ser off` if data volume becomes an issue, or you are not gaining unique tests out of the repetition of data. 

* Initialize Serialbox, have the serialized data written to a directory called `./data2`, and set the Serialbox file prefix to `example2`.
* Create a Serialbox savepoint called loop_data that contains the variable `arr` data and index `n`
* Save `arr` only for n = 1 and n = 50 by using the `!ser verbatim if ...` statement. Either save `n` as well, or change the python comparison code to not rely on it.
* Note the 'ser init' statement can include `unique_id=.true.`. This allows you to save the same variable multiple times.    

In [6]:
%%writefile serialBox_tutorial2.F90

subroutine compute_arr(arr, n)
    double precision, dimension(100,100,100), intent(INOUT) :: arr
    integer, intent(IN) :: n
    integer          :: ii, jj, kk
    do kk=1,100
        do jj=2,100
            do ii=2,100
                arr(ii,jj,kk) = arr(ii, jj,kk) + (arr(ii-1,jj,kk) + arr(ii, jj - 1,kk) + 2 * arr(ii, jj,kk)) / (4 * n)
            enddo
        enddo
    enddo
    ! Add serialization statements
    !$ser verbatim if (n == 1 .or. n == 50) then
    !$ser savepoint 'loop_data'
    !$ser data arr=arr n=n
    !$ser verbatim endif
end subroutine 

program serialBox_tutorial
    use mod0
    implicit none

    integer          :: n
    double precision, dimension(100,100,100) :: arr

    ! Initialize Serialbox
    !$ser init directory='./data2' prefix='example2' unique_id=.true.
    !$ser mode write
    !$ser on
    do n=1,100
       call compute_arr(arr, n)
    enddo
    !$ser cleanup
    
end program

Writing serialBox_tutorial2.F90


In [None]:
%%bash

[ -f tutorial_run2 ] && rm tutorial_run2
[ -f s_serialBox_tutorial2.F90 ] && rm s_serialBox_tutorial2.F90

python3 ${SERIALBOX_ROOT}/python/pp_ser/pp_ser.py -s -v --output=s_serialBox_tutorial2.F90 serialBox_tutorial2.F90
cat s_serialBox_tutorial2.F90
gfortran -O3 -cpp -DSERIALIZE \
    -o tutorial_run2 s_serialBox_tutorial2.F90 \
    -I${SERIALBOX_ROOT}/include \
    ${SERIALBOX_ROOT}/lib/libSerialboxFortran.a \
    ${SERIALBOX_ROOT}/lib/libSerialboxC.a \
    ${SERIALBOX_ROOT}/lib/libSerialboxCore.a \
    -lpthread -lstdc++ -lstdc++fs 
rm -rf ./data2
./tutorial_run2

In [None]:
#!/usr/bin/env python3

import numpy as np
import sys
import os
sys.path.append(os.environ.get('SERIALBOX_ROOT') + '/python')
import serialbox as ser
import gt4py.gtscript as gtscript
import gt4py.storage as gt_storage
backend="numpy"
@gtscript.stencil(backend=backend)
def update_arr(arr: gtscript.Field[np.float64], n: int):
    with computation(PARALLEL), interval(...):
        arr = arr + (arr[-1, 0, 0] + arr[0, -1, 0] + 2 * arr) / (4 * n)

data_path = './data2'        
if not os.path.isdir(data_path):
    raise Exception('Data directory does not exist', data_path)
serializer = ser.Serializer(ser.OpenModeKind.Read, data_path, 'example2')
saved_arrs = {}
for savepoint in serializer.savepoint_list():
    if savepoint.name == 'loop_data':
        n = serializer.read('n',   savepoint)[0]
        saved_arrs[n] = serializer.read('arr',   savepoint)
shape = saved_arrs[1].shape

arr0_ref =gt_storage.from_array(data=np.zeros(shape, dtype=np.float64), backend=backend, 
                                dtype=np.float64,default_origin=(0, 0, 0), shape=shape)
for n in range(1, 101):
    update_arr(arr0_ref, n, origin=(1, 1, 0), domain=(shape[0] - 1, shape[1] - 1, shape[2]))
    if n == 1 or n == 50:
      try:
          print('Checking n =', n)
          assert np.array_equal(arr0_ref, saved_arrs[n]), 'arr0 does not match!'
      except AssertionError as msg:
        print(msg)


print("Finished running comparison tests!")