## **Serialbox Tutorial : Serializing Fortran Data**

This notebook will cover the basics on extracting data within a Fortran program using [Serialbox](https://gridtools.github.io/serialbox/).

### **Notebook Requirements**

- Python v3.11.x to v3.12.x
- [NOAA/NASA Domain Specific Language Middleware](https://github.com/NOAA-GFDL/NDSL)
- `ipykernel==6.1.0`
- [`ipython_genutils`](https://pypi.org/project/ipython_genutils/)
- Fortran compiler that built Serialbox in the `NDSL` middleware (Note: The default build instructions for `NDSL` builds Serialbox such that it outputs to binary data files from Fortran.  Serialbox has compiler options that enable it to write netCDF files.)

### **Brief Serialbox Overview**

[Serialbox](https://gridtools.github.io/serialbox/) is a library that can extract data from Fortran programs for use in code porting and verification.  It uses directive-based code statements that are translated later into actual Serialbox library calls, which makes it approachable to use.  Extracting data from a Fortran program using Serialbox essentially follows these steps.

1) Initialize Serialbox
2) Create a savepoint
3) Save the data of interest
4) "Clean up" the savepoint

These four steps corrolate to the following directives in Serialbox.

1) `!$ser init directory='<Directory Path to store Serialbox data>' prefix='<Name of Data Group>'`
2) `!$ser savepoint <Name of Savepoint>`
3) `!$ser data <Serialbox Variable Name>=<Fortran Variable Name>`
4) `!$ser cleanup`

Note that in 3, multiple variables can be specified (ex: `!$ser data serialA=fortranA serialB=fortranB serialC=fortranC`)

### **Serialbox Example 1**

We'll step through a basic example that extracts data from a Fortran code using Serialbox.

The following sets the environment variables `SERIALBOX_EXAMPLE_PATH` and `SERIALBOX_INSTALL_PATH`.  Afterwards, a Bash script issues commands that create a `Fortran` directory within `SERIALBOX_EXAMPLE_PATH` that will store the Fortran code used to demonstrate Serialbox commands.  Be sure to change the environment variables `SERIALBOX_EXAMPLE_PATH` and `SERIALBOX_INSTALL_PATH` to ones that're appropriate for your machine.

In [8]:
# Change SERIALBOX_EXAMPLE_PATH and SERIALBOX_INSTALL_PATH to appropriate paths
%env SERIALBOX_EXAMPLE_PATH=/home/mad/work/ndsl/test_data/example_data/
%env SERIALBOX_INSTALL_PATH=/home/mad/work/serialbox/install/
%env SAVEPOINT_NAME=FILLQ2ZERO1

env: SERIALBOX_EXAMPLE_PATH=/home/mad/work/ndsl/test_data/example_data/
env: SERIALBOX_INSTALL_PATH=/home/mad/work/serialbox/install/


In [9]:
%%bash

cd $SERIALBOX_EXAMPLE_PATH

if [ ! -d "./Fortran" ]; then
    mkdir Fortran
else
    rm -rf Fortran
    mkdir Fortran
fi

#### **Serialbox directive calls in Fortran code**

Next we'll issue commands that create and write the file `testSerialBox.F90` and move it to the previously created `Fortran` directory. This file will contain the Fortran program `testSerialBox` that allocates three arrays, writes random numbers into two arrays (`Qin_out`, `MASS`), and passes the arrays into the subroutine `FILLQ2ZERO1`.

In [29]:
%%writefile testSerialBox.F90

program testSerialBox

  implicit none

  real, dimension(:,:,:), allocatable :: Qin_out, MASS
  real, dimension(:,:),   allocatable :: FILLQ_out

  integer :: N = 5

  allocate(Qin_out(N,N,N), MASS(N,N,N), FILLQ_out(N,N))

  call random_number(Qin_out)
  call random_number(MASS)

  where(Qin_out < 0.1) Qin_out = -Qin_out

  print*, 'sum(Qin_out) = ', sum(Qin_out)
  print*, 'sum(MASS) = ', sum(MASS)


!$ser init directory='.' prefix='FILLQ2ZERO_SerData'
!$ser mode write
!$ser savepoint FILLQ2ZERO1-In
!$ser data q=Qin_out mass=MASS fq=FILLQ_out

  call FILLQ2ZERO1(Qin_out, MASS, FILLQ_out)

!$ser savepoint FILLQ2ZERO1-Out
!$ser data q=Qin_out mass=MASS fq=FILLQ_out
!$ser cleanup

  print*, 'sum(Qin_out) = ', sum(Qin_out)
  print*, 'sum(FILLQ_out) = ', sum(FILLQ_out)

   contains

  subroutine FILLQ2ZERO1( Q, MASS, FILLQ  )
    real, dimension(:,:,:),   intent(inout)  :: Q
    real, dimension(:,:,:),   intent(in)     :: MASS
    real, dimension(:,:),     intent(  out)  :: FILLQ
    integer                                  :: IM,JM,LM
    integer                                  :: I,J,K,L
    real                                     :: TPW, NEGTPW
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ! Fills in negative q values in a mass conserving way.
    ! Conservation of TPW was checked.
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    IM = SIZE( Q, 1 )
    JM = SIZE( Q, 2 )
    LM = SIZE( Q, 3 )
    do j=1,JM
       do i=1,IM
          TPW = SUM( Q(i,j,:)*MASS(i,j,:) )
          NEGTPW = 0.
          do l=1,LM
             if ( Q(i,j,l) < 0.0 ) then
                NEGTPW   = NEGTPW + ( Q(i,j,l)*MASS( i,j,l ) )
                Q(i,j,l) = 0.0
             endif
          enddo
          do l=1,LM
             if ( Q(i,j,l) >= 0.0 ) then
                Q(i,j,l) = Q(i,j,l)*( 1.0+NEGTPW/(TPW-NEGTPW) )
             endif
          enddo
          FILLQ(i,j) = -NEGTPW
       end do
    end do
  end subroutine FILLQ2ZERO1
end program

Writing testSerialBox.F90


In [30]:
%%bash

mv testSerialBox.F90 $SERIALBOX_EXAMPLE_PATH/Fortran

Assuming that we are interested in porting the subroutine `FILLQ2ZERO1`, we need the array data before and after calling `FILLQ2ZERO1`, which will let us set the initial data state in our ported code appropriately and have output data for comparison purposes.  To get this data, there are directive-based Serialbox commands inserted before and after the call to `FILLQ2ZERO1` that follow the steps presented in the [Serialbox overview](#brief-serialbox-overview).  Let's quickly examine the Serialbox commands before the call to `FILLQ2ZERO1`.

- `!$ser init directory='.' prefix='FILLQ2ZERO1_SerData'` : Initializes Serialbox and specifies that the extracted data will be written into the current path where the code is executed.  The data will be grouped and named with the prefix `FILLQ2ZERO1_SerData`.

- `!$ser savepoint FILLQ2ZERO1` : Creates a savepoint with the name `FILLQ2ZERO1`.

- `!$ser mode write` : Serialbox's operation mode will be to write data files.  This is the default mode (have to check this).  Other modes include `read`.

- `!$ser data q=Qin_out mass=MASS fq=FILLQ_out` : Serialbox will write the arrays out into data files.  Note that the variable on the left side of `=` is the variable name that Serialbox will use, and the variable on the right side of `=` is the Fortran variable.

After the `FILLQ2ZERO1` call, the Serialbox command `!$ser data...`  records the resulting output arrays from `FILLQ2ZERO1` .  `!$ser cleanup` indicates we're done with writing data and finalizes the files.

#### **Translating Serialbox directive calls into actual library calls**

While we've expressed the Serialbox commands using directives, these directives will need to be mapped to the appropriate Serialbox library calls. To do this, we run a Python script `pp_ser.py` (found in the Serialbox installation directory) that will replace the `!ser` directive statements will the appropriate Fortran Serialbox calls and will write a new `testSerialBox.F90` file.  The following Bash commands will create an `sb` directory with the `Fortran` directory and execute the `pp_ser.py` script.

In [36]:
%%bash

cd $SERIALBOX_EXAMPLE_PATH/Fortran
if [ ! -d "./$SAVEPOINT_NAME" ]; then
    mkdir $SAVEPOINT_NAME
else
    rm -rf $SAVEPOINT_NAME
    mkdir $SAVEPOINT_NAME
fi

python /home/mad/work/serialbox/src/serialbox-python/pp_ser/pp_ser.py --output-dir=./$SAVEPOINT_NAME testSerialBox.F90

Processing file testSerialBox.F90


Note that we specified the option `--output-dir=./sb` when running `pp_ser.py`, which specifies the location where we want the resulting Fortran code with the Serialbox directives replaced with library calls.  If we did not specify the output directory, executing `pp_ser.py` would simply print the Fortran code to the terminal.  In the `sb` directory, we'll find a `testSerialBox.F90` file that contains the appropriate Serialbox calls.

In [39]:
%%bash

cd $SERIALBOX_EXAMPLE_PATH/Fortran/FILLQ2ZERO1
ls -al

total 16
drwxr-xr-x 2 mad mad 4096 May 15 14:32 .
drwxr-xr-x 3 mad mad 4096 May 15 14:32 ..
-rw-r--r-- 1 mad mad 5099 May 15 14:32 testSerialBox.F90


#### **Building and Running Fortran code with Serialbox library**

Compiling the Fortran code with Serialbox requires the following during compilation:

- References to the following Serialbox libraries (assuming that we want the resulting binary with libraries statically linked)
    - `libSerialboxFortran.a`
    - `libSerialboxC.a`
    - `libSerialboxCore.a`
    - `-lstdc++`
    
- The `-DSERIALIZE` macro to activate the Serialbox codepath within the Fortran code.  Note that not having this macro during compilation will result in a binary without Serialbox calls.

- The `include` path from the Serialbox installation

The compilation line can look as follows.

In [40]:
%%bash

cd $SERIALBOX_EXAMPLE_PATH/Fortran/FILLQ2ZERO1

# Note: Adjust the libraries and include paths appropriately

gfortran testSerialBox.F90  \
    $SERIALBOX_INSTALL_PATH/lib/libSerialboxFortran.a \
    $SERIALBOX_INSTALL_PATH/lib/libSerialboxC.a \
    $SERIALBOX_INSTALL_PATH/lib/libSerialboxCore.a \
    -lstdc++ \
    -DSERIALIZE \
    -I$SERIALBOX_INSTALL_PATH/include \
    -o testSerialBox.bin


After successful compilation, we can execute the code.  Note that whenever Serialbox is running, the code displays `WARNING: SERIALIZATION IS ON` in the terminal.

In [41]:
%%bash

cd $SERIALBOX_EXAMPLE_PATH/Fortran/FILLQ2ZERO1
./testSerialBox.bin

 sum(Qin_out) =    55.8682556    
 sum(MASS) =    60.2736511    
 >>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<
 >>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<
 sum(Qin_out) =    55.8816376    
 sum(FILLQ_out) =   0.210932091    


After the code executes, you will see several `.json` and `.dat` files that are named based on the Serialbox's written variables and the `prefix` specified during Serialbox's initialization.

In [42]:
%%bash

cd $SERIALBOX_EXAMPLE_PATH/Fortran/FILLQ2ZERO1
ls -al

total 1008
drwxr-xr-x 2 mad mad   4096 May 15 14:33 .
drwxr-xr-x 3 mad mad   4096 May 15 14:32 ..
-rw-r--r-- 1 mad mad    703 May 15 14:33 ArchiveMetaData-FILLQ2ZERO_InOut.json
-rw-r--r-- 1 mad mad    200 May 15 14:33 FILLQ2ZERO_InOut_fq.dat
-rw-r--r-- 1 mad mad    500 May 15 14:33 FILLQ2ZERO_InOut_mass.dat
-rw-r--r-- 1 mad mad   1000 May 15 14:33 FILLQ2ZERO_InOut_q.dat
-rw-r--r-- 1 mad mad   3860 May 15 14:33 MetaData-FILLQ2ZERO_InOut.json
-rw-r--r-- 1 mad mad   5099 May 15 14:32 testSerialBox.F90
-rwxr-xr-x 1 mad mad 993872 May 15 14:33 testSerialBox.bin


## **Serialbox Example 2 : Looping Region**

There may be cases where a function or subroutine is located within a looping region, and we want to check the values of a looping region.  Serialbox enables saving data within a looping region by adding metadata to the `!$ser savepoint` declaration.  In general, it can look like this.

- `!$ser savepoint <Savepoint Name> <Metadata variable>=<Fortran variable (Usually the timestep)>`

For example, if there's a timestep looping region that increments the variable `currTS`, we can use that variable to create separate savepoints within that looping region.

- `!$ser savepoint sp timestep=currTS`

In the example below, we'll use Serialbox to create multiple savepoints within a looping region.

In [None]:
%%bash

cd $SERIALBOX_EXAMPLE_PATH

if [ ! -d "./Fortran_ts" ]; then
    mkdir Fortran_ts
else
    rm -rf Fortran_ts
    mkdir Fortran_ts
fi

In [None]:
%%writefile testSerialBox_ts.F90

program testSerialBox_ts

  implicit none

  real, dimension(:,:,:), allocatable :: Qin_out, MASS
  real, dimension(:,:),   allocatable :: FILLQ_out

  integer :: N = 5, N_ts = 10, t

  allocate(Qin_out(N,N,N), MASS(N,N,N), FILLQ_out(N,N))

!$ser init directory='.' prefix='FILLQ2ZERO_SerData'

  do t = 1, N_ts

   call random_number(Qin_out)
   call random_number(MASS)

   where(Qin_out < 0.1) Qin_out = -Qin_out

   print*, 'sum(Qin_out) = ', sum(Qin_out)
   print*, 'sum(MASS) = ', sum(MASS)


!$ser savepoint FILLQ2ZERO1-In timestep=t
!$ser data q=Qin_out mass=MASS fq=FILLQ_out

    call FILLQ2ZERO1(Qin_out, MASS, FILLQ_out)

!$ser savepoint FILLQ2ZERO1-Out timestep=t
!$ser data q=Qin_out mass=MASS fq=FILLQ_out

!   print*, 'sum(Qin_out) = ', sum(Qin_out)
!   print*, 'sum(FILLQ_out) = ', sum(FILLQ_out)

  enddo
  
!$ser cleanup
   contains

  subroutine FILLQ2ZERO1( Q, MASS, FILLQ  )
    real, dimension(:,:,:),   intent(inout)  :: Q
    real, dimension(:,:,:),   intent(in)     :: MASS
    real, dimension(:,:),     intent(  out)  :: FILLQ
    integer                                  :: IM,JM,LM
    integer                                  :: I,J,K,L
    real                                     :: TPW, NEGTPW
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ! Fills in negative q values in a mass conserving way.
    ! Conservation of TPW was checked.
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    IM = SIZE( Q, 1 )
    JM = SIZE( Q, 2 )
    LM = SIZE( Q, 3 )
    do j=1,JM
       do i=1,IM
          TPW = SUM( Q(i,j,:)*MASS(i,j,:) )
          NEGTPW = 0.
          do l=1,LM
             if ( Q(i,j,l) < 0.0 ) then
                NEGTPW   = NEGTPW + ( Q(i,j,l)*MASS( i,j,l ) )
                Q(i,j,l) = 0.0
             endif
          enddo
          do l=1,LM
             if ( Q(i,j,l) >= 0.0 ) then
                Q(i,j,l) = Q(i,j,l)*( 1.0+NEGTPW/(TPW-NEGTPW) )
             endif
          enddo
          FILLQ(i,j) = -NEGTPW
       end do
    end do
  end subroutine FILLQ2ZERO1
end program

In [None]:
%%bash

mv testSerialBox_ts.F90 $SERIALBOX_EXAMPLE_PATH/Fortran_ts

cd $SERIALBOX_EXAMPLE_PATH/Fortran_ts
if [ ! -d "./$SAVEPOINT_NAME" ]; then
    mkdir $SAVEPOINT_NAME
else
    rm -rf $SAVEPOINT_NAME
    mkdir $SAVEPOINT_NAME
fi

python /home/ckung/Documents/Code/SMT-Nebulae/sw_stack/discover/sles15/src/2024.03.00/install/serialbox/python/pp_ser/pp_ser.py --output-dir=./$SAVEPOINT_NAME testSerialBox_ts.F90

cd $SERIALBOX_EXAMPLE_PATH/Fortran_ts/$SAVEPOINT_NAME

gfortran testSerialBox_ts.F90  \
    $SERIALBOX_INSTALL_PATH/lib/libSerialboxFortran.a \
    $SERIALBOX_INSTALL_PATH/lib/libSerialboxC.a \
    $SERIALBOX_INSTALL_PATH/lib/libSerialboxCore.a \
    -lstdc++ \
    -DSERIALIZE \
    -I$SERIALBOX_INSTALL_PATH/include \
    -o testSerialBox_ts.bin

./testSerialBox_ts.bin

ls -al