# Tutorial: Caching fieldsets files in temporary, high-speed storage locations

Parcels simulations can be very time-consuming operations. The runtime of each simulation depends on several parameters:

* Number of particles
* Number of attributes per particle (class)
* Lifetime modality of particles: fixed, continuous release, conditional removal, fully dynamic
* Number of fields
* Grid resolution of fields
* Temporal spacing of fields
* Computational density of the kernels (i.e. how much code the kernels have)
* Computational complexity of the kernels (i.e. how smart and computationally efficient the kernels are written) 
* Particle file output time period (i.e. _dt_ of writing particles)
* Integration period (i.e. computational _dt_ in trajectory interation)
* etc.

Several of the mentioned parameters can easily be steered (e.g. number of particles), though are strongly dependent of other parameters.
Some of the parameters, such as computational complexity of the kernels, are directly controlled by the computational skills of the oceanographer.
That said, computation-related parameters have often less impact on the actual simulation runtime.
Most of the time, the time expense of a Parcels, Lagrangian particle simulation is controlled by _input/output (I/O)_ operations,
which means they are mainly controlled by the demand of fields and their grid properties, which cannot be controlled by the Lagrangian modeller.

We have identified and accepted this issue in Lagrangian simulations. As a consequence, the most promising avenue to speed up simulations
is to reduce the reading time of field data from files. There are overall 2 options in Parcels to address this issue:

1. Chunking field files via _Dask_ to reduce the actual data being read
2. Pre-buffering field files in cache locations to speed-up the actual reading of a file.

The usage of chunking is demonstrated in the [_example_dask_chunk_OCMs.py_](https://github.com/OceanParcels/parcels/blob/master/parcels/examples/example_dask_chunk_OCMs.py) file. Here, we detail how to use file cache locations to
speed up simulations.

## Status-quo and problem

We assume here to perform a 3D simulation with the NEMO data. We can measure the runtime of this experiment for the simulation execution

In [1]:
from parcels import FieldSet, ParticleSet, JITParticle, AdvectionRK4_3D
from time import process_time as compute_ptimer_hr
from datetime import timedelta as delta
from glob import glob
import gc

from parcels.tools import logger
# from parcels.tools.loggers import XarrayDecodedFilter
# logger.addFilter(XarrayDecodedFilter())  # Add a filter for the xarray decoding warning

# data_path = 'NemoNorthSeaORCA025-N006_data/'
data_path = "/data/NEMO-MEDUSA/NORTHSEA_ORCA025-N006/"
ufiles = sorted(glob(data_path+'ORCA*U.nc'))
vfiles = sorted(glob(data_path+'ORCA*V.nc'))
wfiles = sorted(glob(data_path+'ORCA*W.nc'))
mesh_mask = data_path + 'coordinates.nc'

filenames = {'U': {'lon': mesh_mask, 'lat': mesh_mask, 'depth': wfiles[0], 'data': ufiles},
             'V': {'lon': mesh_mask, 'lat': mesh_mask, 'depth': wfiles[0], 'data': vfiles},
             'W': {'lon': mesh_mask, 'lat': mesh_mask, 'depth': wfiles[0], 'data': wfiles}}

variables = {'U': 'uo',
             'V': 'vo',
             'W': 'wo'}
dimensions = {'U': {'lon': 'glamf', 'lat': 'gphif', 'depth': 'depthw', 'time': 'time_counter'},
              'V': {'lon': 'glamf', 'lat': 'gphif', 'depth': 'depthw', 'time': 'time_counter'},
              'W': {'lon': 'glamf', 'lat': 'gphif', 'depth': 'depthw', 'time': 'time_counter'}}

fieldset = FieldSet.from_nemo(filenames, variables, dimensions, time_periodic=delta(days=183).total_seconds(), do_cache=False)

pset = ParticleSet.from_line(fieldset=fieldset, pclass=JITParticle,
                             size=10,
                             start=(1.9, 52.5),
                             finish=(3.4, 51.6),
                             depth=1)

kernels = pset.Kernel(AdvectionRK4_3D)
stime = compute_ptimer_hr()
pset.execute(kernels, runtime=delta(days=91), dt=delta(hours=6))
etime = compute_ptimer_hr()
delta_runtime = etime-stime
print("Runtime: {} sec.".format(delta_runtime))
del kernels
del pset
del fieldset

fatal: not a git repository (or any of the parent directories): .git
         It will be opened with no decoding. Filling values might be wrongly parsed.
sh: 1: None: not found
INFO: Compiled ArrayJITParticleAdvectionRK4_3D ==> /tmp/parcels-1000/lib75526eba08dc72fd498cd3a9c2742678_0.so


Runtime: 2.0957419089999996 sec.


## Simply using fieldset buffers - the synchronous option

The majority of the runtime here is spent in reading and loading the data from file. We can now simply activate buffering (i.e. _caching_) as follows.

In [2]:
fieldset_ca_st = FieldSet.from_nemo(filenames, variables, dimensions, time_periodic=delta(days=183).total_seconds(), do_cache=True)
pset_ca_st = ParticleSet.from_line(fieldset=fieldset_ca_st, pclass=JITParticle,
                                   size=10,
                                   start=(1.9, 52.5),
                                   finish=(3.4, 51.6),
                                   depth=1)
kernels_ca_st = pset_ca_st.Kernel(AdvectionRK4_3D)
stime = compute_ptimer_hr()
pset_ca_st.execute(kernels_ca_st, runtime=delta(days=91), dt=delta(hours=6))
etime = compute_ptimer_hr()
delta_runtime = etime-stime
print("Runtime: {} sec.".format(delta_runtime))
fieldset_ca_st.stop_caching()
del kernels_ca_st
del pset_ca_st
del fieldset_ca_st
gc.collect()

sh: 1: None: not found
INFO: Compiled ArrayJITParticleAdvectionRK4_3D ==> /tmp/parcels-1000/lib1035c7154af8cab6dc09da1af2864136_0.so


Runtime: 1.7454694660000003 sec.


2624

### Defining one's individual buffer location

Where did the simulation now store those data ? For some special infrastructures in the Netherlands, Parcels knows ideal caching paths.
That said, for you as external user, you would need to define the path of your high-speed cache drive (e.g. _solid-state drive (SSD)_)
on fieldset construction. Let us assume your cache drive is located at */data/ssd_drive*.

In [3]:
fieldset_ca_st_manual = FieldSet.from_nemo(filenames, variables, dimensions, time_periodic=delta(days=183).total_seconds(), do_cache=True, cache_dir="/data/ssd_drive")
pset_ca_st_manual = ParticleSet.from_line(fieldset=fieldset_ca_st_manual, pclass=JITParticle, size=10, start=(1.9, 52.5), finish=(3.4, 51.6), depth=1)
kernels_ca_st_manual = pset_ca_st_manual.Kernel(AdvectionRK4_3D)
stime = compute_ptimer_hr()
pset_ca_st_manual.execute(kernels_ca_st_manual, runtime=delta(days=91), dt=delta(hours=6))
etime = compute_ptimer_hr()
delta_runtime = etime-stime
print("Runtime: {} sec.".format(delta_runtime))
fieldset_ca_st_manual.stop_caching()
del kernels_ca_st_manual
del pset_ca_st_manual
del fieldset_ca_st_manual
gc.collect()

sh: 1: None: not found
INFO: Compiled ArrayJITParticleAdvectionRK4_3D ==> /tmp/parcels-1000/lib5131fb3f767e120afca7e7879fb14c41_0.so


Runtime: 1.7731092340000014 sec.


1833

Technically, what happens is that with each new timestep Parcels moves the field files of the simulation from the low-throughput
storage to the high-throughput buffer location. In a synchronized, single threaded setup as outlined about, the gain of this operation
is minimal and only leads to gains with either (a) very large field files with many attributes, or (b) if the throughput difference between
low- and high-throughput locations is very big.

## Speeding up fieldset buffers - the asynchronous option

In order to gain more from the buffered process, the buffer management ideally runs in the background. For that, Parcels uses a multi-threaded setup,
where a background thread populates the file pool that the simulation thread in the end consumes. We can activate this option with the argument `use_thread=True`.

In [4]:
fieldset_ca_mt = FieldSet.from_nemo(filenames, variables, dimensions, time_periodic=delta(days=183).total_seconds(), do_cache=True, use_threads=True, cache_dir="/data/ssd_drive")
pset_ca_mt = ParticleSet.from_line(fieldset=fieldset_ca_mt, pclass=JITParticle,
                                   size=10,
                                   start=(1.9, 52.5),
                                   finish=(3.4, 51.6),
                                   depth=1)
kernels_ca_mt = pset_ca_mt.Kernel(AdvectionRK4_3D)
stime = compute_ptimer_hr()
pset_ca_mt.execute(kernels_ca_mt, runtime=delta(days=91), dt=delta(hours=6))
etime = compute_ptimer_hr()
delta_runtime = etime-stime
print("Runtime: {} sec.".format(delta_runtime))
fieldset_ca_mt.stop_caching()
del kernels_ca_mt
del pset_ca_mt
del fieldset_ca_mt
gc.collect()

sh: 1: None: not found
INFO: Compiled ArrayJITParticleAdvectionRK4_3D ==> /tmp/parcels-1000/lib0cf1115e84ec405b1433908e91a088d9_0.so


Runtime: 1.7177553899999989 sec.


1833

## Advanced buffer options

The buffer itself is controlled by two variables: a lower- and an upper caching cap. The buffer manager removes old fields (not in use anymore) until the buffer
is between upper- and lower cap, and adds new field to the buffers as long as the buffer is below the upper cap. For simulations with many, large fields,
those caps can be changed. On clusters and non-personal compute environments, please consult your ICT manager for the max. individual quota for personal
scratch locations. To change those caps, the `FieldFileCache` needs to be created by itself, as the upper cap is part of the advanced-level functions.

In [5]:
from parcels.tools import FieldFileCache

use_threads = False  # or: True (choice)
lower_cache_limit = int(0.5*(2**30))
upper_cache_limit = int(2.0*(2**30))
cache_obj = FieldFileCache(cache_lower_limit=lower_cache_limit, cache_upper_limit=upper_cache_limit, use_thread=use_threads, cache_top_dir="/data/ssd_drive")
fieldset_ca_mt_manual = FieldSet.from_nemo(filenames, variables, dimensions, time_periodic=delta(days=183).total_seconds(), do_cache=True, cache=cache_obj)
pset_ca_mt_manual = ParticleSet.from_line(fieldset=fieldset_ca_mt_manual, pclass=JITParticle, size=10, start=(1.9, 52.5), finish=(3.4, 51.6), depth=1)
kernels_ca_mt_manual = pset_ca_mt_manual.Kernel(AdvectionRK4_3D)
stime = compute_ptimer_hr()
pset_ca_mt_manual.execute(kernels_ca_mt_manual, runtime=delta(days=91), dt=delta(hours=6))
etime = compute_ptimer_hr()
delta_runtime = etime-stime
print("Runtime: {} sec.".format(delta_runtime))
fieldset_ca_mt_manual.stop_caching()
del kernels_ca_mt_manual
del pset_ca_mt_manual
del cache_obj
del fieldset_ca_mt_manual
gc.collect()

sh: 1: None: not found
INFO: Compiled ArrayJITParticleAdvectionRK4_3D ==> /tmp/parcels-1000/lib8fadb8fe2e486406b70881d61d2f11f8_0.so


Runtime: 1.734731759999999 sec.


1833

There are a few expanded options a user can access with this illustrated construct. Aside setting not only the lower- but also the upper cache limit,
it is possible to constraint the pre-buffer to a maximum of _m_ future timesteps with the use of the `cache_step_limit` property. This can be useful in
long simulations with very small files to avoid a big initial load overhead with the first initial buffer population. Furthermore, using the `debug` parameter
in the `FieldFileCache` construction allows to print-out the extensive track of debug information in case a simulation once doesn't work as intended with caching.
Accessing the `cache_top_dir` and writing a new path in there allows to circumvent the auto-determination of paths on the Dutch infrastructure. One can also
`enable_named_copy` or `disable_named_copy` (default) to control the renaming of files of field variables and split files where _U, V and W_ are written in the same
file into multiple files in the cache. This is advisable when combining chunking via _Dask_ and the field buffer together, as it allows for auxilliary variables
(e.g. _TPP3_, _CO2_ and others) in the same file to be chunked independently. Lastly, the advanced construction allows to `enable_threading` or `disable_threading` after
the `FieldFileCache` construction. Those capabilities are illustrated in the final example

In [6]:
use_threads = False
lower_cache_limit = int(0.5*(2**30))
upper_cache_limit = int(2.0*(2**30))
cache_obj_adv = FieldFileCache(cache_lower_limit=lower_cache_limit, cache_upper_limit=upper_cache_limit, use_thread=use_threads, cache_top_dir="/data/ssd_drive")
cache_obj_adv.cache_step_limit = 3
cache_obj_adv.enable_named_copy()
cache_obj_adv.enable_threading()

fieldset_ca_mt_adv_manual = FieldSet.from_nemo(filenames, variables, dimensions, time_periodic=delta(days=183).total_seconds(), do_cache=True, cache=cache_obj_adv)
pset_ca_mt_adv_manual = ParticleSet.from_line(fieldset=fieldset_ca_mt_adv_manual, pclass=JITParticle, size=10, start=(1.9, 52.5), finish=(3.4, 51.6), depth=1)
kernels_ca_mt_adv_manual = pset_ca_mt_adv_manual.Kernel(AdvectionRK4_3D)
stime = compute_ptimer_hr()
pset_ca_mt_adv_manual.execute(kernels_ca_mt_adv_manual, runtime=delta(days=91), dt=delta(hours=6))
etime = compute_ptimer_hr()
delta_runtime = etime-stime
print("Runtime: {} sec.".format(delta_runtime))
fieldset_ca_mt_adv_manual.stop_caching()
del kernels_ca_mt_adv_manual
del pset_ca_mt_adv_manual
del fieldset_ca_mt_adv_manual
gc.collect()


sh: 1: None: not found
INFO: Compiled ArrayJITParticleAdvectionRK4_3D ==> /tmp/parcels-1000/lib9e68b1f5786a189fa1ed51782759bae0_0.so


Runtime: 1.7673386569999998 sec.


2098