# Working with Parcels output


This tutorial covers the format of the trajectory output exported by Parcels. **Parcels does not include advanced analysis or plotting functionality**, which users are suggested to write themselves to suit their research goals. Here we provide some starting points to explore the parcels output files yourself.

- [**Reading the output file**](#reading-the-output-file)
- [**Trajectory data structure**](#trajectory-data-structure)
- [**Analysis**](#analysis)
- [**Plotting**](#plotting)
- [**Animations**](#animations)

For more advanced reading and tutorials on the analysis of Lagrangian trajectories, we recommend checking out the [Lagrangian Diagnostics Analysis Cookbook](https://lagrangian-diags.readthedocs.io/en/latest/tutorials.html) and the project in general. The [TrajAn package](https://opendrift.github.io/trajan/index.html) can be used to read and plot datasets of Lagrangian trajectories.

In [1]:
from datetime import datetime, timedelta

import numpy as np
import xarray as xr

import parcels

  import parcels


First we need to create some parcels output to analyze. We simulate a set of particles using the setup described in the [Delay start tutorial](https://docs.oceanparcels.org/en/latest/examples/tutorial_delaystart.html). We will also add some user defined metadata to the output file.

In [None]:
# Load the CopernicusMarine data in the Agulhas region from the example_datasets
example_dataset_folder = parcels.download_example_dataset(
    "CopernicusMarine_data_for_Argo_tutorial"
)

ds_fields = xr.open_mfdataset(f"{example_dataset_folder}/*.nc", combine="by_coords")
ds_fields.load()  # load the dataset into memory

fieldset = parcels.FieldSet.from_copernicusmarine(ds_fields)

In [None]:
# Particle locations and initial time
npart = 10  # number of particles to be released
lon = 32 * np.ones(npart)
lat = np.linspace(-32.5, -30.5, npart, dtype=np.float32)
time = ds_fields.time.values[0] + np.arange(0, npart) * np.timedelta64(2, "h")
z = np.repeat(ds_fields.depth.values[0], npart)

pset = parcels.ParticleSet(
    fieldset=fieldset, pclass=parcels.Particle, lon=lon, lat=lat, time=time, z=z
)

output_file = parcels.ParticleFile("output.zarr", outputdt=np.timedelta64(2, "h"))

Parcels saves some metadata in the output file with every simulation (Parcels version, CF convention information, etc.). This metadata is just a dictionary which is propogated to `xr.Dataset(attrs=...)` and is stored in the `.metadata` attribute. We are free to manipulate this dictionary to add any custom, xarray-compatible metadata relevant to their simulation. Here we add a custom metadata field `date_created` to the output file.

In [None]:
output_file.metadata["date_created"] = datetime.now().isoformat()
output_file.metadata

To write the metadata to the output_file, we need to add it before running `pset.execute()` which writes the particleset including the metadata to the output_file.

In [None]:
pset.execute(
    parcels.kernels.AdvectionRK4,
    runtime=np.timedelta64(48, "h"),
    dt=np.timedelta64(5, "m"),
    output_file=output_file,
    verbose_progress=False,
)

## Reading the output file

Parcels exports output trajectories in `zarr` [format](https://zarr.readthedocs.io/en/stable/). Files in `zarr` are typically _much_ smaller in size than netcdf, although may be slightly more challenging to handle (but `xarray` has a fairly seamless `open_zarr()` method). Note when we display the dataset we can see our custom metadata field `date_created`.


In [None]:
ds_particles = xr.open_zarr("output.zarr")

print(ds_particles)

Note that if you are running Parcels on multiple processors with `mpirun`, you will need to concatenate the files of each processor, see also the [MPI documentation](https://docs.oceanparcels.org/en/latest/examples/documentation_MPI.html#Reading-in-the-ParticleFile-data-in-zarr-format).

Also, once you have loaded the data as an `xarray` DataSet using `xr.open_zarr()`, you can always save the file to NetCDF if you prefer with the `.to_netcdf()` method.


## Trajectory data structure

The data zarr file are organised according to the [CF-convention for trajectories data](http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_multidimensional_array_representation_of_trajectories) implemented with the [NCEI trajectory template](https://www.ncei.noaa.gov/data/oceans/ncei/formats/netcdf/v2.0/trajectoryIncomplete.cdl). The data is stored in a **two-dimensional array** with the dimensions `traj` and `obs`. Each particle trajectory is essentially stored as a time series where the coordinate data (`lon`, `lat`, `time`) are a function of the observation (`obs`).

The output dataset used here contains **10 particles** and **13 observations**. Not every particle has 13 observations however; since we released particles at different times some particle trajectories are shorter than others.


In [None]:
np.set_printoptions(linewidth=160)
one_hour = np.timedelta64(1, "h")  # Define timedelta object to help with conversion
time_from_start = ds_particles["time"].values - fieldset.time_interval.left

print(time_from_start / one_hour)  # timedelta / timedelta -> float number of hours

Note how the first observation occurs at a different time for each trajectory. So remember that `obs != time`


## Analysis

Sometimes, trajectories are analyzed as they are stored: as individual time series. If we want to study the distance travelled as a function of time, the time we are interested in is the time relative to the start of the each particular trajectory: the array operations are simple since each trajectory is analyzed as a function of `obs`. The time variable is only needed to express the results in the correct units.


In [None]:
import matplotlib.pyplot as plt

x = ds_particles["lon"].values
y = ds_particles["lat"].values
distance = np.cumsum(
    np.sqrt(np.square(np.diff(x)) + np.square(np.diff(y))), axis=1
)  # d = (dx^2 + dy^2)^(1/2)

real_time = time_from_start / one_hour  # convert time to hours
time_since_release = (
    real_time.transpose() - real_time[:, 0]
)  # substract the initial time from each timeseries

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4), constrained_layout=True)

ax1.set_ylabel("Distance travelled [m]")
ax1.set_xlabel("observation", weight="bold")
d_plot = ax1.plot(distance.transpose())

ax2.set_ylabel("Distance travelled [m]")
ax2.set_xlabel("time since release [hours]", weight="bold")
d_plot_t = ax2.plot(time_since_release[1:], distance.transpose())
plt.show()

The two figures above show the same graph. Time is not needed to create the first figure. The time variable minus the first value of each trajectory gives the x-axis the correct units in the second figure.

We can also plot the distance travelled as a function of the absolute time easily, since the `time` variable matches up with the data for each individual trajectory.


In [None]:
plt.figure()
ax = plt.axes()
ax.set_ylabel("Distance travelled [m]")
ax.set_xlabel("time [hours]", weight="bold")
d_plot_t = ax.plot(real_time.T[1:], distance.transpose())

### Conditional selection

In other cases, the processing of the data itself however depends on the absolute time at which the observations are made, e.g. studying seasonal phenomena. In that case the array structure is not as simple: the data must be selected by their `time` value. Here we show how the mean location of the particles evolves through time. This also requires the trajectory data to be aligned in time. The data are selected using `xr.DataArray.where()` which compares the time variable to a specific time. This type of selecting data with a condition (`ds_particles['time']==time`) is a powerful tool to analyze trajectory data.


In [None]:
# Using xarray
mean_lon_x = []
mean_lat_x = []

timerange = np.arange(
    np.nanmin(ds_particles["time"].values),
    np.nanmax(ds_particles["time"].values) + np.timedelta64(timedelta(hours=2)),
    timedelta(hours=2),
)  # timerange in nanoseconds

for time in timerange:
    # if all trajectories share an observation at time
    if np.all(np.any(ds_particles["time"] == time, axis=1)):
        # find the data that share the time
        mean_lon_x += [
            np.nanmean(ds_particles["lon"].where(ds_particles["time"] == time).values)
        ]
        mean_lat_x += [
            np.nanmean(ds_particles["lat"].where(ds_particles["time"] == time).values)
        ]

In [None]:
plt.figure()
ax = plt.axes()
ax.set_ylabel("Meridional distance [m]")
ax.set_xlabel("Zonal distance [m]")
ax.grid()
ax.scatter(mean_lon_x, mean_lat_x, marker="^", s=80)
plt.show()

## Plotting

Parcels output consists of particle trajectories through time and space. An important way to explore patterns in this information is to draw the trajectories in space. The [**trajan**](https://opendrift.github.io/trajan/index.html) package can be used to quickly plot parcels results, but users are encouraged to create their own figures, for example by using the comprehensive [**matplotlib**](https://matplotlib.org/) library. Here we show a basic setup on how to process the parcels output into trajectory plots and animations.

Some other packages to help you make beautiful figures are:

- [**cartopy**](https://scitools.org.uk/cartopy/docs/latest/), a map-drawing tool especially compatible with matplotlib
- [**trajan**](https://opendrift.github.io/trajan/index.html), a package to quickly plot trajectories
- [**cmocean**](https://matplotlib.org/cmocean/), a set of ocean-relevant colormaps


To draw the trajectory data in space usually it is informative to draw points at the observed coordinates to see the resolution of the output and draw a line through them to separate the different trajectories. The coordinates to draw are in `lon` and `lat` and can be passed to either `matplotlib.pyplot.plot` or `matplotlib.pyplot.scatter`. Note however, that the default way matplotlib plots 2D arrays is to plot a separate set for each column. In the parcels 2D output, the columns correspond to the `obs` dimension, so to separate the different trajectories we need to transpose the 2D array using `.T`.


In [None]:
fig, (ax1, ax2, ax3, ax4) = plt.subplots(
    1, 4, figsize=(16, 3.5), constrained_layout=True
)

###-Points-###
ax1.set_title("Points")
ax1.scatter(ds_particles["lon"].T, ds_particles["lat"].T)
###-Lines-###
ax2.set_title("Lines")
ax2.plot(ds_particles["lon"].T, ds_particles["lat"].T)
###-Points + Lines-###
ax3.set_title("Points + Lines")
ax3.plot(ds_particles["lon"].T, ds_particles["lat"].T, marker="o")
###-Not Transposed-###
ax4.set_title("Not transposed")
ax4.plot(ds_particles["lon"], ds_particles["lat"], marker="o")

plt.show()

### Animations


Trajectory plots like the ones above can become very cluttered for large sets of particles. To better see patterns, it's a good idea to create an animation in time and space. To do this, matplotlib offers an [animation package](https://matplotlib.org/stable/api/animation_api.html). Here we show how to use the [**FuncAnimation**](https://matplotlib.org/3.3.2/api/_as_gen/matplotlib.animation.FuncAnimation.html#matplotlib.animation.FuncAnimation) class to animate parcels trajectory data, based on [this visualisation tutorial](https://github.com/Parcels-code/10year-anniversary-session5/blob/eaf7ac35f43c222280fa5577858be81dc346c06b/animations_tutorial.ipynb) from 10-years Parcels. 

To correctly reveal the patterns in time we must remember that the `obs` dimension does not necessarily correspond to the `time` variable ([see the section of Trajectory data structure above](#trajectory-data-structure)). In the animation of the particles, we usually want to draw the points at each consecutive moment in time, not necessarily at each moment since the start of the trajectory. To do this we must [select the correct data](#conditional-selection) in each rendering.


In [None]:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib
from matplotlib.animation import FuncAnimation

In [None]:
# for interactive display of animation
plt.rcParams["animation.html"] = "jshtml"

In [None]:
# Number of timesteps to animate
nframes = 25  # use less frames for testing purposes
nreducedtrails = 1  # every 10th particle will have a trail (if 1, all particles have trails. Adjust for faster performance)


# Set up the colors and associated trajectories:
# get release times for each particle (first valide obs for each trajectory)
release_times = ds_particles["time"].min(dim="obs", skipna=True).values

# get unique release times and assign colors
unique_release_times = np.unique(release_times[~np.isnat(release_times)])
n_release_times = len(unique_release_times)
print(f"Number of unique release times: {n_release_times}")

# choose a continuous colormap
colormap = matplotlib.colormaps["tab20b"]

# set up a unique color for each release time
release_time_to_color = {}
for i, release_time in enumerate(unique_release_times):
    release_time_to_color[release_time] = colormap(i / max(n_release_times - 1, 1))


# --> Store data for all timeframes (this is needed for faster performance)
print("Pre-computing all particle positions...")
all_particles_data = []
for i, target_time in enumerate(timerange):
    time_id = np.where(ds_particles["time"] == target_time)
    lons = ds_particles["lon"].values[time_id]
    lats = ds_particles["lat"].values[time_id]
    particle_indices = time_id[0]
    valid = ~np.isnan(lons) & ~np.isnan(lats)

    all_particles_data.append(
        {
            "lons": lons[valid],
            "lats": lats[valid],
            "particle_indices": particle_indices[valid],
            "valid_count": np.sum(valid),
        }
    )


# figure setup
fig, ax = plt.subplots(figsize=(6, 5), subplot_kw={"projection": ccrs.PlateCarree()})
ax.set_xlim(30, 33)
ax.set_xticks(np.arange(30, 33.5, 0.5))
ax.set_xlabel("Longitude (deg E)")
ax.set_ylim(-33, -30)
ax.set_yticks(ticks=np.arange(-33, -29.5, 0.5))
ax.set_yticklabels(np.arange(33, 29.5, -0.5).astype(str))
ax.set_ylabel("Latitude (deg S)")
ax.coastlines(color="saddlebrown")
ax.add_feature(cfeature.LAND, alpha=0.5, facecolor="saddlebrown")

# --> Use pre-computed data for initial setup
initial_data = all_particles_data[0]
initial_colors = []
for particle_idx in initial_data["particle_indices"]:
    rt = release_times[particle_idx]
    if rt in release_time_to_color:
        initial_colors.append(release_time_to_color[rt])
    else:
        initial_colors.append("blue")

# --> plot first timestep
scatter = ax.scatter(initial_data["lons"], initial_data["lats"], s=10, c=initial_colors)

# --> initialize trails
trail_plot = []

# Set initial title
t_str = str(timerange[0])[:19]  # Format datetime nicely
title = ax.set_title(f"Particles at t = {t_str}")


# loop over for animation
def animate(i):
    print(f"Animating frame {i + 1}/{len(timerange)} at time {timerange[i]}")
    t_str = str(timerange[i])[:19]
    title.set_text(f"Particles at t = {t_str}")

    # Find particles at current time
    current_data = all_particles_data[i]

    if current_data["valid_count"] > 0:
        current_colors = []
        for particle_idx in current_data["particle_indices"]:
            rt = release_times[particle_idx]
            current_colors.append(release_time_to_color[rt])

        scatter.set_offsets(np.c_[current_data["lons"], current_data["lats"]])
        scatter.set_color(current_colors)

        # --> add trails

        for trail in trail_plot:
            trail.remove()
        trail_plot.clear()

        trail_length = min(10, i)  # trails will have max length of 10 time steps

        if trail_length > 0:
            sampled_particles = current_data["particle_indices"][
                ::nreducedtrails
            ]  # use all or sample if you want faster computation

            for particle_idx in sampled_particles:
                trail_lons = []
                trail_lats = []
                for j in range(i - trail_length, i + 1):
                    past_data = all_particles_data[j]
                    if particle_idx in past_data["particle_indices"]:
                        idx = np.where(past_data["particle_indices"] == particle_idx)[
                            0
                        ][0]
                        trail_lons.append(past_data["lons"][idx])
                        trail_lats.append(past_data["lats"][idx])
                if len(trail_lons) > 1:
                    rt = release_times[particle_idx]
                    color = release_time_to_color[rt]
                    (trail,) = ax.plot(
                        trail_lons, trail_lats, color=color, linewidth=0.6, alpha=0.6
                    )
                    trail_plot.append(trail)

    else:
        scatter.set_offsets(np.empty((0, 2)))


# Create animation
anim = FuncAnimation(fig, animate, frames=nframes, interval=100)
anim