### Example 2 - Animation of Argo BGC fleet over time (Poseidon version)

This example shows how to read and manipulate Argo data stored in parquet format. The data are stored across multiple files: we will load into memory only what we need by applying some filters, and we will create an animation showing the growing number of Argo BGC floats.

##### Note on parquet files
There are several ways to load parquet files in a dataframe in Python, and a few are illustrated in Examples 1 and 2. This notebook uses dask as it is more efficient and optimized to work with larger than memory data.

#### Getting started

If you haven't already, install the required packages by running `pip install .` at the root of the repository.

We also need the dataset! In this example we use the Argo BGC dataset: you can uncomment and run the cell below, or copy-paste the command (without the leading `!`) in your command line.

In [None]:
# !download_db -d Argo -t BGC --noqc

We then import the necessary modules and set up the path to the dataset (update the `parquet_dir` variable below if you have specified a different location in the previous cell or have moved the dataset).

In [None]:
# dask
import dask
import dask.dataframe as dd

# modules for visualizations
import cartopy
import matplotlib.animation
import matplotlib.pyplot as plt
from matplotlib import colormaps
import datetime
import numpy as np
import pandas as pd
import pyarrow.parquet as pq

from pprint import pprint

# Path to Argo BGC
parquet_dir = './CrocoLake/1010_BGC_ARGO-CLOUD-DEV/'
# Setting up parquet schema
schema = pq.read_schema(parquet_dir+"_common_metadata")

As we want to visualize the growth of Argo's BGC fleet, all we need to make the animation are the float IDs and geographical and time coordinates.

We set this through the `columns` variable in `read_parquet()`, while we set a filter to discard `NaN` values. Note that the pyarrow's syntax cannot explicitly remove non-valid numbers, so we work around this by allowing a number to be inside a very very large range.

In [None]:
%%time
cols = ["PLATFORM_NUMBER","LATITUDE","LONGITUDE","JULD"]
# filter_coords_time_doxy = [
#     ("JULD",">",reference_time),
#     ("LATITUDE",">",lat0), ("LATITUDE","<",lat1),
#     ("LONGITUDE",">",lon0), ("LONGITUDE","<",lon1),
#     ("DOXY_ADJUSTED",">=",-1e30),("DOXY_ADJUSTED","<=",+1e30)
# ]

ddf = dd.read_parquet(
    parquet_dir,
    engine="pyarrow",
    schema=schema,
    columns=cols
)

It took around 5 seconds to load all the data.

We can now make an animation that shows the evolution of the dissolved oxygen measurements over time since the first measurement available, displaying the growth of the Argo BCG fleet.

In [None]:
plt.rcParams["animation.html"] = "jshtml"
plt.rcParams['figure.dpi'] = 150  
plt.ioff()

# Group by platform number and time, and aggregate by averaging over lat/lon coordinates.
# The result is a dataframe ordered by platform number first and time second. For each entry we have the average coordinates.
df_grouped = ddf.groupby(['PLATFORM_NUMBER', 'JULD']).agg({
    'LATITUDE': 'mean',
    'LONGITUDE': 'mean'
}).reset_index().compute()

# Start and end of animation
start_t = df_grouped['JULD'].min()#.compute()
end_t = df_grouped['JULD'].max()#.compute()

# Setting number of frames for animation
nb_frames = 40

# Setting values of each time to plot
datetime_values = pd.date_range(start=start_t, end=end_t, periods=nb_frames+1).values

# Setting up plot
proj = cartopy.crs.PlateCarree()
fig = plt.figure(figsize=(8, 6), )
ax = fig.add_subplot(projection=proj, frameon=False)

# Setting up colormap
# cbar_min = df[ref_var].quantile(q=0.01).compute()
# cbar_max = df[ref_var].quantile(q=0.99)
sc = ax.scatter([], [], c="yellow", transform=cartopy.crs.PlateCarree())
# cbar = plt.colorbar(sc, ax=ax, fraction=0.02, pad=0.04)
# cbar.set_label('micromole/kg')

# note that update() is passed the current frame number by FuncAnimation()
def update(frame):
    plt.cla()
    ax.add_feature(cartopy.feature.OCEAN)
    ax.add_feature(cartopy.feature.LAND, edgecolor='black')
    ax.add_feature(cartopy.feature.LAKES, edgecolor='black')
    t_i = datetime_values[frame]
    t_ip1 = datetime_values[frame+1]
    timestep_df = df_grouped[(df_grouped['JULD'] >= t_i) & (df_grouped['JULD'] <= t_ip1)]
    timestep_df = timestep_df.groupby(['PLATFORM_NUMBER']).agg({
                                 'LATITUDE': 'mean',
                                 'LONGITUDE': 'mean'
                                 }).reset_index()#.compute()
    ax.scatter(
        timestep_df['LONGITUDE'], timestep_df['LATITUDE'],
        c="yellow",
        s=5,
        transform=cartopy.crs.PlateCarree()
    )
    plt.xlim(-180,180)
    plt.ylim(-90,90)
    plt.title("Argo BGC fleet on " + np.datetime_as_string(t_ip1, unit='D'))
    
matplotlib.animation.FuncAnimation(fig, update, frames=nb_frames)