# 1.03 1000 m Memory Time Origin 

---

Author : Riley X. Brady

Date : 11/18/2020

This computes the statistical origin of DIC for the particle's 1000 m crossing point, based on the $e$-folding memory time. It appends on the x, y, and z location at the origin as well as a host of tracers and the length of the memory time for the given particle.

In [1]:
%load_ext lab_black
%load_ext autoreload
%autoreload 2
import figutils
import numpy as np
import xarray as xr

from dask.distributed import Client

In [2]:
print(f"numpy: {np.__version__}")
print(f"xarray: {xr.__version__}")

numpy: 1.19.4
xarray: 0.16.1


In [3]:
# This is my TCP client from the `launch_cluster` notebook. I use it
# for distributed computing with `dask` on NCAR's machine, Casper.
client = Client("tcp://...")

Load in all particles that were pre-filtered to those that cross 1000 m within the ACC.

**Note**: I loaded in the netCDF file, and chunked it, and then saved it back out as a `zarr` file. This makes `dask` run a lot more efficiently. E.g.,

```python
ds = xr.open_dataset('../data/southern_ocean_deep_upwelling_particles.nc')
ds = ds.chunk({'time': -1, 'nParticles': 'auto'})
ds.to_zarr('../data/southern_ocean_deep_upwelling_particles.zarr', consolidated=True)
```

You could probably chunk the particles into slightly smaller chunks for even faster performance.

In [4]:
# Load in the `zarr` file, which is pre-chunked and already has been
# filtered from the original 1,000,000 particles to the 19,002 that
# upwell last across 1000 m S of 45S and outside of the annual sea ice
# edge.
filepath = "../data/southern_ocean_deep_upwelling_particles.zarr/"
ds = xr.open_zarr(filepath, consolidated=True)

In [5]:
# Load in 1000 m crossing locations
crossings = xr.open_dataset("../data/postproc/1000m.crossing.locations.nc")

Define functions to compute the memory time.

In [6]:
def autocorr_by_hand(x, lag):
    """Computes the autocorrelation coefficient.

    See:
    https://stackoverflow.com/questions/36038927/
    whats-the-difference-between-pandas-acf-and-statsmodel-acf

    x : numpy time series (here it'll be particleDIC)
    lag : int of the lag for which to autocorrelate.
    """
    # Slice the relevant subseries based on the lag
    y1 = x[: (len(x) - lag)]
    y2 = x[lag:]
    # Subtract the subseries means
    sum_product = np.sum((y1 - np.mean(y1)) * (y2 - np.mean(y2)))
    # Normalize with the subseries stds
    return sum_product / ((len(x) - lag) * np.std(y1) * np.std(y2))


def compute_idx_of_last_1000m_crossing(z):
    """Find index of final time particle upwells across 1000 m.

    z : zLevelParticle (m)
    """
    currentDepth = z
    previousDepth = np.roll(z, 1)
    previousDepth[0] = 999  # So we're not dealing with a nan here.
    cond = (currentDepth >= -1000) & (previousDepth < -1000)
    idx = (
        len(cond) - np.flip(cond).argmax() - 1
    )  # Finds last location that condition is true.
    return idx


def deep_tracers_origin(x, y, z, dic, T, S, alk, po4, sio3):
    """Finds x, y, z statistical origin and content of tracer before its
    last 1000 m crossing.

    Memory time is decided by the e-folding time.

        x (ndarray): lonParticle
        y (ndarray): latParticle
        z (ndarray): zLevelDepth
        tracer (ndarray): e.g., particleDIC
    """
    # Find last crossing to 1000m and trim to time series leading up
    # to this point.
    idx = compute_idx_of_last_1000m_crossing(z)
    x_subset = x[0:idx]
    y_subset = y[0:idx]
    z_subset = z[0:idx]
    dic_subset = dic[0:idx]
    T_subset = T[0:idx]
    S_subset = S[0:idx]
    alk_subset = alk[0:idx]
    po4_subset = po4[0:idx]
    sio3_subset = sio3[0:idx]

    # Compute autocorrelation function based on DIC for all lags.
    auto = np.asarray([autocorr_by_hand(dic_subset, i) for i in range(len(dic_subset))])
    # Find the time step at which the autocorrelation drops below 1/e.
    e_folding = (auto <= (1 / np.e)).argmax()

    # Go backward this many steps.
    subset_idx = len(z_subset) - e_folding - 1

    # If there is no zero crossing, argmax will return exactly zero. Which can't be the
    # case ever since at zero, ACF == 1 by definition.
    if e_folding == 0:
        x_origin = np.nan
        y_origin = np.nan
        z_origin = np.nan
        dic_origin = np.nan
        T_origin = np.nan
        S_origin = np.nan
        alk_origin = np.nan
        po4_origin = np.nan
        sio3_origin = np.nan
        memory_time = np.nan
    else:
        x_origin = x_subset[subset_idx]
        y_origin = y_subset[subset_idx]
        z_origin = z_subset[subset_idx]
        dic_origin = dic_subset[subset_idx]
        T_origin = T_subset[subset_idx]
        S_origin = S_subset[subset_idx]
        alk_origin = alk_subset[subset_idx]
        po4_origin = po4_subset[subset_idx]
        sio3_origin = sio3_subset[subset_idx]
        memory_time = e_folding * 2  # approximate memory time in days.
    return np.array(
        [
            x_origin,
            y_origin,
            z_origin,
            dic_origin,
            T_origin,
            S_origin,
            alk_origin,
            po4_origin,
            sio3_origin,
            memory_time,
        ]
    )

Derive memory time origin for each region, and then the remaining particles that don't upwell in a topographic region.

In [7]:
def compute_locations_and_save_dataset(ensemble, region):
    """Computes the xy memory time origin for a particle ensemble
    and saves it out as a netCDF.

    ensemble : xarray object with the particle trajectories for the
               given ensemble.
    region : str of the region ['drake', 'kerguelan', ...]
    """
    result = xr.apply_ufunc(
        deep_tracers_origin,
        ensemble.lonParticle,
        ensemble.latParticle,
        ensemble.zLevelParticle,
        ensemble.particleDIC,
        ensemble.particleTemperature,
        ensemble.particleSalinity,
        ensemble.particleALK,
        ensemble.particlePO4,
        ensemble.particleSiO3,
        input_core_dims=[
            ["time"],
            ["time"],
            ["time"],
            ["time"],
            ["time"],
            ["time"],
            ["time"],
            ["time"],
            ["time"],
        ],
        output_core_dims=[["coordinate"]],
        dask_gufunc_kwargs={"output_sizes": {"coordinate": 10}},
        output_dtypes=[float],
        vectorize=True,
        dask="parallelized",
    )
    del result["coordinate"]

    origin = xr.Dataset()
    origin = xr.Dataset()
    origin["x"] = np.rad2deg(result.isel(coordinate=0))
    origin["y"] = np.rad2deg(result.isel(coordinate=1))
    origin["z"] = result.isel(coordinate=2)
    origin["DIC"] = result.isel(coordinate=3)
    origin["T"] = result.isel(coordinate=4)
    origin["S"] = result.isel(coordinate=5)
    origin["ALK"] = result.isel(coordinate=6)
    origin["PO4"] = result.isel(coordinate=7)
    origin["SiO3"] = result.isel(coordinate=8)
    origin["memory_time"] = result.isel(coordinate=9).astype(int)
    origin.attrs[
        "description"
    ] = f"tracers origin and memory time based on DIC for the {region} ensemble."
    origin.attrs[
        "ensemble"
    ] = f"particles that last upwelled into 1000 m in the {region} region. Filtered to occur S of 45S and out of the 75% sea ice zone. ultimately make it to 200m."
    print("Computing and saving to netCDF...")
    origin.to_netcdf(f"../data/postproc/{region}.1000m.tracer.origin.nc")

## Compute Statistical Source of DIC for Each Region

Now we subset our particles into their ensembles, based on their crossing location occurring in a given geographic box. Then we can use the defined functions above to create a dataset that records the (x, y) origin of the DIC that upwells across 1000 m.

In [8]:
xCross, yCross = crossings["lon_crossing"], crossings["lat_crossing"]

In [9]:
region = "drake"

x0, x1, y0, y1 = figutils.BOUNDS[region]

# Want on a 0-360 longitude scale.
x0 += 360
x1 += 360

drake_conditions = (xCross > x0) & (xCross < x1) & (yCross > y0) & (yCross < y1)
drake_ensemble = ds.where(drake_conditions, drop=True)
drake_ensemble = drake_ensemble.chunk({"time": -1, "nParticles": 100}).persist()

%time compute_locations_and_save_dataset(drake_ensemble, region)

Computing and saving to netCDF...
CPU times: user 713 ms, sys: 46.4 ms, total: 759 ms
Wall time: 1min 17s


In [10]:
region = "crozet"

x0, x1, y0, y1 = figutils.BOUNDS[region]

crozet_conditions = (xCross > x0) & (xCross < x1) & (yCross > y0) & (yCross < y1)
crozet_ensemble = ds.where(crozet_conditions, drop=True)
crozet_ensemble = crozet_ensemble.chunk({"time": -1, "nParticles": 100}).persist()
%time compute_locations_and_save_dataset(crozet_ensemble, region)

Computing and saving to netCDF...
CPU times: user 272 ms, sys: 19.1 ms, total: 291 ms
Wall time: 48.7 s


In [11]:
region = "kerguelan"

x0, x1, y0, y1 = figutils.BOUNDS[region]

kerguelan_conditions = (xCross > x0) & (xCross < x1) & (yCross > y0) & (yCross < y1)
kerguelan_ensemble = ds.where(kerguelan_conditions, drop=True)
kerguelan_ensemble = kerguelan_ensemble.chunk({"time": -1, "nParticles": 100}).persist()
%time compute_locations_and_save_dataset(kerguelan_ensemble, region)

Computing and saving to netCDF...
CPU times: user 355 ms, sys: 8.94 ms, total: 364 ms
Wall time: 53.4 s


In [12]:
region = "campbell"

x0, x1, y0, y1 = figutils.BOUNDS[region]

campbell_conditions = (xCross > x0) & (xCross < x1) & (yCross > y0) & (yCross < y1)
campbell_ensemble = ds.where(campbell_conditions, drop=True)
campbell_ensemble = campbell_ensemble.chunk({"time": -1, "nParticles": 100}).persist()
%time compute_locations_and_save_dataset(campbell_ensemble, region)

Computing and saving to netCDF...
CPU times: user 439 ms, sys: 17.5 ms, total: 457 ms
Wall time: 49.5 s


In [13]:
region = "non_topographic"
# Simply everywhere that is not the other conditions.
non_topo_conditions = ~(
    drake_conditions + crozet_conditions + kerguelan_conditions + campbell_conditions
)
non_topo_ensemble = ds.where(non_topo_conditions, drop=True)
non_topo_ensemble = non_topo_ensemble.chunk({"time": -1, "nParticles": 500}).persist()
%time compute_locations_and_save_dataset(non_topo_ensemble, region)

Computing and saving to netCDF...
CPU times: user 323 ms, sys: 18.3 ms, total: 341 ms
Wall time: 2min 39s
