# missing data interpolation

statistics is the answer to everything

Use this notebook to gapfill a saved netcdf file.

### potential shenanigans

"Several techniques have been used to fill the gaps in either the UWLS or OI derived total vector maps.

These are implemented using covariance derived from normal mode analysis (Lipphardt et al. 2000), open-boundary modal analysis (OMA) (Kaplan and Lekien 2007), and empirical orthogonal function (EOF) analysis (Beckers and Rixen 2003; Alvera-Azcárate et al. 2005); and using idealized or smoothed observed covariance (Davis 1985)."

- normal mode analysis
- open-boundary modal analysis (OMA)
- empirical orthogonal function analysis (EOF)
- use idealized/smoothed observed covariance

---

### other ideas

DINEOF (could only find an implementation in R)

In [None]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

In [None]:
from pathlib import Path
import numpy as np

import pyplume.utils as utils
from pyplume.dataloaders import dataset_to_fieldset, SurfaceGrid, DataLoader
from pyplume.constants import *
from pyplume.gapfilling import InterpolationStep, SmoothnStep, Gapfiller

### target and interp_references

#### Change these variables

`target` is the data you are interpolating.

`interp_references` is a list of reference data to interpolate from. A few specifications:
- should be ordered from most accurate data to least accurate (highest to lowest resolution)
- time domain should be identical or bigger than the one of the target
- lat and lon domain should be bigger than the target's to prevent any out-of-bounds complications

`mask_nc` must have the exact same lat and lon dimensions of the target

In [None]:
target_path = "data/field_netcdfs/tj_plume_1km_2022-09.nc"
target = DataLoader(target_path).dataset

In [None]:
gapfiller = Gapfiller()
# ADD GAPFILLING STEPS HERE
gapfiller.add_steps(
    InterpolationStep([
        "data/field_netcdfs/tj_plume_2km_2022-09.nc",
        "data/field_netcdfs/tj_plume_6km_2022-09.nc",
    ]),
    # SmoothnStep(mask="data/field_netcdfs/tj_plume_1km_2022-09_mask.npy")
)

### formatting and saving

In [None]:
target_interped_ds = gapfiller.execute(target)

In [None]:
save_path = str(target_path).split(".nc")[0] + "_interped.nc"
target_interped_ds.to_netcdf(save_path)
print(f"saved to {save_path}")

### display field to see if interpolation worked

In [None]:
fs = dataset_to_fieldset(target)
fs_interp = dataset_to_fieldset(target_interped_ds)
fs.U.show()  # uninterpolated
fs_interp.U.show()  # interpolated, gapfilled, smoothed