Skip to content

Commit

Permalink
Merge pull request #430 from creare-com/feature/enhance_interpolation
Browse files Browse the repository at this point in the history
Feature/enhance interpolation
  • Loading branch information
mpu-creare committed Nov 24, 2020
2 parents 887e67d + d1ac848 commit 015d0eb
Show file tree
Hide file tree
Showing 65 changed files with 4,657 additions and 3,059 deletions.
105 changes: 104 additions & 1 deletion doc/source/interpolation.md
Expand Up @@ -55,8 +55,111 @@ interpolation = [
* **Descripition**: The dimensions listed in the `'dims'` list will used the specified method. These dictionaries can also specify the same field shown in the previous section.
* **Details**: PODPAC loops through the `interpolation` list, using the settings specified for each dimension independently.

**NOTE! Specifying the interpolation as a list also control the ORDER of interpolation.**
The first item in the list will be interpolated first. In this case, `lat`/`lon` will be bilinearly interpolated BEFORE `time` is nearest-neighbor interpolated.

## Interpolators
The list of available interpolators are as follows:
* `NearestNeighbor`: A custom implementation based on `scipy.cKDtree`, which handles nearly any combination of source and destination coordinates
* `XarrayInterpolator`: A light-weight wrapper around `xarray`'s `DataArray.interp` method, which is itself a wrapper around `scipy` interpolation functions, but with a clean `xarray` interface
* `Rasterio`: A wrapper around `rasterio`'s interpolation/reprojection routines. Appropriate for grid-to-grid interpolation.
* `ScipyGrid`: An optimized implementation for `grid` sources that uses `scipy`'s `RegularGridInterpolator`, or `RectBivariateSplit` interpolator depending on the method.
* `ScipyPoint`: An implementation based on `scipy.KDtree` capable of `nearest` interpolation for `point` sources
* `NearestPreview`: An approximate nearest-neighbor interpolator useful for rapidly viewing large files

The default order for these interpolators can be found in `podpac.data.INTERPOLATORS`.

### NearestNeighbor
Since this is the most general of the interpolators, this section deals with the available parameters and settings for the `NearestNeighbor` interpolator.

#### Parameters
The following parameters can be set by specifying the interpolation as a dictionary or a list, as described above.

* `respect_bounds` : `bool`
* Default is `True`. If `True`, any requested dimension OUTSIDE of the bounds will be interpolated as `nan`.
Otherwise, any point outside the bounds will have the value of the nearest neighboring point
* `remove_nan` : `bool`
* Default is `False`. If `True`, `nan`'s in the source dataset will NOT be interpolated. This can be used if a value for the function
is needed at every point of the request. It is not helpful when computing statistics, where `nan` values will be explicitly
ignored. In that case, if `remove_nan` is `True`, `nan` values will take on the values of neighbors, skewing the statistical result.
* `*_tolerance` : `float`, where `*` in ["spatial", "time", "alt"]
* Default is `inf`. Maximum distance to the nearest coordinate to be interpolated.
Corresponds to the unit of the `*` dimension.
* `*_scale` : `float`, where `*` in ["spatial", "time", "alt"]
* Default is `1`. This only applies when the source has stacked dimensions with different units. The `*_scale`
defines the factor that the coordinates will be scaled by (coordinates are divided by `*_scale`) to output
a valid distance for the combined set of dimensions.
For example, when "lat, lon, and alt" dimensions are stacked, ["lat", "lon"] are in degrees
and "alt" is in feet, the `*_scale` parameters should be set so that
`|| [dlat / spatial_scale, dlon / spatial_scale, dalt / alt_scale] ||` results in a reasonable distance.
* `use_selector` : `bool`
* Default is `True`. If `True`, a subset of the coordinates will be selected BEFORE the data of a dataset is retrieved. This
reduces the number of data retrievals needed for large datasets. In cases where `remove_nan` = `True`, the selector may select
only `nan` points, in which case the interpolation fails to produce non-`nan` data. This usually happens when requesting a single
point from a dataset that contains `nan`s. As such, in these cases set `use_selector` = `False` to get a non-`nan` value.

#### Advanced NearestNeighbor Interpolation Examples
* Only interpolate points that are within `1` of the source data lat/lon locations
```python
interpolation={"method": "nearest", "params": {"spatial_tolerance": 1}},
```
* When interpolating with mixed time/space, use `1` day as equivalent to `1` degree for determining the distance
```python
interpolation={
"method": "nearest",
"params": {
"spatial_scale": 1,
"time_scale": "1,D",
"alt_scale": 10,
}
}
```
* Remove nan values in the source datasource -- in some cases a `nan` may still be interpolated
```python
interpolation={
"method": "nearest",
"params": {
"remove_nan": True,
}
}
```
* Remove nan values in the source datasource in all cases, even for single point requests located directly at `nan`-values in the source.
```python
interpolation={
"method": "nearest",
"params": {
"remove_nan": True,
"use_selector": False,
}
}
```
* Do nearest-neighbor extrapolation outside of the bounds of the source dataset
```python
interpolation={
"method": "nearest",
"params": {
"respect_bounds": False,
}
}
```
* Do nearest-neighbor interpolation of time with `nan` removal followed by spatial interpolation
```python
interpolation = [
{
"method": "nearest",
"params": {
"remove_nan": True,
},
"dims": ["time"]
},
{
"method": "nearest",
"dims": ["lat", "lon", "alt"]
},
]
```
## Notes and Caveats
While the API is well developed, all conceivable functionality is not. For example, while we can interpolate gridded data to point data, point data to grid data interpolation is not as well supported, and there may be errors or unexpected results. Advanced users can develop their own interpolators, but this is not currently well-documented.

**Gotcha**: Parameters for a specific interpolator may silently be ignored if a different interpolator is automatically selected.
**Gotcha**: Parameters for a specific interpolator may be ignored if a different interpolator is automatically selected. These ignored parameters are logged as warnings.

1 change: 1 addition & 0 deletions podpac/compositor.py
Expand Up @@ -6,3 +6,4 @@

from podpac.core.compositor.ordered_compositor import OrderedCompositor
from podpac.core.compositor.tile_compositor import UniformTileCompositor, UniformTileMixin, TileCompositor
from podpac.core.compositor.data_compositor import DataCompositor
2 changes: 1 addition & 1 deletion podpac/coordinates.py
Expand Up @@ -8,6 +8,6 @@
from podpac.core.coordinates import Coordinates
from podpac.core.coordinates import crange, clinspace
from podpac.core.coordinates import Coordinates1d, ArrayCoordinates1d, UniformCoordinates1d
from podpac.core.coordinates import StackedCoordinates, DependentCoordinates, RotatedCoordinates
from podpac.core.coordinates import StackedCoordinates, RotatedCoordinates
from podpac.core.coordinates import merge_dims, concat, union
from podpac.core.coordinates import GroupCoordinates
4 changes: 2 additions & 2 deletions podpac/core/algorithm/coord_select.py
Expand Up @@ -85,9 +85,9 @@ def _eval(self, coordinates, output=None, _selector=None):
dims = outputs["source"].dims
coords = self._requested_coordinates
extra_dims = [d for d in coords.dims if d not in dims]
coords = coords.drop(extra_dims).coords
coords = coords.drop(extra_dims)

outputs["source"] = outputs["source"].assign_coords(**coords)
outputs["source"] = outputs["source"].assign_coords(**coords.xcoords)

if output is None:
output = outputs["source"]
Expand Down
4 changes: 1 addition & 3 deletions podpac/core/algorithm/test/test_algorithm.py
Expand Up @@ -157,9 +157,7 @@ def algorithm(self, inputs):
class DataArrayAlgorithm(Algorithm):
def algorithm(self, inputs):
data = np.ones(self._requested_coordinates.shape)
return xr.DataArray(
data, dims=self._requested_coordinates.dims, coords=self._requested_coordinates.coords
)
return self.create_output_array(self._requested_coordinates, data=data)

node = DataArrayAlgorithm()
result = node.eval(coords)
Expand Down
70 changes: 36 additions & 34 deletions podpac/core/algorithm/test/test_coord_select.py
Expand Up @@ -10,10 +10,12 @@
from podpac.core.algorithm.utility import Arange
from podpac.core.algorithm.coord_select import ExpandCoordinates, SelectCoordinates, YearSubstituteCoordinates

# TODO move to test setup
coords = podpac.Coordinates(
["2017-09-01", podpac.clinspace(45, 66, 4), podpac.clinspace(-80, -70, 5)], dims=["time", "lat", "lon"]
)

def setup_module(module):
global COORDS
COORDS = podpac.Coordinates(
["2017-09-01", podpac.clinspace(45, 66, 4), podpac.clinspace(-80, -70, 5)], dims=["time", "lat", "lon"]
)


class MyDataSource(DataSource):
Expand All @@ -35,80 +37,80 @@ def get_data(self, coordinates, slc):
class TestExpandCoordinates(object):
def test_no_expansion(self):
node = ExpandCoordinates(source=Arange())
o = node.eval(coords)
o = node.eval(COORDS)

def test_time_expansion(self):
node = ExpandCoordinates(source=Arange(), time=("-5,D", "0,D", "1,D"))
o = node.eval(coords)
o = node.eval(COORDS)

def test_spatial_expansion(self):
node = ExpandCoordinates(source=Arange(), lat=(-1, 1, 0.1))
o = node.eval(coords)
o = node.eval(COORDS)

def test_time_expansion_implicit_coordinates(self):
node = ExpandCoordinates(source=MyDataSource(), time=("-15,D", "0,D"))
o = node.eval(coords)
o = node.eval(COORDS)

node = ExpandCoordinates(source=MyDataSource(), time=("-15,Y", "0,D", "1,Y"))
o = node.eval(coords)
o = node.eval(COORDS)

node = ExpandCoordinates(source=MyDataSource(), time=("-5,M", "0,D", "1,M"))
o = node.eval(coords)
o = node.eval(COORDS)

# Behaviour a little strange on these?
node = ExpandCoordinates(source=MyDataSource(), time=("-15,Y", "0,D", "4,Y"))
o = node.eval(coords)
o = node.eval(COORDS)

node = ExpandCoordinates(source=MyDataSource(), time=("-15,Y", "0,D", "13,M"))
o = node.eval(coords)
o = node.eval(COORDS)

node = ExpandCoordinates(source=MyDataSource(), time=("-144,M", "0,D", "13,M"))
o = node.eval(coords)
o = node.eval(COORDS)

def test_spatial_expansion_ultiple_outputs(self):
multi = Array(source=np.random.random(coords.shape + (2,)), coordinates=coords, outputs=["a", "b"])
multi = Array(source=np.random.random(COORDS.shape + (2,)), coordinates=COORDS, outputs=["a", "b"])
node = ExpandCoordinates(source=multi, lat=(-1, 1, 0.1))
o = node.eval(coords)
o = node.eval(COORDS)


class TestSelectCoordinates(object):
def test_no_expansion(self):
node = SelectCoordinates(source=Arange())
o = node.eval(coords)
o = node.eval(COORDS)

def test_time_selection(self):
node = SelectCoordinates(source=Arange(), time=("2017-08-01", "2017-09-30", "1,D"))
o = node.eval(coords)
o = node.eval(COORDS)

def test_spatial_selection(self):
node = SelectCoordinates(source=Arange(), lat=(46, 56, 1))
o = node.eval(coords)
o = node.eval(COORDS)

def test_time_selection_implicit_coordinates(self):
node = SelectCoordinates(source=MyDataSource(), time=("2011-01-01", "2011-02-01"))
o = node.eval(coords)
o = node.eval(COORDS)

node = SelectCoordinates(source=MyDataSource(), time=("2011-01-01", "2017-01-01", "1,Y"))
o = node.eval(coords)
o = node.eval(COORDS)

def test_spatial_selection_multiple_outputs(self):
multi = Array(source=np.random.random(coords.shape + (2,)), coordinates=coords, outputs=["a", "b"])
multi = Array(source=np.random.random(COORDS.shape + (2,)), coordinates=COORDS, outputs=["a", "b"])
node = SelectCoordinates(source=multi, lat=(46, 56, 1))
o = node.eval(coords)
o = node.eval(COORDS)


class TestYearSubstituteCoordinates(object):
def test_year_substitution(self):
node = YearSubstituteCoordinates(source=Arange(), year="2018")
o = node.eval(coords)
o = node.eval(COORDS)
assert o.time.dt.year.data[0] == 2018
assert o["time"].data != xr.DataArray(coords.coords["time"]).data
assert not np.array_equal(o["time"], COORDS["time"].coordinates)

def test_year_substitution_orig_coords(self):
node = YearSubstituteCoordinates(source=Arange(), year="2018", substitute_eval_coords=True)
o = node.eval(coords)
assert o.time.dt.year.data[0] == xr.DataArray(coords.coords["time"]).dt.year.data[0]
assert o["time"].data == xr.DataArray(coords.coords["time"]).data
o = node.eval(COORDS)
assert o.time.dt.year.data[0] == xr.DataArray(COORDS["time"].coordinates).dt.year.data[0]
np.testing.assert_array_equal(o["time"], COORDS["time"].coordinates)

def test_year_substitution_missing_coords(self):
source = Array(
Expand All @@ -118,9 +120,9 @@ def test_year_substitution_missing_coords(self):
),
)
node = YearSubstituteCoordinates(source=source, year="2018")
o = node.eval(coords)
o = node.eval(COORDS)
assert o.time.dt.year.data[0] == 2018
assert o["time"].data != xr.DataArray(coords.coords["time"]).data
assert o["time"].data != xr.DataArray(COORDS["time"].coordinates).data

def test_year_substitution_missing_coords_orig_coords(self):
source = Array(
Expand All @@ -130,13 +132,13 @@ def test_year_substitution_missing_coords_orig_coords(self):
),
)
node = YearSubstituteCoordinates(source=source, year="2018", substitute_eval_coords=True)
o = node.eval(coords)
o = node.eval(COORDS)
assert o.time.dt.year.data[0] == 2017
assert o["time"].data == xr.DataArray(coords.coords["time"]).data
np.testing.assert_array_equal(o["time"], COORDS["time"].coordinates)

def test_year_substitution_multiple_outputs(self):
multi = Array(source=np.random.random(coords.shape + (2,)), coordinates=coords, outputs=["a", "b"])
multi = Array(source=np.random.random(COORDS.shape + (2,)), coordinates=COORDS, outputs=["a", "b"])
node = YearSubstituteCoordinates(source=multi, year="2018")
o = node.eval(coords)
o = node.eval(COORDS)
assert o.time.dt.year.data[0] == 2018
assert o["time"].data != xr.DataArray(coords.coords["time"]).data
assert not np.array_equal(o["time"], COORDS["time"].coordinates)
4 changes: 2 additions & 2 deletions podpac/core/algorithm/test/test_utility.py
Expand Up @@ -20,10 +20,10 @@ def test_CoordData(self):
coords = podpac.Coordinates([[0, 1, 2], [0, 1, 2, 3, 4]], dims=["lat", "lon"])

node = CoordData(coord_name="lat")
np.testing.assert_array_equal(node.eval(coords), coords.coords["lat"])
np.testing.assert_array_equal(node.eval(coords), coords["lat"].coordinates)

node = CoordData(coord_name="lon")
np.testing.assert_array_equal(node.eval(coords), coords.coords["lon"])
np.testing.assert_array_equal(node.eval(coords), coords["lon"].coordinates)

def test_invalid_dimension(self):
coords = podpac.Coordinates([[0, 1, 2], [0, 1, 2, 3, 4]], dims=["lat", "lon"])
Expand Down
4 changes: 2 additions & 2 deletions podpac/core/compositor/compositor.py
Expand Up @@ -140,10 +140,10 @@ def select_sources(self, coordinates):
sources = self.sources
else:
try:
_, I = self.source_coordinates.intersect(coordinates, outer=True, return_indices=True)
_, I = self.source_coordinates.intersect(coordinates, outer=True, return_index=True)
except:
# Likely non-monotonic coordinates
_, I = self.source_coordinates.intersect(coordinates, outer=False, return_indices=True)
_, I = self.source_coordinates.intersect(coordinates, outer=False, return_index=True)
i = I[0]
sources = np.array(self.sources)[i].tolist()

Expand Down
9 changes: 8 additions & 1 deletion podpac/core/compositor/data_compositor.py
Expand Up @@ -7,6 +7,8 @@
from podpac.core.utils import common_doc
from podpac.core.compositor.compositor import COMMON_COMPOSITOR_DOC, BaseCompositor
from podpac.core.units import UnitsDataArray
from podpac.core.interpolation.interpolation import InterpolationMixin
from podpac.core.coordinates import Coordinates


@common_doc(COMMON_COMPOSITOR_DOC)
Expand Down Expand Up @@ -46,8 +48,13 @@ def composite(self, coordinates, data_arrays, result=None):
for arr in data_arrays:
res = res.combine_first(arr)
res = UnitsDataArray(res)

coords = Coordinates.from_xarray(res.coords)
res.attrs["bounds"] = coords.bounds
if result is not None:
result.data[:] = res.transponse(*result.dims).data
return result
return res


class InterpDataCompositor(InterpolationMixin, DataCompositor):
pass
5 changes: 1 addition & 4 deletions podpac/core/compositor/test/test_base_compositor.py
Expand Up @@ -78,10 +78,7 @@ def test_source_coordinates(self):
)

def test_select_sources_default(self):
node = BaseCompositor(
sources=[DataSource(), DataSource(interpolation="nearest_preview"), podpac.algorithm.Arange()],
interpolation="bilinear",
)
node = BaseCompositor(sources=[DataSource(), DataSource(), podpac.algorithm.Arange()], interpolation="bilinear")
sources = node.select_sources(podpac.Coordinates([[0, 10]], ["time"]))

assert isinstance(sources, list)
Expand Down
13 changes: 13 additions & 0 deletions podpac/core/compositor/test/test_data_compositor.py
@@ -0,0 +1,13 @@
import pytest
import numpy as np

import podpac
from podpac.core.data.datasource import DataSource
from podpac.core.data.array_source import Array, ArrayBase
from podpac.core.compositor.data_compositor import DataCompositor


class TestDataCompositor(object):
def test_basic_composition(self):
# TODO
pass

0 comments on commit 015d0eb

Please sign in to comment.