Skip to content

Commit

Permalink
New API for Canvas.raster (#556)
Browse files Browse the repository at this point in the history
As mentioned in holoviz/holoviews#1909, the ``Canvas.raster()`` API has not been matching the rest of datashader, due to that code originating in an external project (gridtools). These differences made it difficult for external tools like HoloViews to provide a consistent interface for datashading across object types. Fully rewriting this code would be a lot of work, but this PR rewrites the top-level API to be more similar to other Canvas glyph types.  The changes should be nearly fully backwards compatible for now, but the previous way of doing it will now be deprecated and removed in a later release.

API Changes:

- Renamed ``Canvas.raster(downsample_method=X)`` to ``Canvas.raster(agg=X)``: What gridtools calls "downsampling" is precisely the same concept as what datashader calls aggregation everywhere else.  "reduction" is perhaps even more accurate, but arguments like that are called ``agg`` elsewhere, so I've adopted that convention here as well.  ``downsample_method`` is still accepted as an alias for now, if ``agg`` is not present.
- Renamed ``Canvas.raster(upsample_method=X)`` to ``Canvas.raster(interpolate=X)``: "interpolate" seems like a better complement to "agg" than "upsample_method".  ``upsample_method`` is still accepted as an alias for now, if ``interpolate`` is not present.
- The ``agg`` argument of other calls accepts any object of type ``.reductions.Reduction``, but ``raster(downsample_method=...)`` accepted only string arguments.  It now accepts ``Reduction`` objects like ``rd.mean()``, extracting the "column" name, if any, and comparing it to the DataArray's name, if any (signaling an error if a "column" name is specified but doesn't match).  Of course, it's not really a "column", but it's the same idea.  The column name need not be provided to the agg, but if it is, it must match any name declared for the DataArray.  String names are still accepted, for backwards compatibility, but will eventually be removed.
- Reduction has been changed to allow instantiation without any argument, for use with unnamed DataArrays.  This change may make some messages for user errors more confusing, but additional checks have been added to alleviate that.
- Stub reduction functions have been added for three aggregations supported by ``Canvas.raster`` but not previously available ``.reductions``: ``mode``, ``first``, and ``last``.  All three are designed for use with categorical data where numerical averaging is not appropriate and an actual existing value must be returned.  For now, these work only with raster, but at least ``first`` and ``last`` should be able to be implemented for other glyph types easily.  (``mode`` is more complicated because it would require unbounded buffers per pixel to hold all distinct values encountered).
- ``Canvas.raster`` now accepts xarray Datasets (collections of aligned DataArrays), with the column argument to each reduction selecting the appropriate DataArray from the Dataset.
- Made the interpolation support in ``Canvas.trimesh`` match that of ``Canvas.raster``, using a string argument ``interpolate`` instead of a Boolean ``interp``.  The Boolean is still accepted for now, but will be deleted before release once HoloViews and GeoViews master have been updated.
- The ``layer`` argument of Canvas.raster was previously confusingly a *1-based* integer index, but it is now an xarray coordinate.  Xarray coordinates support 0-based, 1-based, or arbitrary floating-point indexing depending on how the DataArray was declared, and so the behavior should be the same for arrays explicitly declared with 1-based indexing (such as multi-band Landsat images indexed with integers), but in other cases the proper coordinate will now need to be supplied.
  • Loading branch information
jbednar committed Jan 26, 2018
1 parent 686eac6 commit 37aa852
Show file tree
Hide file tree
Showing 7 changed files with 327 additions and 91 deletions.
155 changes: 97 additions & 58 deletions datashader/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,29 +4,14 @@
import pandas as pd
import dask.dataframe as dd
from dask.array import Array
from xarray import DataArray
from xarray import DataArray, Dataset
from collections import OrderedDict

from .utils import Dispatcher, ngjit, calc_res, calc_bbox, orient_array, compute_coords, get_indices, dshape_from_pandas, dshape_from_dask, categorical_in_dtypes
from .resampling import (resample_2d, US_NEAREST, US_LINEAR, DS_FIRST, DS_LAST,
DS_MEAN, DS_MODE, DS_VAR, DS_STD, DS_MIN, DS_MAX)
from .resampling import resample_2d
from .utils import Expr # noqa (API import)


class Expr(object):
"""Base class for expression-like objects.
Implements hashing and equality checks. Subclasses should implement an
``inputs`` attribute/property, containing a tuple of everything that fully
defines that expression.
"""
def __hash__(self):
return hash((type(self), self.inputs))

def __eq__(self, other):
return type(self) is type(other) and self.inputs == other.inputs

def __ne__(self, other):
return not self == other
from . import reductions as rd


class Axis(object):
Expand Down Expand Up @@ -201,7 +186,7 @@ def line(self, source, x, y, agg=None):
agg = any_rdn()
return bypixel(source, self, Line(x, y), agg)

def trimesh(self, vertices, simplices, mesh=None, agg=None, interp=True):
def trimesh(self, vertices, simplices, mesh=None, agg=None, interp=True, interpolate=None):
"""Compute a reduction by pixel, mapping data to pixels as a triangle.
>>> import datashader as ds
Expand Down Expand Up @@ -240,17 +225,29 @@ def trimesh(self, vertices, simplices, mesh=None, agg=None, interp=True):
purposes. This dataframe is expected to have come from
``datashader.utils.mesh()``. If this argument is not None, the first
two arguments are ignored.
interp : boolean, optional
Specify whether to do bilinear interpolation of the pixels within each
triangle. This can be thought of as a "weighted average" of the vertex
values. Defaults to True.
interpolate : str, optional default=linear
Method to use for interpolation between specified values. ``nearest``
means to use a single value for the whole triangle, and ``linear``
means to do bilinear interpolation of the pixels within each
triangle (a weighted average of the vertex values). For
backwards compatibility, also accepts ``interp=True`` for ``linear``
and ``interp=False`` for ``nearest``.
"""
from .glyphs import Triangles
from .reductions import mean as mean_rdn
from .utils import mesh as create_mesh

source = mesh

# 'interp' argument is deprecated as of datashader=0.6.4
if interpolate is not None:
if interpolate == 'linear':
interp = True
elif interpolate == 'nearest':
interp = False
else:
raise ValueError('Invalid interpolate method: options include {}'.format(['linear','nearest']))

# Validation is done inside the [pd]d_mesh utility functions
if source is None:
source = create_mesh(vertices, simplices)
Expand All @@ -274,9 +271,11 @@ def trimesh(self, vertices, simplices, mesh=None, agg=None, interp=True):
def raster(self,
source,
layer=None,
upsample_method='linear',
downsample_method='mean',
nan_value=None):
upsample_method='linear', # Deprecated as of datashader=0.6.4
downsample_method=rd.mean(), # Deprecated as of datashader=0.6.4
nan_value=None,
agg=None,
interpolate=None):
"""Sample a raster dataset by canvas size and bounds.
Handles 2D or 3D xarray DataArrays, assuming that the last two
Expand All @@ -291,16 +290,18 @@ def raster(self,
Parameters
----------
source : xarray.DataArray
input datasource most likely obtain from `xr.open_rasterio()`.
layer : int
source layer number : optional default=None
upsample_method : str, optional default=linear
resample mode when upsampling raster.
source : xarray.DataArray or xr.Dataset
2D or 3D labelled array (if Dataset, the agg reduction must
define the data variable).
layer : float
For a 3D array, value along the z dimension : optional default=None
interpolate : str, optional default=linear
Resampling mode when upsampling raster.
options include: nearest, linear.
downsample_method : str, optional default=mean
resample mode when downsampling raster.
options include: first, last, mean, mode, var, std
agg : Reduction, optional default=mean()
Resampling mode when downsampling raster.
options include: first, last, mean, mode, var, std, min, max
Also accepts string names, for backwards compatibility.
nan_value : int or float, optional
Optional nan_value which will be masked out when applying
the resampling.
Expand All @@ -310,28 +311,66 @@ def raster(self,
data : xarray.Dataset
"""
upsample_methods = dict(nearest=US_NEAREST,
linear=US_LINEAR)

downsample_methods = dict(first=DS_FIRST,
last=DS_LAST,
mean=DS_MEAN,
mode=DS_MODE,
var=DS_VAR,
std=DS_STD,
min=DS_MIN,
max=DS_MAX)

if upsample_method not in upsample_methods.keys():
raise ValueError('Invalid upsample method: options include {}'.format(list(upsample_methods.keys())))
if downsample_method not in downsample_methods.keys():
raise ValueError('Invalid downsample method: options include {}'.format(list(downsample_methods.keys())))
# For backwards compatibility
if agg is None: agg=downsample_method
if interpolate is None: interpolate=upsample_method

upsample_methods = ['nearest','linear']

downsample_methods = {'first':'first', rd.first:'first',
'last':'last', rd.last:'last',
'mode':'mode', rd.mode:'mode',
'mean':'mean', rd.mean:'mean',
'var':'var', rd.var:'var',
'std':'std', rd.std:'std',
'min':'min', rd.min:'min',
'max':'max', rd.max:'max'}

if interpolate not in upsample_methods:
raise ValueError('Invalid interpolate method: options include {}'.format(upsample_methods))

if not isinstance(source, (DataArray, Dataset)):
raise ValueError('Expected xarray DataArray or Dataset as '
'the data source, found %s.'
% type(source).__name__)

column = None
if isinstance(agg, rd.Reduction):
agg, column = type(agg), agg.column
if (isinstance(source, DataArray) and column is not None
and source.name != column):
agg_repr = '%s(%r)' % (agg.__name__, column)
raise ValueError('DataArray name %r does not match '
'supplied reduction %s.' %
(source.name, agg_repr))

if isinstance(source, Dataset):
data_vars = list(source.data_vars)
if column is None:
raise ValueError('When supplying a Dataset the agg reduction '
'must specify the variable to aggregate. '
'Available data_vars include: %r.' % data_vars)
elif column not in source.data_vars:
raise KeyError('Supplied reduction column %r not found '
'in Dataset, expected one of the following '
'data variables: %r.' % (column, data_vars))
source = source[column]

if agg not in downsample_methods.keys():
raise ValueError('Invalid aggregation method: options include {}'.format(list(downsample_methods.keys())))
ds_method = downsample_methods[agg]

if source.ndim not in [2, 3]:
raise ValueError('Raster aggregation expects a 2D or 3D '
'DataArray, found %s dimensions' % source.ndim)

res = calc_res(source)
ydim, xdim = source.dims[-2:]
xvals, yvals = source[xdim].values, source[ydim].values
left, bottom, right, top = calc_bbox(xvals, yvals, res)
array = orient_array(source, res, layer)
if layer is not None:
source=source.sel(**{source.dims[0]: layer})
array = orient_array(source, res)
dtype = array.dtype

if nan_value is not None:
Expand All @@ -354,26 +393,26 @@ def raster(self,
height_ratio = (ymax - ymin) / (self.y_range[1] - self.y_range[0])

if np.isclose(width_ratio, 0) or np.isclose(height_ratio, 0):
raise ValueError('Canvas x_range or y_range values do not match closely-enough with the data source to be able to accurately rasterize. Please provide ranges that are more accurate.')
raise ValueError('Canvas x_range or y_range values do not match closely enough with the data source to be able to accurately rasterize. Please provide ranges that are more accurate.')

w = int(np.ceil(self.plot_width * width_ratio))
h = int(np.ceil(self.plot_height * height_ratio))
cmin, cmax = get_indices(xmin, xmax, xvals, res[0])
rmin, rmax = get_indices(ymin, ymax, yvals, res[1])

kwargs = dict(w=w, h=h, ds_method=downsample_methods[downsample_method],
us_method=upsample_methods[upsample_method], fill_value=fill_value)
kwargs = dict(w=w, h=h, ds_method=ds_method,
us_method=interpolate, fill_value=fill_value)
if array.ndim == 2:
source_window = array[rmin:rmax+1, cmin:cmax+1]
if isinstance(source_window, Array):
source_window = source_window.compute()
if downsample_method in ['var', 'std']:
if ds_method in ['var', 'std']:
source_window = source_window.astype('f')
data = resample_2d(source_window, **kwargs)
layers = 1
else:
source_window = array[:, rmin:rmax+1, cmin:cmax+1]
if downsample_method in ['var', 'std']:
if ds_method in ['var', 'std']:
source_window = source_window.astype('f')
arrays = []
for arr in source_window:
Expand Down
3 changes: 1 addition & 2 deletions datashader/glyphs.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,7 @@
from toolz import memoize
import numpy as np

from .core import Expr
from .utils import ngjit, isreal
from .utils import ngjit, isreal, Expr


class Glyph(Expr):
Expand Down
114 changes: 110 additions & 4 deletions datashader/reductions.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@
from toolz import concat, unique
import xarray as xr

from .core import Expr
from .utils import ngjit
from .utils import Expr, ngjit


class Preprocess(Expr):
Expand All @@ -34,10 +33,12 @@ def apply(self, df):

class Reduction(Expr):
"""Base class for per-bin reductions."""
def __init__(self, column):
def __init__(self, column=None):
self.column = column

def validate(self, in_dshape):
if not self.column in in_dshape.dict:
raise ValueError("specified column not found")
if not isnumeric(in_dshape.measure[self.column]):
raise ValueError("input must be numeric")

Expand Down Expand Up @@ -76,7 +77,7 @@ def __init__(self, column=None):

@property
def inputs(self):
return (extract(self.column),) if self.column else ()
return (extract(self.column),) if self.column is not None else ()

def validate(self, in_dshape):
pass
Expand Down Expand Up @@ -382,6 +383,111 @@ def _finalize(bases, **kwargs):
return xr.DataArray(x, **kwargs)


class first(Reduction):
"""First value encountered in ``column``.
Useful for categorical data where an actual value must always be returned,
not an average or other numerical calculation.
Currently only supported for rasters, externally to this class.
Parameters
----------
column : str
Name of the column to aggregate over. If the data type is floating point,
``NaN`` values in the column are skipped.
"""
_dshape = dshape(Option(ct.float64))

@staticmethod
def _append(x, y, agg):
raise NotImplementedError("first is currently implemented only for rasters")

@staticmethod
def _create(shape):
raise NotImplementedError("first is currently implemented only for rasters")

@staticmethod
def _combine(aggs):
raise NotImplementedError("first is currently implemented only for rasters")

@staticmethod
def _finalize(bases, **kwargs):
raise NotImplementedError("first is currently implemented only for rasters")



class last(Reduction):
"""Last value encountered in ``column``.
Useful for categorical data where an actual value must always be returned,
not an average or other numerical calculation.
Currently only supported for rasters, externally to this class.
Parameters
----------
column : str
Name of the column to aggregate over. If the data type is floating point,
``NaN`` values in the column are skipped.
"""
_dshape = dshape(Option(ct.float64))

@staticmethod
def _append(x, y, agg):
raise NotImplementedError("last is currently implemented only for rasters")

@staticmethod
def _create(shape):
raise NotImplementedError("last is currently implemented only for rasters")

@staticmethod
def _combine(aggs):
raise NotImplementedError("last is currently implemented only for rasters")

@staticmethod
def _finalize(bases, **kwargs):
raise NotImplementedError("last is currently implemented only for rasters")



class mode(Reduction):
"""Mode (most common value) of all the values encountered in ``column``.
Useful for categorical data where an actual value must always be returned,
not an average or other numerical calculation.
Currently only supported for rasters, externally to this class.
Implementing it for other glyph types would be difficult due to potentially
unbounded data storage requirements to store indefinite point or line
data per pixel.
Parameters
----------
column : str
Name of the column to aggregate over. If the data type is floating point,
``NaN`` values in the column are skipped.
"""
_dshape = dshape(Option(ct.float64))

@staticmethod
def _append(x, y, agg):
raise NotImplementedError("mode is currently implemented only for rasters")

@staticmethod
def _create(shape):
raise NotImplementedError("mode is currently implemented only for rasters")

@staticmethod
def _combine(aggs):
raise NotImplementedError("mode is currently implemented only for rasters")

@staticmethod
def _finalize(bases, **kwargs):
raise NotImplementedError("mode is currently implemented only for rasters")



class summary(Expr):
"""A collection of named reductions.
Expand Down

0 comments on commit 37aa852

Please sign in to comment.