Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats.rm_poly tests fail when dask array passed #100

Closed
lukegre opened this issue Nov 6, 2020 · 4 comments
Closed

stats.rm_poly tests fail when dask array passed #100

lukegre opened this issue Nov 6, 2020 · 4 comments

Comments

@lukegre
Copy link
Contributor

lukegre commented Nov 6, 2020

Overview

Tests for stats.rm_poly fails when x and y are given and input array when y is a dask array. Encountered on the PR #99 resulting in 12 failed tests.

The test conditions below result in failure:
esmtools/tests/test_stats_poly.py::test_rm_poly_against_time_dask[<any>-<any>-False]
For more info see: https://travis-ci.com/github/bradyrx/esmtools/builds/198414251

The tests pass if y.load() is forced. So somehow dask is the culprit, though not sure how to feasibly solve the problem.

Traceback

Traceback (most recent call last):
1568  File "/home/travis/build/bradyrx/esmtools/esmtools/tests/test_stats_poly.py", line 197, in test_polyfit_against_time_dask
1569    dt = rm_poly(y, order=2)
1570  File "/home/travis/build/bradyrx/esmtools/esmtools/checks.py", line 79, in wrapper
1571    return func(*args, **kwargs)
1572  File "/home/travis/build/bradyrx/esmtools/esmtools/stats.py", line 541, in rm_poly
1573    output_dtypes=['float64'],
1574  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/xarray/core/computation.py", line 1110, in apply_ufunc
1575    keep_attrs=keep_attrs,
1576  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/xarray/core/computation.py", line 262, in apply_dataarray_vfunc
1577    result_var = func(*data_vars)
1578  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/xarray/core/computation.py", line 700, in apply_variable_ufunc
1579    result_data = func(*input_data)
1580  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/xarray/core/computation.py", line 674, in func
1581    **dask_gufunc_kwargs,
1582  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/dask/array/gufunc.py", line 431, in apply_gufunc
1583    func, loop_output_dims, *arginds, concatenate=True, **kwargs
1584  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/dask/array/blockwise.py", line 250, in blockwise
1585    meta = compute_meta(func, dtype, *args[::2], **kwargs)
1586  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/dask/array/utils.py", line 121, in compute_meta
1587    meta = func(*args_meta)
1588  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/numpy/lib/function_base.py", line 2108, in __call__
1589    return self._vectorize_call(func=func, args=vargs)
1590  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/numpy/lib/function_base.py", line 2182, in _vectorize_call
1591    res = self._vectorize_call_with_signature(func, args)
1592  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/numpy/lib/function_base.py", line 2223, in _vectorize_call_with_signature
1593    results = func(*(arg[index] for arg in args))
1594  File "/home/travis/build/bradyrx/esmtools/esmtools/stats.py", line 528, in _rm_poly
1595    fit = _polyfit(x, y, order, nan_policy)
1596  File "/home/travis/build/bradyrx/esmtools/esmtools/stats.py", line 149, in _polyfit
1597    coefs = poly.polyfit(x_mod, y_mod, order)
1598  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/numpy/polynomial/polynomial.py", line 1350, in polyfit
1599    return pu._fit(polyvander, x, y, deg, rcond, full, w)
1600  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/numpy/polynomial/polyutils.py", line 664, in _fit
1601    raise TypeError("expected non-empty vector for x")
1602TypeError: expected non-empty vector for x
@bradyrx
Copy link
Owner

bradyrx commented Nov 13, 2020

I'll give this a look now and see what the deal is...

@bradyrx
Copy link
Owner

bradyrx commented Nov 13, 2020

Okay, just putting some notes here as I work through this.

  • Fails happen on the dask-aware rm_poly and polyfit tests. It doesn't matter if they're gridded or what the time type is. This is solely related to it breaking with 2 arguments instead of just 1.

@bradyrx
Copy link
Owner

bradyrx commented Nov 13, 2020

Replicating a broken test.... this breaks it:

def gridded_da_float():
    """Mock data of gridded time series in float time."""
    # Wrapper so fixture can be called multiple times.
    # https://alysivji.github.io/pytest-fixures-with-function-arguments.html
    data = np.random.rand(60, 3, 3)
    da = xr.DataArray(data, dims=["time", "lat", "lon"])
    # Annual resolution time axis for 60 years.
    da["time"] = np.arange(1900, 1960)
    return da

# false - float - false
data = gridded_da_float().chunk()
x = data["time"]
y = data.isel(lat=0, lon=0)

X, _ = esmtools.stats._convert_time_and_return_slope_factor(x, "time")
Y = y

def _rm_poly(x, y, order, nan_policy):
    print(x)
    fit = esmtools.stats._polyfit(x, y, order, nan_policy)
    return y - fit

dim = "time"
test = xr.apply_ufunc(
    _rm_poly,
    X,
    Y,
    2,
    "none",
    vectorize=True,
    dask="parallelized",
    input_core_dims=[[dim], [dim], [], []],
    output_core_dims=[[dim]],
    output_dtypes=[float],
)

dask='allowed' fixes it but causes other tests to break. The weird thing is that with dask='parallelized' here and dask arrays, it passes empty arrays to the _rm_poly func. You see with the print statement in _rm_poly an empty list for parallelized and a full list for allowed. I haven't seen this behavior before. I don't know why it's doing this.

@bradyrx
Copy link
Owner

bradyrx commented Nov 13, 2020

See #102. This should fix it. Looks like there was some dask or xarray update that stays true to vectorize. So if you pass a single time series in with input_core_dims=['time'] and vectorize=True it doesn't know how to split it up.

With a gridded product, it passes in each grid cell separately. This just detects whether it's gridded. This seems like a kludge but should be fine.

@bradyrx bradyrx closed this as completed Nov 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants