Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with parallel post fit wrapper (ParallelPostfitWrapper)--Can't drop an axis with more than 1 block. Please use atop instead #376

firstkingofrome opened this issue Sep 29, 2018 · 4 comments


Copy link

@firstkingofrome firstkingofrome commented Sep 29, 2018

Basically I am trying to evaluate some sklearn predictors that I came up with using standard sklearn to produce larger than memory array outputs. This code illustrates the problem:

from dask.base import tokenize
import numpy as np
import dask.array as da
from dask.array import Array
from sklearn.linear_model import LinearRegression
from dask_ml.wrappers import ParallelPostFit
sample of problem
x = np.linspace(0,100,100,dtype=np.int32)
y = np.linspace(0,100,100,dtype=np.int32)
z = np.linspace(0,100,100,dtype=np.int32)

Y = np.random.normal(size=(100,))
X = np.stack([x,y,z],axis=1)
reg = LinearRegression().fit(X,Y)

#now try to compute on dask arrays over the whole space
x= da.linspace(0,100,100,chunks=(10,)).astype(np.int32)
y= da.linspace(0,100,100,chunks=(10,)).astype(np.int32)
z= da.linspace(0,100,100,chunks=(10,)).astype(np.int32)
x,y,z = da.meshgrid(x,y,z,sparse=False,indexing='ij')
stacked = da.stack([x.flatten(),y.flatten(),z.flatten()],axis=1)
clf = ParallelPostFit(estimator=reg)

Executing clf.predict throws a value error Can't drop an axis with more than 1 block. Please use atop instead.

which I dont understand how to correct. Thank You for any help.

Copy link

@mrocklin mrocklin commented Sep 29, 2018

I suspect that ParallelPostFit assumes that the data has a single block in the columns direction. @TomAugspurger thoughts about providing an informative error here pointing users to X.reshape({1: -1}) or doing this for them automatically?

Copy link

@TomAugspurger TomAugspurger commented Sep 30, 2018

TomAugspurger added a commit to TomAugspurger/dask-ml that referenced this issue Oct 1, 2018
Copy link

@firstkingofrome firstkingofrome commented Oct 1, 2018

I just wanted to thank you both for how quickly you helped resolve this and for dask.

Copy link

@TomAugspurger TomAugspurger commented Oct 1, 2018

This will be auto-closed once #377 is merged. Just need to sort out one issue there first.

@TomAugspurger TomAugspurger reopened this Oct 1, 2018
TomAugspurger added a commit that referenced this issue Oct 3, 2018
Closes #376
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants