New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with parallel post fit wrapper (ParallelPostfitWrapper)--Can't drop an axis with more than 1 block. Please use atop instead #376

Closed
firstkingofrome opened this Issue Sep 29, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@firstkingofrome

firstkingofrome commented Sep 29, 2018

Basically I am trying to evaluate some sklearn predictors that I came up with using standard sklearn to produce larger than memory array outputs. This code illustrates the problem:

from dask.base import tokenize
import numpy as np
import dask.array as da
from dask.array import Array
from sklearn.linear_model import LinearRegression
from dask_ml.wrappers import ParallelPostFit
"""
sample of problem
"""
x = np.linspace(0,100,100,dtype=np.int32)
y = np.linspace(0,100,100,dtype=np.int32)
z = np.linspace(0,100,100,dtype=np.int32)

Y = np.random.normal(size=(100,))
X = np.stack([x,y,z],axis=1)
reg = LinearRegression().fit(X,Y)

#now try to compute on dask arrays over the whole space
x= da.linspace(0,100,100,chunks=(10,)).astype(np.int32)
y= da.linspace(0,100,100,chunks=(10,)).astype(np.int32)
z= da.linspace(0,100,100,chunks=(10,)).astype(np.int32)
x,y,z = da.meshgrid(x,y,z,sparse=False,indexing='ij')
stacked = da.stack([x.flatten(),y.flatten(),z.flatten()],axis=1)
clf = ParallelPostFit(estimator=reg)
clf.predict(stacked)

Executing clf.predict throws a value error Can't drop an axis with more than 1 block. Please use atop instead.

which I dont understand how to correct. Thank You for any help.

@mrocklin

This comment has been minimized.

Member

mrocklin commented Sep 29, 2018

I suspect that ParallelPostFit assumes that the data has a single block in the columns direction. @TomAugspurger thoughts about providing an informative error here pointing users to X.reshape({1: -1}) or doing this for them automatically?

@TomAugspurger

This comment has been minimized.

Member

TomAugspurger commented Sep 30, 2018

TomAugspurger added a commit to TomAugspurger/dask-ml that referenced this issue Oct 1, 2018

@firstkingofrome

This comment has been minimized.

firstkingofrome commented Oct 1, 2018

I just wanted to thank you both for how quickly you helped resolve this and for dask.

@TomAugspurger

This comment has been minimized.

Member

TomAugspurger commented Oct 1, 2018

This will be auto-closed once #377 is merged. Just need to sort out one issue there first.

@TomAugspurger TomAugspurger reopened this Oct 1, 2018

TomAugspurger added a commit that referenced this issue Oct 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment