{{ message }}

# Problem with parallel post fit wrapper (ParallelPostfitWrapper)--Can't drop an axis with more than 1 block. Please use atop instead #376

Closed
opened this issue Sep 29, 2018 · 4 comments
Closed

# Problem with parallel post fit wrapper (ParallelPostfitWrapper)--Can't drop an axis with more than 1 block. Please use atop instead#376

opened this issue Sep 29, 2018 · 4 comments

### firstkingofrome commented Sep 29, 2018

 Basically I am trying to evaluate some sklearn predictors that I came up with using standard sklearn to produce larger than memory array outputs. This code illustrates the problem: from dask.base import tokenize import numpy as np import dask.array as da from dask.array import Array from sklearn.linear_model import LinearRegression from dask_ml.wrappers import ParallelPostFit """ sample of problem """ x = np.linspace(0,100,100,dtype=np.int32) y = np.linspace(0,100,100,dtype=np.int32) z = np.linspace(0,100,100,dtype=np.int32) Y = np.random.normal(size=(100,)) X = np.stack([x,y,z],axis=1) reg = LinearRegression().fit(X,Y) #now try to compute on dask arrays over the whole space x= da.linspace(0,100,100,chunks=(10,)).astype(np.int32) y= da.linspace(0,100,100,chunks=(10,)).astype(np.int32) z= da.linspace(0,100,100,chunks=(10,)).astype(np.int32) x,y,z = da.meshgrid(x,y,z,sparse=False,indexing='ij') stacked = da.stack([x.flatten(),y.flatten(),z.flatten()],axis=1) clf = ParallelPostFit(estimator=reg) clf.predict(stacked) Executing clf.predict throws a value error Can't drop an axis with more than 1 block. Please use atop instead. which I dont understand how to correct. Thank You for any help. The text was updated successfully, but these errors were encountered:

### mrocklin commented Sep 29, 2018

 I suspect that `ParallelPostFit` assumes that the data has a single block in the columns direction. @TomAugspurger thoughts about providing an informative error here pointing users to `X.reshape({1: -1})` or doing this for them automatically?

### TomAugspurger commented Sep 30, 2018

 That’s correct. We should at least be calling check_array, which validates this. In principle a rechunk with “auto” and -1 should do what we want. … ________________________________ From: Matthew Rocklin Sent: Saturday, September 29, 2018 6:03:40 PM To: dask/dask-ml Cc: Tom Augspurger; Mention Subject: Re: [dask/dask-ml] Problem with parallel post fit wrapper (ParallelPostfitWrapper)--Can't drop an axis with more than 1 block. Please use atop instead (#376) I suspect that ParallelPostFit assumes that the data has a single block in the columns direction. @TomAugspurger thoughts about providing an informative error here pointing users to X.reshape({1: -1}) or doing this for them automatically? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#376 (comment)>, or mute the thread.
added a commit to TomAugspurger/dask-ml that referenced this issue Oct 1, 2018
``` Auto-rechunk input arrays ```
``` ce31505 ```
`Closes dask#376`
mentioned this issue Oct 1, 2018

### firstkingofrome commented Oct 1, 2018

 I just wanted to thank you both for how quickly you helped resolve this and for dask.

### TomAugspurger commented Oct 1, 2018

 This will be auto-closed once #377 is merged. Just need to sort out one issue there first.
reopened this Oct 1, 2018
added a commit that referenced this issue Oct 3, 2018
``` Auto-rechunk input arrays (#377) ```
``` 3260747 ```
`Closes #376`