-
-
Notifications
You must be signed in to change notification settings - Fork 43
AttributeError when using GridSearchCV with XGBClassifier #31
Comments
Can you try with master? Older versions didn't properly handle pandas / numpy objects passed to train, but I think that's fixed now. Will try to get a release out soon. |
Does our GridSearchCV even handle dask-ml estimators? I thought that it
was mostly optimzied for parameter searches on scikit-learn estimators.
…On Tue, Nov 6, 2018 at 8:15 AM Tom Augspurger ***@***.***> wrote:
Can you try with master? Older versions didn't properly handle pandas /
numpy objects passed to train, but I think that's fixed now.
Will try to get a release out soon.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#31 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszMwRfTqT7u-1Ft4AB5OTeLFQXlpRks5usYt5gaJpZM4YQCV_>
.
|
I assume by "dask-ml estimators" you mean dask data objects? dask-ml.model_selection.GridSearchCV should work fine on either, but you have the requirement that the underlying estimator being searched over supports whatever is passed to it (and doesn't blow up memory). When dask_xgboost encounters a pandas or NumPy object, it just trains the Booster locally. I wonder if that should be done on a worker, in case you have resources like a GPU you want used. |
Thanks for the response.
Okay, I've tried with master, but now another problem appears:
|
Whoops, I've accidentally been running your script on my branch for #28, which is fixing this exact issue :) I didn't realize that wasn't merged. I'm going to kick off the Ci again, and then merge it. |
Hah, glad to read this! Thank you. |
Hi! Are there any updates on this issue? I'm heading the same problem - and the PR unfortunately did not get merged, as the CI pipeline failed. |
I don't know if it works for you, but you might be interested in the original xgboost's external memory API. I've ended up searching hyperparameters with hyperopt, training on large data using the external memory API, reading the data from multiple csv files without dask (currently, I use dask only for the preprocessing part). |
I was able to install the branch from #28 and it works for my use case. @TomAugspurger I would be interested in helping solve the CI problems but I don't know where to begin (the error is in multiprocessing when using distributed.utils_test.cluster), so if you would welcome help and be willing to point me in the right direction just ping me. No worries if that is more trouble than it is worth. |
I spent another couple hours on this with no luck... It's just hard to work around xgboot's behavior of basically doing FYI, the |
Is there any update on this issue? I am also encounring the same problem. |
Still open. You can apply #28. IIRC there are some issues with the CI / testing on master, but no one has had time to resole them (LMK if you're interested in working on it). |
Hello,
I'm working on a small proof of concept. I use dask in my project and would like to use the XGBClassifier. I also need a parameter search and, of course, cross-validation mechanisms.
Unfortunately, when fitting the dask_xgboost.XGBClassifier, I get the following error:
Although I call .fit() with two dask objects, somehow it becomes a pandas.DataFrame later on.
Here's the code I'm using:
I use the packages in the following versions:
Note that I don't get this exception when using sklearn.ensemble.GradientBoostingClassifier.
Any help would be appreciated.
Mateusz
The text was updated successfully, but these errors were encountered: