-
-
Notifications
You must be signed in to change notification settings - Fork 260
Description
I have a custom packages which uses a custom data-structure similar to pandas DataFrame
but that also carry additional metadata. Let's call it CustomFrame
.
I wrapped all estimator which share the sklearn
API such as StandardScaler
, Lasso
, XGBRegressor
etc.. to make them able to handle this CustomFrame
.
This works well with dask
implementation of GridSearchCV
or RandomizedSearchCV
as they just pass the X
and y
to the underlying estimators that, as I said, can correctly handle my CustomFrame
.
I wanted to do the same with HyperbandSearchCV but I found this much more difficult, if not impossible without rewriting most of it, because the implementation is very different as it relies on Future objects and async await.
To be more precise at the moment it doesn't even work with pandas DataFrames
. If you don't specify the test_size
, the earliest time it fails is in dask_ml.model_selection._incremental
at line 569 test_size = min(0.2, 1 / X.npartitions)
because DataFrame
does not have the attribute npartitions
.
I would like to ask if you could consider adding the possibility to also handle also other ArrayLike
objects the same way is already possible using GridSearchCV
or RandomizedSearchCV
Many thanks
Gio