Skip to content

Allowing HyperbandSearchCV to also work with non dask arrays #748

@gioxc88

Description

@gioxc88

I have a custom packages which uses a custom data-structure similar to pandas DataFrame but that also carry additional metadata. Let's call it CustomFrame.

I wrapped all estimator which share the sklearn API such as StandardScaler, Lasso, XGBRegressor etc.. to make them able to handle this CustomFrame.

This works well with dask implementation of GridSearchCV or RandomizedSearchCV as they just pass the X and y to the underlying estimators that, as I said, can correctly handle my CustomFrame.

I wanted to do the same with HyperbandSearchCV but I found this much more difficult, if not impossible without rewriting most of it, because the implementation is very different as it relies on Future objects and async await.

To be more precise at the moment it doesn't even work with pandas DataFrames. If you don't specify the test_size, the earliest time it fails is in dask_ml.model_selection._incremental at line 569 test_size = min(0.2, 1 / X.npartitions) because DataFrame does not have the attribute npartitions.

I would like to ask if you could consider adding the possibility to also handle also other ArrayLike objects the same way is already possible using GridSearchCV or RandomizedSearchCV

Many thanks
Gio

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions