Changelog

Version 2024.4.1

Fixed packaging 986

Version 2023.3.24

Compatibility with Python 3.10
Dropped support for Python 3.7
Compatibility with scikit-learn 1.2.0 and newer

Version 2022.5.27

Compatibility with scikit-learn 1.1 and newer (910)

Version 2021.11.30

Fixed regression in meta inference for wrappers when the base estimator returned a scipy.sparse matrix (889)

Version 2021.11.16

Meta-estimators like wrappers.ParallelPostFit now work with cuDF and CuPy objects. (862)
Fixed incompatibility with new Dask optimizations in wrappers.ParallelPostFit (878)

Version 2021.10.17

Added support for scikit-learn 1.0.0. scikit-learn 1.0.0 is now the minimum-supported version.

Version 1.9.0

LogisticRegression.predict_proba now correctly returns an (n, 2) array for binary classification (760)
Fixed multioutput behavior to be consistent with scikit-learn (820)
Added MAPE to regression metrics (822)
NumPy 1.20 compatability (784)

Version 1.8.0

Compatibility with scikit-learn 0.24

Version 1.7.0

Improved documentation for working with PyTorch models, see pytorch (699)
Improved documentation for working with Keras / TensorFlow models, see keras (713)
Fixed handling of remote vocabularies in dask_ml.feature_extraction.text.HashingVectorizer (719)
Added dask_ml.metrics.regression.mean_squared_log_error (725)
Allow user-provided categories in dask_ml.preprocessing.OneHotEncoder (727)
Added dask_ml.linear_model.LogisticRegression.decision_function (728)
Added compute argument to dask_ml.decomposition.TruncatedSVD (743)
Fixed sign stability in incremental PCA (742)

Version 1.6.0

Improved documentation for RandomizedSearchCV
Improved logging in dask_ml.cluster.KMeans (688)
Added support for dask.dataframe objects in dask_ml.model_selection.HyperbandSearchCV (701)
Added squared=True option to dask_ml.metrics.mean_squared_error (707)
Added dask_ml.feature_extraction.text.CountVectorizer (705)

Version 1.5.0

Support for Python 3.8 (669)
Compatibility with Scikit-Learn 0.23.0 (669)
Scikit-Learn 0.23.0 or newer is now required (669)
Removed previously deprecated Partial classes. Use dask_ml.wrappers.Incremental instead (674)

Version 1.4.0

Added dask_ml.decomposition.IncrementalPCA for out-of-core / distributed incremental PCA (619)
Improved logging and monitoring in incremental model selection (528)
Added dask_ml.ensemble.BlockwiseVotingClassifier and dask_ml.ensemble.BlockwiseVotingRegressor for blockwise training and ensemble prediction (657)
Improved documentation for hyper-parameter-search (432)

Version 1.3.0

Added shuffle support to dask_ml.model_selection.train_test_split for DataFrame input (625)
Improved performance of dask_ml.model_selection.GridSearchCV by re-using cached tasks (622)
Add support for DataFrame to dask_ml.model_selection.GridSearchCV (612)
Fixed dask_ml.linear_model.LinearRegression.score to use r2_score rather than mse (614)
Handle missing data in dask_ml.preprocessing.StandardScaler (608)

Version 1.2.0

Changed the name of the second positional argument in model_selection.IncrementalSearchCV from param_distribution to parameters to match the name of the base class.
Compatibility with scikit-learn 0.22.1.
Added dask_ml.preprocessing.BlockTransfomer an extension of scikit-learn's FunctionTransformer (366).
Added dask_ml.feature_extraction.FeatureHasher which is similar to scikit-learn's implementation.

Version 1.1.1

Fixed an issue with the 1.1.0 wheel (575)
Make svd_flip work even when arrays are read only (592)

Version 1.1.0

Non-arrays (e.g. Dask Bags and DataFrames) are now allowed in dask_ml.wrappers.Incremental. This is useful for text classification pipelines (pr:570)
The index is now preserved in dask_ml.preprocessing.PolynomialFeatures for DataFrame inputs (563)
dask_ml.decomposition.PCA now works with DataFrame inputs (543)
dask_ml.cluster.KMeans handles inputs where some blocks are length-0 (559)
Improved error reporting for mixed inputs to dask_ml.model_selection.train_test_split (552)
Removed deprecated dask_ml.joblib module. Use joblib.parallel_backend instead (545)
dask_ml.preprocessing.QuantileTransformer now handles DataFrame input (533)

Version 1.0.0

Added new hyperparameter search meta-estimators for hyperparameter search on distributed datasets: ~dask_ml.model_selection.HyperbandSearchCV and ~dask_ml.model_selection.SuccessiveHalvingSearchCV
Dropped Python 2 support (500)

Version 0.13.0

Compatibility with scikit-learn 0.21.1
Cross-validation results in GridSearchCV and RandomizedSearchCV are now gathered as completed, in case a worker is lost (433)
Fixed bug in dask_ml.model_selection.train_test_split when only one of train / test size is provided (502)
Consistent random state for dask_ml.model_selection.IncrementalSearchCV
Fixed various issues with 32-bit Windows builds (487)

Note

dask-ml 0.13.0 will be the last release to support Python 2.

Version 0.12.0

API Breaking Changes

dask_ml.model_selection.IncrementalSearchCV now returns Dask objects for post-fit methods like .predict, etc (423).

Version 0.11.0

Note that this version of Dask-ML requires scikit-learn >= 0.20.0.

Enhancements

Added dask_ml.model_selection.IncrementalSearchCV, a meta-estimator for hyperparameter optimization on larger-than-memory datasets (356). See hyperparameter.incremental for more.
Added dask_ml.preprocessing.PolynomialTransformer, a drop-in replacement for the scikit-learn version (347).
Added auto-rechunking to Dask Arrays with more than one block along the features in dask_ml.model_selection.ParallelPostFit (376)
Added support for Dask DataFrame inputs to dask_ml.cluster.KMeans (390)
Added a compute keyword to dask_ml.wrappers.ParallelPostFit.score to support lazily evaluating a model's score (402)

Bug Fixes

Changed dask_ml.wrappers.ParallelPostFit to automatically rechunk input arrays to methods like predict when they have more than one block along the features (376).
Bug in dask_ml.impute.SimpleImputer with Dask DataFrames filling the count of the most frequent item, rather than the item itself (385).
Bug in dask_ml.model_selection.ShuffleSplit returning the same split when the random_state was set (380).

Version 0.10.0

Enhancements

Added support for dask.dataframe.DataFrame to dask_ml.model_selection.train_test_split (351)

Version 0.9.0

Enhancements

Added dask_ml.model_selection.ShuffleSplit (340)

Bug Fixes

Fixed handling of errors in the predict and score steps of dask_ml.model_selection.GridSearchCV and dask_ml.model_selection.RandomizedSearchCV (339)
Compatability with Dask 0.18 for dask_ml.preprocessing.LabelEncoder (you'll also notice improved performance) (336).

Documentation Updates

Added a roadmap. Please open an issue if you'd like something to be included on the roadmap. (322)
Added many examples to the documentation and the dask examples binder.

Build Changes

We're now using Numba for performance-sensitive parts of Dask-ML. Dask-ML is now a pure-python project, so we can provide universal wheels.

Version 0.8.0

Enhancements

Automatically replace default scikit-learn scorers with dask-aware versions in Incremental (200)
Added the dask_ml.metrics.log_loss loss function and neg_log_loss scorer (318)
Fixed handling of array-like fit-parameters to GridSearchCV and BaseSearchCV (320)

Bug Fixes

Fixed dtype in LabelEncoder.fit_transform to be integer, rather than the dtype of the classes for dask arrays (311)

Version 0.7.0

Enhancements

Added sample_weight support for dask_ml.metrics.accuracy_score. (217)
Improved performance of training on dask_ml.cluster.SpectralClustering (152)
Added dask_ml.preprocessing.LabelEncoder. (226)
Fixed issue in model_selection meta-estimators not respecting the default Dask scheduler (260)

API Breaking Changes

Removed the basis_inds_ attribute from dask_ml.cluster.SpectralClustering as its no longer used (152)
Change dask_ml.wrappers.Incremental.fit to clone the underlying estimator before training (258). This induces a few changes
1. The underlying estimator no longer gives access to learned attributes like coef_. We recommend using Incremental.coef_.
2. State no longer leaks between successive fit calls. Note that Incremental.partial_fit is still available if you want state, like learned attributes or random seeds, to be re-used. This is useful if you're making multiple passes over the training data.
Changed get_params and set_params for dask_ml.wrappers.Incremental to no longer magically get / set parameters for the underlying estimator (258). To specify parameters for the underlying estimator, use the double-underscore prefix convention established by scikit-learn:
```
inc.set_params('estimator__alpha': 10)
```

Reorganization

Dask-SearchCV is now being developed in the dask/dask-ml repository. Users who previously installed dask-searchcv should now just install dask-ml.

Bug Fixes

Fixed random seed generation on 32-bit platforms (230)

Version 0.6.0

API Breaking Changes

Removed the get keyword from the incremental learner fit methods. (187)
Deprecated the various Partial* estimators in favor of the dask_ml.wrappers.Incremental meta-estimator (190)

Enhancements

Added a new meta-estimator dask_ml.wrappers.Incremental for wrapping any estimator with a partial_fit method. See incremental.blockwise-metaestimator for more. (190)
Added an R2-score metric dask_ml.metrics.r2_score.

Version 0.5.0

API Breaking Changes

The n_samples_seen_ attribute on dask_ml.preprocessing.StandardScalar is now consistently numpy.nan (157).
Changed the algorithm for dask_ml.datasets.make_blobs, dask_ml.datasets.make_regression and dask_ml.datasets.make_classfication to reduce the single-machine peak memory usage (67)

Enhancements

Added dask_ml.model_selection.train_test_split and dask_ml.model_selection.ShuffleSplit (172)
Added dask_ml.metrics.classification_score, dask_ml.metrics.mean_absolute_error, and dask_ml.metrics.mean_squared_error.

Bug Fixes

dask_ml.preprocessing.StandardScalar now works on DataFrame inputs (157).

Version 0.4.1

This release added several new estimators.

Enhancements

Added `dask_ml.preprocessing.RobustScaler`

Scale features using statistics that are robust to outliers. This mirrors sklearn.preprocessing.RobustScalar (62).

Added `dask_ml.preprocessing.OrdinalEncoder`

Encodes categorical features as ordinal, in one ordered feature (119).

Added `dask_ml.wrappers.ParallelPostFit`

A meta-estimator for fitting with any scikit-learn estimator, but post-processing (predict, transform, etc.) in parallel on dask arrays. See parallel-meta-estimators for more (132).

Version 0.4.0

API Changes

Changed the arguments of the dask-glm based estimators in dask_glm.linear_model to match scikit-learn's API (94).
- To specify lambuh use C = 1.0 / lambduh (the default of 1.0 is unchanged)
- The rho, over_relax, abstol and reltol arguments have been removed. Provide them in solver_kwargs instead.
This affects the LinearRegression, LogisticRegression and PoissonRegression estimators.

Enhancements

Accept dask.dataframe for dask-glm based estimators (84).

Version 0.3.2

Enhancements

Added dask_ml.preprocessing.TruncatedSVD and dask_ml.preprocessing.PCA (78)

Version 0.3.0

Enhancements

Added KMeans.predict (83)

API Changes

Changed the fitted attributes on MinMaxScaler and StandardScaler to be concrete NumPy or pandas objects, rather than persisted dask objects (75).

Files

changelog.rst

Latest commit

History

changelog.rst

File metadata and controls

Changelog

Version 2024.4.1

Version 2023.3.24

Version 2022.5.27

Version 2021.11.30

Version 2021.11.16

Version 2021.10.17

Version 1.9.0

Version 1.8.0

Version 1.7.0

Version 1.6.0

Version 1.5.0

Version 1.4.0

Version 1.3.0

Version 1.2.0

Version 1.1.1

Version 1.1.0

Version 1.0.0

Version 0.13.0

Version 0.12.0

API Breaking Changes

Version 0.11.0

Enhancements

Bug Fixes

Version 0.10.0

Enhancements

Version 0.9.0

Enhancements

Bug Fixes

Documentation Updates

Build Changes

Version 0.8.0

Enhancements

Bug Fixes

Version 0.7.0

Enhancements

API Breaking Changes

Reorganization

Bug Fixes

Version 0.6.0

API Breaking Changes

Enhancements

Version 0.5.0

API Breaking Changes

Enhancements

Bug Fixes

Version 0.4.1

Enhancements

Added dask_ml.preprocessing.RobustScaler

Added dask_ml.preprocessing.OrdinalEncoder

Added dask_ml.wrappers.ParallelPostFit

Version 0.4.0

API Changes

Enhancements

Version 0.3.2

Enhancements

Version 0.3.0

Enhancements

API Changes

Added `dask_ml.preprocessing.RobustScaler`

Added `dask_ml.preprocessing.OrdinalEncoder`

Added `dask_ml.wrappers.ParallelPostFit`