Skip to content

Latest commit

 

History

History
371 lines (255 loc) · 14 KB

changelog.rst

File metadata and controls

371 lines (255 loc) · 14 KB

Changelog

Version 2024.4.1

  • Fixed packaging 986

Version 2023.3.24

  • Compatibility with Python 3.10
  • Dropped support for Python 3.7
  • Compatibility with scikit-learn 1.2.0 and newer

Version 2022.5.27

  • Compatibility with scikit-learn 1.1 and newer (910)

Version 2021.11.30

  • Fixed regression in meta inference for wrappers when the base estimator returned a scipy.sparse matrix (889)

Version 2021.11.16

  • Meta-estimators like wrappers.ParallelPostFit now work with cuDF and CuPy objects. (862)
  • Fixed incompatibility with new Dask optimizations in wrappers.ParallelPostFit (878)

Version 2021.10.17

  • Added support for scikit-learn 1.0.0. scikit-learn 1.0.0 is now the minimum-supported version.

Version 1.9.0

  • LogisticRegression.predict_proba now correctly returns an (n, 2) array for binary classification (760)
  • Fixed multioutput behavior to be consistent with scikit-learn (820)
  • Added MAPE to regression metrics (822)
  • NumPy 1.20 compatability (784)

Version 1.8.0

  • Compatibility with scikit-learn 0.24

Version 1.7.0

  • Improved documentation for working with PyTorch models, see pytorch (699)
  • Improved documentation for working with Keras / TensorFlow models, see keras (713)
  • Fixed handling of remote vocabularies in dask_ml.feature_extraction.text.HashingVectorizer (719)
  • Added dask_ml.metrics.regression.mean_squared_log_error (725)
  • Allow user-provided categories in dask_ml.preprocessing.OneHotEncoder (727)
  • Added dask_ml.linear_model.LogisticRegression.decision_function (728)
  • Added compute argument to dask_ml.decomposition.TruncatedSVD (743)
  • Fixed sign stability in incremental PCA (742)

Version 1.6.0

  • Improved documentation for RandomizedSearchCV
  • Improved logging in dask_ml.cluster.KMeans (688)
  • Added support for dask.dataframe objects in dask_ml.model_selection.HyperbandSearchCV (701)
  • Added squared=True option to dask_ml.metrics.mean_squared_error (707)
  • Added dask_ml.feature_extraction.text.CountVectorizer (705)

Version 1.5.0

  • Support for Python 3.8 (669)
  • Compatibility with Scikit-Learn 0.23.0 (669)
  • Scikit-Learn 0.23.0 or newer is now required (669)
  • Removed previously deprecated Partial classes. Use dask_ml.wrappers.Incremental instead (674)

Version 1.4.0

  • Added dask_ml.decomposition.IncrementalPCA for out-of-core / distributed incremental PCA (619)
  • Improved logging and monitoring in incremental model selection (528)
  • Added dask_ml.ensemble.BlockwiseVotingClassifier and dask_ml.ensemble.BlockwiseVotingRegressor for blockwise training and ensemble prediction (657)
  • Improved documentation for hyper-parameter-search (432)

Version 1.3.0

  • Added shuffle support to dask_ml.model_selection.train_test_split for DataFrame input (625)
  • Improved performance of dask_ml.model_selection.GridSearchCV by re-using cached tasks (622)
  • Add support for DataFrame to dask_ml.model_selection.GridSearchCV (612)
  • Fixed dask_ml.linear_model.LinearRegression.score to use r2_score rather than mse (614)
  • Handle missing data in dask_ml.preprocessing.StandardScaler (608)

Version 1.2.0

  • Changed the name of the second positional argument in model_selection.IncrementalSearchCV from param_distribution to parameters to match the name of the base class.
  • Compatibility with scikit-learn 0.22.1.
  • Added dask_ml.preprocessing.BlockTransfomer an extension of scikit-learn's FunctionTransformer (366).
  • Added dask_ml.feature_extraction.FeatureHasher which is similar to scikit-learn's implementation.

Version 1.1.1

  • Fixed an issue with the 1.1.0 wheel (575)
  • Make svd_flip work even when arrays are read only (592)

Version 1.1.0

  • Non-arrays (e.g. Dask Bags and DataFrames) are now allowed in dask_ml.wrappers.Incremental. This is useful for text classification pipelines (pr:570)
  • The index is now preserved in dask_ml.preprocessing.PolynomialFeatures for DataFrame inputs (563)
  • dask_ml.decomposition.PCA now works with DataFrame inputs (543)
  • dask_ml.cluster.KMeans handles inputs where some blocks are length-0 (559)
  • Improved error reporting for mixed inputs to dask_ml.model_selection.train_test_split (552)
  • Removed deprecated dask_ml.joblib module. Use joblib.parallel_backend instead (545)
  • dask_ml.preprocessing.QuantileTransformer now handles DataFrame input (533)

Version 1.0.0

  • Added new hyperparameter search meta-estimators for hyperparameter search on distributed datasets: ~dask_ml.model_selection.HyperbandSearchCV and ~dask_ml.model_selection.SuccessiveHalvingSearchCV
  • Dropped Python 2 support (500)

Version 0.13.0

  • Compatibility with scikit-learn 0.21.1
  • Cross-validation results in GridSearchCV and RandomizedSearchCV are now gathered as completed, in case a worker is lost (433)
  • Fixed bug in dask_ml.model_selection.train_test_split when only one of train / test size is provided (502)
  • Consistent random state for dask_ml.model_selection.IncrementalSearchCV
  • Fixed various issues with 32-bit Windows builds (487)

Note

dask-ml 0.13.0 will be the last release to support Python 2.

Version 0.12.0

API Breaking Changes

  • dask_ml.model_selection.IncrementalSearchCV now returns Dask objects for post-fit methods like .predict, etc (423).

Version 0.11.0

Note that this version of Dask-ML requires scikit-learn >= 0.20.0.

Enhancements

  • Added dask_ml.model_selection.IncrementalSearchCV, a meta-estimator for hyperparameter optimization on larger-than-memory datasets (356). See hyperparameter.incremental for more.
  • Added dask_ml.preprocessing.PolynomialTransformer, a drop-in replacement for the scikit-learn version (347).
  • Added auto-rechunking to Dask Arrays with more than one block along the features in dask_ml.model_selection.ParallelPostFit (376)
  • Added support for Dask DataFrame inputs to dask_ml.cluster.KMeans (390)
  • Added a compute keyword to dask_ml.wrappers.ParallelPostFit.score to support lazily evaluating a model's score (402)

Bug Fixes

  • Changed dask_ml.wrappers.ParallelPostFit to automatically rechunk input arrays to methods like predict when they have more than one block along the features (376).
  • Bug in dask_ml.impute.SimpleImputer with Dask DataFrames filling the count of the most frequent item, rather than the item itself (385).
  • Bug in dask_ml.model_selection.ShuffleSplit returning the same split when the random_state was set (380).

Version 0.10.0

Enhancements

  • Added support for dask.dataframe.DataFrame to dask_ml.model_selection.train_test_split (351)

Version 0.9.0

Enhancements

  • Added dask_ml.model_selection.ShuffleSplit (340)

Bug Fixes

  • Fixed handling of errors in the predict and score steps of dask_ml.model_selection.GridSearchCV and dask_ml.model_selection.RandomizedSearchCV (339)
  • Compatability with Dask 0.18 for dask_ml.preprocessing.LabelEncoder (you'll also notice improved performance) (336).

Documentation Updates

  • Added a roadmap. Please open an issue if you'd like something to be included on the roadmap. (322)
  • Added many examples to the documentation and the dask examples binder.

Build Changes

We're now using Numba for performance-sensitive parts of Dask-ML. Dask-ML is now a pure-python project, so we can provide universal wheels.

Version 0.8.0

Enhancements

  • Automatically replace default scikit-learn scorers with dask-aware versions in Incremental (200)
  • Added the dask_ml.metrics.log_loss loss function and neg_log_loss scorer (318)
  • Fixed handling of array-like fit-parameters to GridSearchCV and BaseSearchCV (320)

Bug Fixes

  • Fixed dtype in LabelEncoder.fit_transform to be integer, rather than the dtype of the classes for dask arrays (311)

Version 0.7.0

Enhancements

  • Added sample_weight support for dask_ml.metrics.accuracy_score. (217)
  • Improved performance of training on dask_ml.cluster.SpectralClustering (152)
  • Added dask_ml.preprocessing.LabelEncoder. (226)
  • Fixed issue in model_selection meta-estimators not respecting the default Dask scheduler (260)

API Breaking Changes

  • Removed the basis_inds_ attribute from dask_ml.cluster.SpectralClustering as its no longer used (152)
  • Change dask_ml.wrappers.Incremental.fit to clone the underlying estimator before training (258). This induces a few changes
    1. The underlying estimator no longer gives access to learned attributes like coef_. We recommend using Incremental.coef_.
    2. State no longer leaks between successive fit calls. Note that Incremental.partial_fit is still available if you want state, like learned attributes or random seeds, to be re-used. This is useful if you're making multiple passes over the training data.
  • Changed get_params and set_params for dask_ml.wrappers.Incremental to no longer magically get / set parameters for the underlying estimator (258). To specify parameters for the underlying estimator, use the double-underscore prefix convention established by scikit-learn:

    inc.set_params('estimator__alpha': 10)

Reorganization

Dask-SearchCV is now being developed in the dask/dask-ml repository. Users who previously installed dask-searchcv should now just install dask-ml.

Bug Fixes

  • Fixed random seed generation on 32-bit platforms (230)

Version 0.6.0

API Breaking Changes

  • Removed the get keyword from the incremental learner fit methods. (187)
  • Deprecated the various Partial* estimators in favor of the dask_ml.wrappers.Incremental meta-estimator (190)

Enhancements

  • Added a new meta-estimator dask_ml.wrappers.Incremental for wrapping any estimator with a partial_fit method. See incremental.blockwise-metaestimator for more. (190)
  • Added an R2-score metric dask_ml.metrics.r2_score.

Version 0.5.0

API Breaking Changes

  • The n_samples_seen_ attribute on dask_ml.preprocessing.StandardScalar is now consistently numpy.nan (157).
  • Changed the algorithm for dask_ml.datasets.make_blobs, dask_ml.datasets.make_regression and dask_ml.datasets.make_classfication to reduce the single-machine peak memory usage (67)

Enhancements

  • Added dask_ml.model_selection.train_test_split and dask_ml.model_selection.ShuffleSplit (172)
  • Added dask_ml.metrics.classification_score, dask_ml.metrics.mean_absolute_error, and dask_ml.metrics.mean_squared_error.

Bug Fixes

  • dask_ml.preprocessing.StandardScalar now works on DataFrame inputs (157).

Version 0.4.1

This release added several new estimators.

Enhancements

Added dask_ml.preprocessing.RobustScaler

Scale features using statistics that are robust to outliers. This mirrors sklearn.preprocessing.RobustScalar (62).

Added dask_ml.preprocessing.OrdinalEncoder

Encodes categorical features as ordinal, in one ordered feature (119).

Added dask_ml.wrappers.ParallelPostFit

A meta-estimator for fitting with any scikit-learn estimator, but post-processing (predict, transform, etc.) in parallel on dask arrays. See parallel-meta-estimators for more (132).

Version 0.4.0

API Changes

  • Changed the arguments of the dask-glm based estimators in dask_glm.linear_model to match scikit-learn's API (94).

    • To specify lambuh use C = 1.0 / lambduh (the default of 1.0 is unchanged)
    • The rho, over_relax, abstol and reltol arguments have been removed. Provide them in solver_kwargs instead.

    This affects the LinearRegression, LogisticRegression and PoissonRegression estimators.

Enhancements

  • Accept dask.dataframe for dask-glm based estimators (84).

Version 0.3.2

Enhancements

  • Added dask_ml.preprocessing.TruncatedSVD and dask_ml.preprocessing.PCA (78)

Version 0.3.0

Enhancements

  • Added KMeans.predict (83)

API Changes

  • Changed the fitted attributes on MinMaxScaler and StandardScaler to be concrete NumPy or pandas objects, rather than persisted dask objects (75).