Skip to content

Latest commit

 

History

History
287 lines (217 loc) · 11.2 KB

v0.24.rst

File metadata and controls

287 lines (217 loc) · 11.2 KB

sklearn

Version 0.24.0

In Development

Put the changes in their relevant module.

Changed models

The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.

  • items
  • items

Details are listed in the changelog below.

(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)

Changelog

sklearn.calibrator

  • Allow calibrator.CalibratedClassifierCV use with prefit pipeline.Pipeline where data is not X is not array-like, sparse matrix or dataframe at the start. 17546 by Lucy Liu <lucyleeow>.

sklearn.datasets

  • datasets.fetch_openml now allows argument as_frame to be 'auto', which tries to convert returned data to pandas DataFrame unless data is sparse. 17396 by Jiaxiang <fujiaxiang>.
  • datasets.fetch_openml now validates md5checksum of arff files downloaded or cached to ensure data integrity. 14800 by Shashank Singh <shashanksingh28> and Joel Nothman.
  • datasets.fetch_covtype now now supports the optional argument as_frame; when it is set to True, the returned Bunch object's data and frame members are pandas DataFrames, and the target member is a pandas Series. 17491 by Alex Liang <tianchuliang>.

sklearn.decomposition

  • Fixed a bug in decomposition.MiniBatchDictionaryLearning.partial_fit which should update the dictionary by iterating only once over a mini-batch. 17433 by Chiara Marmo <cmarmo>.
  • Fix decomposition.SparseCoder such that it follows scikit-learn API and support cloning. The attribute components_ is deprecated in 0.24 and will be removed in 0.26. This attribute was redundant with the dictionary attribute and constructor parameter. 17679 by Xavier Dupré <sdpython>.

sklearn.ensemble

  • ensemble.HistGradientBoostingRegressor and ensemble.HistGradientClassifier now support staged_predict, which allows monitoring of each stage. 16985 by Hao Chun Chang <haochunchang>.
  • Fixed bug in ensemble.MultinomialDeviance where the average of logloss was incorrectly calculated as sum of logloss. 17694 by Markus Rempfler <rempfler> and Tsutomu Kusanagi <t-kusanagi2>.

sklearn.feature_selection

  • A new parameter importance_getter was added to feature_selection.RFE, feature_selection.RFECV and feature_selection.SelectFromModel, allowing the user to specify an attribute name/path or a callable for extracting feature importance from the estimator. 15361 by Venkatachalam N <venkyyuvy>
  • Added the option for the number of n_features_to_select to be given as a float representing the percentage of features to select. 17090 by Lisa Schwetlick <lschwetlick> and Marija Vlajic Wheeler <marijavlajic>.

sklearn.impute

  • replace the default values in impute.IterativeImputer of min_value and max_value parameters to -np.inf and np.inf, respectively instead of None. However, the behaviour of the class does not change since None was defaulting to these values already. 16493 by Darshan N <DarshanGowda0>.
  • impute.SimpleImputer now supports a list of strings when strategy='most_frequent' or strategy='constant'. 17526 by Ayako YAGI <yagi-3> and Juan Carlos Alfaro Jiménez <alfaro96>.
  • impute.SimpleImputer now supports inverse_transform functionality to revert imputed data to original when instantiated with add_indicator=True. 17612 by Srimukh Sripada <d3b0unce>

sklearn.inspection

  • inspection.partial_dependence and inspection.plot_partial_dependence now support calculating and plotting Individual Conditional Expectation (ICE) curves controlled by the kind parameter. 16619 by Madhura Jayratne <madhuracj>.

sklearn.isotonic

  • Expose fitted attributes X_thresholds_ and y_thresholds_ that hold the de-duplicated interpolation thresholds of an isotonic.IsotonicRegression instance for model inspection purpose. 16289 by Masashi Kishimoto <kishimoto-banana> and Olivier Grisel <ogrisel>.

sklearn.metrics

  • Added metrics.mean_absolute_percentage_error metric and the associated scorer for regression problems. 10708 fixed with the PR 15007 by Ashutosh Hathidara <ashutosh1919>. The scorer and some practical test cases were taken from PR 10711 by Mohamed Ali Jamaoui <mohamed-ali>.
  • Fixed a bug in metrics.mean_squared_error where the average of multiple RMSE values was incorrectly calculated as the root of the average of multiple MSE values. 17309 by Swier Heeres <swierh>
  • Add sample_weight parameter to metrics.median_absolute_error. 17225 by Lucy Liu <lucyleeow>.
  • Add pos_label parameter in metrics.plot_precision_recall_curve in order to specify the positive class to be used when computing the precision and recall statistics. 17569 by Guillaume Lemaitre <glemaitre>.
  • metrics.plot_confusion_matrix now supports making colorbar optional in the matplotlib plot by setting colorbar=False. 17192 by Avi Gupta <avigupta2612>
  • Add pos_label parameter in metrics.plot_roc_curve in order to specify the positive class to be used when computing the roc auc statistics. 17651 by Clara Matos <claramatos>.

sklearn.model_selection

  • model_selection.TimeSeriesSplit has two new keyword arguments test_size and gap. test_size allows the out-of-sample time series length to be fixed for all folds. gap removes a fixed number of samples between the train and test set on each fold. 13204 by Kyle Kosic <kykosic>.
  • model_selection.RandomizedSearchCV and model_selection.GridSearchCV now have the method, score_samples 17478 by Teon Brooks <teonbrooks> and Mohamed Maskani <maskani-moh>.

sklearn.multiclass

  • A fix to allow multiclass.OutputCodeClassifier to accept sparse input data in its fit and predict methods. The check for validity of the input is now delegated to the base estimator. 17233 by Zolisa Bleki <zoj613>.

sklearn.naive_bayes

  • : The attributes coef_ and intercept_ are now deprecated in naive_bayes.MultinomialNB, naive_bayes.ComplementNB, naive_bayes.BernoulliNB and naive_bayes.CategoricalNB, and will be removed in v0.26. 17427 by Juan Carlos Alfaro Jiménez <alfaro96>.

sklearn.neighbors

  • Speed up seuclidean, wminkowski, mahalanobis and haversine metrics in neighbors.DistanceMetric by avoiding unexpected GIL acquiring in Cython when setting n_jobs>1 in neighbors.KNeighborsClassifier, neighbors.KNeighborsRegressor, neighbors.RadiusNeighborsClassifier, neighbors.RadiusNeighborsRegressor, metrics.pairwise_distances and by validating data out of loops. 17038 by Wenbo Zhao <webber26232>.
  • neighbors.NeighborsBase benefits of an improved algorithm = 'auto' heuristic. In addition to the previous set of rules, now, when the number of features exceeds 15, brute is selected, assuming the data intrinsic dimensionality is too high for tree-based methods. 17148 by Geoffrey Bolmier <gbolmier>.

sklearn.neural_network

  • Neural net training and prediction are now a little faster. 17603, 17604, 17606, 17608, 17609, 17633, 17661 by Alex Henrie <alexhenrie>.
  • Avoid converting float32 input to float64 in neural_network.BernoulliRBM. 16352 by Arthur Imbert <Henley13>.
  • Support 32-bit computations in neural_network.MLPClassifier and neural_network.MLPRegressor. 17759 by Srimukh Sripada <d3b0unce>.

sklearn.preprocessing

  • Verbose output of model_selection.GridSearchCV has been improved for readability. 16935 by Raghav Rajagopalan <raghavrv> and Chiara Marmo <cmarmo>.
  • Add unit_variance to preprocessing.RobustScaler, which scales output data such that normally distributed features have a variance of 1. 17193 by Lucy Liu <lucyleeow> and Mabel Villalba <mabelvj>.
  • Add dtype parameter to preprocessing.KBinsDiscretizer. 16335 by Arthur Imbert <Henley13>.

sklearn.svm

  • invoke scipy blas api for svm kernel function in fit, predict and related methods of svm.SVC, svm.NuSVC, svm.SVR, svm.NuSVR, OneClassSVM. 16530 by Shuhua Fan <jim0421>.

sklearn.tree

  • tree.plot_tree now uses colors from the matplotlib configuration settings. 17187 by Andreas Müller.
  • : The parameter X_idx_sorted is now deprecated in tree.DecisionTreeClassifier.fit and tree.DecisionTreeRegressor.fit, and has not effect. 17614 by Juan Carlos Alfaro Jiménez <alfaro96>.

Code and Documentation Contributors

Thanks to everyone who has contributed to the maintenance and improvement of the project since version 0.20, including: