sklearn
In Development
Put the changes in their relevant module.
The following estimators and functions, when fit with the same data and parameters, may produce different models from the previous version. This often occurs due to changes in the modelling logic (bug fixes or enhancements), or in random sampling procedures.
- items
- items
Details are listed in the changelog below.
(While we are trying to better inform users by providing this information, we cannot assure that this list is complete.)
- Allow
calibrator.CalibratedClassifierCV
use with prefitpipeline.Pipeline
where data is not X is not array-like, sparse matrix or dataframe at the start.17546
byLucy Liu <lucyleeow>
.
datasets.fetch_openml
now allows argument as_frame to be 'auto', which tries to convert returned data to pandas DataFrame unless data is sparse.17396
byJiaxiang <fujiaxiang>
.datasets.fetch_openml
now validates md5checksum of arff files downloaded or cached to ensure data integrity.14800
byShashank Singh <shashanksingh28>
and Joel Nothman.datasets.fetch_covtype
now now supports the optional argument as_frame; when it is set to True, the returned Bunch object's data and frame members are pandas DataFrames, and the target member is a pandas Series.17491
byAlex Liang <tianchuliang>
.
- Fixed a bug in
decomposition.MiniBatchDictionaryLearning.partial_fit
which should update the dictionary by iterating only once over a mini-batch.17433
byChiara Marmo <cmarmo>
. - Fix
decomposition.SparseCoder
such that it follows scikit-learn API and support cloning. The attribute components_ is deprecated in 0.24 and will be removed in 0.26. This attribute was redundant with the dictionary attribute and constructor parameter.17679
byXavier Dupré <sdpython>
.
ensemble.HistGradientBoostingRegressor
andensemble.HistGradientClassifier
now support staged_predict, which allows monitoring of each stage.16985
byHao Chun Chang <haochunchang>
.- Fixed bug in
ensemble.MultinomialDeviance
where the average of logloss was incorrectly calculated as sum of logloss.17694
byMarkus Rempfler <rempfler>
andTsutomu Kusanagi <t-kusanagi2>
.
- A new parameter importance_getter was added to
feature_selection.RFE
,feature_selection.RFECV
andfeature_selection.SelectFromModel
, allowing the user to specify an attribute name/path or a callable for extracting feature importance from the estimator.15361
byVenkatachalam N <venkyyuvy>
- Added the option for the number of n_features_to_select to be given as a float representing the percentage of features to select.
17090
byLisa Schwetlick <lschwetlick>
andMarija Vlajic Wheeler <marijavlajic>
.
- replace the default values in
impute.IterativeImputer
of min_value and max_value parameters to -np.inf and np.inf, respectively instead of None. However, the behaviour of the class does not change since None was defaulting to these values already.16493
byDarshan N <DarshanGowda0>
. impute.SimpleImputer
now supports a list of strings whenstrategy='most_frequent'
orstrategy='constant'
.17526
byAyako YAGI <yagi-3>
andJuan Carlos Alfaro Jiménez <alfaro96>
.impute.SimpleImputer
now supportsinverse_transform
functionality to revert imputed data to original when instantiated with add_indicator=True.17612
bySrimukh Sripada <d3b0unce>
inspection.partial_dependence
andinspection.plot_partial_dependence
now support calculating and plotting Individual Conditional Expectation (ICE) curves controlled by thekind
parameter.16619
byMadhura Jayratne <madhuracj>
.
- Expose fitted attributes
X_thresholds_
andy_thresholds_
that hold the de-duplicated interpolation thresholds of anisotonic.IsotonicRegression
instance for model inspection purpose.16289
byMasashi Kishimoto <kishimoto-banana>
andOlivier Grisel <ogrisel>
.
- Added
metrics.mean_absolute_percentage_error
metric and the associated scorer for regression problems.10708
fixed with the PR15007
byAshutosh Hathidara <ashutosh1919>
. The scorer and some practical test cases were taken from PR10711
byMohamed Ali Jamaoui <mohamed-ali>
. - Fixed a bug in
metrics.mean_squared_error
where the average of multiple RMSE values was incorrectly calculated as the root of the average of multiple MSE values.17309
bySwier Heeres <swierh>
- Add sample_weight parameter to
metrics.median_absolute_error
.17225
byLucy Liu <lucyleeow>
. - Add pos_label parameter in
metrics.plot_precision_recall_curve
in order to specify the positive class to be used when computing the precision and recall statistics.17569
byGuillaume Lemaitre <glemaitre>
. metrics.plot_confusion_matrix
now supports making colorbar optional in the matplotlib plot by setting colorbar=False.17192
byAvi Gupta <avigupta2612>
- Add pos_label parameter in
metrics.plot_roc_curve
in order to specify the positive class to be used when computing the roc auc statistics.17651
byClara Matos <claramatos>
.
model_selection.TimeSeriesSplit
has two new keyword arguments test_size and gap. test_size allows the out-of-sample time series length to be fixed for all folds. gap removes a fixed number of samples between the train and test set on each fold.13204
byKyle Kosic <kykosic>
.model_selection.RandomizedSearchCV
andmodel_selection.GridSearchCV
now have the method,score_samples
17478
byTeon Brooks <teonbrooks>
andMohamed Maskani <maskani-moh>
.
- A fix to allow
multiclass.OutputCodeClassifier
to accept sparse input data in its fit and predict methods. The check for validity of the input is now delegated to the base estimator.17233
byZolisa Bleki <zoj613>
.
- : The attributes
coef_
andintercept_
are now deprecated innaive_bayes.MultinomialNB
,naive_bayes.ComplementNB
,naive_bayes.BernoulliNB
andnaive_bayes.CategoricalNB
, and will be removed in v0.26.17427
byJuan Carlos Alfaro Jiménez <alfaro96>
.
- Speed up
seuclidean
,wminkowski
,mahalanobis
andhaversine
metrics inneighbors.DistanceMetric
by avoiding unexpected GIL acquiring in Cython when settingn_jobs>1
inneighbors.KNeighborsClassifier
,neighbors.KNeighborsRegressor
,neighbors.RadiusNeighborsClassifier
,neighbors.RadiusNeighborsRegressor
,metrics.pairwise_distances
and by validating data out of loops.17038
byWenbo Zhao <webber26232>
. neighbors.NeighborsBase
benefits of an improved algorithm = 'auto' heuristic. In addition to the previous set of rules, now, when the number of features exceeds 15, brute is selected, assuming the data intrinsic dimensionality is too high for tree-based methods.17148
byGeoffrey Bolmier <gbolmier>
.
- Neural net training and prediction are now a little faster.
17603
,17604
,17606
,17608
,17609
,17633
,17661
byAlex Henrie <alexhenrie>
. - Avoid converting float32 input to float64 in
neural_network.BernoulliRBM
.16352
byArthur Imbert <Henley13>
. - Support 32-bit computations in
neural_network.MLPClassifier
andneural_network.MLPRegressor
.17759
bySrimukh Sripada <d3b0unce>
.
- Verbose output of
model_selection.GridSearchCV
has been improved for readability.16935
byRaghav Rajagopalan <raghavrv>
andChiara Marmo <cmarmo>
. - Add
unit_variance
topreprocessing.RobustScaler
, which scales output data such that normally distributed features have a variance of 1.17193
byLucy Liu <lucyleeow>
andMabel Villalba <mabelvj>
. - Add dtype parameter to
preprocessing.KBinsDiscretizer
.16335
byArthur Imbert <Henley13>
.
- invoke scipy blas api for svm kernel function in
fit
,predict
and related methods ofsvm.SVC
,svm.NuSVC
,svm.SVR
,svm.NuSVR
,OneClassSVM
.16530
byShuhua Fan <jim0421>
.
tree.plot_tree
now uses colors from the matplotlib configuration settings.17187
by Andreas Müller.- : The parameter
X_idx_sorted
is now deprecated intree.DecisionTreeClassifier.fit
andtree.DecisionTreeRegressor.fit
, and has not effect.17614
byJuan Carlos Alfaro Jiménez <alfaro96>
.
Thanks to everyone who has contributed to the maintenance and improvement of the project since version 0.20, including: