Skip to content

Releases: alteryx/evalml

v0.24.0

05 May 15:14
96fed90
Compare
Choose a tag to compare

v0.24.0 May. 4, 2021

Enhancements

  • Added date_index as a required parameter for TimeSeries problems #2217
  • Have the OneHotEncoder return the transformed columns as booleans rather than floats #2170
  • Added Oversampler transformer component to EvalML #2079
  • Added Undersampler to AutoMLSearch, as well as arguments _sampler_method and sampler_balanced_ratio #2128
  • Updated prediction explanations functions to allow pipelines with XGBoost estimators #2162
  • Added partial dependence for datetime columns #2180
  • Update precision-recall curve with positive label index argument, and fix for 2d predicted probabilities #2090
  • Add pct_null_rows to HighlyNullDataCheck #2211
  • Added a standalone AutoML search method for convenience, which runs data checks and then runs automl #2152
  • Make the first batch of AutoML have a predefined order, with linear models first and complex models last #2223

Fixes

  • Fixed partial dependence not respecting grid resolution parameter for numerical features #2180
  • Enable prediction explanations for catboost for multiclass problems #2224

Changes

  • Deleted baseline pipeline classes #2202
  • Reverting user specified date feature PR #2155 until pmdarima installation fix is found #2214
  • Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. #2091
  • Removed all old datasplitters from EvalML #2193
  • Deleted make_pipeline_from_components #2218

Documentation Changes

  • Renamed dataset to clarify that its gzipped but not a tarball #2183
  • Updated documentation to use pipeline instances instead of pipeline subclasses #2195
  • Updated contributing guide with a note about GitHub Actions permissions #2090
  • Updated automl and model understanding user guides #2090

Testing Changes

  • Use machineFL user token for dependency update bot, and add more reviewers #2189

Breaking Changes

  • All baseline pipeline classes (BaselineBinaryPipeline, BaselineMulticlassPipeline, BaselineRegressionPipeline, etc.) have been deleted #2202
  • Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. Pipelines can now be initialized by specifying the component graph as the first parameter, and then passing in optional arguments such as custom_name, parameters, etc. For example, BinaryClassificationPipeline(["Random Forest Classifier"], parameters={}). #2091
  • Removed all old datasplitters from EvalML #2193
  • Deleted utility method make_pipeline_from_components #2218

v0.23.0

21 Apr 04:23
d6bec07
Compare
Choose a tag to compare

v0.23.0 Apr. 21, 2021

Enhancements

  • Refactored EngineBase and SequentialEngine api. Adding DaskEngine #1975.
  • Added optional engine argument to AutoMLSearch #1975
  • Added a warning about how time series support is still in beta when a user passes in a time series problem to AutoMLSearch #2118
  • Added NaturalLanguageNaNDataCheck data check #2122
  • Added ValueError to partial_dependence to prevent users from computing partial dependence on columns with all NaNs #2120
  • Added standard deviation of cv scores to rankings table #2154

Fixes

  • Fixed BalancedClassificationDataCVSplit, BalancedClassificationDataTVSplit, and BalancedClassificationSampler to use minority:majority ratio instead of majority:minority #2077
  • Fixed bug where two-way partial dependence plots with categorical variables were not working correctly #2117
  • Fixed bug where hyperparameters were not displaying properly for pipelines with a list component_graph and duplicate components #2133
  • Fixed bug where pipeline_parameters argument in AutoMLSearch was not applied to pipelines passed in as allowed_pipelines #2133
  • Fixed bug where AutoMLSearch was not applying custom hyperparameters to pipelines with a list component_graph and duplicate components #2133

Changes

  • Removed hyperparameter_ranges from Undersampler and renamed balanced_ratio to sampling_ratio for samplers #2113
  • Renamed TARGET_BINARY_NOT_TWO_EXAMPLES_PER_CLASS data check message code to TARGET_MULTICLASS_NOT_TWO_EXAMPLES_PER_CLASS #2126
  • Modified one-way partial dependence plots of categorical features to display data with a bar plot #2117
  • Renamed score column for automl.rankings as mean_cv_score #2135

Documentation Changes

  • Fixed conf.py file #2112
  • Added a sentence to the automl user guide stating that our support for time series problems is still in beta. #2118
  • Fixed documentation demos #2139
  • Update test badge in README to use GitHub Actions #2150

Testing Changes

  • Fixed test_describe_pipeline for pandas v1.2.4 #2129
  • Added a GitHub Action for building the conda package #1870 #2148

Breaking Changes

  • Renamed balanced_ratio to sampling_ratio for the BalancedClassificationDataCVSplit, BalancedClassificationDataTVSplit, BalancedClassficationSampler, and Undersampler #2113
  • Deleted the "errors" key from automl results #1975
  • Deleted the raise_and_save_error_callback and the log_and_save_error_callback #1975
  • Fixed BalancedClassificationDataCVSplit, BalancedClassificationDataTVSplit, and BalancedClassificationSampler to use minority:majority ratio instead of majority:minority #2077

v0.22.0

07 Apr 15:47
581a7fb
Compare
Choose a tag to compare

v0.22.0 Apr. 7, 2021

Enhancements

  • Added a GitHub Action for linux_unit_tests#2013
  • Added recommended actions for InvalidTargetDataCheck, updated _make_component_list_from_actions to address new action, and added TargetImputer component #1989
  • Updated AutoMLSearch._check_for_high_variance to not emit RuntimeWarning #2024
  • Added exception when pipeline passed to explain_predictions is a Stacked Ensemble pipeline #2033
  • Added sensitivity at low alert rates as an objective #2001
  • Added Undersampler transformer component #2030

Fixes

  • Updated Engine's train_batch to apply undersampling #2038
  • Fixed bug in where Time Series Classification pipelines were not encoding targets in predict and predict_proba #2040
  • Fixed data splitting errors if target is float for classification problems #2050
  • Pinned docutils to <0.17 to fix ReadtheDocs warning issues #2088

Changes

  • Removed lists as acceptable hyperparameter ranges in AutoMLSearch #2028
  • Renamed "details" to "metadata" for data check actions #2008

Documentation Changes

  • Catch and suppress warnings in documentation #1991 #2097
  • Change spacing in start.ipynb to provide clarity for AutoMLSearch #2078
  • Fixed start code on README #2108

v0.21.0

24 Mar 21:04
4e95739
Compare
Choose a tag to compare

v0.21.0 Mar. 24, 2021

Enhancements

  • Changed AutoMLSearch to default optimize_thresholds to True #1943
  • Added multiple oversampling and undersampling sampling methods as data splitters for imbalanced classification #1775
  • Added params to balanced classification data splitters for visibility #1966
  • Updated make_pipeline to not add Imputer if input data does not have numeric or categorical columns #1967
  • Updated ClassImbalanceDataCheck to better handle multiclass imbalances #1986
  • Added recommended actions for the output of data check's validate method #1968
  • Added error message for partial_dependence when features are mostly the same value #1994
  • Updated OneHotEncoder to drop one redundant feature by default for features with two categories #1997
  • Added a PolynomialDetrender component #1992

Fixes

  • Updated binary classification pipelines to use objective decision function during scoring of custom objectives #1934

Changes

  • Removed data_checks parameter, data_check_results and data checks logic from AutoMLSearch #1935
  • Deleted random_state argument #1985
  • Updated Woodwork version requirement to v0.0.11 #1996

Documentation Changes

Testing Changes

  • Removed build_docs CI job in favor of RTD GH builder #1974
  • Added tests to confirm support for Python 3.9 #1724
  • Changed build_conda_pkg job to use latest_release_changes branch in the feedstock. #1979

Breaking Changes

  • Changed AutoMLSearch to default optimize_thresholds to True #1943
  • Removed data_checks parameter, data_check_results and data checks logic from AutoMLSearch. To run the data checks which were previously run by default in AutoMLSearch, please call DefaultDataChecks().validate(X_train, y_train) or take a look at our documentation for more examples. #1935
  • Deleted random_state argument #1985

v0.20.0

11 Mar 00:15
91775ff
Compare
Choose a tag to compare

v0.20.0 Mar. 10, 2021

Enhancements

  • Added a GitHub Action for Detecting dependency changes #1933
  • Create a separate CV split to train stacked ensembler on for AutoMLSearch #1814
  • Added a GitHub Action for Linux unit tests #1846
  • Added DataCheckAction class and DataCheckActionCode enum #1896
  • Updated Woodwork requirement to v0.0.10 #1900
  • Added BalancedClassificationDataCVSplit and BalancedClassificationDataTVSplit to AutoMLSearch #1875
  • Update default classification data splitter to use downsampling for highly imbalanced data #1875
  • Updated describe_pipeline to return more information, including id of pipelines used for ensemble models #1909
  • Added utility method to create list of components from a list of DataCheckAction #1907
  • Updated validate method to include a action key in returned dictionary for all DataCheckand DataChecks #1916
  • Aggregating the shap values for predictions that we know the provenance of, e.g. OHE, text, and date-time. #1901
  • Improved error message when custom objective is passed as a string in pipeline.score #1941
  • Added score_pipelines and train_pipelines methods to AutoMLSearch #1913
  • Added score_batch and train_batch abstact methods to EngineBase and implementations in SequentialEngine #1913

Fixes

  • Removed CI check for check_dependencies_updated_linux #1950
  • Added metaclass for time series pipelines and fix binary classification pipeline predict not using objective if it is passed as a named argument #1874
  • Fixed stack trace in prediction explanation functions caused by mixed string/numeric pandas column names #1871
  • Fixed stack trace caused by passing pipelines with duplicate names to AutoMLSearch #1932
  • Fixed AutoMLSearch.get_pipelines returning pipelines with the same attributes #1958

Changes

  • Reversed GitHub Action for Linux unit tests until a fix for report generation is found #1920
  • Updated add_results in AutoMLAlgorithm to take in entire pipeline results dictionary from AutoMLSearch #1891
  • Updated ClassImbalanceDataCheck to look for severe class imbalance scenarios #1905
  • Deleted the explain_prediction function #1915
  • Removed HighVarianceCVDataCheck and convered it to an AutoMLSearch method instead #1928

Documentation Changes

  • Updated model_understanding.ipynb to demo the two-way partial dependence capability #1919

Testing Changes

Breaking Changes

  • Deleted the explain_prediction function #1915
  • Removed HighVarianceCVDataCheck and convered it to an AutoMLSearch method instead #1928
  • Added score_batch and train_batch abstact methods to EngineBase. These need to be implemented in Engine subclasses #1913

v0.19.0

24 Feb 19:14
3eafd9b
Compare
Choose a tag to compare

v0.19.0 Feb. 24, 2021

Enhancements

  • Added a GitHub Action for Python windows unit tests #1844
  • Added a GitHub Action for checking updated release notes #1849
  • Added a GitHub Action for Python lint checks #1837
  • Adjusted explain_prediction, explain_predictions and explain_predictions_best_worst to handle timeseries problems. #1818
  • Updated InvalidTargetDataCheck to check for mismatched indices in target and features #1816
  • Updated Woodwork structures returned from components to support Woodwork logical type overrides set by the user #1784
  • Updated estimators to keep track of input feature names during fit() #1794
  • Updated visualize_decision_tree to include feature names in output #1813
  • Added is_bounded_like_percentage property for objectives. If true, the calculate_percent_difference method will return the absolute difference rather than relative difference #1809
  • Added full error traceback to AutoMLSearch logger file #1840
  • Changed TargetEncoder to preserve custom indices in the data #1836
  • Refactored explain_predictions and explain_predictions_best_worst to only compute features once for all rows that need to be explained #1843
  • Added custom random undersampling sampler for classification #1857
  • Updated OutliersDataCheck implementation to calculate the probability of having no outliers #1855
  • Added Engines pipeline processing API #1838

Fixes

  • Changed EngineBase random_state arg to random_seed and same for user guide docs #1889

Changes

  • Modified calculate_percent_difference so that division by 0 is now inf rather than nan #1809
  • Removed text_columns parameter from LSA and TextFeaturizer components #1652
  • Added random_seed as an argument to our automl/pipeline/component API. Using random_state will raise a warning #1798
  • Added DataCheckError message in InvalidTargetDataCheck if input target is None and removed exception raised #1866

Testing Changes

  • Added back coverage for _get_feature_provenance in TextFeaturizer after text_columns was removed #1842
  • Pin graphviz version for windows builds #1847
  • Unpin graphviz version for windows builds #1851

Breaking Changes

  • Added a deprecation warning to explain_prediction. It will be deleted in the next release. #1860

v0.18.2

10 Feb 17:11
c3bd8d3
Compare
Choose a tag to compare

v0.18.2 Feb. 10, 2021

Enhancements

  • Added uniqueness score data check #1785
  • Added "dataframe" output format for prediction explanations #1781
  • Updated LightGBM estimators to handle pandas.MultiIndex #1770
  • Sped up permutation importance for some pipelines #1762
  • Added sparsity data check #1797
  • Confirmed support for threshold tuning for binary time series classification problems #1803

Fixes

Changes

Documentation Changes

  • Added section on conda to the contributing guide #1771
  • Updated release process to reflect freezing main before perf tests #1787
  • Moving some prs to the right section of the release notes #1789
  • Tweak README.md. #1800
  • Fixed back arrow on install page docs #1795

Testing Changes

v0.18.1

02 Feb 01:20
1f089b9
Compare
Choose a tag to compare

v0.18.1 Feb. 1, 2021

Enhancements

  • Added graph_t_sne as a visualization tool for high dimensional data #1731
  • Added the ability to see the linear coefficients of features in linear models terms #1738
  • Added support for scikit-learn v0.24.0 #1733
  • Added support for scipy v1.6.0 #1752
  • Added SVM Classifier and Regressor to estimators #1714 #1761

Fixes

  • Addressed bug with partial_dependence and categorical data with more categories than grid resolution #1748
  • Removed random_state arg from get_pipelines in AutoMLSearch #1719
  • Pinned pyzmq at less than 22.0.0 till we add support #1756
  • Remove ProphetRegressor from main as windows tests were flaky #1764

Changes

  • Updated components and pipelines to return Woodwork data structures #1668
  • Updated clone() for pipelines and components to copy over random state automatically #1753
  • Dropped support for Python version 3.6 #1751
  • Removed deprecated verbose flag from AutoMLSearch parameters #1772

Documentation Changes

  • Add Twitter and Github link to documentation toolbar #1754
  • Added Open Graph info to documentation #1758

Testing Changes

Breaking Changes

  • Components and pipelines return Woodwork data structures instead of pandas data structures #1668
  • Python 3.6 will not be actively supported due to discontinued support from EvalML dependencies.
  • Deprecated verbose flag is removed for AutoMLSearch #1772

v0.18.0

26 Jan 22:19
4630f26
Compare
Choose a tag to compare

v0.18.0 Jan. 26, 2021

Enhancements

  • Added RMSLE, MSLE, and MAPE to core objectives while checking for negative target values in invalid_targets_data_check #1574
  • Added validation checks for binary problems with regression-like datasets and multiclass problems without true multiclass targets in invalid_targets_data_check #1665
  • Added time series support for make_pipeline #1566
  • Added target name for output of pipeline predict method #1578
  • Added multiclass check to InvalidTargetDataCheck for two examples per class #1596
  • Support graphviz 0.16 #1657
  • Enhanced time series pipelines to accept empty features #1651
  • Added KNN Classifier to estimators. #1650
  • Added support for list inputs for objectives #1663
  • Added support for AutoMLSearch to handle time series classification pipelines #1666
  • Enhanced DelayedFeaturesTransformer to encode categorical features and targets before delaying them #1691
  • Added 2-way dependence plots. #1690
  • Added ability to directly iterate through components within Pipelines #1583

Fixes

  • Fixed inconsistent attributes and added Exceptions to docs #1673
  • Fixed TargetLeakageDataCheck to use Woodwork mutual_information rather than using Pandas' Pearson Correlation #1616
  • Fixed thresholding for pipelines in AutoMLSearch to only threshold binary classification pipelines #1622 #1626
  • Updated load_data to return Woodwork structures and update default parameter value for index to None #1610
  • Pinned scipy at < 1.6.0 while we work on adding support #1629
  • Fixed data check message formatting in AutoMLSearch #1633
  • Addressed stacked ensemble component for scikit-learn v0.24 support by setting shuffle=True for default CV #1613
  • Fixed bug where Imputer reset the index on X #1590
  • Fixed AutoMLSearch stacktrace when a cutom objective was passed in as a primary objective or additional objective #1575
  • Fixed custom index bug for MAPE objective #1641
  • Fixed index bug for TextFeaturizer and LSA components #1644
  • Limited load_fraud dataset loaded into automl.ipynb #1646
  • add_to_rankings updates AutoMLSearch.best_pipeline when necessary #1647
  • Fixed bug where time series baseline estimators were not receiving gap and max_delay in AutoMLSearch #1645
  • Fixed jupyter notebooks to help the RTD buildtime #1654
  • Added positive_only objectives to non_core_objectives #1661
  • Fixed stacking argument n_jobs for IterativeAlgorithm #1706
  • Updated CatBoost estimators to return self in .fit() rather than the underlying model for consistency #1701
  • Added ability to initialize pipeline parameters in AutoMLSearch constructor #1676
  • Make AutoMLSearch pipelines pickle-able #1721

Changes

  • Added labeling to graph_confusion_matrix #1632
  • Rerunning search for AutoMLSearch results in a message thrown rather than failing the search, and removed has_searched property #1647
  • Changed tuner class to allow and ignore single parameter values as input #1686
  • Capped LightGBM version limit to remove bug in docs #1711
  • Removed support for np.random.RandomState in EvalML #1727

Documentation Changes

  • Update Model Understanding in the user guide to include visualize_decision_tree #1678
  • Updated docs to include information about AutoMLSearch callback parameters and methods #1577
  • Updated docs to prompt users to install graphiz on Mac #1656
  • Added infer_feature_types to the start.ipynb guide #1700
  • Added multicollinearity data check to API reference and docs #1707

Testing Changes

Breaking Changes

  • Removed has_searched property from AutoMLSearch #1647
  • Removed support for np.random.RandomState in EvalML. Rather than passing np.random.RandomState as component and pipeline random_state values, we use int random_seed #1727

v0.17.0

30 Dec 00:36
0e671b9
Compare
Choose a tag to compare

v0.17.0 Dec. 29, 2020

Enhancements

  • Added save_plot that allows for saving figures from different backends #1588
  • Added LightGBM Regressor to regression components #1459
  • Added visualize_decision_tree for tree visualization with decision_tree_data_from_estimator and decision_tree_data_from_pipeline to reformat tree structure output #1511
  • Added DFS Transformer component into transformer components #1454
  • Added MAPE to the standard metrics for time series problems and update objectives #1510
  • Added graph_prediction_vs_actual_over_time and get_prediction_vs_actual_over_time_data to the model understanding module for time series problems #1483
  • Added a ComponentGraph class that will support future pipelines as directed acyclic graphs #1415
  • Updated data checks to accept Woodwork data structures #1481
  • Added parameter to InvalidTargetDataCheck to show only top unique values rather than all unique values #1485
  • Added multicollinearity data check #1515
  • Added baseline pipeline and components for time series regression problems #1496
  • Added more information to users about ensembling behavior in AutoMLSearch #1527
  • Add woodwork support for more utility and graph methods #1544
  • Changed DateTimeFeaturizer to encode features as int #1479
  • Return trained pipelines from AutoMLSearch.best_pipeline #1547
  • Added utility method so that users can set feature types without having to learn about Woodwork directly #1555
  • Added Linear Discriminant Analysis transformer for dimensionality reduction #1331
  • Added multiclass support for partial_dependence and graph_partial_dependence #1554
  • Added TimeSeriesBinaryClassificationPipeline and TimeSeriesMulticlassClassificationPipeline classes #1528
  • Added make_data_splitter method for easier automl data split customization #1568
  • Integrated ComponentGraph class into Pipelines for full non-linear pipeline support #1543
  • Update AutoMLSearch constructor to take training data instead of search and add_to_leaderboard #1597
  • Update split_data helper args #1597
  • Add problem type utils is_regression, is_classification, is_timeseries #1597
  • Rename AutoMLSearch data_split arg to data_splitter #1569

Fixes

  • Fix Windows CI jobs: install numba via conda, required for shap #1490
  • Added custom-index support for reset-index-get_prediction_vs_actual_over_time_data #1494
  • Fix generate_pipeline_code to account for boolean and None differences between Python and JSON #1524 #1531
  • Set max value for plotly and xgboost versions while we debug CI failures with newer versions #1532
  • Undo version pinning for plotly #1533
  • Fix ReadTheDocs build by updating the version of setuptools #1561
  • Set random_state of data splitter in AutoMLSearch to take int to keep consistency in the resulting splits #1579
  • Pin sklearn version while we work on adding support #1594
  • Pin pandas at <1.2.0 while we work on adding support #1609
  • Pin graphviz at < 0.16 while we work on adding support #1609

Changes

  • Reverting save_graph #1550 to resolve kaleido build issues #1585
  • Update circleci badge to apply to main #1489
  • Added script to generate github markdown for releases #1487
  • Updated dependencies to fix ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes' error and to address Woodwork and Featuretool dependencies #1540
  • Made get_prediction_vs_actual_data() a public method #1553
  • Updated Woodwork version requirement to v0.0.7 #1560
  • Move data splitters from evalml.automl.data_splitters to evalml.preprocessing.data_splitters #1597
  • Rename "# Testing" in automl log output to "# Validation" #1597

Documentation Changes

  • Added partial dependence methods to API reference #1537
  • Updated documentation for confusion matrix methods #1611

Testing Changes

  • Set n_jobs=1 in most unit tests to reduce memory #1505

Breaking Changes

  • Updated minimal dependencies: numpy>=1.19.1, pandas>=1.1.0, scikit-learn>=0.23.1, scikit-optimize>=0.8.1
  • Updated AutoMLSearch.best_pipeline to return a trained pipeline. Pass in train_best_pipeline=False to AutoMLSearch in order to return an untrained pipeline.
  • Pipeline component instances can no longer be iterated through using Pipeline.component_graph #1543
  • Update AutoMLSearch constructor to take training data instead of search and add_to_leaderboard #1597
  • Update split_data helper args #1597
  • Move data splitters from evalml.automl.data_splitters to evalml.preprocessing.data_splitters #1597
  • Rename AutoMLSearch data_split arg to data_splitter #1569