05 May 15:14

chukarsten

96fed90

v0.24.0

v0.24.0 May. 4, 2021

Enhancements

Added date_index as a required parameter for TimeSeries problems #2217
Have the OneHotEncoder return the transformed columns as booleans rather than floats #2170
Added Oversampler transformer component to EvalML #2079
Added Undersampler to AutoMLSearch, as well as arguments _sampler_method and sampler_balanced_ratio #2128
Updated prediction explanations functions to allow pipelines with XGBoost estimators #2162
Added partial dependence for datetime columns #2180
Update precision-recall curve with positive label index argument, and fix for 2d predicted probabilities #2090
Add pct_null_rows to HighlyNullDataCheck #2211
Added a standalone AutoML search method for convenience, which runs data checks and then runs automl #2152
Make the first batch of AutoML have a predefined order, with linear models first and complex models last #2223

Fixes

Fixed partial dependence not respecting grid resolution parameter for numerical features #2180
Enable prediction explanations for catboost for multiclass problems #2224

Changes

Deleted baseline pipeline classes #2202
Reverting user specified date feature PR #2155 until pmdarima installation fix is found #2214
Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. #2091
Removed all old datasplitters from EvalML #2193
Deleted make_pipeline_from_components #2218

Documentation Changes

Renamed dataset to clarify that its gzipped but not a tarball #2183
Updated documentation to use pipeline instances instead of pipeline subclasses #2195
Updated contributing guide with a note about GitHub Actions permissions #2090
Updated automl and model understanding user guides #2090

Testing Changes

Use machineFL user token for dependency update bot, and add more reviewers #2189

Breaking Changes

All baseline pipeline classes (BaselineBinaryPipeline, BaselineMulticlassPipeline, BaselineRegressionPipeline, etc.) have been deleted #2202
Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. Pipelines can now be initialized by specifying the component graph as the first parameter, and then passing in optional arguments such as custom_name, parameters, etc. For example, BinaryClassificationPipeline(["Random Forest Classifier"], parameters={}). #2091
Removed all old datasplitters from EvalML #2193
Deleted utility method make_pipeline_from_components #2218

Assets 2

21 Apr 04:23

chukarsten

v0.23.0

d6bec07

v0.23.0

v0.23.0 Apr. 21, 2021

Enhancements

Refactored EngineBase and SequentialEngine api. Adding DaskEngine #1975.
Added optional engine argument to AutoMLSearch #1975
Added a warning about how time series support is still in beta when a user passes in a time series problem to AutoMLSearch #2118
Added NaturalLanguageNaNDataCheck data check #2122
Added ValueError to partial_dependence to prevent users from computing partial dependence on columns with all NaNs #2120
Added standard deviation of cv scores to rankings table #2154

Fixes

Fixed BalancedClassificationDataCVSplit, BalancedClassificationDataTVSplit, and BalancedClassificationSampler to use minority:majority ratio instead of majority:minority #2077
Fixed bug where two-way partial dependence plots with categorical variables were not working correctly #2117
Fixed bug where hyperparameters were not displaying properly for pipelines with a list component_graph and duplicate components #2133
Fixed bug where pipeline_parameters argument in AutoMLSearch was not applied to pipelines passed in as allowed_pipelines #2133
Fixed bug where AutoMLSearch was not applying custom hyperparameters to pipelines with a list component_graph and duplicate components #2133

Changes

Removed hyperparameter_ranges from Undersampler and renamed balanced_ratio to sampling_ratio for samplers #2113
Renamed TARGET_BINARY_NOT_TWO_EXAMPLES_PER_CLASS data check message code to TARGET_MULTICLASS_NOT_TWO_EXAMPLES_PER_CLASS #2126
Modified one-way partial dependence plots of categorical features to display data with a bar plot #2117
Renamed score column for automl.rankings as mean_cv_score #2135

Documentation Changes

Fixed conf.py file #2112
Added a sentence to the automl user guide stating that our support for time series problems is still in beta. #2118
Fixed documentation demos #2139
Update test badge in README to use GitHub Actions #2150

Testing Changes

Fixed test_describe_pipeline for pandas v1.2.4 #2129
Added a GitHub Action for building the conda package #1870 #2148

Breaking Changes

Renamed balanced_ratio to sampling_ratio for the BalancedClassificationDataCVSplit, BalancedClassificationDataTVSplit, BalancedClassficationSampler, and Undersampler #2113
Deleted the "errors" key from automl results #1975
Deleted the raise_and_save_error_callback and the log_and_save_error_callback #1975
Fixed BalancedClassificationDataCVSplit, BalancedClassificationDataTVSplit, and BalancedClassificationSampler to use minority:majority ratio instead of majority:minority #2077

Assets 2

07 Apr 15:47

chukarsten

v0.22.0

581a7fb

v0.22.0

v0.22.0 Apr. 7, 2021

Enhancements

Added a GitHub Action for linux_unit_tests#2013
Added recommended actions for InvalidTargetDataCheck, updated _make_component_list_from_actions to address new action, and added TargetImputer component #1989
Updated AutoMLSearch._check_for_high_variance to not emit RuntimeWarning #2024
Added exception when pipeline passed to explain_predictions is a Stacked Ensemble pipeline #2033
Added sensitivity at low alert rates as an objective #2001
Added Undersampler transformer component #2030

Fixes

Updated Engine's train_batch to apply undersampling #2038
Fixed bug in where Time Series Classification pipelines were not encoding targets in predict and predict_proba #2040
Fixed data splitting errors if target is float for classification problems #2050
Pinned docutils to <0.17 to fix ReadtheDocs warning issues #2088

Changes

Removed lists as acceptable hyperparameter ranges in AutoMLSearch #2028
Renamed "details" to "metadata" for data check actions #2008

Documentation Changes

Catch and suppress warnings in documentation #1991 #2097
Change spacing in start.ipynb to provide clarity for AutoMLSearch #2078
Fixed start code on README #2108

Assets 2

24 Mar 21:04

dsherry

v0.21.0

4e95739

v0.21.0

v0.21.0 Mar. 24, 2021

Enhancements

Changed AutoMLSearch to default optimize_thresholds to True #1943
Added multiple oversampling and undersampling sampling methods as data splitters for imbalanced classification #1775
Added params to balanced classification data splitters for visibility #1966
Updated make_pipeline to not add Imputer if input data does not have numeric or categorical columns #1967
Updated ClassImbalanceDataCheck to better handle multiclass imbalances #1986
Added recommended actions for the output of data check's validate method #1968
Added error message for partial_dependence when features are mostly the same value #1994
Updated OneHotEncoder to drop one redundant feature by default for features with two categories #1997
Added a PolynomialDetrender component #1992

Fixes

Updated binary classification pipelines to use objective decision function during scoring of custom objectives #1934

Changes

Removed data_checks parameter, data_check_results and data checks logic from AutoMLSearch #1935
Deleted random_state argument #1985
Updated Woodwork version requirement to v0.0.11 #1996

Documentation Changes

Testing Changes

Removed build_docs CI job in favor of RTD GH builder #1974
Added tests to confirm support for Python 3.9 #1724
Changed build_conda_pkg job to use latest_release_changes branch in the feedstock. #1979

Breaking Changes

Changed AutoMLSearch to default optimize_thresholds to True #1943
Removed data_checks parameter, data_check_results and data checks logic from AutoMLSearch. To run the data checks which were previously run by default in AutoMLSearch, please call DefaultDataChecks().validate(X_train, y_train) or take a look at our documentation for more examples. #1935
Deleted random_state argument #1985

Assets 2

11 Mar 00:15

dsherry

v0.20.0

91775ff

v0.20.0

v0.20.0 Mar. 10, 2021

Enhancements

Added a GitHub Action for Detecting dependency changes #1933
Create a separate CV split to train stacked ensembler on for AutoMLSearch #1814
Added a GitHub Action for Linux unit tests #1846
Added DataCheckAction class and DataCheckActionCode enum #1896
Updated Woodwork requirement to v0.0.10 #1900
Added BalancedClassificationDataCVSplit and BalancedClassificationDataTVSplit to AutoMLSearch #1875
Update default classification data splitter to use downsampling for highly imbalanced data #1875
Updated describe_pipeline to return more information, including id of pipelines used for ensemble models #1909
Added utility method to create list of components from a list of DataCheckAction #1907
Updated validate method to include a action key in returned dictionary for all DataCheckand DataChecks #1916
Aggregating the shap values for predictions that we know the provenance of, e.g. OHE, text, and date-time. #1901
Improved error message when custom objective is passed as a string in pipeline.score #1941
Added score_pipelines and train_pipelines methods to AutoMLSearch #1913
Added score_batch and train_batch abstact methods to EngineBase and implementations in SequentialEngine #1913

Fixes

Removed CI check for check_dependencies_updated_linux #1950
Added metaclass for time series pipelines and fix binary classification pipeline predict not using objective if it is passed as a named argument #1874
Fixed stack trace in prediction explanation functions caused by mixed string/numeric pandas column names #1871
Fixed stack trace caused by passing pipelines with duplicate names to AutoMLSearch #1932
Fixed AutoMLSearch.get_pipelines returning pipelines with the same attributes #1958

Changes

Reversed GitHub Action for Linux unit tests until a fix for report generation is found #1920
Updated add_results in AutoMLAlgorithm to take in entire pipeline results dictionary from AutoMLSearch #1891
Updated ClassImbalanceDataCheck to look for severe class imbalance scenarios #1905
Deleted the explain_prediction function #1915
Removed HighVarianceCVDataCheck and convered it to an AutoMLSearch method instead #1928

Documentation Changes

Updated model_understanding.ipynb to demo the two-way partial dependence capability #1919

Testing Changes

Breaking Changes

Deleted the explain_prediction function #1915
Removed HighVarianceCVDataCheck and convered it to an AutoMLSearch method instead #1928
Added score_batch and train_batch abstact methods to EngineBase. These need to be implemented in Engine subclasses #1913

Assets 2

24 Feb 19:14

chukarsten

v0.19.0

3eafd9b

v0.19.0

v0.19.0 Feb. 24, 2021

Enhancements

Added a GitHub Action for Python windows unit tests #1844
Added a GitHub Action for checking updated release notes #1849
Added a GitHub Action for Python lint checks #1837
Adjusted explain_prediction, explain_predictions and explain_predictions_best_worst to handle timeseries problems. #1818
Updated InvalidTargetDataCheck to check for mismatched indices in target and features #1816
Updated Woodwork structures returned from components to support Woodwork logical type overrides set by the user #1784
Updated estimators to keep track of input feature names during fit() #1794
Updated visualize_decision_tree to include feature names in output #1813
Added is_bounded_like_percentage property for objectives. If true, the calculate_percent_difference method will return the absolute difference rather than relative difference #1809
Added full error traceback to AutoMLSearch logger file #1840
Changed TargetEncoder to preserve custom indices in the data #1836
Refactored explain_predictions and explain_predictions_best_worst to only compute features once for all rows that need to be explained #1843
Added custom random undersampling sampler for classification #1857
Updated OutliersDataCheck implementation to calculate the probability of having no outliers #1855
Added Engines pipeline processing API #1838

Fixes

Changed EngineBase random_state arg to random_seed and same for user guide docs #1889

Changes

Modified calculate_percent_difference so that division by 0 is now inf rather than nan #1809
Removed text_columns parameter from LSA and TextFeaturizer components #1652
Added random_seed as an argument to our automl/pipeline/component API. Using random_state will raise a warning #1798
Added DataCheckError message in InvalidTargetDataCheck if input target is None and removed exception raised #1866

Testing Changes

Added back coverage for _get_feature_provenance in TextFeaturizer after text_columns was removed #1842
Pin graphviz version for windows builds #1847
Unpin graphviz version for windows builds #1851

Breaking Changes

Added a deprecation warning to explain_prediction. It will be deleted in the next release. #1860

Assets 2

10 Feb 17:11

ParthivNaresh

v0.18.2

c3bd8d3

v0.18.2

v0.18.2 Feb. 10, 2021

Enhancements

Added uniqueness score data check #1785
Added "dataframe" output format for prediction explanations #1781
Updated LightGBM estimators to handle pandas.MultiIndex #1770
Sped up permutation importance for some pipelines #1762
Added sparsity data check #1797
Confirmed support for threshold tuning for binary time series classification problems #1803

Fixes

Changes

Documentation Changes

Added section on conda to the contributing guide #1771
Updated release process to reflect freezing main before perf tests #1787
Moving some prs to the right section of the release notes #1789
Tweak README.md. #1800
Fixed back arrow on install page docs #1795

Testing Changes

Assets 2

02 Feb 01:20

chukarsten

v0.18.1

1f089b9

v0.18.1

v0.18.1 Feb. 1, 2021

Enhancements

Added graph_t_sne as a visualization tool for high dimensional data #1731
Added the ability to see the linear coefficients of features in linear models terms #1738
Added support for scikit-learn v0.24.0 #1733
Added support for scipy v1.6.0 #1752
Added SVM Classifier and Regressor to estimators #1714 #1761

Fixes

Addressed bug with partial_dependence and categorical data with more categories than grid resolution #1748
Removed random_state arg from get_pipelines in AutoMLSearch #1719
Pinned pyzmq at less than 22.0.0 till we add support #1756
Remove ProphetRegressor from main as windows tests were flaky #1764

Changes

Updated components and pipelines to return Woodwork data structures #1668
Updated clone() for pipelines and components to copy over random state automatically #1753
Dropped support for Python version 3.6 #1751
Removed deprecated verbose flag from AutoMLSearch parameters #1772

Documentation Changes

Add Twitter and Github link to documentation toolbar #1754
Added Open Graph info to documentation #1758

Testing Changes

Breaking Changes

Components and pipelines return Woodwork data structures instead of pandas data structures #1668
Python 3.6 will not be actively supported due to discontinued support from EvalML dependencies.
Deprecated verbose flag is removed for AutoMLSearch #1772

Assets 2

26 Jan 22:19

bchen1116

v0.18.0

4630f26

v0.18.0

v0.18.0 Jan. 26, 2021

Enhancements

Added RMSLE, MSLE, and MAPE to core objectives while checking for negative target values in invalid_targets_data_check #1574
Added validation checks for binary problems with regression-like datasets and multiclass problems without true multiclass targets in invalid_targets_data_check #1665
Added time series support for make_pipeline #1566
Added target name for output of pipeline predict method #1578
Added multiclass check to InvalidTargetDataCheck for two examples per class #1596
Support graphviz 0.16 #1657
Enhanced time series pipelines to accept empty features #1651
Added KNN Classifier to estimators. #1650
Added support for list inputs for objectives #1663
Added support for AutoMLSearch to handle time series classification pipelines #1666
Enhanced DelayedFeaturesTransformer to encode categorical features and targets before delaying them #1691
Added 2-way dependence plots. #1690
Added ability to directly iterate through components within Pipelines #1583

Fixes

Fixed inconsistent attributes and added Exceptions to docs #1673
Fixed TargetLeakageDataCheck to use Woodwork mutual_information rather than using Pandas' Pearson Correlation #1616
Fixed thresholding for pipelines in AutoMLSearch to only threshold binary classification pipelines #1622 #1626
Updated load_data to return Woodwork structures and update default parameter value for index to None #1610
Pinned scipy at < 1.6.0 while we work on adding support #1629
Fixed data check message formatting in AutoMLSearch #1633
Addressed stacked ensemble component for scikit-learn v0.24 support by setting shuffle=True for default CV #1613
Fixed bug where Imputer reset the index on X #1590
Fixed AutoMLSearch stacktrace when a cutom objective was passed in as a primary objective or additional objective #1575
Fixed custom index bug for MAPE objective #1641
Fixed index bug for TextFeaturizer and LSA components #1644
Limited load_fraud dataset loaded into automl.ipynb #1646
add_to_rankings updates AutoMLSearch.best_pipeline when necessary #1647
Fixed bug where time series baseline estimators were not receiving gap and max_delay in AutoMLSearch #1645
Fixed jupyter notebooks to help the RTD buildtime #1654
Added positive_only objectives to non_core_objectives #1661
Fixed stacking argument n_jobs for IterativeAlgorithm #1706
Updated CatBoost estimators to return self in .fit() rather than the underlying model for consistency #1701
Added ability to initialize pipeline parameters in AutoMLSearch constructor #1676
Make AutoMLSearch pipelines pickle-able #1721

Changes

Added labeling to graph_confusion_matrix #1632
Rerunning search for AutoMLSearch results in a message thrown rather than failing the search, and removed has_searched property #1647
Changed tuner class to allow and ignore single parameter values as input #1686
Capped LightGBM version limit to remove bug in docs #1711
Removed support for np.random.RandomState in EvalML #1727

Documentation Changes

Update Model Understanding in the user guide to include visualize_decision_tree #1678
Updated docs to include information about AutoMLSearch callback parameters and methods #1577
Updated docs to prompt users to install graphiz on Mac #1656
Added infer_feature_types to the start.ipynb guide #1700
Added multicollinearity data check to API reference and docs #1707

Testing Changes

Breaking Changes

Removed has_searched property from AutoMLSearch #1647
Removed support for np.random.RandomState in EvalML. Rather than passing np.random.RandomState as component and pipeline random_state values, we use int random_seed #1727

Assets 2

30 Dec 00:36

dsherry

v0.17.0

0e671b9

v0.17.0

v0.17.0 Dec. 29, 2020

Enhancements

Added save_plot that allows for saving figures from different backends #1588
Added LightGBM Regressor to regression components #1459
Added visualize_decision_tree for tree visualization with decision_tree_data_from_estimator and decision_tree_data_from_pipeline to reformat tree structure output #1511
Added DFS Transformer component into transformer components #1454
Added MAPE to the standard metrics for time series problems and update objectives #1510
Added graph_prediction_vs_actual_over_time and get_prediction_vs_actual_over_time_data to the model understanding module for time series problems #1483
Added a ComponentGraph class that will support future pipelines as directed acyclic graphs #1415
Updated data checks to accept Woodwork data structures #1481
Added parameter to InvalidTargetDataCheck to show only top unique values rather than all unique values #1485
Added multicollinearity data check #1515
Added baseline pipeline and components for time series regression problems #1496
Added more information to users about ensembling behavior in AutoMLSearch #1527
Add woodwork support for more utility and graph methods #1544
Changed DateTimeFeaturizer to encode features as int #1479
Return trained pipelines from AutoMLSearch.best_pipeline #1547
Added utility method so that users can set feature types without having to learn about Woodwork directly #1555
Added Linear Discriminant Analysis transformer for dimensionality reduction #1331
Added multiclass support for partial_dependence and graph_partial_dependence #1554
Added TimeSeriesBinaryClassificationPipeline and TimeSeriesMulticlassClassificationPipeline classes #1528
Added make_data_splitter method for easier automl data split customization #1568
Integrated ComponentGraph class into Pipelines for full non-linear pipeline support #1543
Update AutoMLSearch constructor to take training data instead of search and add_to_leaderboard #1597
Update split_data helper args #1597
Add problem type utils is_regression, is_classification, is_timeseries #1597
Rename AutoMLSearch data_split arg to data_splitter #1569

Fixes

Fix Windows CI jobs: install numba via conda, required for shap #1490
Added custom-index support for reset-index-get_prediction_vs_actual_over_time_data #1494
Fix generate_pipeline_code to account for boolean and None differences between Python and JSON #1524 #1531
Set max value for plotly and xgboost versions while we debug CI failures with newer versions #1532
Undo version pinning for plotly #1533
Fix ReadTheDocs build by updating the version of setuptools #1561
Set random_state of data splitter in AutoMLSearch to take int to keep consistency in the resulting splits #1579
Pin sklearn version while we work on adding support #1594
Pin pandas at <1.2.0 while we work on adding support #1609
Pin graphviz at < 0.16 while we work on adding support #1609

Changes

Reverting save_graph #1550 to resolve kaleido build issues #1585
Update circleci badge to apply to main #1489
Added script to generate github markdown for releases #1487
Updated dependencies to fix ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes' error and to address Woodwork and Featuretool dependencies #1540
Made get_prediction_vs_actual_data() a public method #1553
Updated Woodwork version requirement to v0.0.7 #1560
Move data splitters from evalml.automl.data_splitters to evalml.preprocessing.data_splitters #1597
Rename "# Testing" in automl log output to "# Validation" #1597

Documentation Changes

Added partial dependence methods to API reference #1537
Updated documentation for confusion matrix methods #1611

Testing Changes

Set n_jobs=1 in most unit tests to reduce memory #1505

Breaking Changes

Updated minimal dependencies: numpy>=1.19.1, pandas>=1.1.0, scikit-learn>=0.23.1, scikit-optimize>=0.8.1
Updated AutoMLSearch.best_pipeline to return a trained pipeline. Pass in train_best_pipeline=False to AutoMLSearch in order to return an untrained pipeline.
Pipeline component instances can no longer be iterated through using Pipeline.component_graph #1543
Update AutoMLSearch constructor to take training data instead of search and add_to_leaderboard #1597
Update split_data helper args #1597
Move data splitters from evalml.automl.data_splitters to evalml.preprocessing.data_splitters #1597
Rename AutoMLSearch data_split arg to data_splitter #1569

Assets 2

Releases: alteryx/evalml

v0.24.0

v0.24.0 May. 4, 2021

Enhancements

Fixes

Changes

Documentation Changes

Testing Changes

Breaking Changes

v0.23.0

v0.23.0 Apr. 21, 2021

Enhancements

Fixes

Changes

Documentation Changes

Testing Changes

Breaking Changes

v0.22.0

v0.22.0 Apr. 7, 2021

Enhancements

Fixes

Changes

Documentation Changes

v0.21.0

v0.21.0 Mar. 24, 2021

Enhancements

Fixes

Changes

Documentation Changes

Testing Changes

Breaking Changes

v0.20.0

v0.20.0 Mar. 10, 2021

Enhancements

Fixes

Changes

Documentation Changes

Testing Changes

Breaking Changes

v0.19.0

v0.19.0 Feb. 24, 2021

Enhancements

Fixes

Changes

Testing Changes

Breaking Changes

v0.18.2

v0.18.2 Feb. 10, 2021

Enhancements

Fixes

Changes

Documentation Changes

Testing Changes

v0.18.1

v0.18.1 Feb. 1, 2021

Enhancements

Fixes

Changes

Documentation Changes

Testing Changes

Breaking Changes

v0.18.0

v0.18.0 Jan. 26, 2021

Enhancements

Fixes

Changes

Documentation Changes

Testing Changes

Breaking Changes

v0.17.0

v0.17.0 Dec. 29, 2020

Enhancements

Fixes

Changes

Documentation Changes

Testing Changes

Breaking Changes