Skip to content

Latest commit

 

History

History
1498 lines (1360 loc) · 105 KB

release_notes.rst

File metadata and controls

1498 lines (1360 loc) · 105 KB

Release Notes

Future Releases
  • Enhancements
    • Add ProphetRegressor to AutoML 2619
    • Integrated DefaultAlgorithm into AutoMLSearch 2634
    • Removed SVM "linear" and "precomputed" kernel hyperparameter options, and improved default parameters 2651
    • Updated ComponentGraph initalization to raise ValueError when user attempts to use .y for a component that does not produce a tuple output 2662
    • Updated pipeline graph() to distingush X and y edges 2654
    • Added DropRowsTransformer component 2692
    • Added DROP_ROWS to _make_component_list_from_actions and clean up metadata 2694
  • Fixes
    • Updated Oversampler logic to select best SMOTE based on component input instead of pipeline input 2695
  • Changes
    • Replaced SMOTEOversampler, SMOTENOversampler and SMOTENCOversampler with consolidated Oversampler component 2695
    • Removed LinearRegressor from the list of default AutoMLSearch estimators due to poor performance 2660
  • Documentation Changes
    • Added docstring linting package pydocstyle and rule to make-lint command 2670
  • Testing Changes
    • Removes the process-level parallelism from the test_cancel_job test 2666
    • Installed numba 0.53 in windows CI to prevent problems installing version 0.54 2710

Warning

Breaking Changes
  • Renamed the current top level search method to search_iterative and defined a new search method for the DefaultAlgorithm 2634
  • Replaced SMOTEOversampler, SMOTENOversampler and SMOTENCOversampler with consolidated Oversampler component 2695
  • Removed LinearRegressor from the list of default AutoMLSearch estimators due to poor performance 2660
v0.31.0 Aug. 19, 2021
  • Enhancements
    • Updated the high variance check in AutoMLSearch to be robust to a variety of objectives and cv scores 2622
    • Use Woodwork's outlier detection for the OutliersDataCheck 2637
    • Added ability to utilize instantiated components when creating a pipeline 2643
    • Sped up the all Nan and unknown check in infer_feature_types 2661
  • Fixes
  • Changes
    • Deleted _put_into_original_order helper function 2639
    • Refactored time series pipeline code using a time series pipeline base class 2649
    • Renamed dask_tests to parallel_tests 2657
    • Removed commented out code in pipeline_meta.py 2659
  • Documentation Changes
    • Add complete install command to README and Install section 2627
    • Cleaned up documentation for MulticollinearityDataCheck 2664
  • Testing Changes
    • Speed up CI by splitting Prophet tests into a separate workflow in GitHub 2644

Warning

Breaking Changes
  • TimeSeriesRegressionPipeline no longer inherits from TimeSeriesRegressionPipeline 2649
v0.30.2 Aug. 16, 2021
  • Fixes
    • Updated changelog and version numbers to match the release. Release 0.30.1 was release erroneously without a change to the version numbers. 0.30.2 replaces it.
v0.30.1 Aug. 12, 2021
  • Enhancements
    • Added DatetimeFormatDataCheck for time series problems 2603
    • Added ProphetRegressor to estimators 2242
    • Updated ComponentGraph to handle not calling samplers' transform during predict, and updated samplers' transform methods s.t. fit_transform is equivalent to fit(X, y).transform(X, y) 2583
    • Updated ComponentGraph _validate_component_dict logic to be stricter about input values 2599
    • Patched bug in xgboost estimators where predicting on a feature matrix of only booleans would throw an exception. 2602
    • Updated ARIMARegressor to use relative forecasting to predict values 2613
    • Added support for creating pipelines without an estimator as the final component and added transform(X, y) method to pipelines and component graphs 2625
    • Updated to support Woodwork 0.5.1 2610
  • Fixes
    • Updated AutoMLSearch to drop ARIMARegressor from allowed_estimators if an incompatible frequency is detected 2632
    • Updated get_best_sampler_for_data to consider all non-numeric datatypes as categorical for SMOTE 2590
    • Fixed inconsistent test results from TargetDistributionDataCheck 2608
    • Adopted vectorized pd.NA checking for Woodwork 0.5.1 support 2626
    • Pinned upper version of astroid to 2.6.6 to keep ReadTheDocs working. 2638
  • Changes
    • Renamed SMOTE samplers to SMOTE oversampler 2595
    • Changed partial_dependence and graph_partial_dependence to raise a PartialDependenceError instead of ValueError. This is not a breaking change because PartialDependenceError is a subclass of ValueError 2604
    • Cleaned up code duplication in ComponentGraph 2612
    • Stored predict_proba results in .x for intermediate estimators in ComponentGraph 2629
  • Documentation Changes
    • To avoid local docs build error, only add warning disable and download headers on ReadTheDocs builds, not locally 2617
  • Testing Changes
    • Updated partial_dependence tests to change the element-wise comparison per the Plotly 5.2.1 upgrade 2638
    • Changed the lint CI job to only check against python 3.9 via the -t flag 2586
    • Installed Prophet in linux nightlies test and fixed test_all_components 2598
    • Refactored and fixed all make_pipeline tests to assert correct order and address new Woodwork Unknown type inference 2572
    • Removed component_graphs as a global variable in test_component_graphs.py 2609

Warning

Breaking Changes
  • Renamed SMOTE samplers to SMOTE oversampler. Please use SMOTEOversampler, SMOTENCOversampler, SMOTENOversampler instead of SMOTESampler, SMOTENCSampler, and SMOTENSampler 2595
v0.30.0 Aug. 3, 2021
  • Enhancements
    • Added LogTransformer and TargetDistributionDataCheck 2487
    • Issue a warning to users when a pipeline parameter passed in isn't used in the pipeline 2564
    • Added Gini coefficient as an objective 2544
    • Added repr to ComponentGraph 2565
    • Added components to extract features from URL and EmailAddress Logical Types 2550
    • Added support for NaN values in TextFeaturizer 2532
    • Added SelectByType transformer 2531
    • Added separate thresholds for percent null rows and columns in HighlyNullDataCheck 2562
    • Added support for NaN natural language values 2577
  • Fixes
    • Raised error message for types URL, NaturalLanguage, and EmailAddress in partial_dependence 2573
  • Changes
    • Updated PipelineBase implementation for creating pipelines from a list of components 2549
    • Moved get_hyperparameter_ranges to PipelineBase class from automl/utils module 2546
    • Renamed ComponentGraph's get_parents to get_inputs 2540
    • Removed ComponentGraph.linearized_component_graph and ComponentGraph.from_list 2556
    • Updated ComponentGraph to enforce requiring .x and .y inputs for each component in the graph 2563
    • Renamed existing ensembler implementation from StackedEnsemblers to SklearnStackedEnsemblers 2578
  • Documentation Changes
    • Added documentation for DaskEngine and CFEngine parallel engines 2560
    • Improved detail of TextFeaturizer docstring and tutorial 2568
  • Testing Changes
    • Added test that makes sure split_data does not shuffle for time series problems 2552

Warning

Breaking Changes
  • Moved get_hyperparameter_ranges to PipelineBase class from automl/utils module 2546
  • Renamed ComponentGraph's get_parents to get_inputs 2540
  • Removed ComponentGraph.linearized_component_graph and ComponentGraph.from_list 2556
  • Updated ComponentGraph to enforce requiring .x and .y inputs for each component in the graph 2563
v0.29.0 Jul. 21, 2021
  • Enhancements
    • Updated 1-way partial dependence support for datetime features 2454
    • Added details on how to fix error caused by broken ww schema 2466
    • Added ability to use built-in pickle for saving AutoMLSearch 2463
    • Updated our components and component graphs to use latest features of ww 0.4.1, e.g. concat_columns and drop in-place. 2465
    • Added new, concurrent.futures based engine for parallel AutoML 2506
    • Added support for new Woodwork Unknown type in AutoMLSearch 2477
    • Updated our components with an attribute that describes if they modify features or targets and can be used in list API for pipeline initialization 2504
    • Updated ComponentGraph to accept X and y as inputs 2507
    • Removed unused TARGET_BINARY_INVALID_VALUES from DataCheckMessageCode enum and fixed formatting of objective documentation 2520
    • Added EvalMLAlgorithm 2525
    • Added support for NaN values in TextFeaturizer 2532
  • Fixes
    • Fixed FraudCost objective and reverted threshold optimization method for binary classification to Golden 2450
    • Added custom exception message for partial dependence on features with scales that are too small 2455
    • Ensures the typing for Ordinal and Datetime ltypes are passed through _retain_custom_types_and_initalize_woodwork 2461
    • Updated to work with Pandas 1.3.0 2442
    • Updated to work with sktime 0.7.0 2499
  • Changes
    • Updated XGBoost dependency to >=1.4.2 2484, 2498
    • Added a DeprecationWarning about deprecating the list API for ComponentGraph 2488
    • Updated make_pipeline for AutoML to create dictionaries, not lists, to initialize pipelines 2504
    • No longer installing graphviz on windows in our CI pipelines because release 0.17 breaks windows 3.7 2516
  • Documentation Changes
    • Moved docstrings from __init__ to class pages, added missing docstrings for missing classes, and updated missing default values 2452
    • Build documentation with sphinx-autoapi 2458
    • Change autoapi_ignore to only ignore files in evalml/tests/* 2530
  • Testing Changes
    • Fixed flaky dask tests 2471
    • Removed shellcheck action from build_conda_pkg action 2514
    • Added a tmp_dir fixture that deletes its contents after tests run 2505
    • Added a test that makes sure all pipelines in AutoMLSearch get the same data splits 2513
    • Condensed warning output in test logs 2521

Warning

Breaking Changes
  • NaN values in the Natural Language type are no longer supported by the Imputer with the pandas upgrade. 2477
v0.28.0 Jul. 2, 2021
  • Enhancements
    • Added support for showing a Individual Conditional Expectations plot when graphing Partial Dependence 2386
    • Exposed thread_count for Catboost estimators as n_jobs parameter 2410
    • Updated Objectives API to allow for sample weighting 2433
  • Fixes
    • Deleted unreachable line from IterativeAlgorithm 2464
  • Changes
    • Pinned Woodwork version between 0.4.1 and 0.4.2 2460
    • Updated psutils minimum version in requirements 2438
    • Updated log_error_callback to not include filepath in logged message 2429
  • Documentation Changes
    • Sped up docs 2430
    • Removed mentions of DataTable and DataColumn from the docs 2445
  • Testing Changes
    • Added slack integration for nightlies tests 2436
    • Changed build_conda_pkg CI job to run only when dependencies are updates 2446
    • Updated workflows to store pytest runtimes as test artifacts 2448
    • Added AutoMLTestEnv test fixture for making it easy to mock automl tests 2406
v0.27.0 Jun. 22, 2021
  • Enhancements
    • Adds force plots for prediction explanations 2157
    • Removed self-reference from AutoMLSearch 2304
    • Added support for nonlinear pipelines for generate_pipeline_code 2332
    • Added inverse_transform method to pipelines 2256
    • Add optional automatic update checker 2350
    • Added search_order to AutoMLSearch's rankings and full_rankings tables 2345
    • Updated threshold optimization method for binary classification 2315
    • Updated demos to pull data from S3 instead of including demo data in package 2387
    • Upgrade woodwork version to v0.4.1 2379
  • Fixes
    • Preserve user-specified woodwork types throughout pipeline fit/predict 2297
    • Fixed ComponentGraph appending target to final_component_features if there is a component that returns both X and y 2358
    • Fixed partial dependence graph method failing on multiclass problems when the class labels are numeric 2372
    • Added thresholding_objective argument to AutoMLSearch for binary classification problems 2320
    • Added change for k_neighbors parameter in SMOTE Oversamplers to automatically handle small samples 2375
    • Changed naming for Logistic Regression Classifier file 2399
    • Pinned pytest-timeout to fix minimum dependence checker 2425
    • Replaced Elastic Net Classifier base class with Logistsic Regression to avoid NaN outputs 2420
  • Changes
    • Cleaned up PipelineBase's component_graph and _component_graph attributes. Updated PipelineBase __repr__ and added __eq__ for ComponentGraph 2332
    • Added and applied black linting package to the EvalML repo in place of autopep8 2306
    • Separated custom_hyperparameters from pipelines and added them as an argument to AutoMLSearch 2317
    • Replaced allowed_pipelines with allowed_component_graphs 2364
    • Removed private method _compute_features_during_fit from PipelineBase 2359
    • Updated compute_order in ComponentGraph to be a read-only property 2408
    • Unpinned PyZMQ version in requirements.txt 2389
    • Uncapping LightGBM version in requirements.txt 2405
    • Updated minimum version of plotly 2415
    • Removed SensitivityLowAlert objective from core objectives 2418
  • Documentation Changes
    • Fixed lead scoring weights in the demos documentation 2315
    • Fixed start page code and description dataset naming discrepancy 2370
  • Testing Changes
    • Update minimum unit tests to run on all pull requests 2314
    • Pass token to authorize uploading of codecov reports 2344
    • Add pytest-timeout. All tests that run longer than 6 minutes will fail. 2374
    • Separated the dask tests out into separate github action jobs to isolate dask failures. 2376
    • Refactored dask tests 2377
    • Added the combined dask/non-dask unit tests back and renamed the dask only unit tests. 2382
    • Sped up unit tests and split into separate jobs 2365
    • Change CI job names, run lint for python 3.9, run nightlies on python 3.8 at 3am EST 2395 2398
    • Set fail-fast to false for CI jobs that run for PRs 2402

Warning

Breaking Changes
  • AutoMLSearch will accept allowed_component_graphs instead of allowed_pipelines 2364
  • Removed PipelineBase's _component_graph attribute. Updated PipelineBase __repr__ and added __eq__ for ComponentGraph 2332
  • pipeline_parameters will no longer accept skopt.space variables since hyperparameter ranges will now be specified through custom_hyperparameters 2317
v0.25.0 Jun. 01, 2021
  • Enhancements
    • Upgraded minimum woodwork to version 0.3.1. Previous versions will not be supported 2181
    • Added a new callback parameter for explain_predictions_best_worst 2308
  • Fixes
  • Changes
    • Deleted the return_pandas flag from our demo data loaders 2181
    • Moved default_parameters to ComponentGraph from PipelineBase 2307
  • Documentation Changes
    • Updated the release procedure documentation 2230
  • Testing Changes
    • Ignoring test_saving_png_file while building conda package 2323

Warning

Breaking Changes
  • Deleted the return_pandas flag from our demo data loaders 2181
  • Upgraded minimum woodwork to version 0.3.1. Previous versions will not be supported 2181
  • Due to the weak-ref in woodwork, set the result of infer_feature_types to a variable before accessing woodwork 2181
v0.24.2 May. 24, 2021
  • Enhancements
    • Added oversamplers to AutoMLSearch 2213 2286
    • Added dictionary input functionality for Undersampler component 2271
    • Changed the default parameter values for Elastic Net Classifier and Elastic Net Regressor 2269
    • Added dictionary input functionality for the Oversampler components 2288
  • Fixes
    • Set default n_jobs to 1 for StackedEnsembleClassifier and StackedEnsembleRegressor until fix for text-based parallelism in sklearn stacking can be found 2295
  • Changes
    • Updated start_iteration_callback to accept a pipeline instance instead of a pipeline class and no longer accept pipeline parameters as a parameter 2290
    • Refactored calculate_permutation_importance method and add per-column permutation importance method 2302
    • Updated logging information in AutoMLSearch.__init__ to clarify pipeline generation 2263
  • Documentation Changes
    • Minor changes to the release procedure 2230
  • Testing Changes
    • Use codecov action to update coverage reports 2238
    • Removed MarkupSafe dependency version pin from requirements.txt and moved instead into RTD docs build CI 2261

Warning

Breaking Changes
  • Updated start_iteration_callback to accept a pipeline instance instead of a pipeline class and no longer accept pipeline parameters as a parameter 2290
  • Moved default_parameters to ComponentGraph from PipelineBase. A pipeline's default_parameters is now accessible via pipeline.component_graph.default_parameters 2307
v0.24.1 May. 16, 2021
  • Enhancements
    • Integrated ARIMARegressor into AutoML 2009
    • Updated HighlyNullDataCheck to also perform a null row check 2222
    • Set max_depth to 1 in calls to featuretools dfs 2231
  • Fixes
    • Removed data splitter sampler calls during training 2253
    • Set minimum required version for for pyzmq, colorama, and docutils 2254
    • Changed BaseSampler to return None instead of y 2272
  • Changes
    • Removed ensemble split and indices in AutoMLSearch 2260
    • Updated pipeline repr() and generate_pipeline_code to return pipeline instances without generating custom pipeline class 2227
  • Documentation Changes
    • Capped Sphinx version under 4.0.0 2244
  • Testing Changes
    • Change number of cores for pytest from 4 to 2 2266
    • Add minimum dependency checker to generate minimum requirement files 2267
    • Add unit tests with minimum dependencies 2277
v0.24.0 May. 04, 2021
  • Enhancements
    • Added date_index as a required parameter for TimeSeries problems 2217
    • Have the OneHotEncoder return the transformed columns as booleans rather than floats 2170
    • Added Oversampler transformer component to EvalML 2079
    • Added Undersampler to AutoMLSearch, as well as arguments _sampler_method and sampler_balanced_ratio 2128
    • Updated prediction explanations functions to allow pipelines with XGBoost estimators 2162
    • Added partial dependence for datetime columns 2180
    • Update precision-recall curve with positive label index argument, and fix for 2d predicted probabilities 2090
    • Add pct_null_rows to HighlyNullDataCheck 2211
    • Added a standalone AutoML search method for convenience, which runs data checks and then runs automl 2152
    • Make the first batch of AutoML have a predefined order, with linear models first and complex models last 2223 2225
    • Added sampling dictionary support to BalancedClassficationSampler 2235
  • Fixes
    • Fixed partial dependence not respecting grid resolution parameter for numerical features 2180
    • Enable prediction explanations for catboost for multiclass problems 2224
  • Changes
    • Deleted baseline pipeline classes 2202
    • Reverting user specified date feature PR 2155 until pmdarima installation fix is found 2214
    • Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. 2091
    • Removed all old datasplitters from EvalML 2193
    • Deleted make_pipeline_from_components 2218
  • Documentation Changes
    • Renamed dataset to clarify that its gzipped but not a tarball 2183
    • Updated documentation to use pipeline instances instead of pipeline subclasses 2195
    • Updated contributing guide with a note about GitHub Actions permissions 2090
    • Updated automl and model understanding user guides 2090
  • Testing Changes
    • Use machineFL user token for dependency update bot, and add more reviewers 2189

Warning

Breaking Changes
  • All baseline pipeline classes (BaselineBinaryPipeline, BaselineMulticlassPipeline, BaselineRegressionPipeline, etc.) have been deleted 2202
  • Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. Pipelines can now be initialized by specifying the component graph as the first parameter, and then passing in optional arguments such as custom_name, parameters, etc. For example, BinaryClassificationPipeline(["Random Forest Classifier"], parameters={}). 2091
  • Removed all old datasplitters from EvalML 2193
  • Deleted utility method make_pipeline_from_components 2218
v0.23.0 Apr. 20, 2021
  • Enhancements
    • Refactored EngineBase and SequentialEngine api. Adding DaskEngine 1975.
    • Added optional engine argument to AutoMLSearch 1975
    • Added a warning about how time series support is still in beta when a user passes in a time series problem to AutoMLSearch 2118
    • Added NaturalLanguageNaNDataCheck data check 2122
    • Added ValueError to partial_dependence to prevent users from computing partial dependence on columns with all NaNs 2120
    • Added standard deviation of cv scores to rankings table 2154
  • Fixes
    • Fixed BalancedClassificationDataCVSplit, BalancedClassificationDataTVSplit, and BalancedClassificationSampler to use minority:majority ratio instead of majority:minority 2077
    • Fixed bug where two-way partial dependence plots with categorical variables were not working correctly 2117
    • Fixed bug where hyperparameters were not displaying properly for pipelines with a list component_graph and duplicate components 2133
    • Fixed bug where pipeline_parameters argument in AutoMLSearch was not applied to pipelines passed in as allowed_pipelines 2133
    • Fixed bug where AutoMLSearch was not applying custom hyperparameters to pipelines with a list component_graph and duplicate components 2133
  • Changes
    • Removed hyperparameter_ranges from Undersampler and renamed balanced_ratio to sampling_ratio for samplers 2113
    • Renamed TARGET_BINARY_NOT_TWO_EXAMPLES_PER_CLASS data check message code to TARGET_MULTICLASS_NOT_TWO_EXAMPLES_PER_CLASS 2126
    • Modified one-way partial dependence plots of categorical features to display data with a bar plot 2117
    • Renamed score column for automl.rankings as mean_cv_score 2135
    • Remove 'warning' from docs tool output 2031
  • Documentation Changes
    • Fixed conf.py file 2112
    • Added a sentence to the automl user guide stating that our support for time series problems is still in beta. 2118
    • Fixed documentation demos 2139
    • Update test badge in README to use GitHub Actions 2150
  • Testing Changes
    • Fixed test_describe_pipeline for pandas v1.2.4 2129
    • Added a GitHub Action for building the conda package 1870 2148

Warning

Breaking Changes
  • Renamed balanced_ratio to sampling_ratio for the BalancedClassificationDataCVSplit, BalancedClassificationDataTVSplit, BalancedClassficationSampler, and Undersampler 2113
  • Deleted the "errors" key from automl results 1975
  • Deleted the raise_and_save_error_callback and the log_and_save_error_callback 1975
  • Fixed BalancedClassificationDataCVSplit, BalancedClassificationDataTVSplit, and BalancedClassificationSampler to use minority:majority ratio instead of majority:minority 2077
v0.22.0 Apr. 06, 2021
  • Enhancements
    • Added a GitHub Action for linux_unit_tests2013
    • Added recommended actions for InvalidTargetDataCheck, updated _make_component_list_from_actions to address new action, and added TargetImputer component 1989
    • Updated AutoMLSearch._check_for_high_variance to not emit RuntimeWarning 2024
    • Added exception when pipeline passed to explain_predictions is a Stacked Ensemble pipeline 2033
    • Added sensitivity at low alert rates as an objective 2001
    • Added Undersampler transformer component 2030
  • Fixes
    • Updated Engine's train_batch to apply undersampling 2038
    • Fixed bug in where Time Series Classification pipelines were not encoding targets in predict and predict_proba 2040
    • Fixed data splitting errors if target is float for classification problems 2050
    • Pinned docutils to <0.17 to fix ReadtheDocs warning issues 2088
  • Changes
    • Removed lists as acceptable hyperparameter ranges in AutoMLSearch 2028
    • Renamed "details" to "metadata" for data check actions 2008
  • Documentation Changes
    • Catch and suppress warnings in documentation 1991 2097
    • Change spacing in start.ipynb to provide clarity for AutoMLSearch 2078
    • Fixed start code on README 2108
  • Testing Changes
v0.21.0 Mar. 24, 2021
  • Enhancements
    • Changed AutoMLSearch to default optimize_thresholds to True 1943
    • Added multiple oversampling and undersampling sampling methods as data splitters for imbalanced classification 1775
    • Added params to balanced classification data splitters for visibility 1966
    • Updated make_pipeline to not add Imputer if input data does not have numeric or categorical columns 1967
    • Updated ClassImbalanceDataCheck to better handle multiclass imbalances 1986
    • Added recommended actions for the output of data check's validate method 1968
    • Added error message for partial_dependence when features are mostly the same value 1994
    • Updated OneHotEncoder to drop one redundant feature by default for features with two categories 1997
    • Added a PolynomialDetrender component 1992
    • Added DateTimeNaNDataCheck data check 2039
  • Fixes
    • Changed best pipeline to train on the entire dataset rather than just ensemble indices for ensemble problems 2037
    • Updated binary classification pipelines to use objective decision function during scoring of custom objectives 1934
  • Changes
    • Removed data_checks parameter, data_check_results and data checks logic from AutoMLSearch 1935
    • Deleted random_state argument 1985
    • Updated Woodwork version requirement to v0.0.11 1996
  • Documentation Changes
  • Testing Changes
    • Removed build_docs CI job in favor of RTD GH builder 1974
    • Added tests to confirm support for Python 3.9 1724
    • Added tests to support Dask AutoML/Engine 1990
    • Changed build_conda_pkg job to use latest_release_changes branch in the feedstock. 1979

Warning

Breaking Changes
  • Changed AutoMLSearch to default optimize_thresholds to True 1943
  • Removed data_checks parameter, data_check_results and data checks logic from AutoMLSearch. To run the data checks which were previously run by default in AutoMLSearch, please call DefaultDataChecks().validate(X_train, y_train) or take a look at our documentation for more examples. 1935
  • Deleted random_state argument 1985
v0.20.0 Mar. 10, 2021
  • Enhancements
    • Added a GitHub Action for Detecting dependency changes 1933
    • Create a separate CV split to train stacked ensembler on for AutoMLSearch 1814
    • Added a GitHub Action for Linux unit tests 1846
    • Added ARIMARegressor estimator 1894
    • Added DataCheckAction class and DataCheckActionCode enum 1896
    • Updated Woodwork requirement to v0.0.10 1900
    • Added BalancedClassificationDataCVSplit and BalancedClassificationDataTVSplit to AutoMLSearch 1875
    • Update default classification data splitter to use downsampling for highly imbalanced data 1875
    • Updated describe_pipeline to return more information, including id of pipelines used for ensemble models 1909
    • Added utility method to create list of components from a list of DataCheckAction 1907
    • Updated validate method to include a action key in returned dictionary for all DataCheckand DataChecks 1916
    • Aggregating the shap values for predictions that we know the provenance of, e.g. OHE, text, and date-time. 1901
    • Improved error message when custom objective is passed as a string in pipeline.score 1941
    • Added score_pipelines and train_pipelines methods to AutoMLSearch 1913
    • Added support for pandas version 1.2.0 1708
    • Added score_batch and train_batch abstact methods to EngineBase and implementations in SequentialEngine 1913
    • Added ability to handle index columns in AutoMLSearch and DataChecks 2138
  • Fixes
    • Removed CI check for check_dependencies_updated_linux 1950
    • Added metaclass for time series pipelines and fix binary classification pipeline predict not using objective if it is passed as a named argument 1874
    • Fixed stack trace in prediction explanation functions caused by mixed string/numeric pandas column names 1871
    • Fixed stack trace caused by passing pipelines with duplicate names to AutoMLSearch 1932
    • Fixed AutoMLSearch.get_pipelines returning pipelines with the same attributes 1958
  • Changes
    • Reversed GitHub Action for Linux unit tests until a fix for report generation is found 1920
    • Updated add_results in AutoMLAlgorithm to take in entire pipeline results dictionary from AutoMLSearch 1891
    • Updated ClassImbalanceDataCheck to look for severe class imbalance scenarios 1905
    • Deleted the explain_prediction function 1915
    • Removed HighVarianceCVDataCheck and convered it to an AutoMLSearch method instead 1928
    • Removed warning in InvalidTargetDataCheck returned when numeric binary classification targets are not (0, 1) 1959
  • Documentation Changes
    • Updated model_understanding.ipynb to demo the two-way partial dependence capability 1919
  • Testing Changes

Warning

Breaking Changes
  • Deleted the explain_prediction function 1915
  • Removed HighVarianceCVDataCheck and convered it to an AutoMLSearch method instead 1928
  • Added score_batch and train_batch abstact methods to EngineBase. These need to be implemented in Engine subclasses 1913
v0.19.0 Feb. 23, 2021
  • Enhancements
    • Added a GitHub Action for Python windows unit tests 1844
    • Added a GitHub Action for checking updated release notes 1849
    • Added a GitHub Action for Python lint checks 1837
    • Adjusted explain_prediction, explain_predictions and explain_predictions_best_worst to handle timeseries problems. 1818
    • Updated InvalidTargetDataCheck to check for mismatched indices in target and features 1816
    • Updated Woodwork structures returned from components to support Woodwork logical type overrides set by the user 1784
    • Updated estimators to keep track of input feature names during fit() 1794
    • Updated visualize_decision_tree to include feature names in output 1813
    • Added is_bounded_like_percentage property for objectives. If true, the calculate_percent_difference method will return the absolute difference rather than relative difference 1809
    • Added full error traceback to AutoMLSearch logger file 1840
    • Changed TargetEncoder to preserve custom indices in the data 1836
    • Refactored explain_predictions and explain_predictions_best_worst to only compute features once for all rows that need to be explained 1843
    • Added custom random undersampler data splitter for classification 1857
    • Updated OutliersDataCheck implementation to calculate the probability of having no outliers 1855
    • Added Engines pipeline processing API 1838
  • Fixes
    • Changed EngineBase random_state arg to random_seed and same for user guide docs 1889
  • Changes
    • Modified calculate_percent_difference so that division by 0 is now inf rather than nan 1809
    • Removed text_columns parameter from LSA and TextFeaturizer components 1652
    • Added random_seed as an argument to our automl/pipeline/component API. Using random_state will raise a warning 1798
    • Added DataCheckError message in InvalidTargetDataCheck if input target is None and removed exception raised 1866
  • Documentation Changes
  • Testing Changes
    • Added back coverage for _get_feature_provenance in TextFeaturizer after text_columns was removed 1842
    • Pin graphviz version for windows builds 1847
    • Unpin graphviz version for windows builds 1851

Warning

Breaking Changes
  • Added a deprecation warning to explain_prediction. It will be deleted in the next release. 1860
v0.18.2 Feb. 10, 2021
  • Enhancements
    • Added uniqueness score data check 1785
    • Added "dataframe" output format for prediction explanations 1781
    • Updated LightGBM estimators to handle pandas.MultiIndex 1770
    • Sped up permutation importance for some pipelines 1762
    • Added sparsity data check 1797
    • Confirmed support for threshold tuning for binary time series classification problems 1803
  • Fixes
  • Changes
  • Documentation Changes
    • Added section on conda to the contributing guide 1771
    • Updated release process to reflect freezing main before perf tests 1787
    • Moving some prs to the right section of the release notes 1789
    • Tweak README.md. 1800
    • Fixed back arrow on install page docs 1795
    • Fixed docstring for ClassImbalanceDataCheck.validate() 1817
  • Testing Changes
v0.18.1 Feb. 1, 2021
  • Enhancements
    • Added graph_t_sne as a visualization tool for high dimensional data 1731
    • Added the ability to see the linear coefficients of features in linear models terms 1738
    • Added support for scikit-learn v0.24.0 1733
    • Added support for scipy v1.6.0 1752
    • Added SVM Classifier and Regressor to estimators 1714 1761
  • Fixes
    • Addressed bug with partial_dependence and categorical data with more categories than grid resolution 1748
    • Removed random_state arg from get_pipelines in AutoMLSearch 1719
    • Pinned pyzmq at less than 22.0.0 till we add support 1756
  • Changes
    • Updated components and pipelines to return Woodwork data structures 1668
    • Updated clone() for pipelines and components to copy over random state automatically 1753
    • Dropped support for Python version 3.6 1751
    • Removed deprecated verbose flag from AutoMLSearch parameters 1772
  • Documentation Changes
    • Add Twitter and Github link to documentation toolbar 1754
    • Added Open Graph info to documentation 1758
  • Testing Changes

Warning

Breaking Changes
  • Components and pipelines return Woodwork data structures instead of pandas data structures 1668
  • Python 3.6 will not be actively supported due to discontinued support from EvalML dependencies.
  • Deprecated verbose flag is removed for AutoMLSearch 1772
v0.18.0 Jan. 26, 2021
  • Enhancements
    • Added RMSLE, MSLE, and MAPE to core objectives while checking for negative target values in invalid_targets_data_check 1574
    • Added validation checks for binary problems with regression-like datasets and multiclass problems without true multiclass targets in invalid_targets_data_check 1665
    • Added time series support for make_pipeline 1566
    • Added target name for output of pipeline predict method 1578
    • Added multiclass check to InvalidTargetDataCheck for two examples per class 1596
    • Added support for graphviz v0.16 1657
    • Enhanced time series pipelines to accept empty features 1651
    • Added KNN Classifier to estimators. 1650
    • Added support for list inputs for objectives 1663
    • Added support for AutoMLSearch to handle time series classification pipelines 1666
    • Enhanced DelayedFeaturesTransformer to encode categorical features and targets before delaying them 1691
    • Added 2-way dependence plots. 1690
    • Added ability to directly iterate through components within Pipelines 1583
  • Fixes
    • Fixed inconsistent attributes and added Exceptions to docs 1673
    • Fixed TargetLeakageDataCheck to use Woodwork mutual_information rather than using Pandas' Pearson Correlation 1616
    • Fixed thresholding for pipelines in AutoMLSearch to only threshold binary classification pipelines 1622 1626
    • Updated load_data to return Woodwork structures and update default parameter value for index to None 1610
    • Pinned scipy at < 1.6.0 while we work on adding support 1629
    • Fixed data check message formatting in AutoMLSearch 1633
    • Addressed stacked ensemble component for scikit-learn v0.24 support by setting shuffle=True for default CV 1613
    • Fixed bug where Imputer reset the index on X 1590
    • Fixed AutoMLSearch stacktrace when a cutom objective was passed in as a primary objective or additional objective 1575
    • Fixed custom index bug for MAPE objective 1641
    • Fixed index bug for TextFeaturizer and LSA components 1644
    • Limited load_fraud dataset loaded into automl.ipynb 1646
    • add_to_rankings updates AutoMLSearch.best_pipeline when necessary 1647
    • Fixed bug where time series baseline estimators were not receiving gap and max_delay in AutoMLSearch 1645
    • Fixed jupyter notebooks to help the RTD buildtime 1654
    • Added positive_only objectives to non_core_objectives 1661
    • Fixed stacking argument n_jobs for IterativeAlgorithm 1706
    • Updated CatBoost estimators to return self in .fit() rather than the underlying model for consistency 1701
    • Added ability to initialize pipeline parameters in AutoMLSearch constructor 1676
  • Changes
    • Added labeling to graph_confusion_matrix 1632
    • Rerunning search for AutoMLSearch results in a message thrown rather than failing the search, and removed has_searched property 1647
    • Changed tuner class to allow and ignore single parameter values as input 1686
    • Capped LightGBM version limit to remove bug in docs 1711
    • Removed support for np.random.RandomState in EvalML 1727
  • Documentation Changes
    • Update Model Understanding in the user guide to include visualize_decision_tree 1678
    • Updated docs to include information about AutoMLSearch callback parameters and methods 1577
    • Updated docs to prompt users to install graphiz on Mac 1656
    • Added infer_feature_types to the start.ipynb guide 1700
    • Added multicollinearity data check to API reference and docs 1707
  • Testing Changes

Warning

Breaking Changes
  • Removed has_searched property from AutoMLSearch 1647
  • Components and pipelines return Woodwork data structures instead of pandas data structures 1668
  • Removed support for np.random.RandomState in EvalML. Rather than passing np.random.RandomState as component and pipeline random_state values, we use int random_seed 1727
v0.17.0 Dec. 29, 2020
  • Enhancements
    • Added save_plot that allows for saving figures from different backends 1588
    • Added LightGBM Regressor to regression components 1459
    • Added visualize_decision_tree for tree visualization with decision_tree_data_from_estimator and decision_tree_data_from_pipeline to reformat tree structure output 1511
    • Added DFS Transformer component into transformer components 1454
    • Added MAPE to the standard metrics for time series problems and update objectives 1510
    • Added graph_prediction_vs_actual_over_time and get_prediction_vs_actual_over_time_data to the model understanding module for time series problems 1483
    • Added a ComponentGraph class that will support future pipelines as directed acyclic graphs 1415
    • Updated data checks to accept Woodwork data structures 1481
    • Added parameter to InvalidTargetDataCheck to show only top unique values rather than all unique values 1485
    • Added multicollinearity data check 1515
    • Added baseline pipeline and components for time series regression problems 1496
    • Added more information to users about ensembling behavior in AutoMLSearch 1527
    • Add woodwork support for more utility and graph methods 1544
    • Changed DateTimeFeaturizer to encode features as int 1479
    • Return trained pipelines from AutoMLSearch.best_pipeline 1547
    • Added utility method so that users can set feature types without having to learn about Woodwork directly 1555
    • Added Linear Discriminant Analysis transformer for dimensionality reduction 1331
    • Added multiclass support for partial_dependence and graph_partial_dependence 1554
    • Added TimeSeriesBinaryClassificationPipeline and TimeSeriesMulticlassClassificationPipeline classes 1528
    • Added make_data_splitter method for easier automl data split customization 1568
    • Integrated ComponentGraph class into Pipelines for full non-linear pipeline support 1543
    • Update AutoMLSearch constructor to take training data instead of search and add_to_leaderboard 1597
    • Update split_data helper args 1597
    • Add problem type utils is_regression, is_classification, is_timeseries 1597
    • Rename AutoMLSearch data_split arg to data_splitter 1569
  • Fixes
    • Fix AutoML not passing CV folds to DefaultDataChecks for usage by ClassImbalanceDataCheck 1619
    • Fix Windows CI jobs: install numba via conda, required for shap 1490
    • Added custom-index support for reset-index-get_prediction_vs_actual_over_time_data 1494
    • Fix generate_pipeline_code to account for boolean and None differences between Python and JSON 1524 1531
    • Set max value for plotly and xgboost versions while we debug CI failures with newer versions 1532
    • Undo version pinning for plotly 1533
    • Fix ReadTheDocs build by updating the version of setuptools 1561
    • Set random_state of data splitter in AutoMLSearch to take int to keep consistency in the resulting splits 1579
    • Pin sklearn version while we work on adding support 1594
    • Pin pandas at <1.2.0 while we work on adding support 1609
    • Pin graphviz at < 0.16 while we work on adding support 1609
  • Changes
    • Reverting save_graph 1550 to resolve kaleido build issues 1585
    • Update circleci badge to apply to main 1489
    • Added script to generate github markdown for releases 1487
    • Updated selection using pandas dtypes to selecting using Woodwork logical types 1551
    • Updated dependencies to fix ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes' error and to address Woodwork and Featuretool dependencies 1540
    • Made get_prediction_vs_actual_data() a public method 1553
    • Updated Woodwork version requirement to v0.0.7 1560
    • Move data splitters from evalml.automl.data_splitters to evalml.preprocessing.data_splitters 1597
    • Rename "# Testing" in automl log output to "# Validation" 1597
  • Documentation Changes
    • Added partial dependence methods to API reference 1537
    • Updated documentation for confusion matrix methods 1611
  • Testing Changes
    • Set n_jobs=1 in most unit tests to reduce memory 1505

Warning

Breaking Changes
  • Updated minimal dependencies: numpy>=1.19.1, pandas>=1.1.0, scikit-learn>=0.23.1, scikit-optimize>=0.8.1
  • Updated AutoMLSearch.best_pipeline to return a trained pipeline. Pass in train_best_pipeline=False to AutoMLSearch in order to return an untrained pipeline.
  • Pipeline component instances can no longer be iterated through using Pipeline.component_graph 1543
  • Update AutoMLSearch constructor to take training data instead of search and add_to_leaderboard 1597
  • Update split_data helper args 1597
  • Move data splitters from evalml.automl.data_splitters to evalml.preprocessing.data_splitters 1597
  • Rename AutoMLSearch data_split arg to data_splitter 1569
v0.16.1 Dec. 1, 2020
  • Enhancements
    • Pin woodwork version to v0.0.6 to avoid breaking changes 1484
    • Updated Woodwork to >=0.0.5 in core-requirements.txt 1473
    • Removed copy_dataframe parameter for Woodwork, updated Woodwork to >=0.0.6 in core-requirements.txt 1478
    • Updated detect_problem_type to use pandas.api.is_numeric_dtype 1476
  • Changes
    • Changed make clean to delete coverage reports as a convenience for developers 1464
    • Set n_jobs=-1 by default for stacked ensemble components 1472
  • Documentation Changes
    • Updated pipeline and component documentation and demos to use Woodwork 1466
  • Testing Changes
    • Update dependency update checker to use everything from core and optional dependencies 1480
v0.16.0 Nov. 24, 2020
  • Enhancements
    • Updated pipelines and make_pipeline to accept Woodwork inputs 1393
    • Updated components to accept Woodwork inputs 1423
    • Added ability to freeze hyperparameters for AutoMLSearch 1284
    • Added Target Encoder into transformer components 1401
    • Added callback for error handling in AutoMLSearch 1403
    • Added the index id to the explain_predictions_best_worst output to help users identify which rows in their data are included 1365
    • The top_k features displayed in explain_predictions_* functions are now determined by the magnitude of shap values as opposed to the top_k largest and smallest shap values. 1374
    • Added a problem type for time series regression 1386
    • Added a is_defined_for_problem_type method to ObjectiveBase 1386
    • Added a random_state parameter to make_pipeline_from_components function 1411
    • Added DelayedFeaturesTransformer 1396
    • Added a TimeSeriesRegressionPipeline class 1418
    • Removed core-requirements.txt from the package distribution 1429
    • Updated data check messages to include a "code" and "details" fields 1451, 1462
    • Added a TimeSeriesSplit data splitter for time series problems 1441
    • Added a problem_configuration parameter to AutoMLSearch 1457
  • Fixes
    • Fixed IndexError raised in AutoMLSearch when ensembling = True but only one pipeline to iterate over 1397
    • Fixed stacked ensemble input bug and LightGBM warning and bug in AutoMLSearch 1388
    • Updated enum classes to show possible enum values as attributes 1391
    • Updated calls to Woodwork's to_pandas() to to_series() and to_dataframe() 1428
    • Fixed bug in OHE where column names were not guaranteed to be unique 1349
    • Fixed bug with percent improvement of ExpVariance objective on data with highly skewed target 1467
    • Fix SimpleImputer error which occurs when all features are bool type 1215
  • Changes
    • Changed OutliersDataCheck to return the list of columns, rather than rows, that contain outliers 1377
    • Simplified and cleaned output for Code Generation 1371
    • Reverted changes from 1337 1409
    • Updated data checks to return dictionary of warnings and errors instead of a list 1448
    • Updated AutoMLSearch to pass Woodwork data structures to every pipeline (instead of pandas DataFrames) 1450
    • Update AutoMLSearch to default to max_batches=1 instead of max_iterations=5 1452
    • Updated _evaluate_pipelines to consolidate side effects 1410
  • Documentation Changes
    • Added description of CLA to contributing guide, updated description of draft PRs 1402
    • Updated documentation to include all data checks, DataChecks, and usage of data checks in AutoML 1412
    • Updated docstrings from np.array to np.ndarray 1417
    • Added section on stacking ensembles in AutoMLSearch documentation 1425
  • Testing Changes
    • Removed category_encoders from test-requirements.txt 1373
    • Tweak codecov.io settings again to avoid flakes 1413
    • Modified make lint to check notebook versions in the docs 1431
    • Modified make lint-fix to standardize notebook versions in the docs 1431
    • Use new version of pull request Github Action for dependency check (1443)
    • Reduced number of workers for tests to 4 1447

Warning

Breaking Changes
  • The top_k and top_k_features parameters in explain_predictions_* functions now return k features as opposed to 2 * k features 1374
  • Renamed problem_type to problem_types in RegressionObjective, BinaryClassificationObjective, and MulticlassClassificationObjective 1319
  • Data checks now return a dictionary of warnings and errors instead of a list 1448
v0.15.0 Oct. 29, 2020
  • Enhancements
    • Added stacked ensemble component classes (StackedEnsembleClassifier, StackedEnsembleRegressor) 1134
    • Added stacked ensemble components to AutoMLSearch 1253
    • Added DecisionTreeClassifier and DecisionTreeRegressor to AutoML 1255
    • Added graph_prediction_vs_actual in model_understanding for regression problems 1252
    • Added parameter to OneHotEncoder to enable filtering for features to encode for 1249
    • Added percent-better-than-baseline for all objectives to automl.results 1244
    • Added HighVarianceCVDataCheck and replaced synonymous warning in AutoMLSearch 1254
    • Added PCA Transformer component for dimensionality reduction 1270
    • Added generate_pipeline_code and generate_component_code to allow for code generation given a pipeline or component instance 1306
    • Added PCA Transformer component for dimensionality reduction 1270
    • Updated AutoMLSearch to support Woodwork data structures 1299
    • Added cv_folds to ClassImbalanceDataCheck and added this check to DefaultDataChecks 1333
    • Make max_batches argument to AutoMLSearch.search public 1320
    • Added text support to automl search 1062
    • Added _pipelines_per_batch as a private argument to AutoMLSearch 1355
  • Fixes
    • Fixed ML performance issue with ordered datasets: always shuffle data in automl's default CV splits 1265
    • Fixed broken evalml info CLI command 1293
    • Fixed boosting type='rf' for LightGBM Classifier, as well as num_leaves error 1302
    • Fixed bug in explain_predictions_best_worst where a custom index in the target variable would cause a ValueError 1318
    • Added stacked ensemble estimators to to evalml.pipelines.__init__ file 1326
    • Fixed bug in OHE where calls to transform were not deterministic if top_n was less than the number of categories in a column 1324
    • Fixed LightGBM warning messages during AutoMLSearch 1342
    • Fix warnings thrown during AutoMLSearch in HighVarianceCVDataCheck 1346
    • Fixed bug where TrainingValidationSplit would return invalid location indices for dataframes with a custom index 1348
    • Fixed bug where the AutoMLSearch random_state was not being passed to the created pipelines 1321
  • Changes
    • Allow add_to_rankings to be called before AutoMLSearch is called 1250
    • Removed Graphviz from test-requirements to add to requirements.txt 1327
    • Removed max_pipelines parameter from AutoMLSearch 1264
    • Include editable installs in all install make targets 1335
    • Made pip dependencies featuretools and nlp_primitives core dependencies 1062
    • Removed PartOfSpeechCount from TextFeaturizer transform primitives 1062
    • Added warning for partial_dependency when the feature includes null values 1352
  • Documentation Changes
    • Fixed and updated code blocks in Release Notes 1243
    • Added DecisionTree estimators to API Reference 1246
    • Changed class inheritance display to flow vertically 1248
    • Updated cost-benefit tutorial to use a holdout/test set 1159
    • Added evalml info command to documentation 1293
    • Miscellaneous doc updates 1269
    • Removed conda pre-release testing from the release process document 1282
    • Updates to contributing guide 1310
    • Added Alteryx footer to docs with Twitter and Github link 1312
    • Added documentation for evalml installation for Python 3.6 1322
    • Added documentation changes to make the API Docs easier to understand 1323
    • Fixed documentation for feature_importance 1353
    • Added tutorial for running AutoML with text data 1357
    • Added documentation for woodwork integration with automl search 1361
  • Testing Changes
    • Added tests for jupyter_check to handle IPython 1256
    • Cleaned up make_pipeline tests to test for all estimators 1257
    • Added a test to check conda build after merge to main 1247
    • Removed code that was lacking codecov for __main__.py and unnecessary 1293
    • Codecov: round coverage up instead of down 1334
    • Add DockerHub credentials to CI testing environment 1356
    • Add DockerHub credentials to conda testing environment 1363

Warning

Breaking Changes
  • Renamed LabelLeakageDataCheck to TargetLeakageDataCheck 1319
  • max_pipelines parameter has been removed from AutoMLSearch. Please use max_iterations instead. 1264
  • AutoMLSearch.search() will now log a warning if the input is not a Woodwork data structure (pandas, numpy) 1299
  • Make max_batches argument to AutoMLSearch.search public 1320
  • Removed unused argument feature_types from AutoMLSearch.search 1062
v0.14.1 Sep. 29, 2020
  • Enhancements
    • Updated partial dependence methods to support calculating numeric columns in a dataset with non-numeric columns 1150
    • Added get_feature_names on OneHotEncoder 1193
    • Added detect_problem_type to problem_type/utils.py to automatically detect the problem type given targets 1194
    • Added LightGBM to AutoMLSearch 1199
    • Updated scikit-learn and scikit-optimize to use latest versions - 0.23.2 and 0.8.1 respectively 1141
    • Added __str__ and __repr__ for pipelines and components 1218
    • Included internal target check for both training and validation data in AutoMLSearch 1226
    • Added ProblemTypes.all_problem_types helper to get list of supported problem types 1219
    • Added DecisionTreeClassifier and DecisionTreeRegressor classes 1223
    • Added ProblemTypes.all_problem_types helper to get list of supported problem types 1219
    • DataChecks can now be parametrized by passing a list of DataCheck classes and a parameter dictionary 1167
    • Added first CV fold score as validation score in AutoMLSearch.rankings 1221
    • Updated flake8 configuration to enable linting on __init__.py files 1234
    • Refined make_pipeline_from_components implementation 1204
  • Fixes
    • Updated GitHub URL after migration to Alteryx GitHub org 1207
    • Changed Problem Type enum to be more similar to the string name 1208
    • Wrapped call to scikit-learn's partial dependence method in a try/finally block 1232
  • Changes
    • Added allow_writing_files as a named argument to CatBoost estimators. 1202
    • Added solver and multi_class as named arguments to LogisticRegressionClassifier 1202
    • Replaced pipeline's ._transform method to evaluate all the preprocessing steps of a pipeline with .compute_estimator_features 1231
    • Changed default large dataset train/test splitting behavior 1205
  • Documentation Changes
    • Included description of how to access the component instances and features for pipeline user guide 1163
    • Updated API docs to refer to target as "target" instead of "labels" for non-classification tasks and minor docs cleanup 1160
    • Added Class Imbalance Data Check to api_reference.rst 1190 1200
    • Added pipeline properties to API reference 1209
    • Clarified what the objective parameter in AutoML is used for in AutoML API reference and AutoML user guide 1222
    • Updated API docs to include skopt.space.Categorical option for component hyperparameter range definition 1228
    • Added install documentation for libomp in order to use LightGBM on Mac 1233
    • Improved description of max_iterations in documentation 1212
    • Removed unused code from sphinx conf 1235
  • Testing Changes

Warning

Breaking Changes
  • DefaultDataChecks now accepts a problem_type parameter that must be specified 1167
  • Pipeline's ._transform method to evaluate all the preprocessing steps of a pipeline has been replaced with .compute_estimator_features 1231
  • get_objectives has been renamed to get_core_objectives. This function will now return a list of valid objective instances 1230
v0.13.2 Sep. 17, 2020
  • Enhancements
    • Added output_format field to explain predictions functions 1107
    • Modified get_objective and get_objectives to be able to return any objective in evalml.objectives 1132
    • Added a return_instance boolean parameter to get_objective 1132
    • Added ClassImbalanceDataCheck to determine whether target imbalance falls below a given threshold 1135
    • Added label encoder to LightGBM for binary classification 1152
    • Added labels for the row index of confusion matrix 1154
    • Added AutoMLSearch object as another parameter in search callbacks 1156
    • Added the corresponding probability threshold for each point displayed in graph_roc_curve 1161
    • Added __eq__ for ComponentBase and PipelineBase 1178
    • Added support for multiclass classification for roc_curve 1164
    • Added categories accessor to OneHotEncoder for listing the categories associated with a feature 1182
    • Added utility function to create pipeline instances from a list of component instances 1176
  • Fixes
    • Fixed XGBoost column names for partial dependence methods 1104
    • Removed dead code validating column type from TextFeaturizer 1122
    • Fixed issue where Imputer cannot fit when there is None in a categorical or boolean column 1144
    • OneHotEncoder preserves the custom index in the input data 1146
    • Fixed representation for ModelFamily 1165
    • Removed duplicate nbsphinx dependency in dev-requirements.txt 1168
    • Users can now pass in any valid kwargs to all estimators 1157
    • Remove broken accessor OneHotEncoder.get_feature_names and unneeded base class 1179
    • Removed LightGBM Estimator from AutoML models 1186
  • Changes
    • Pinned scikit-optimize version to 0.7.4 1136
    • Removed tqdm as a dependency 1177
    • Added lightgbm version 3.0.0 to latest_dependency_versions.txt 1185
    • Rename max_pipelines to max_iterations 1169
  • Documentation Changes
    • Fixed API docs for AutoMLSearch add_result_callback 1113
    • Added a step to our release process for pushing our latest version to conda-forge 1118
    • Added warning for missing ipywidgets dependency for using PipelineSearchPlots on Jupyterlab 1145
    • Updated README.md example to load demo dataset 1151
    • Swapped mapping of breast cancer targets in model_understanding.ipynb 1170
  • Testing Changes
    • Added test confirming TextFeaturizer never outputs null values 1122
    • Changed Python version of Update Dependencies action to 3.8.x 1137
    • Fixed release notes check-in test for Update Dependencies actions 1172

Warning

Breaking Changes
  • get_objective will now return a class definition rather than an instance by default 1132
  • Deleted OPTIONS dictionary in evalml.objectives.utils.py 1132
  • If specifying an objective by string, the string must now match the objective's name field, case-insensitive 1132
  • Passing "Cost Benefit Matrix", "Fraud Cost", "Lead Scoring", "Mean Squared Log Error",

    "Recall", "Recall Macro", "Recall Micro", "Recall Weighted", or "Root Mean Squared Log Error" to AutoMLSearch will now result in a ValueError rather than an ObjectiveNotFoundError 1132

  • Search callbacks start_iteration_callback and add_results_callback have changed to include a copy of the AutoMLSearch object as a third parameter 1156
  • Deleted OneHotEncoder.get_feature_names method which had been broken for a while, in favor of pipelines' input_feature_names 1179
  • Deleted empty base class CategoricalEncoder which OneHotEncoder component was inheriting from 1176
  • Results from roc_curve will now return as a list of dictionaries with each dictionary representing a class 1164
  • max_pipelines now raises a DeprecationWarning and will be removed in the next release. max_iterations should be used instead. 1169
v0.13.1 Aug. 25, 2020
  • Enhancements
    • Added Cost-Benefit Matrix objective for binary classification 1038
    • Split fill_value into categorical_fill_value and numeric_fill_value for Imputer 1019
    • Added explain_predictions and explain_predictions_best_worst for explaining multiple predictions with SHAP 1016
    • Added new LSA component for text featurization 1022
    • Added guide on installing with conda 1041
    • Added a “cost-benefit curve” util method to graph cost-benefit matrix scores vs. binary classification thresholds 1081
    • Standardized error when calling transform/predict before fit for pipelines 1048
    • Added percent_better_than_baseline to AutoML search rankings and full rankings table 1050
    • Added one-way partial dependence and partial dependence plots 1079
    • Added "Feature Value" column to prediction explanation reports. 1064
    • Added LightGBM classification estimator 1082, 1114
    • Added max_batches parameter to AutoMLSearch 1087
  • Fixes
    • Updated TextFeaturizer component to no longer require an internet connection to run 1022
    • Fixed non-deterministic element of TextFeaturizer transformations 1022
    • Added a StandardScaler to all ElasticNet pipelines 1065
    • Updated cost-benefit matrix to normalize score 1099
    • Fixed logic in calculate_percent_difference so that it can handle negative values 1100
  • Changes
    • Added needs_fitting property to ComponentBase 1044
    • Updated references to data types to use datatype lists defined in evalml.utils.gen_utils 1039
    • Remove maximum version limit for SciPy dependency 1051
    • Moved all_components and other component importers into runtime methods 1045
    • Consolidated graphing utility methods under evalml.utils.graph_utils 1060
    • Made slight tweaks to how TextFeaturizer uses featuretools, and did some refactoring of that and of LSA 1090
    • Changed show_all_features parameter into importance_threshold, which allows for thresholding feature importance 1097, 1103
  • Documentation Changes
    • Update setup.py URL to point to the github repo 1037
    • Added tutorial for using the cost-benefit matrix objective 1088
    • Updated model_understanding.ipynb to include documentation for using plotly on Jupyter Lab 1108
  • Testing Changes
    • Refactor CircleCI tests to use matrix jobs (1043)
    • Added a test to check that all test directories are included in evalml package 1054

Warning

Breaking Changes
  • confusion_matrix and normalize_confusion_matrix have been moved to evalml.utils 1038
  • All graph utility methods previously under evalml.pipelines.graph_utils have been moved to evalml.utils.graph_utils 1060
v0.12.2 Aug. 6, 2020
  • Enhancements
    • Add save/load method to components 1023
    • Expose pickle protocol as optional arg to save/load 1023
    • Updated estimators used in AutoML to include ExtraTrees and ElasticNet estimators 1030
  • Fixes
  • Changes
    • Removed DeprecationWarning for SimpleImputer 1018
  • Documentation Changes
    • Add note about version numbers to release process docs 1034
  • Testing Changes
    • Test files are now included in the evalml package 1029
v0.12.0 Aug. 3, 2020
  • Enhancements
    • Added string and categorical targets support for binary and multiclass pipelines and check for numeric targets for DetectLabelLeakage data check 932
    • Added clear exception for regression pipelines if target datatype is string or categorical 960
    • Added target column names and class labels in predict and predict_proba output for pipelines 951
    • Added _compute_shap_values and normalize_values to pipelines/explanations module 958
    • Added explain_prediction feature which explains single predictions with SHAP 974
    • Added Imputer to allow different imputation strategies for numerical and categorical dtypes 991
    • Added support for configuring logfile path using env var, and don't create logger if there are filesystem errors 975
    • Updated catboost estimators' default parameters and automl hyperparameter ranges to speed up fit time 998
  • Fixes
    • Fixed ReadtheDocs warning failure regarding embedded gif 943
    • Removed incorrect parameter passed to pipeline classes in _add_baseline_pipelines 941
    • Added universal error for calling predict, predict_proba, transform, and feature_importances before fitting 969, 994
    • Made TextFeaturizer component and pip dependencies featuretools and nlp_primitives optional 976
    • Updated imputation strategy in automl to no longer limit impute strategy to most_frequent for all features if there are any categorical columns 991
    • Fixed UnboundLocalError for cv_pipeline when automl search errors 996
    • Fixed Imputer to reset dataframe index to preserve behavior expected from SimpleImputer 1009
  • Changes
    • Moved get_estimators to evalml.pipelines.components.utils 934
    • Modified Pipelines to raise PipelineScoreError when they encounter an error during scoring 936
    • Moved evalml.model_families.list_model_families to evalml.pipelines.components.allowed_model_families 959
    • Renamed DateTimeFeaturization to DateTimeFeaturizer 977
    • Added check to stop search and raise an error if all pipelines in a batch return NaN scores 1015
  • Documentation Changes
    • Updated README.md 963
    • Reworded message when errors are returned from data checks in search 982
    • Added section on understanding model predictions with explain_prediction to User Guide 981
    • Added a section to the user guide and api reference about how XGBoost and CatBoost are not fully supported. 992
    • Added custom components section in user guide 993
    • Updated FAQ section formatting 997
    • Updated release process documentation 1003
  • Testing Changes
    • Moved predict_proba and predict tests regarding string / categorical targets to test_pipelines.py 972
    • Fixed dependency update bot by updating python version to 3.7 to avoid frequent github version updates 1002

Warning

Breaking Changes
  • get_estimators has been moved to evalml.pipelines.components.utils (previously was under evalml.pipelines.utils) 934
  • Removed the raise_errors flag in AutoML search. All errors during pipeline evaluation will be caught and logged. 936
  • evalml.model_families.list_model_families has been moved to evalml.pipelines.components.allowed_model_families 959
  • TextFeaturizer: the featuretools and nlp_primitives packages must be installed after installing evalml in order to use this component 976
  • Renamed DateTimeFeaturization to DateTimeFeaturizer 977
v0.11.2 July 16, 2020
  • Enhancements
    • Added NoVarianceDataCheck to DefaultDataChecks 893
    • Added text processing and featurization component TextFeaturizer 913, 924
    • Added additional checks to InvalidTargetDataCheck to handle invalid target data types 929
    • AutoMLSearch will now handle KeyboardInterrupt and prompt user for confirmation 915
  • Fixes
    • Makes automl results a read-only property 919
  • Changes
    • Deleted static pipelines and refactored tests involving static pipelines, removed all_pipelines() and get_pipelines() 904
    • Moved list_model_families to evalml.model_family.utils 903
    • Updated all_pipelines, all_estimators, all_components to use the same mechanism for dynamically generating their elements 898
    • Rename master branch to main 918
    • Add pypi release github action 923
    • Updated AutoMLSearch.search stdout output and logging and removed tqdm progress bar 921
    • Moved automl config checks previously in search() to init 933
  • Documentation Changes
    • Reorganized and rewrote documentation 937
    • Updated to use pydata sphinx theme 937
    • Updated docs to use release_notes instead of changelog 942
  • Testing Changes
    • Cleaned up fixture names and usages in tests 895

Warning

Breaking Changes
  • list_model_families has been moved to evalml.model_family.utils (previously was under evalml.pipelines.utils) 903
  • get_estimators has been moved to evalml.pipelines.components.utils (previously was under evalml.pipelines.utils) 934
  • Static pipeline definitions have been removed, but similar pipelines can still be constructed via creating an instance of PipelineBase 904
  • all_pipelines() and get_pipelines() utility methods have been removed 904
v0.11.0 June 30, 2020
  • Enhancements
    • Added multiclass support for ROC curve graphing 832
    • Added preprocessing component to drop features whose percentage of NaN values exceeds a specified threshold 834
    • Added data check to check for problematic target labels 814
    • Added PerColumnImputer that allows imputation strategies per column 824
    • Added transformer to drop specific columns 827
    • Added support for categories, handle_error, and drop parameters in OneHotEncoder 830 897
    • Added preprocessing component to handle DateTime columns featurization 838
    • Added ability to clone pipelines and components 842
    • Define getter method for component parameters 847
    • Added utility methods to calculate and graph permutation importances 860, 880
    • Added new utility functions necessary for generating dynamic preprocessing pipelines 852
    • Added kwargs to all components 863
    • Updated AutoSearchBase to use dynamically generated preprocessing pipelines 870
    • Added SelectColumns transformer 873
    • Added ability to evaluate additional pipelines for automl search 874
    • Added default_parameters class property to components and pipelines 879
    • Added better support for disabling data checks in automl search 892
    • Added ability to save and load AutoML objects to file 888
    • Updated AutoSearchBase.get_pipelines to return an untrained pipeline instance 876
    • Saved learned binary classification thresholds in automl results cv data dict 876
  • Fixes
    • Fixed bug where SimpleImputer cannot handle dropped columns 846
    • Fixed bug where PerColumnImputer cannot handle dropped columns 855
    • Enforce requirement that builtin components save all inputted values in their parameters dict 847
    • Don't list base classes in all_components output 847
    • Standardize all components to output pandas data structures, and accept either pandas or numpy 853
    • Fixed rankings and full_rankings error when search has not been run 894
  • Changes
    • Update all_pipelines and all_components to try initializing pipelines/components, and on failure exclude them 849
    • Refactor handle_components to handle_components_class, standardize to ComponentBase subclass instead of instance 850
    • Refactor "blacklist"/"whitelist" to "allow"/"exclude" lists 854
    • Replaced AutoClassificationSearch and AutoRegressionSearch with AutoMLSearch 871
    • Renamed feature_importances and permutation_importances methods to use singular names (feature_importance and permutation_importance) 883
    • Updated automl default data splitter to train/validation split for large datasets 877
    • Added open source license, update some repo metadata 887
    • Removed dead code in _get_preprocessing_components 896
  • Documentation Changes
    • Fix some typos and update the EvalML logo 872
  • Testing Changes
    • Update the changelog check job to expect the new branching pattern for the deps update bot 836
    • Check that all components output pandas datastructures, and can accept either pandas or numpy 853
    • Replaced AutoClassificationSearch and AutoRegressionSearch with AutoMLSearch 871

Warning

Breaking Changes
  • Pipelines' static component_graph field must contain either ComponentBase subclasses or str, instead of ComponentBase subclass instances 850
  • Rename handle_component to handle_component_class. Now standardizes to ComponentBase subclasses instead of ComponentBase subclass instances 850
  • Renamed automl's cv argument to data_split 877
  • Pipelines' and classifiers' feature_importances is renamed feature_importance, graph_feature_importances is renamed graph_feature_importance 883
  • Passing data_checks=None to automl search will not perform any data checks as opposed to default checks. 892
  • Pipelines to search for in AutoML are now determined automatically, rather than using the statically-defined pipeline classes. 870
  • Updated AutoSearchBase.get_pipelines to return an untrained pipeline instance, instead of one which happened to be trained on the final cross-validation fold 876
v0.10.0 May 29, 2020
  • Enhancements
    • Added baseline models for classification and regression, add functionality to calculate baseline models before searching in AutoML 746
    • Port over highly-null guardrail as a data check and define DefaultDataChecks and DisableDataChecks classes 745
    • Update Tuner classes to work directly with pipeline parameters dicts instead of flat parameter lists 779
    • Add Elastic Net as a pipeline option 812
    • Added new Pipeline option ExtraTrees 790
    • Added precicion-recall curve metrics and plot for binary classification problems in evalml.pipeline.graph_utils 794
    • Update the default automl algorithm to search in batches, starting with default parameters for each pipeline and iterating from there 793
    • Added AutoMLAlgorithm class and IterativeAlgorithm impl, separated from AutoSearchBase 793
  • Fixes
    • Update pipeline score to return nan score for any objective which throws an exception during scoring 787
    • Fixed bug introduced in 787 where binary classification metrics requiring predicted probabilities error in scoring 798
    • CatBoost and XGBoost classifiers and regressors can no longer have a learning rate of 0 795
  • Changes
    • Cleanup pipeline score code, and cleanup codecov 711
    • Remove pass for abstract methods for codecov 730
    • Added __str__ for AutoSearch object 675
    • Add util methods to graph ROC and confusion matrix 720
    • Refactor AutoBase to AutoSearchBase 758
    • Updated AutoBase with data_checks parameter, removed previous detect_label_leakage parameter, and added functionality to run data checks before search in AutoML 765
    • Updated our logger to use Python's logging utils 763
    • Refactor most of AutoSearchBase._do_iteration impl into AutoSearchBase._evaluate 762
    • Port over all guardrails to use the new DataCheck API 789
    • Expanded import_or_raise to catch all exceptions 759
    • Adds RMSE, MSLE, RMSLE as standard metrics 788
    • Don't allow Recall to be used as an objective for AutoML 784
    • Removed feature selection from pipelines 819
    • Update default estimator parameters to make automl search faster and more accurate 793
  • Documentation Changes
    • Add instructions to freeze master on release.md 726
    • Update release instructions with more details 727 733
    • Add objective base classes to API reference 736
    • Fix components API to match other modules 747
  • Testing Changes
    • Delete codecov yml, use codecov.io's default 732
    • Added unit tests for fraud cost, lead scoring, and standard metric objectives 741
    • Update codecov client 782
    • Updated AutoBase __str__ test to include no parameters case 783
    • Added unit tests for ExtraTrees pipeline 790
    • If codecov fails to upload, fail build 810
    • Updated Python version of dependency action 816
    • Update the dependency update bot to use a suffix when creating branches 817

Warning

Breaking Changes
  • The detect_label_leakage parameter for AutoML classes has been removed and replaced by a data_checks parameter 765
  • Moved ROC and confusion matrix methods from evalml.pipeline.plot_utils to evalml.pipeline.graph_utils 720
  • Tuner classes require a pipeline hyperparameter range dict as an init arg instead of a space definition 779
  • Tuner.propose and Tuner.add work directly with pipeline parameters dicts instead of flat parameter lists 779
  • PipelineBase.hyperparameters and custom_hyperparameters use pipeline parameters dict format instead of being represented as a flat list 779
  • All guardrail functions previously under evalml.guardrails.utils will be removed and replaced by data checks 789
  • Recall disallowed as an objective for AutoML 784
  • AutoSearchBase parameter tuner has been renamed to tuner_class 793
  • AutoSearchBase parameter possible_pipelines and possible_model_families have been renamed to allowed_pipelines and allowed_model_families 793
v0.9.0 Apr. 27, 2020
  • Enhancements
    • Added Accuracy as an standard objective 624
    • Added verbose parameter to load_fraud 560
    • Added Balanced Accuracy metric for binary, multiclass 612 661
    • Added XGBoost regressor and XGBoost regression pipeline 666
    • Added Accuracy metric for multiclass 672
    • Added objective name in AutoBase.describe_pipeline 686
    • Added DataCheck and DataChecks, Message classes and relevant subclasses 739
  • Fixes
    • Removed direct access to cls.component_graph 595
    • Add testing files to .gitignore 625
    • Remove circular dependencies from Makefile 637
    • Add error case for normalize_confusion_matrix() 640
    • Fixed XGBoostClassifier and XGBoostRegressor bug with feature names that contain [, ], or < 659
    • Update make_pipeline_graph to not accidentally create empty file when testing if path is valid 649
    • Fix pip installation warning about docsutils version, from boto dependency 664
    • Removed zero division warning for F1/precision/recall metrics 671
    • Fixed summary for pipelines without estimators 707
  • Changes
    • Updated default objective for binary/multiclass classification to log loss 613
    • Created classification and regression pipeline subclasses and removed objective as an attribute of pipeline classes 405
    • Changed the output of score to return one dictionary 429
    • Created binary and multiclass objective subclasses 504
    • Updated objectives API 445
    • Removed call to get_plot_data from AutoML 615
    • Set raise_error to default to True for AutoML classes 638
    • Remove unnecessary "u" prefixes on some unicode strings 641
    • Changed one-hot encoder to return uint8 dtypes instead of ints 653
    • Pipeline _name field changed to custom_name 650
    • Removed graphs.py and moved methods into PipelineBase 657, 665
    • Remove s3fs as a dev dependency 664
    • Changed requirements-parser to be a core dependency 673
    • Replace supported_problem_types field on pipelines with problem_type attribute on base classes 678
    • Changed AutoML to only show best results for a given pipeline template in rankings, added full_rankings property to show all 682
    • Update ModelFamily values: don't list xgboost/catboost as classifiers now that we have regression pipelines for them 677
    • Changed AutoML's describe_pipeline to get problem type from pipeline instead 685
    • Standardize import_or_raise error messages 683
    • Updated argument order of objectives to align with sklearn's 698
    • Renamed pipeline.feature_importance_graph to pipeline.graph_feature_importances 700
    • Moved ROC and confusion matrix methods to evalml.pipelines.plot_utils 704
    • Renamed MultiClassificationObjective to MulticlassClassificationObjective, to align with pipeline naming scheme 715
  • Documentation Changes
    • Fixed some sphinx warnings 593
    • Fixed docstring for AutoClassificationSearch with correct command 599
    • Limit readthedocs formats to pdf, not htmlzip and epub 594 600
    • Clean up objectives API documentation 605
    • Fixed function on Exploring search results page 604
    • Update release process doc 567
    • AutoClassificationSearch and AutoRegressionSearch show inherited methods in API reference 651
    • Fixed improperly formatted code in breaking changes for changelog 655
    • Added configuration to treat Sphinx warnings as errors 660
    • Removed separate plotting section for pipelines in API reference 657, 665
    • Have leads example notebook load S3 files using https, so we can delete s3fs dev dependency 664
    • Categorized components in API reference and added descriptions for each category 663
    • Fixed Sphinx warnings about BalancedAccuracy objective 669
    • Updated API reference to include missing components and clean up pipeline docstrings 689
    • Reorganize API ref, and clarify pipeline sub-titles 688
    • Add and update preprocessing utils in API reference 687
    • Added inheritance diagrams to API reference 695
    • Documented which default objective AutoML optimizes for 699
    • Create seperate install page 701
    • Include more utils in API ref, like import_or_raise 704
    • Add more color to pipeline documentation 705
  • Testing Changes
    • Matched install commands of check_latest_dependencies test and it's GitHub action 578
    • Added Github app to auto assign PR author as assignee 477
    • Removed unneeded conda installation of xgboost in windows checkin tests 618
    • Update graph tests to always use tmpfile dir 649
    • Changelog checkin test workaround for release PRs: If 'future release' section is empty of PR refs, pass check 658
    • Add changelog checkin test exception for dep-update branch 723

Warning

Breaking Changes

  • Pipelines will now no longer take an objective parameter during instantiation, and will no longer have an objective attribute.
  • fit() and predict() now use an optional objective parameter, which is only used in binary classification pipelines to fit for a specific objective.
  • score() will now use a required objectives parameter that is used to determine all the objectives to score on. This differs from the previous behavior, where the pipeline's objective was scored on regardless.
  • score() will now return one dictionary of all objective scores.
  • ROC and ConfusionMatrix plot methods via Auto(*).plot have been removed by 615 and are replaced by roc_curve and confusion_matrix in evamlm.pipelines.plot_utils in 704
  • normalize_confusion_matrix has been moved to evalml.pipelines.plot_utils 704
  • Pipelines _name field changed to custom_name
  • Pipelines supported_problem_types field is removed because it is no longer necessary 678
  • Updated argument order of objectives' objective_function to align with sklearn 698
  • pipeline.feature_importance_graph has been renamed to pipeline.graph_feature_importances in 700
  • Removed unsupported MSLE objective 704
v0.8.0 Apr. 1, 2020
  • Enhancements
    • Add normalization option and information to confusion matrix 484
    • Add util function to drop rows with NaN values 487
    • Renamed PipelineBase.name as PipelineBase.summary and redefined PipelineBase.name as class property 491
    • Added access to parameters in Pipelines with PipelineBase.parameters (used to be return of PipelineBase.describe) 501
    • Added fill_value parameter for SimpleImputer 509
    • Added functionality to override component hyperparameters and made pipelines take hyperparemeters from components 516
    • Allow numpy.random.RandomState for random_state parameters 556
  • Fixes
    • Removed unused dependency matplotlib, and move category_encoders to test reqs 572
  • Changes
    • Undo version cap in XGBoost placed in 402 and allowed all released of XGBoost 407
    • Support pandas 1.0.0 486
    • Made all references to the logger static 503
    • Refactored model_type parameter for components and pipelines to model_family 507
    • Refactored problem_types for pipelines and components into supported_problem_types 515
    • Moved pipelines/utils.save_pipeline and pipelines/utils.load_pipeline to PipelineBase.save and PipelineBase.load 526
    • Limit number of categories encoded by OneHotEncoder 517
  • Documentation Changes
    • Updated API reference to remove PipelinePlot and added moved PipelineBase plotting methods 483
    • Add code style and github issue guides 463 512
    • Updated API reference for to surface class variables for pipelines and components 537
    • Fixed README documentation link 535
    • Unhid PR references in changelog 656
  • Testing Changes
    • Added automated dependency check PR 482, 505
    • Updated automated dependency check comment 497
    • Have build_docs job use python executor, so that env vars are set properly 547
    • Added simple test to make sure OneHotEncoder's top_n works with large number of categories 552
    • Run windows unit tests on PRs 557

Warning

Breaking Changes

  • AutoClassificationSearch and AutoRegressionSearch's model_types parameter has been refactored into allowed_model_families
  • ModelTypes enum has been changed to ModelFamily
  • Components and Pipelines now have a model_family field instead of model_type
  • get_pipelines utility function now accepts model_families as an argument instead of model_types
  • PipelineBase.name no longer returns structure of pipeline and has been replaced by PipelineBase.summary
  • PipelineBase.problem_types and Estimator.problem_types has been renamed to supported_problem_types
  • pipelines/utils.save_pipeline and pipelines/utils.load_pipeline moved to PipelineBase.save and PipelineBase.load
v0.7.0 Mar. 9, 2020
  • Enhancements
    • Added emacs buffers to .gitignore 350
    • Add CatBoost (gradient-boosted trees) classification and regression components and pipelines 247
    • Added Tuner abstract base class 351
    • Added n_jobs as parameter for AutoClassificationSearch and AutoRegressionSearch 403
    • Changed colors of confusion matrix to shades of blue and updated axis order to match scikit-learn's 426
    • Added PipelineBase .graph and .feature_importance_graph methods, moved from previous location 423
    • Added support for python 3.8 462
  • Fixes
    • Fixed ROC and confusion matrix plots not being calculated if user passed own additional_objectives 276
    • Fixed ReadtheDocs FileNotFoundError exception for fraud dataset 439
  • Changes
    • Added n_estimators as a tunable parameter for XGBoost 307
    • Remove unused parameter ObjectiveBase.fit_needs_proba 320
    • Remove extraneous parameter component_type from all components 361
    • Remove unused rankings.csv file 397
    • Downloaded demo and test datasets so unit tests can run offline 408
    • Remove _needs_fitting attribute from Components 398
    • Changed plot.feature_importance to show only non-zero feature importances by default, added optional parameter to show all 413
    • Refactored PipelineBase to take in parameter dictionary and moved pipeline metadata to class attribute 421
    • Dropped support for Python 3.5 438
    • Removed unused apply.py file 449
    • Clean up requirements.txt to remove unused deps 451
    • Support installation without all required dependencies 459
  • Documentation Changes
    • Update release.md with instructions to release to internal license key 354
  • Testing Changes
    • Added tests for utils (and moved current utils to gen_utils) 297
    • Moved XGBoost install into it's own separate step on Windows using Conda 313
    • Rewind pandas version to before 1.0.0, to diagnose test failures for that version 325
    • Added dependency update checkin test 324
    • Rewind XGBoost version to before 1.0.0 to diagnose test failures for that version 402
    • Update dependency check to use a whitelist 417
    • Update unit test jobs to not install dev deps 455

Warning

Breaking Changes

  • Python 3.5 will not be actively supported.
v0.6.0 Dec. 16, 2019
  • Enhancements
    • Added ability to create a plot of feature importances 133
    • Add early stopping to AutoML using patience and tolerance parameters 241
    • Added ROC and confusion matrix metrics and plot for classification problems and introduce PipelineSearchPlots class 242
    • Enhanced AutoML results with search order 260
    • Added utility function to show system and environment information 300
  • Fixes
    • Lower botocore requirement 235
    • Fixed decision_function calculation for FraudCost objective 254
    • Fixed return value of Recall metrics 264
    • Components return self on fit 289
  • Changes
    • Renamed automl classes to AutoRegressionSearch and AutoClassificationSearch 287
    • Updating demo datasets to retain column names 223
    • Moving pipeline visualization to PipelinePlot class 228
    • Standarizing inputs as pd.Dataframe / pd.Series 130
    • Enforcing that pipelines must have an estimator as last component 277
    • Added ipywidgets as a dependency in requirements.txt 278
    • Added Random and Grid Search Tuners 240
  • Documentation Changes
    • Adding class properties to API reference 244
    • Fix and filter FutureWarnings from scikit-learn 249, 257
    • Adding Linear Regression to API reference and cleaning up some Sphinx warnings 227
  • Testing Changes
    • Added support for testing on Windows with CircleCI 226
    • Added support for doctests 233

Warning

Breaking Changes

  • The fit() method for AutoClassifier and AutoRegressor has been renamed to search().
  • AutoClassifier has been renamed to AutoClassificationSearch
  • AutoRegressor has been renamed to AutoRegressionSearch
  • AutoClassificationSearch.results and AutoRegressionSearch.results now is a dictionary with pipeline_results and search_order keys. pipeline_results can be used to access a dictionary that is identical to the old .results dictionary. Whereas, search_order returns a list of the search order in terms of pipeline_id.
  • Pipelines now require an estimator as the last component in component_list. Slicing pipelines now throws an NotImplementedError to avoid returning pipelines without an estimator.
v0.5.2 Nov. 18, 2019
  • Enhancements
    • Adding basic pipeline structure visualization 211
  • Documentation Changes
    • Added notebooks to build process 212
v0.5.1 Nov. 15, 2019
  • Enhancements
    • Added basic outlier detection guardrail 151
    • Added basic ID column guardrail 135
    • Added support for unlimited pipelines with a max_time limit 70
    • Updated .readthedocs.yaml to successfully build 188
  • Fixes
    • Removed MSLE from default additional objectives 203
    • Fixed random_state passed in pipelines 204
    • Fixed slow down in RFRegressor 206
  • Changes
    • Pulled information for describe_pipeline from pipeline's new describe method 190
    • Refactored pipelines 108
    • Removed guardrails from Auto(*) 202, 208
  • Documentation Changes
    • Updated documentation to show max_time enhancements 189
    • Updated release instructions for RTD 193
    • Added notebooks to build process 212
    • Added contributing instructions 213
    • Added new content 222
v0.5.0 Oct. 29, 2019
  • Enhancements
    • Added basic one hot encoding 73
    • Use enums for model_type 110
    • Support for splitting regression datasets 112
    • Auto-infer multiclass classification 99
    • Added support for other units in max_time 125
    • Detect highly null columns 121
    • Added additional regression objectives 100
    • Show an interactive iteration vs. score plot when using fit() 134
  • Fixes
    • Reordered describe_pipeline 94
    • Added type check for model_type 109
    • Fixed s units when setting string max_time 132
    • Fix objectives not appearing in API documentation 150
  • Changes
    • Reorganized tests 93
    • Moved logging to its own module 119
    • Show progress bar history 111
    • Using cloudpickle instead of pickle to allow unloading of custom objectives 113
    • Removed render.py 154
  • Documentation Changes
    • Update release instructions 140
    • Include additional_objectives parameter 124
    • Added Changelog 136
  • Testing Changes
    • Code coverage 90
    • Added CircleCI tests for other Python versions 104
    • Added doc notebooks as tests 139
    • Test metadata for CircleCI and 2 core parallelism 137
v0.4.1 Sep. 16, 2019
  • Enhancements
    • Added AutoML for classification and regressor using Autobase and Skopt 7 9
    • Implemented standard classification and regression metrics 7
    • Added logistic regression, random forest, and XGBoost pipelines 7
    • Implemented support for custom objectives 15
    • Feature importance for pipelines 18
    • Serialization for pipelines 19
    • Allow fitting on objectives for optimal threshold 27
    • Added detect label leakage 31
    • Implemented callbacks 42
    • Allow for multiclass classification 21
    • Added support for additional objectives 79
  • Fixes
    • Fixed feature selection in pipelines 13
    • Made random_seed usage consistent 45
  • Documentation Changes
    • Documentation Changes
    • Added docstrings 6
    • Created notebooks for docs 6
    • Initialized readthedocs EvalML 6
    • Added favicon 38
  • Testing Changes
    • Added testing for loading data 39
v0.2.0 Aug. 13, 2019
  • Enhancements
    • Created fraud detection objective 4
v0.1.0 July. 31, 2019
  • First Release
  • Enhancements
    • Added lead scoring objecitve 1
    • Added basic classifier 1
  • Documentation Changes
    • Initialized Sphinx for docs 1