Skip to content

Releases: alteryx/evalml

v0.12.0.dev1

06 Aug 13:42
Compare
Choose a tag to compare
v0.12.0.dev1 Pre-release
Pre-release

Publishing a new package to TestPyPi that has unit tests included.

v0.12.0

03 Aug 18:58
21eaa34
Compare
Choose a tag to compare

v0.12.0 Aug. 3, 2020

Enhancements

  • Added string and categorical targets support for binary and multiclass pipelines and check for numeric targets for DetectLabelLeakage data check #932
  • Added clear exception for regression pipelines if target datatype is string or categorical #960
  • Added target column names and class labels in predict and predict_proba output for pipelines #951
  • Added _compute_shap_values and normalize_values to pipelines/explanations module #958
  • Added explain_prediction feature which explains single predictions with SHAP #974
  • Added Imputer to allow different imputation strategies for numerical and categorical dtypes #991
  • Added support for configuring logfile path using env var, and don't create logger if there are filesystem errors #975
  • Updated catboost estimators' default parameters and automl hyperparameter ranges to speed up fit time #998

Fixes

  • Fixed ReadtheDocs warning failure regarding embedded gif #943
  • Removed incorrect parameter passed to pipeline classes in _add_baseline_pipelines #941
  • Added universal error for calling predict, predict_proba, transform, and feature_importances before fitting #969, #994
  • Made TextFeaturizer component and pip dependencies featuretools and nlp_primitives optional #976
  • Updated imputation strategy in automl to no longer limit impute strategy to most_frequent for all features if there are any categorical columns #991
  • Fixed UnboundLocalError forcv_pipeline when automl search errors #996
  • Fixed Imputer to reset dataframe index to preserve behavior expected from SimpleImputer #1009

Changes

  • Moved get_estimators to evalml.pipelines.components.utils #934
  • Modified Pipelines to raise PipelineScoreError when they encounter an error during scoring #936
  • Moved evalml.model_families.list_model_families to evalml.pipelines.components.allowed_model_families #959
  • Renamed DateTimeFeaturization to DateTimeFeaturizer #977

Documentation Changes

  • Update README.md #963
  • Reworded message when errors are returned from data checks in search #982
  • Added section on understanding model predictions with explain_prediction to User Guide #981
  • Added a section to the user guide and api reference about how XGBoost and CatBoost are not fully supported. #992
  • Added custom components section in user guide #993
  • Update FAQ section formatting #997
  • Update release process documentation #1003

Testing Changes

  • Moved predict_proba and predict tests regarding string / categorical targets to test_pipelines.py #972
  • Fix dependency update bot by updating python version to 3.7 to avoid frequent github version updates #1002

Breaking Changes

  • get_estimators has been moved to evalml.pipelines.components.utils (previously was under evalml.pipelines.utils) #934
  • Removed the raise_errors flag in AutoML search. All errors during pipeline evaluation will be caught and logged. #936
  • evalml.model_families.list_model_families has been moved to evalml.pipelines.components.allowed_model_families #959
  • TextFeaturizer: the featuretools and nlp_primitives packages must be installed after installing evalml in order to use this component #976
  • Renamed DateTimeFeaturization to DateTimeFeaturizer #977

v0.11.2

16 Jul 20:50
7771cf2
Compare
Choose a tag to compare

v0.11.2 July 16, 2020

Enhancements

  • Added NoVarianceDataCheck to DefaultDataChecks #893
  • Added text processing and featurization component TextFeaturizer #913, #924
  • Added additional checks to InvalidTargetDataCheck to handle invalid target data types #929

Fixes

  • Makes automl results a read-only property #919

Changes

  • Deleted static pipelines and refactored tests involving static pipelines, removed all_pipelines() and get_pipelines() #904
  • Moved list_model_families to evalml.model_family.utils #903
  • Updated all_pipelines, all_estimators, all_components to use the same mechanism for dynamically generating their elements #898
  • Rename master branch to main #918
  • Add pypi release github action #923
  • Updated AutoMLSearch.search stdout output and logging and removed tqdm progress bar #921
  • Moved automl config checks previously in search() to init #933

Documentation Changes

  • Reorganized and rewrote documentation #937
  • Updated to use pydata sphinx theme #937

Testing Changes

  • Cleaned up fixture names and usages in tests #895

Breaking Changes

  • list_model_families has been moved to evalml.model_family.utils (previously was under evalml.pipelines.utils) #903
  • Static pipeline definitions have been removed, but similar pipelines can still be constructed via creating an instance of PipelineBase #904
  • all_pipelines() and get_pipelines() utility methods have been removed #904

v0.11.dev1 July 10, 2020

10 Jul 22:25
Compare
Choose a tag to compare

A development release to check pypi github action deployment to test.pypi.org.

v0.11.0

30 Jun 19:46
f30a457
Compare
Choose a tag to compare

v0.11.0 June 30, 2020

Enhancements

  • Added multiclass support for ROC curve graphing #832
  • Added preprocessing component to drop features whose percentage of NaN values exceeds a specified threshold #834
  • Added data check to check for problematic target labels #814
  • Added PerColumnImputer that allows imputation strategies per column #824
  • Added transformer to drop specific columns #827
  • Added support for categories, handle_error, and drop parameters in OneHotEncoder #830 #897
  • Added preprocessing component to handle DateTime columns featurization #838
  • Added ability to clone pipelines and components #842
  • Define getter method for component parameters #847
  • Added utility methods to calculate and graph permutation importances #860, #880
  • Added new utility functions necessary for generating dynamic preprocessing pipelines #852
  • Added kwargs to all components #863
  • Updated AutoSearchBase to use dynamically generated preprocessing pipelines #870
  • Added SelectColumns transformer #873
  • Added ability to evaluate additional pipelines for automl search #874
  • Added default_parameters class property to components and pipelines #879
  • Added better support for disabling data checks in automl search #892
  • Added ability to save and load AutoML objects to file #888
  • Updated AutoSearchBase.get_pipelines to return an untrained pipeline instance #876
  • Saved learned binary classification thresholds in automl results cv data dict #876

Fixes

  • Fixed bug where SimpleImputer cannot handle dropped columns #846
  • Fixed bug where PerColumnImputer cannot handle dropped columns #855
  • Enforce requirement that builtin components save all inputted values in their parameters dict #847
  • Don't list base classes in all_components output #847
  • Standardize all components to output pandas data structures, and accept either pandas or numpy #853
  • Fixed rankings and full_rankings error when search has not been run #894

Changes

  • Update all_pipelines and all_components to try initializing pipelines/components, and on failure exclude them #849
  • Refactor handle_components to handle_components_class, standardize to ComponentBase subclass instead of instance #850
  • Refactor "blacklist"/"whitelist" to "allow"/"exclude" lists #854
  • Replaced AutoClassificationSearch and AutoRegressionSearch with AutoMLSearch #871
  • Renamed feature_importances and permutation_importances methods to use singular names (feature_importance and permutation_importance) #883
  • Updated automl default data splitter to train/validation split for large datasets #877
  • Added open source license, update some repo metadata #887

Documentation Changes

  • Fix some typos and update the EvalML logo #872

Testing Changes

  • Update the changelog check job to expect the new branching pattern for the deps update bot #836
  • Check that all components output pandas datastructures, and can accept either pandas or numpy #853
  • Replaced AutoClassificationSearch and AutoRegressionSearch with AutoMLSearch #871

Breaking Changes

  • Pipelines' static component_graph field must contain either ComponentBase subclasses or str, instead of ComponentBase subclass instances #850
  • Rename handle_component to handle_component_class. Now standardizes to ComponentBase subclasses instead of ComponentBase subclass instances #850
  • Renamed automl's cv argument to data_split #877
  • Pipelines' and classifiers' feature_importances is renamed feature_importance, graph_feature_importances is renamed graph_feature_importance #883
  • Passing data_checks=None to automl search will not perform any data checks as opposed to default checks. #892
  • Pipelines to search for in AutoML are now determined automatically, rather than using the statically - defined pipeline classes. #870
  • Updated AutoSearchBase.get_pipelines to return an untrained pipeline instance, instead of one which happened to be trained on the final cross - validation fold #876

v0.10.0

29 May 20:44
Compare
Choose a tag to compare

v0.10.0 May 29, 2020

Enhancements

  • Added baseline models for classification and regression, add functionality to calculate baseline models before searching in AutoML #746
  • Port over highly-null guardrail as a data check and define DefaultDataChecks and DisableDataChecks classes #745
  • Update Tuner classes to work directly with pipeline parameters dicts instead of flat parameter lists #779
  • Add Elastic Net as a pipeline option #812
  • Added new Pipeline option ExtraTrees #790
  • Added precicion-recall curve metrics and plot for binary classification problems in evalml.pipeline.graph_utils #794

Fixes

  • Update pipeline score to return nan score for any objective which throws an exception during scoring #787
  • Fixed bug introduced in #787 where binary classification metrics requiring predicted probabilities error in scoring #798
  • CatBoost and XGBoost classifiers and regressors can no longer have a learning rate of 0 #795

Changes

  • Cleanup pipeline score code, and cleanup codecov #711
  • Remove pass for abstract methods for codecov #730
  • Added str for AutoSearch object #675
  • Add util methods to graph ROC and confusion matrix #720
  • Refactor AutoBase to AutoSearchBase #758
  • Updated AutoBase with data_checks parameter, removed previous detect_label_leakage parameter, and added functionality to run data checks before search in AutoML #765
  • Updated our logger to use Python's logging utils #763
  • Refactor most of AutoSearchBase._do_iteration impl into AutoSearchBase._evaluate #762
  • Port over all guardrails to use the new DataCheck API #789
  • Expanded import_or_raise to catch all exceptions #759
  • Adds RMSE, MSLE, RMSLE as standard metrics #788
  • Don't allow Recall to be used as an objective for AutoML #784
  • Removed feature selection from pipelines #819

Documentation Changes

  • Add instructions to freeze master on release.md #726
  • Update release instructions with more details #727 #733
  • Add objective base classes to API reference #736
  • Fix components API to match other modules #747

Testing Changes

  • Delete codecov yml, use codecov.io's default #732
  • Added unit tests for fraud cost, lead scoring, and standard metric objectives #741
  • Update codecov client #782
  • Updated AutoBase str test to include no parameters case #783
  • Added unit tests for ExtraTrees pipeline #790
  • If codecov fails to upload, fail build #810
  • Updated Python version of dependency action #816
  • Update the dependency update bot to use a suffix when creating branches #817

Breaking Changes

  • The detect_label_leakage parameter for AutoML classes has been removed and replaced by a data_checks parameter #765
  • Moved ROC and confusion matrix methods from evalml.pipeline.plot_utils to evalml.pipeline.graph_utils #720
  • Tuner classes require a pipeline hyperparameter range dict as an init arg instead of a space definition #779
  • Tuner.propose and Tuner.add work directly with pipeline parameters dicts instead of flat parameter lists #779
  • PipelineBase.hyperparameters and custom_hyperparameters use pipeline parameters dict format instead of being represented as a flat list #779
  • All guardrail functions previously under evalml.guardrails.utils will be removed and replaced by data checks #789
  • Recall disallowed as an objective for AutoML #784

v0.9.0

27 Apr 21:01
747df6a
Compare
Choose a tag to compare

v0.9.0 Apr. 27, 2020

Enhancements

  • Added accuracy as an standard objective :pr:624
  • Added verbose parameter to load_fraud :pr:560
  • Added Balanced Accuracy metric for binary, multiclass :pr:612 :pr:661
  • Added XGBoost regressor and XGBoost regression pipeline :pr:666
  • Added Accuracy metric for multiclass :pr:672
  • Added objective name in AutoBase.describe_pipeline :pr:686

Fixes

  • Removed direct access to cls.component_graph :pr:595
  • Add testing files to .gitignore :pr:625
  • Remove circular dependencies from Makefile :pr:637
  • Add error case for normalize_confusion_matrix() :pr:640
  • Fixed XGBoostClassifier and XGBoostRegressor bug with feature names that contain [, ], or < :pr:659
  • Update make_pipeline_graph to not accidentally create empty file when testing if path is valid :pr:649
  • Fix pip installation warning about docsutils version, from boto dependency :pr:664
  • Removed zero division warning for F1/precision/recall metrics :pr:671
  • Fixed summary for pipelines without estimators :pr:707

Changes

  • Updated default objective for binary/multiseries classification to log loss :pr:613
  • Created classification and regression pipeline subclasses and removed objective as an attribute of pipeline classes :pr:405
  • Changed the output of score to return one dictionary :pr:429
  • Created binary and multiclass objective subclasses :pr:504
  • Updated objectives API :pr:445
  • Removed call to get_plot_data from AutoML :pr:615
  • Set raise_error to default to True for AutoML classes :pr:638
  • Remove unnecessary "u" prefixes on some unicode strings :pr:641
  • Changed one-hot encoder to return uint8 dtypes instead of ints :pr:653
  • Pipeline _name field changed to custom_name :pr:650
  • Removed graphs.py and moved methods into PipelineBase :pr:657, :pr:665
  • Remove s3fs as a dev dependency :pr:664
  • Changed requirements-parser to be a core dependency :pr:673
  • Replace supported_problem_types field on pipelines with problem_type attribute on base classes :pr:678
  • Changed AutoML to only show best results for a given pipeline template in rankings, added full_rankings property to show all :pr:682
  • Update ModelFamily values: don't list xgboost/catboost as classifiers now that we have regression pipelines for them :pr:677
  • Changed AutoML's describe_pipeline to get problem type from pipeline instead :pr:685
  • Standardize import_or_raise error messages :pr:683
  • Updated argument order of objectives to align with sklearn's :pr:698
  • Renamed pipeline.feature_importance_graph to pipeline.graph_feature_importances :pr:700
  • Moved ROC and confusion matrix methods to evalml.pipelines.plot_utils :pr:704
  • Renamed MultiClassificationObjective to MulticlassClassificationObjective, to align with pipeline naming scheme :pr:715

Documentation Changes

  • Fixed some sphinx warnings :pr:593
  • Fixed docstring for AutoClassificationSearch with correct command :pr:599
  • Limit readthedocs formats to pdf, not htmlzip and epub :pr:594 :pr:600
  • Clean up objectives API documentation :pr:605
  • Fixed function on Exploring search results page :pr:604
  • Update release process doc :pr:567
  • AutoClassificationSearch and AutoRegressionSearch show inherited methods in API reference :pr:651
  • Fixed improperly formatted code in breaking changes for changelog :pr:655
  • Added configuration to treat Sphinx warnings as errors :pr:660
  • Removed separate plotting section for pipelines in API reference :pr:657, :pr:665
  • Have leads example notebook load S3 files using https, so we can delete s3fs dev dependency :pr:664
  • Categorized components in API reference and added descriptions for each category :pr:663
  • Fixed Sphinx warnings about BalancedAccuracy objective :pr:669
  • Updated API reference to include missing components and clean up pipeline docstrings :pr:689
  • Reorganize API ref, and clarify pipeline sub-titles :pr:688
  • Add and update preprocessing utils in API reference :pr:687
  • Added inheritance diagrams to API reference :pr:695
  • Documented which default objective AutoML optimizes for :pr:699
  • Create seperate install page :pr:701
  • Include more utils in API ref, like import_or_raise :pr:704
  • Add more color to pipeline documentation :pr:705

Testing Changes

  • Matched install commands of check_latest_dependencies test and it's GitHub action :pr:578
  • Added Github app to auto assign PR author as assignee :pr:477
  • Removed unneeded conda installation of xgboost in windows checkin tests :pr:618
  • Update graph tests to always use tmpfile dir :pr:649
  • Changelog checkin test workaround for release PRs: If 'future release' section is empty of PR refs, pass check :pr:658

Breaking Changes

  • Pipelines will now no longer take an objective parameter during instantiation, and will no longer have an objective attribute.
  • fit() and predict() now use an optional objective parameter, which is only used in binary classification pipelines to fit for a specific objective.
  • score() will now use a required objectives parameter that is used to determine all the objectives to score on. This differs from the previous behavior, where the pipeline's objective was scored on regardless.
  • score() will now return one dictionary of all objective scores.
  • ROC and ConfusionMatrix plot methods via Auto(*).plot have been removed by :pr:615 and are replaced by roc_curve and confusion_matrix in evamlm.pipelines.plot_utils in :pr:704
  • normalize_confusion_matrix has been moved to evalml.pipelines.plot_utils :pr:704
  • Pipelines _name field changed to custom_name
  • Pipelines supported_problem_types field is removed because it is no longer necessary :pr:678
  • Updated argument order of objectives' objective_function to align with sklearn :pr:698
  • pipeline.feature_importance_graph has been renamed to pipeline.graph_feature_importances in :pr:700
  • Removed unsupported MSLE objective :pr:704

v0.8.0

02 Apr 17:10
894f584
Compare
Choose a tag to compare

v0.8.0 Apr. 1, 2020

Enhancements

  • Add normalization option and information to confusion matrix #484
  • Add util function to drop rows with NaN values #487
  • Renamed PipelineBase.name as PipelineBase.summary and redefined PipelineBase.name as class property #491
  • Added access to parameters in Pipelines with PipelineBase.parameters (used to be return of PipelineBase.describe) #501
  • Added fill_value parameter for SimpleImputer #509
  • Added functionality to override component hyperparemeters and made pipelines take hyperparemeters from components #516
  • Allow numpy.random.RandomState for random_state parameters #556

Fixes

Changes

  • Undo version cap in XGBoost placed in #402 and allowed all released of XGBoost #407
  • Support pandas 1.0.0 #486
  • Made all references to the logger static #503
  • Refactored model_type parameter for components and pipelines to model_family #507
  • Refactored problem_types for pipelines and components into supported_problem_types #515
  • Moved pipelines/utils.save_pipeline and pipelines/utils.load_pipeline to PipelineBase.save and PipelineBase.load #526
  • Limit number of categories encoded by OneHotEncoder #517
    Documentation Changes
  • Updated API reference to remove PipelinePlot and added moved PipelineBase plotting methods #483
  • Add code style and github issue guides #463, #512
  • Updated API reference for to surface class variables for pipelines and components #537
    Testing Changes
  • Added automated dependency check PR #482, #505
  • Updated automated dependency check comment #497
  • Have build_docs job use python executor, so that env vars are set properly #547
  • Run windows unit tests on PRs #557

Breaking Changes

  • AutoClassificationSearch and AutoRegressionSearch's model_types parameter has been refactored into allowed_model_families
  • ModelTypes enum has been changed to ModelFamily
  • Components and Pipelines now have a model_family field instead of model_type
  • get_pipelines utility function now accepts model_families as an argument instead of model_types
  • PipelineBase.name no longer returns structure of pipeline and has been replaced by PipelineBase.summary
  • PipelineBase.problem_types and Estimator.problem_types has been renamed to supported_problem_types
  • pipelines/utils.save_pipeline and pipelines/utils.load_pipeline moved to PipelineBase.save and PipelineBase.load

v0.7.0

10 Mar 00:36
0e9acab
Compare
Choose a tag to compare

v0.7.0 Mar. 9, 2020

Enhancements

  • Added emacs buffers to .gitignore #350
  • Add CatBoost (gradient-boosted trees) classification and regression components and pipelines #247
  • Added Tuner abstract base class #351
  • Added n_jobs as parameter for AutoClassificationSearch and AutoRegressionSearch #403
  • Changed colors of confusion matrix to shades of blue and updated axis order to match scikit-learn's #426
  • Added PipelineBase graph and feature_importance_graph methods, moved from previous location #423
  • Added support for python 3.8 #462

Fixes

  • Fixed ROC and confusion matrix plots not being calculated if user passed own additional_objectives #276
  • Fixed ReadtheDocs FileNotFoundError exception for fraud dataset #439

Changes

  • Added n_estimators as a tunable parameter for XGBoost #307
  • Remove unused parameter ObjectiveBase.fit_needs_proba #320
  • Remove extraneous parameter component_type from all components #361
  • Remove unused rankings.csv file #397
  • Downloaded demo and test datasets so unit tests can run offline #408
  • Remove _needs_fitting attribute from Components #398
  • Changed plot.feature_importance to show only non-zero feature importances by default, added optional parameter to show all #413
  • Dropped support for Python 3.5 #438
  • Removed unused apply.py file #449
  • Clean up requirements.txt to remove unused deps #451

Documentation Changes

  • Update release.md with instructions to release to internal license key #354

Testing Changes

  • Added tests for utils (and moved current utils to gen_utils) #297
  • Moved XGBoost install into it's own separate step on Windows using Conda #313
  • Rewind pandas version to before 1.0.0, to diagnose test failures for that version #325
  • Added dependency update checkin test #324
  • Rewind XGBoost version to before 1.0.0 to diagnose test failures for that version #402
  • Update dependency check to use a whitelist #417
  • Update unit test jobs to not install dev deps #455

Breaking Changes

  • Python 3.5 will not be actively supported.

v0.6.0

17 Dec 21:32
6970bfc
Compare
Choose a tag to compare

v0.6.0 (Dec. 16, 2019)

Enhancements

  • Added ability to create a plot of feature importances #133
  • Add early stopping to AutoML using patience and tolerance parameters #241
  • Added ROC and confusion matrix metrics and plot for classification problems and introduce PipelineSearchPlots class #242
  • Enhanced AutoML results with search order #260

Fixes

  • Lower botocore requirement #235
  • Fixed decision_function calculation for FraudCost objective #254
  • Fixed return value of Recall metrics #264

Changes

  • Renamed automl classes to AutoRegressionSearch and AutoClassificationSearch #287
  • Updating demo datasets to retain column names #223
  • Moving pipeline visualization to PipelinePlots class #228
  • Standarizing inputs as pd.Dataframe / pd.Series #130
  • Enforcing that pipelines must have an estimator as last component #277
  • Added ipywidgets as a dependency in requirements.txt #278

Documentation Changes

  • Adding class properties to API reference #244
  • Fix and filter FutureWarnings from scikit-learn #249, #257
  • Adding Linear Regression to API reference and cleaning up some Sphinx warnings #227

Testing Changes

  • Added support for testing on Windows with CircleCI #226
  • Added support for doctests #233

Breaking Changes

  • The fit() method for AutoClassifier and AutoRegressor has been renamed to search().
  • AutoClassifier has been renamed to AutoClassificationSearch
  • AutoRegressor has been renamed to AutoRegressionSearch
  • AutoClassificationSearch.results and AutoRegressionSearch.results now is a dictionary with pipeline_results and search_order keys. pipeline_results can be used to access a dictionary that is identical to the old .results dictionary. Whereas,search_order returns a list of the search order in terms of pipeline id.
  • Pipelines now require an estimator as the last component in component_list. Slicing pipelines now throws an NotImplementedError to avoid returning Pipelines without an estimator.