Skip to content

Releases: alteryx/evalml

v0.39.0

09 Dec 18:07
389d4f6
Compare
Choose a tag to compare

v0.39.0 Dec. 9, 2021

Enhancements

  • Renamed DelayedFeatureTransformer to TimeSeriesFeaturizer and enhanced it to compute rolling features #3028
  • Added ability to impute only specific columns in PerColumnImputer #3123
  • Added TimeSeriesParametersDataCheck to verify the time series parameters are valid given the number of splits in cross validation #3111

Fixes

  • Default parameters for RFRegressorSelectFromModel and RFClassifierSelectFromModel has been fixed to avoid selecting all features #3110

Changes

  • Removed reliance on a datetime index for ARIMARegressor and ProphetRegressor #3104
  • Included target leakage check when fitting ARIMARegressor to account for the lack of TimeSeriesFeaturizer in ARIMARegressor based pipelines #3104
  • Cleaned up and refactored InvalidTargetDataCheck implementation and docstring #3122
  • Removed indices information from the output of HighlyNullDataCheck's validate() method #3092
  • Added ReplaceNullableTypes component to prepare for handling pandas nullable types. #3090
  • Removed unused EnsembleMissingPipelinesError exception definition #3131

Documentation Changes

Testing Changes

  • Refactored tests to avoid using importorskip #3126
  • Added skip_during_conda test marker to skip tests that are not supposed to run during conda build #3127
  • Added skip_if_39 test marker to skip tests that are not supposed to run during python 3.9 #3133

Breaking Changes

  • Renamed DelayedFeatureTransformer to TimeSeriesFeaturizer #3028
  • ProphetRegressor now requires a datetime column in X represented by the date_index parameter #3104
  • Renamed module evalml.data_checks.invalid_target_data_check to evalml.data_checks.invalid_targets_data_check #3122
  • Removed unused EnsembleMissingPipelinesError exception definition #3131

v0.38.0

29 Nov 19:36
5de7049
Compare
Choose a tag to compare

v0.38.0 Nov. 29, 2021

Enhancements

  • Added data_check_name attribute to the data check action class #3034
  • Added NumWords and NumCharacters primitives to TextFeaturizer and renamed TextFeaturizer` to NaturalLanguageFeaturizer`` #3030
  • Added support for scikit-learn > 1.0.0 #3051
  • Required the date_index parameter to be specified for time series problems in AutoMLSearch #3041
  • Allowed time series pipelines to predict on test datasets whose length is less than or equal to the forecast_horizon. Also allowed the test set index to start at 0. #3071
  • Enabled time series pipeline to predict on data with features that are not known-in-advanced #3094

Fixes

  • Added in error message when fit and predict/predict_proba data types are different #3036
  • Fixed bug where ensembling components could not get converted to JSON format #3049
  • Fixed bug where components with tuned integer hyperparameters could not get converted to JSON format #3049
  • Included confusion matrix at the pipeline threshold for find_confusion_matrix_per_threshold #3080
  • Fixed bug where One Hot Encoder would error out if a non-categorical feature had a missing value #3083
  • Fixed bug where features created from categorical columns by Delayed Feature Transformer would be inferred as categorical #3083

Changes

  • Delete predict_uses_y estimator attribute #3069
  • Change DateTimeFeaturizer to use corresponding Featuretools primitives #3081
  • Updated TargetDistributionDataCheck to return metadata details as floats rather strings #3085
  • Removed dependency on psutil package #3093

Documentation Changes

  • Updated docs to use data check action methods rather than manually cleaning data #3050

Testing Changes

  • Updated integration tests to use make_pipeline_from_actions instead of private method #3047

Breaking Changes

  • Added data_check_name attribute to the data check action class #3034
  • Renamed TextFeaturizer` to NaturalLanguageFeaturizer`` #3030
  • Updated the Pipeline.graph_json function to return a dictionary of "from" and "to" edges instead of tuples #3049
  • Delete predict_uses_y estimator attribute #3069
  • Changed time series problems in AutoMLSearch to need a not-None date_index #3041
  • Changed the DelayedFeatureTransformer to throw a ValueError during fit if the date_index is None #3041
  • Passing X=None to DelayedFeatureTransformer is deprecated #3041

v0.37.0

10 Nov 17:30
25808fb
Compare
Choose a tag to compare

v0.37.0 Nov. 10, 2021

Enhancements

  • Added find_confusion_matrix_per_threshold to Model Understanding #2972
  • Limit computationally-intensive models during AutoMLSearch for certain multiclass problems, allow for opt-in with parameter allow_long_running_models #2982
  • Added support for stacked ensemble pipelines to prediction explanations module #2971
  • Added integration tests for data checks and data checks actions workflow #2883
  • Added a change in pipeline structure to handle categorical columns separately for pipelines in DefaultAlgorithm #2986
  • Added an algorithm to DelayedFeatureTransformer to select better lags #3005
  • Added AutoML function to access ensemble pipeline's input pipelines IDs #3011

Fixes

  • Fixed bug where Oversampler didn't consider boolean columns to be categorical #2980
  • Fixed permutation importance failing when target is categorical #3017
  • Updated estimator and pipelines' predict, predict_proba, transform, inverse_transform methods to preserve input indices #2979
  • Updated demo dataset link for daily min temperatures #3023

Changes

  • Updated OutliersDataCheck and UniquenessDataCheck and allow for the suspension of the Nullable types error #3018

Documentation Changes

  • Fixed cost benefit matrix demo formatting #2990
  • Update ReadMe.md with new badge links and updated installation instructions for conda #2998
  • Added more comprehensive doctests #3002

v0.36.0

27 Oct 22:14
59b6664
Compare
Choose a tag to compare

v0.36.0 Oct. 27, 2021

Enhancements

  • Added LIME as an algorithm option for explain_predictions and explain_predictions_best_worst #2905
  • Standardized data check messages and added default "rows" and "columns" to data check message details dictionary #2869
  • Added rows_of_interest to pipeline utils #2908
  • Added support for woodwork version 0.8.2 #2909
  • Enhanced the DateTimeFeaturizer to handle NaNs in date features #2909
  • Added support for woodwork logical types PostalCode, SubRegionCode, and CountryCode in model understanding tools #2946
  • Added Vowpal Wabbit regressor and classifiers #2846

Fixes

  • Fixed bug where partial dependence was not respecting the ww schema #2929
  • Fixed calculate_permutation_importance for datetimes on StandardScaler #2938
  • Fixed SelectColumns to only select available features for feature selection in DefaultAlgorithm #2944
  • Fixed DropColumns component not receiving parameters in DefaultAlgorithm #2945
  • Fixed bug where trained binary thresholds were not being returned by get_pipeline or clone #2948
  • Fixed bug where Oversampler selected ww logical categorical instead of ww semantic category #2946

Changes

  • Changed make_pipeline function to place the DateTimeFeaturizer prior to the Imputer so that NaN dates can be imputed #2909
  • Refactored OutliersDataCheck and HighlyNullDataCheck to add more descriptive metadata #2907

Documentation Changes

  • Added back Future Release section to release notes #2927
  • Updated CI to run doctest (docstring tests) and apply necessary fixes to docstrings #2933
  • Added documentation for BinaryClassificationPipeline thresholding #2937

Testing Changes

  • Fixed dependency checker to catch full names of packages #2930
  • Refactored build_conda_pkg to work from a local recipe #2925

Breaking Changes

  • Standardized data check messages and added default "rows" and "columns" to data check message details dictionary. This may change the number of messages returned from a data check. #2869

v0.35.0

15 Oct 02:32
c4475d9
Compare
Choose a tag to compare

v0.35.0 Oct. 14, 2021

Enhancements

  • Added human-readable pipeline explanations to model understanding #2861
  • Updated to support Featuretools 1.0.0 and nlp-primitives 2.0.0 #2848

Fixes

  • Fixed bug where long mode for the top level search method was not respected #2875
  • Pinned cmdstan to 0.28.0 in cmdstan-builder to prevent future breaking of support for Prophet #2880
  • Added Jarque-Bera to the TargetDistributionDataCheck #2891

Changes

  • Updated pipelines to use a label encoder component instead of doing encoding on the pipeline level #2821
  • Deleted scikit-learn ensembler #2819
  • Refactored pipeline building logic out of AutoMLSearch and into IterativeAlgorithm #2854
  • Refactored names for methods in ComponentGraph and PipelineBase #2902

Documentation Changes

  • Updated install.ipynb to reflect flexibility for cmdstan version installation #2880
  • Updated the conda section of our contributing guide #2899

Testing Changes

  • Updated test_all_estimators to account for Prophet being allowed for Python 3.9 #2892
  • Updated linux tests to use cmdstan-builder==0.0.8 #2880

Breaking Changes

  • Updated pipelines to use a label encoder component instead of doing encoding on the pipeline level. This means that pipelines will no longer automatically encode non-numerical targets. Please use a label encoder if working with classification problems and non-numeric targets. #2821
  • Deleted scikit-learn ensembler #2819
  • IterativeAlgorithm now requires X, y, problem_type as required arguments as well as sampler_name, allowed_model_families, allowed_component_graphs, max_batches, and verbose as optional arguments #2854
  • Changed method names of fit_features and compute_final_component_features to fit_and_transform_all_but_final and transform_all_but_final in ComponentGraph, and compute_estimator_features to transform_all_but_final in pipeline classes #2902

v0.34.1rc1

01 Oct 19:22
Compare
Choose a tag to compare
v0.34.1rc1 Pre-release
Pre-release

v0.34.1rc1 Oct. 1, 2021

Enhancements

  • Updated to support Featuretools 1.0.0 and nlp-primitives 2.0.0 #2848

v0.34.0

01 Oct 16:46
40ad6f5
Compare
Choose a tag to compare

v0.34.0 Oct. 1, 2021

Enhancements

  • Updated to work with Woodwork 0.8.1 #2783
  • Added validation that training_data and training_target are not None in prediction explanations #2787
  • Added support for training-only components in pipelines and component graphs #2776
  • Added default argument for the parameters value for ComponentGraph.instantiate #2796
  • Added TIME_SERIES_REGRESSION to LightGBMRegressor's supported problem types #2793
  • Added validation to holdout data passed to predict and predict_proba for time series #2804
  • Added information about which row indices are outliers in OutliersDataCheck #2818
  • Added verbose flag to top level search() method #2813
  • Added support for linting jupyter notebooks and clearing the executed cells and empty cells #2829 #2837
  • Added "DROP_ROWS" action to output of OutliersDataCheck.validate() #2820
  • Added the ability of AutoMLSearch to accept a SequentialEngine instance as engine input #2838
  • Added new label encoder component to EvalML #2853
  • Added our own partial dependence implementation #2834

Fixes

  • Fixed bug where calculate_permutation_importance was not calculating the right value for pipelines with target transformers #2782
  • Fixed bug where transformed target values were not used in fit for time series pipelines #2780
  • Fixed bug where score_pipelines method of AutoMLSearch would not work for time series problems #2786
  • Removed TargetTransformer class #2833
  • Added tests to verify ComponentGraph support by pipelines #2830
  • Fixed incorrect parameter for baseline regression pipeline in AutoMLSearch #2847

Changes

  • Changed woodwork initialization to use partial schemas #2774
  • Made Transformer.transform() an abstract method #2744
  • Deleted EmptyDataChecks class #2794
  • Removed data check for checking log distributions in make_pipeline #2806
  • Changed the minimum woodwork version to 0.8.0 #2783
  • Pinned woodwork version to 0.8.0 #2832
  • Removed model_family attribute from ComponentBase and transformers #2828
  • Limited scikit-learn until new features and errors can be addressed #2842
  • Show DeprecationWarning when Sklearn Ensemblers are called #2859

Testing Changes

  • Updated matched assertion message regarding monotonic indices in polynomial detrender tests #2811
  • Added a test to make sure pip versions match conda versions #2851

Breaking Changes

  • Made Transformer.transform() an abstract method #2744
  • Deleted EmptyDataChecks class #2794
  • Removed data check for checking log distributions in make_pipeline #2806

v0.33.0

15 Sep 20:25
ce3fc7a
Compare
Choose a tag to compare

v0.33.0 Sep. 15, 2021

Enhancements

Fixes

  • Fixed bug where warnings during make_pipeline were not being raised to the user #2765

Changes

  • Refactored and removed SamplerBase class #2775

Documentation Changes

  • Added docstring linting packages pydocstyle and darglint to make-lint command #2670

v0.32.1

10 Sep 21:03
ca2bd17
Compare
Choose a tag to compare

v0.32.1 Sep. 10, 2021

Enhancements

  • Added verbose flag to AutoMLSearch to run search in silent mode by default #2645
  • Added label encoder to XGBoostClassifier to remove the warning #2701
  • Set eval_metric to logloss for XGBoostClassifier #2741
  • Added support for woodwork versions 0.7.0 and 0.7.1 #2743
  • Changed explain_predictions functions to display original feature values #2759
  • Added X_train and y_train to graph_prediction_vs_actual_over_time and get_prediction_vs_actual_over_time_data #2762
  • Added forecast_horizon as a required parameter to time series pipelines and AutoMLSearch #2697
  • Added predict_in_sample and predict_proba_in_sample methods to time series pipelines to predict on data where the target is known, e.g. cross-validation #2697

Fixes

  • Fixed bug where _catch_warnings assumed all warnings were PipelineNotUsed #2753
  • Fixed bug where Imputer.transform would erase ww typing information prior to handing data to the SimpleImputer #2752
  • Fixed bug where Oversampler could not be copied #2755

Changes

  • Deleted drop_nan_target_rows utility method #2737
  • Removed default logging setup and debugging log file #2645
  • Changed the default n_jobs value for XGBoostClassifier and XGBoostRegressor to 12 #2757
  • Changed TimeSeriesBaselineEstimator to only work on a time series pipeline with a DelayedFeaturesTransformer #2697
  • Added X_train and y_train as optional parameters to pipeline predict, predict_proba. Only used for time series pipelines #2697
  • Added training_data and training_target as optional parameters to explain_predictions and explain_predictions_best_worst to support time series pipelines #2697
  • Changed time series pipeline predictions to no longer output series/dataframes padded with NaNs. A prediction will be returned for every row in the X input #2697

Documentation Changes

  • Specified installation steps for Prophet #2713
  • Added documentation for data exploration on data check actions #2696
  • Added a user guide entry for time series modelling #2697

Testing Changes

  • Fixed flaky TargetDistributionDataCheck test for very_lognormal distribution #2748

Breaking Changes

  • Removed default logging setup and debugging log file #2645
  • Added X_train and y_train to graph_prediction_vs_actual_over_time and get_prediction_vs_actual_over_time_data #2762
  • Added forecast_horizon as a required parameter to time series pipelines and AutoMLSearch #2697
  • Changed TimeSeriesBaselineEstimator to only work on a time series pipeline with a DelayedFeaturesTransformer #2697
  • Added X_train and y_train as required parameters for predict and predict_proba in time series pipelines #2697
  • Added training_data and training_target as required parameters to explain_predictions and explain_predictions_best_worst for time series pipelines #2697

v0.32.0

02 Sep 00:42
9352922
Compare
Choose a tag to compare

v0.32.0 Sep. 1, 2021

Enhancements

  • Allow string for engine parameter for AutoMLSearch#2667
  • Add ProphetRegressor to AutoML #2619
  • Integrated DefaultAlgorithm into AutoMLSearch #2634
  • Removed SVM "linear" and "precomputed" kernel hyperparameter options, and improved default parameters #2651
  • Updated ComponentGraph initalization to raise ValueError when user attempts to use .y for a component that does not produce a tuple output #2662
  • Updated to support Woodwork 0.6.0 #2690
  • Updated pipeline graph() to distingush X and y edges #2654
  • Added DropRowsTransformer component #2692
  • Added DROP_ROWS to _make_component_list_from_actions and clean up metadata #2694

Fixes

  • Updated Oversampler logic to select best SMOTE based on component input instead of pipeline input #2695
  • Added ability to explicitly close DaskEngine resources to improve runtime and reduce Dask warnings #2667
  • Fixed partial dependence bug for ensemble pipelines #2714
  • Updated TargetLeakageDataCheck to maintain user-selected logical types #2711

Changes

  • Replaced SMOTEOversampler, SMOTENOversampler and SMOTENCOversampler with consolidated Oversampler component #2695
  • Removed LinearRegressor from the list of default AutoMLSearch estimators due to poor performance #2660

Documentation Changes

  • Updated documentation to make parallelization of AutoML clearer #2667

Testing Changes

  • Removes the process-level parallelism from the test_cancel_job test #2666
  • Installed numba 0.53 in windows CI to prevent problems installing version 0.54 #2710

Breaking Changes

  • Renamed the current top level search method to search_iterative and defined a new search method for the DefaultAlgorithm #2634
  • Replaced SMOTEOversampler, SMOTENOversampler and SMOTENCOversampler with consolidated Oversampler component #2695
  • Removed LinearRegressor from the list of default AutoMLSearch estimators due to poor performance #2660