Skip to content

Releases: alteryx/evalml

v0.16.1

01 Dec 23:26
5a8d844
Compare
Choose a tag to compare

v0.16.1 Dec. 1, 2020

Enhancements

  • Pin woodwork version to v0.0.6 to avoid breaking changes #1484

Fixes

  • Updated Woodwork to >=0.0.5 in core-requirements.txt #1473
  • Removed copy_dataframe parameter for Woodwork, updated Woodwork to >=0.0.6 in core-requirements.txt #1478
  • Updated detect_problem_type to use pandas.api.is_numeric_dtype #1476

Changes

  • Changed make clean to delete coverage reports as a convenience for developers #1464

Documentation Changes

Testing Changes

  • Update dependency update checker to use everything from core and optional dependencies #1480

v0.16.0

24 Nov 22:12
b8b594f
Compare
Choose a tag to compare

v0.16.0 Nov. 24, 2020

Enhancements

  • Updated pipelines and make_pipeline to accept Woodwork inputs #1393
  • Updated components to accept Woodwork inputs #1423
  • Added ability to freeze hyperparameters for AutoMLSearch #1284
  • Added Target Encoder into transformer components #1401
  • Added callback for error handling in AutoMLSearch #1403
  • Added the index id to the explain_predictions_best_worst output to help users identify which rows in their data are included #1365
  • The top_k features displayed in explain_predictions_* functions are now determined by the magnitude of shap values as opposed to the top_k largest and smallest shap values. #1374
  • Added a problem type for time series regression #1386
  • Added a is_defined_for_problem_type method to ObjectiveBase #1386
  • Added a random_state parameter to make_pipeline_from_components function #1411
  • Added DelayedFeaturesTransformer #1396
  • Added a TimeSeriesRegressionPipeline class #1418
  • Removed core-requirements.txt from the package distribution #1429
  • Updated data check messages to include a "code" and "details" fields #1451 #1462
  • Added a TimeSeriesSplit data splitter for time series problems #1441
  • Added a problem_configuration parameter to AutoMLSearch #1457

Fixes

  • Fixed IndexError raised in AutoMLSearch when ensembling = True but only one pipeline to iterate over #1397
  • Fixed stacked ensemble input bug and LightGBM warning and bug in AutoMLSearch #1388
  • Updated enum classes to show possible enum values as attributes #1391
  • Updated calls to Woodwork's to_pandas() to to_series() and to_dataframe() #1428
  • Fixed bug in OHE where column names were not guaranteed to be unique #1349
  • Fixed bug with percent improvement of ExpVariance objective on data with highly skewed target #1467

Changes

  • Changed OutliersDataCheck to return the list of columns, rather than rows, that contain outliers #1377
  • Simplified and cleaned output for Code Generation #1371
  • Updated data checks to return dictionary of warnings and errors instead of a list #1448
  • Updated AutoMLSearch to pass Woodwork data structures to every pipeline (instead of pandas DataFrames) #1450
  • Update AutoMLSearch to default to max_batches=1 instead of max_iterations=5 #1452

Documentation Changes

  • Added description of CLA to contributing guide, updated description of draft PRs #1402
  • Updated documentation to include all data checks, DataChecks, and usage of data checks in AutoML #1412
  • Updated docstrings from np.array to np.ndarray #1417
  • Added section on stacking ensembles in AutoMLSearch documentation #1425

Testing Changes

  • Removed category_encoders from test-requirements.txt #1373
  • Tweak codecov.io settings again to avoid flakes #1413
  • Modified make lint to check notebook versions in the docs #1431
  • Modified make lint-fix to standardize notebook versions in the docs #1431
  • Use new version of pull request Github Action for dependency check #1443
  • Reduced number of workers for tests to 4 #1447

Breaking Changes

  • The top_k and top_k_features parameters in explain_predictions_* functions now return k features as opposed to 2 * k features #1374
  • Renamed problem_type to problem_types in RegressionObjective, BinaryClassificationObjective, and MulticlassClassificationObjective #1319
  • Data checks now return a dictionary of warnings and errors instead of a list #1448
  • 🦃 🚀

v0.15.0

29 Oct 22:59
1ec2ee4
Compare
Choose a tag to compare

v0.15.0 Oct. 29, 2020

Enhancements

  • Added stacked ensemble component classes (StackedEnsembleClassifier, StackedEnsembleRegressor) #1134
  • Added stacked ensemble components to AutoMLSearch #1253
  • Added DecisionTreeClassifier and DecisionTreeRegressor to AutoML #1255
  • Added graph_prediction_vs_actual in model_understanding for regression problems #1252
  • Added parameter to OneHotEncoder to enable filtering for features to encode for #1249
  • Added percent-better-than-baseline for all objectives to automl.results #1244
  • Added HighVarianceCVDataCheck and replaced synonymous warning in AutoMLSearch #1254
  • Added PCA Transformer component for dimensionality reduction #1270
  • Added generate_pipeline_code and generate_component_code to allow for code generation given a pipeline or component instance #1306
  • Added PCA Transformer component for dimensionality reduction #1270
  • Updated AutoMLSearch to support Woodwork data structures #1299
  • Added cv_folds to ClassImbalanceDataCheck and added this check to DefaultDataChecks #1333
  • Make max_batches argument to AutoMLSearch.search public #1320
  • Added text support to automl search #1062
  • Added _pipelines_per_batch as a private argument to AutoMLSearch #1355

Fixes

  • Fixed ML performance issue with ordered datasets: always shuffle data in automl's default CV splits #1265
  • Fixed broken evalml info CLI command #1293
  • Fixed boosting type='rf' for LightGBM Classifier, as well as num_leaves error #1302
  • Fixed bug in explain_predictions_best_worst where a custom index in the target variable would cause a ValueError #1318
  • Added stacked ensemble estimators to to evalml.pipelines.__init__ file #1326
  • Fixed bug in OHE where calls to transform were not deterministic if top_n was less than the number of categories in a column #1324
  • Fixed LightGBM warning messages during AutoMLSearch #1342
  • Fix warnings thrown during AutoMLSearch in HighVarianceCVDataCheck #1346
  • Fixed bug where TrainingValidationSplit would return invalid location indices for dataframes with a custom index #1348
  • Fixed bug where the AutoMLSearch random_state was not being passed to the created pipelines #1321

Changes

  • Allow add_to_rankings to be called before AutoMLSearch is called #1250
  • Removed Graphviz from test-requirements to add to requirements.txt #1327
  • Removed max_pipelines parameter from AutoMLSearch #1264
  • Include editable installs in all install make targets #1335
  • Made pip dependencies featuretools and nlp_primitives core dependencies #1062
  • Removed PartOfSpeechCount from TextFeaturizer transform primitives #1062
  • Added warning for partial_dependency when the feature includes null values #1352

Documentation Changes

  • Fixed and updated code blocks in Release Notes #1243
  • Added DecisionTree estimators to API Reference #1246
  • Changed class inheritance display to flow vertically #1248
  • Updated cost-benefit tutorial to use a holdout/test set #1159
  • Added evalml info command to documentation #1293
  • Miscellaneous doc updates #1269
  • Removed conda pre-release testing from the release process document #1282
  • Updates to contributing guide #1310
  • Added Alteryx footer to docs with Twitter and Github link #1312
  • Added documentation for evalml installation for Python 3.6 #1322
  • Added documentation changes to make the API Docs easier to understand #1323
  • Fixed documentation for feature_importance #1353
  • Added tutorial for running AutoML with text data #1357
  • Added documentation for woodwork integration with automl search #1361

Testing Changes

  • Added tests for jupyter_check to handle IPython #1256
  • Cleaned up make_pipeline tests to test for all estimators #1257
  • Added a test to check conda build after merge to main #1247
  • Removed code that was lacking codecov for __main__.py and unnecessary #1293
  • Codecov: round coverage up instead of down #1334
  • Add DockerHub credentials to CI testing environment #1356
  • Add DockerHub credentials to conda testing environment #1363

Breaking Changes

  • Renamed LabelLeakageDataCheck to TargetLeakageDataCheck #1319
  • max_pipelines parameter has been removed from AutoMLSearch. Please use max_iterations instead. #1264
  • AutoMLSearch.search() will now log a warning if the input is not a Woodwork data structure (pandas, numpy) #1299
  • Make max_batches argument to AutoMLSearch.search public #1320
  • Removed unused argument feature_types from AutoMLSearch.search #1062

v0.14.1

29 Sep 20:44
7dcf640
Compare
Choose a tag to compare

v0.14.1 Sep. 29, 2020

Enhancements

  • Updated partial dependence methods to support calculating numeric columns in a dataset with non-numeric columns #1150
  • Added get_feature_names on OneHotEncoder #1193
  • Added detect_problem_type to problem_type/utils.py to automatically detect the problem type given targets #1194
  • Added LightGBM to AutoMLSearch #1199
  • Updates scikit-learn and scikit-optimize to use latest versions - 0.23.2 and 0.8.1 respectively #1141
  • Added __str__ and __repr__ for pipelines and components #1218
  • Included internal target check for both training and validation data in AutoMLSearch #1226
  • Add ProblemTypes.all_problem_types helper to get list of supported problem types #1219
  • Added DecisionTreeClassifier and DecisionTreeRegressor classes #1223
  • Added ProblemTypes.all_problem_types helper to get list of supported problem types #1219
  • DataChecks can now be parametrized by passing a list of DataCheck classes and a parameter dictionary #1167
  • Added first CV fold score as validation score in AutoMLSearch.rankings #1221
  • Updated flake8 configuration to enable linting on init.py` files #1234
  • Refined make_pipeline_from_components implementation #1204

Fixes

  • Updated GitHub URL after migration to Alteryx GitHub org #1207
  • Changed Problem Type enum to be more similar to the string name #1208
  • Wrapped call to scikit-learn's partial dependence method in a try/finally block #1232

Changes

  • Added allow_writing_files as a named argument to CatBoost estimators. #1202
  • Added solver and multi_class as named arguments to LogisticRegressionClassifier #1202
  • Replaced pipeline's ._transform method to evaluate all the preprocessing steps of a pipeline with .compute_estimator_features #1231
  • Changed default large dataset train/test splitting behavior #1205

Documentation Changes

  • Included description of how to access the component instances and features for pipeline user guide #1163
  • Updated API docs to refer to target as "target" instead of "labels" for non-classification tasks and minor docs cleanup #1160
  • Added Class Imbalance Data Check to api_reference.rst #1190 #1200
  • Added pipeline properties to API reference #1209
  • Clarified what the objective parameter in AutoML is used for in AutoML API reference and AutoML user guide #1222
  • Updated API docs to include skopt.space.Categorical option for component hyperparameter range definition #1228
  • Added install documentation for libomp in order to use LightGBM on Mac #1233
  • Improved description of max_iterations in documentation #1212
  • Removed unused code from sphinx conf #1235
    ###Testing Changes

Breaking Changes

  • DefaultDataChecks now accepts a problem_type parameter that must be specified #1167
  • Pipeline's ._transform method to evaluate all the preprocessing steps of a pipeline has been replaced with .compute_estimator_features #1231
  • get_objectives has been renamed to get_core_objectives. This function will now return a list of valid objective instances #1230

v0.14.dev0

29 Sep 18:19
Compare
Choose a tag to compare
v0.14.dev0 Pre-release
Pre-release

Development release for testing purposes

v0.13.2

17 Sep 20:47
f5c8bb2
Compare
Choose a tag to compare

v0.13.2 Sep. 17, 2020

Enhancements

  • Added output_format field to explain predictions functions #1107
  • Modified get_objective and get_objectives to be able to return any objective in evalml.objectives #1132
  • Added a return_instance boolean parameter to get_objective #1132
  • Added ClassImbalanceDataCheck to determine whether target imbalance falls below a given threshold #1135
  • Added label encoder to lightGBM for binary classification #1152
  • Added labels for the row index of confusion matrix #1154
  • Added AutoMLSearch object as another parameter in search callbacks #1156
  • Added the corresponding probability threshold for each point displayed in graph_roc_curve #1161
  • Added __eq__ for ComponentBase and PipelineBase #1178
  • Added support for multiclass classification for roc_curve #1164
  • Added categories accessor to OneHotEncoder for listing the categories associated with a feature #1182
  • Added utility function to create pipeline instances from a list of component instances #1176

Fixes

  • Fixed XGBoost column names for partial dependence methods #1104
  • Removed dead code validating column type from TextFeaturizer #1122
  • Fixed issue where Imputer cannot fit when there is None in a categorical or boolean column #1144
  • OneHotEncoder preserves the custom index in the input data #1146
  • Fixed representation for ModelFamily #1165
  • Removed duplicate nbsphinx dependency in dev-requirements.txt #1168
  • Users can now pass in any valid kwargs to all estimators #1157
  • Remove broken accessor OneHotEncoder.get_feature_names and unneeded base class #1179
  • Removed LightGBM Estimator from AutoML models #1186

Changes

  • Pinned scikit-optimize version to 0.7.4 #1136
  • Removed tqdm as a dependency #1177
  • Added lightgbm version 3.0.0 to latest_dependency_versions.txt #1185

Documentation Changes

  • Fixed API docs for AutoMLSearch add_result_callback #1113
  • Added a step to our release process for pushing our latest version to conda-forge #1118
  • Added warning for missing ipywidgets dependency for using PipelineSearchPlots on Jupyterlab #1145
  • Updated README.md example to load demo dataset #1151
  • Swapped mapping of breast cancer targets in model_understanding.ipynb #1170

Testing Changes

  • Added test confirming TextFeaturizer never outputs null values #1122
  • Changed Python version of Update Dependencies action to 3.8.x #1137
  • Fixed release notes check-in test for Update Dependencies actions #1172

v0.13.dev1

17 Sep 18:27
Compare
Choose a tag to compare
v0.13.dev1 Pre-release
Pre-release

Development release for testing purposes.

v0.13.1

25 Aug 21:06
8922638
Compare
Choose a tag to compare

v0.13.1 Aug. 25, 2020

Enhancements

  • Added Cost-Benefit Matrix objective for binary classification #1038
  • Split fill_value into categorical_fill_value and numeric_fill_value for Imputer #1019
  • Added explain_predictions and explain_predictions_best_worst for explaining multiple predictions with SHAP #1016
  • Added new LSA component for text featurization #1022
  • Added guide on installing with conda #1041
  • Added a “cost-benefit curve” util method to graph cost-benefit matrix scores vs. binary classification thresholds #1081
  • Standardized error when calling transform/predict before fit for pipelines #1048
  • Added percent_better_than_baseline to Automl search rankings and full rankings table #1050
  • Added one-way partial dependence and partial dependence plots #1079
  • Added "Feature Value" column to prediction explanation reports. #1064
  • Added max_batches parameter to AutoMLSearch #1087

Fixes

  • Updated TextFeaturizer component to no longer require an internet connection to run #1022
  • Fixed non-deterministic element of TextFeaturizer transformations #1022
  • Added a StandardScaler to all ElasticNet pipelines #1065
  • Updated cost-benefit matrix to normalize score #1099
  • Fixed logic in calculate_percent_difference so that it can handle negative values #1100

Changes

  • Added needs_fitting property to ComponentBase #1044
  • Updated references to data types to use datatype lists defined in evalml.utils.gen_utils #1039
  • Remove maximum version limit for SciPy dependency #1051
  • Moved all_components and other component importers into runtime methods #1045
  • Consolidated graphing utility methods under evalml.utils.graph_utils #1060
  • Made slight tweaks to how TextFeaturizer uses featuretools, and did some refactoring of that and of LSA #1090
  • Changed show_all_features parameter into feature_threshold, which allows for thresholding feature importance #1097

Documentation Changes

  • Update setup.py URL to point to the github repo #1037
  • Added tutorial for using the cost-benefit matrix objective #1088

Testing Changes

  • Refactor CircleCI tests to use matrix jobs #1043
  • Added a test to check that all test directories are included in evalml package #1054

Breaking Changes

  • confusion_matrix and normalize_confusion_matrix have been moved to evalml.utils #1038
  • All graph utility methods previously under evalml.pipelines.graph_utils have been moved to evalml.utils.graph_utils #1060

v0.13.0.dev0

25 Aug 18:33
Compare
Choose a tag to compare
v0.13.0.dev0 Pre-release
Pre-release

Development release for testing purposes.

v0.12.2

06 Aug 19:49
f7fb96a
Compare
Choose a tag to compare

v0.12.2 Aug. 6, 2020

Enhancements

  • Add save/load method to components #1023
  • Expose pickle protocol as optional arg to save/load #1023

Fixes

Changes

  • Removed DeprecationWarning for SimpleImputer #1018

Documentation Changes

Testing Changes

  • Test files are now included in the evalml package #1029