01 Dec 23:26

dsherry

5a8d844

v0.16.1

v0.16.1 Dec. 1, 2020

Enhancements

Pin woodwork version to v0.0.6 to avoid breaking changes #1484

Fixes

Updated Woodwork to >=0.0.5 in core-requirements.txt #1473
Removed copy_dataframe parameter for Woodwork, updated Woodwork to >=0.0.6 in core-requirements.txt #1478
Updated detect_problem_type to use pandas.api.is_numeric_dtype #1476

Changes

Changed make clean to delete coverage reports as a convenience for developers #1464

Documentation Changes

Testing Changes

Update dependency update checker to use everything from core and optional dependencies #1480

Assets 2

24 Nov 22:12

dsherry

v0.16.0

b8b594f

v0.16.0

v0.16.0 Nov. 24, 2020

Enhancements

Updated pipelines and make_pipeline to accept Woodwork inputs #1393
Updated components to accept Woodwork inputs #1423
Added ability to freeze hyperparameters for AutoMLSearch #1284
Added Target Encoder into transformer components #1401
Added callback for error handling in AutoMLSearch #1403
Added the index id to the explain_predictions_best_worst output to help users identify which rows in their data are included #1365
The top_k features displayed in explain_predictions_* functions are now determined by the magnitude of shap values as opposed to the top_k largest and smallest shap values. #1374
Added a problem type for time series regression #1386
Added a is_defined_for_problem_type method to ObjectiveBase #1386
Added a random_state parameter to make_pipeline_from_components function #1411
Added DelayedFeaturesTransformer #1396
Added a TimeSeriesRegressionPipeline class #1418
Removed core-requirements.txt from the package distribution #1429
Updated data check messages to include a "code" and "details" fields #1451 #1462
Added a TimeSeriesSplit data splitter for time series problems #1441
Added a problem_configuration parameter to AutoMLSearch #1457

Fixes

Fixed IndexError raised in AutoMLSearch when ensembling = True but only one pipeline to iterate over #1397
Fixed stacked ensemble input bug and LightGBM warning and bug in AutoMLSearch #1388
Updated enum classes to show possible enum values as attributes #1391
Updated calls to Woodwork's to_pandas() to to_series() and to_dataframe() #1428
Fixed bug in OHE where column names were not guaranteed to be unique #1349
Fixed bug with percent improvement of ExpVariance objective on data with highly skewed target #1467

Changes

Changed OutliersDataCheck to return the list of columns, rather than rows, that contain outliers #1377
Simplified and cleaned output for Code Generation #1371
Updated data checks to return dictionary of warnings and errors instead of a list #1448
Updated AutoMLSearch to pass Woodwork data structures to every pipeline (instead of pandas DataFrames) #1450
Update AutoMLSearch to default to max_batches=1 instead of max_iterations=5 #1452

Documentation Changes

Added description of CLA to contributing guide, updated description of draft PRs #1402
Updated documentation to include all data checks, DataChecks, and usage of data checks in AutoML #1412
Updated docstrings from np.array to np.ndarray #1417
Added section on stacking ensembles in AutoMLSearch documentation #1425

Testing Changes

Removed category_encoders from test-requirements.txt #1373
Tweak codecov.io settings again to avoid flakes #1413
Modified make lint to check notebook versions in the docs #1431
Modified make lint-fix to standardize notebook versions in the docs #1431
Use new version of pull request Github Action for dependency check #1443
Reduced number of workers for tests to 4 #1447

Breaking Changes

The top_k and top_k_features parameters in explain_predictions_* functions now return k features as opposed to 2 * k features #1374
Renamed problem_type to problem_types in RegressionObjective, BinaryClassificationObjective, and MulticlassClassificationObjective #1319
Data checks now return a dictionary of warnings and errors instead of a list #1448
🦃 🚀

Assets 2

29 Oct 22:59

dsherry

v0.15.0

1ec2ee4

v0.15.0

v0.15.0 Oct. 29, 2020

Enhancements

Added stacked ensemble component classes (StackedEnsembleClassifier, StackedEnsembleRegressor) #1134
Added stacked ensemble components to AutoMLSearch #1253
Added DecisionTreeClassifier and DecisionTreeRegressor to AutoML #1255
Added graph_prediction_vs_actual in model_understanding for regression problems #1252
Added parameter to OneHotEncoder to enable filtering for features to encode for #1249
Added percent-better-than-baseline for all objectives to automl.results #1244
Added HighVarianceCVDataCheck and replaced synonymous warning in AutoMLSearch #1254
Added PCA Transformer component for dimensionality reduction #1270
Added generate_pipeline_code and generate_component_code to allow for code generation given a pipeline or component instance #1306
Added PCA Transformer component for dimensionality reduction #1270
Updated AutoMLSearch to support Woodwork data structures #1299
Added cv_folds to ClassImbalanceDataCheck and added this check to DefaultDataChecks #1333
Make max_batches argument to AutoMLSearch.search public #1320
Added text support to automl search #1062
Added _pipelines_per_batch as a private argument to AutoMLSearch #1355

Fixes

Fixed ML performance issue with ordered datasets: always shuffle data in automl's default CV splits #1265
Fixed broken evalml info CLI command #1293
Fixed boosting type='rf' for LightGBM Classifier, as well as num_leaves error #1302
Fixed bug in explain_predictions_best_worst where a custom index in the target variable would cause a ValueError #1318
Added stacked ensemble estimators to to evalml.pipelines.__init__ file #1326
Fixed bug in OHE where calls to transform were not deterministic if top_n was less than the number of categories in a column #1324
Fixed LightGBM warning messages during AutoMLSearch #1342
Fix warnings thrown during AutoMLSearch in HighVarianceCVDataCheck #1346
Fixed bug where TrainingValidationSplit would return invalid location indices for dataframes with a custom index #1348
Fixed bug where the AutoMLSearch random_state was not being passed to the created pipelines #1321

Changes

Allow add_to_rankings to be called before AutoMLSearch is called #1250
Removed Graphviz from test-requirements to add to requirements.txt #1327
Removed max_pipelines parameter from AutoMLSearch #1264
Include editable installs in all install make targets #1335
Made pip dependencies featuretools and nlp_primitives core dependencies #1062
Removed PartOfSpeechCount from TextFeaturizer transform primitives #1062
Added warning for partial_dependency when the feature includes null values #1352

Documentation Changes

Fixed and updated code blocks in Release Notes #1243
Added DecisionTree estimators to API Reference #1246
Changed class inheritance display to flow vertically #1248
Updated cost-benefit tutorial to use a holdout/test set #1159
Added evalml info command to documentation #1293
Miscellaneous doc updates #1269
Removed conda pre-release testing from the release process document #1282
Updates to contributing guide #1310
Added Alteryx footer to docs with Twitter and Github link #1312
Added documentation for evalml installation for Python 3.6 #1322
Added documentation changes to make the API Docs easier to understand #1323
Fixed documentation for feature_importance #1353
Added tutorial for running AutoML with text data #1357
Added documentation for woodwork integration with automl search #1361

Testing Changes

Added tests for jupyter_check to handle IPython #1256
Cleaned up make_pipeline tests to test for all estimators #1257
Added a test to check conda build after merge to main #1247
Removed code that was lacking codecov for __main__.py and unnecessary #1293
Codecov: round coverage up instead of down #1334
Add DockerHub credentials to CI testing environment #1356
Add DockerHub credentials to conda testing environment #1363

Breaking Changes

Renamed LabelLeakageDataCheck to TargetLeakageDataCheck #1319
max_pipelines parameter has been removed from AutoMLSearch. Please use max_iterations instead. #1264
AutoMLSearch.search() will now log a warning if the input is not a Woodwork data structure (pandas, numpy) #1299
Make max_batches argument to AutoMLSearch.search public #1320
Removed unused argument feature_types from AutoMLSearch.search #1062

Assets 2

29 Sep 20:44

angela97lin

v0.14.1

7dcf640

v0.14.1

v0.14.1 Sep. 29, 2020

Enhancements

Updated partial dependence methods to support calculating numeric columns in a dataset with non-numeric columns #1150
Added get_feature_names on OneHotEncoder #1193
Added detect_problem_type to problem_type/utils.py to automatically detect the problem type given targets #1194
Added LightGBM to AutoMLSearch #1199
Updates scikit-learn and scikit-optimize to use latest versions - 0.23.2 and 0.8.1 respectively #1141
Added __str__ and __repr__ for pipelines and components #1218
Included internal target check for both training and validation data in AutoMLSearch #1226
Add ProblemTypes.all_problem_types helper to get list of supported problem types #1219
Added DecisionTreeClassifier and DecisionTreeRegressor classes #1223
Added ProblemTypes.all_problem_types helper to get list of supported problem types #1219
DataChecks can now be parametrized by passing a list of DataCheck classes and a parameter dictionary #1167
Added first CV fold score as validation score in AutoMLSearch.rankings #1221
Updated flake8 configuration to enable linting on init.py` files #1234
Refined make_pipeline_from_components implementation #1204

Fixes

Updated GitHub URL after migration to Alteryx GitHub org #1207
Changed Problem Type enum to be more similar to the string name #1208
Wrapped call to scikit-learn's partial dependence method in a try/finally block #1232

Changes

Added allow_writing_files as a named argument to CatBoost estimators. #1202
Added solver and multi_class as named arguments to LogisticRegressionClassifier #1202
Replaced pipeline's ._transform method to evaluate all the preprocessing steps of a pipeline with .compute_estimator_features #1231
Changed default large dataset train/test splitting behavior #1205

Documentation Changes

Included description of how to access the component instances and features for pipeline user guide #1163
Updated API docs to refer to target as "target" instead of "labels" for non-classification tasks and minor docs cleanup #1160
Added Class Imbalance Data Check to api_reference.rst #1190 #1200
Added pipeline properties to API reference #1209
Clarified what the objective parameter in AutoML is used for in AutoML API reference and AutoML user guide #1222
Updated API docs to include skopt.space.Categorical option for component hyperparameter range definition #1228
Added install documentation for libomp in order to use LightGBM on Mac #1233
Improved description of max_iterations in documentation #1212
Removed unused code from sphinx conf #1235
###Testing Changes

Breaking Changes

DefaultDataChecks now accepts a problem_type parameter that must be specified #1167
Pipeline's ._transform method to evaluate all the preprocessing steps of a pipeline has been replaced with .compute_estimator_features #1231
get_objectives has been renamed to get_core_objectives. This function will now return a list of valid objective instances #1230

Assets 2

29 Sep 18:19

freddyaboulton

v0.14.dev0

298ba82

v0.14.dev0 Pre-release

Pre-release

Development release for testing purposes

Assets 2

17 Sep 20:47

freddyaboulton

v0.13.2

f5c8bb2

v0.13.2

v0.13.2 Sep. 17, 2020

Enhancements

Added output_format field to explain predictions functions #1107
Modified get_objective and get_objectives to be able to return any objective in evalml.objectives #1132
Added a return_instance boolean parameter to get_objective #1132
Added ClassImbalanceDataCheck to determine whether target imbalance falls below a given threshold #1135
Added label encoder to lightGBM for binary classification #1152
Added labels for the row index of confusion matrix #1154
Added AutoMLSearch object as another parameter in search callbacks #1156
Added the corresponding probability threshold for each point displayed in graph_roc_curve #1161
Added __eq__ for ComponentBase and PipelineBase #1178
Added support for multiclass classification for roc_curve #1164
Added categories accessor to OneHotEncoder for listing the categories associated with a feature #1182
Added utility function to create pipeline instances from a list of component instances #1176

Fixes

Fixed XGBoost column names for partial dependence methods #1104
Removed dead code validating column type from TextFeaturizer #1122
Fixed issue where Imputer cannot fit when there is None in a categorical or boolean column #1144
OneHotEncoder preserves the custom index in the input data #1146
Fixed representation for ModelFamily #1165
Removed duplicate nbsphinx dependency in dev-requirements.txt #1168
Users can now pass in any valid kwargs to all estimators #1157
Remove broken accessor OneHotEncoder.get_feature_names and unneeded base class #1179
Removed LightGBM Estimator from AutoML models #1186

Changes

Pinned scikit-optimize version to 0.7.4 #1136
Removed tqdm as a dependency #1177
Added lightgbm version 3.0.0 to latest_dependency_versions.txt #1185

Documentation Changes

Fixed API docs for AutoMLSearch add_result_callback #1113
Added a step to our release process for pushing our latest version to conda-forge #1118
Added warning for missing ipywidgets dependency for using PipelineSearchPlots on Jupyterlab #1145
Updated README.md example to load demo dataset #1151
Swapped mapping of breast cancer targets in model_understanding.ipynb #1170

Testing Changes

Added test confirming TextFeaturizer never outputs null values #1122
Changed Python version of Update Dependencies action to 3.8.x #1137
Fixed release notes check-in test for Update Dependencies actions #1172

Assets 2

17 Sep 18:27

freddyaboulton

v0.13.dev1

f8eb8e6

v0.13.dev1 Pre-release

Pre-release

Development release for testing purposes.

Assets 2

25 Aug 21:06

dsherry

v0.13.1

8922638

v0.13.1

v0.13.1 Aug. 25, 2020

Enhancements

Added Cost-Benefit Matrix objective for binary classification #1038
Split fill_value into categorical_fill_value and numeric_fill_value for Imputer #1019
Added explain_predictions and explain_predictions_best_worst for explaining multiple predictions with SHAP #1016
Added new LSA component for text featurization #1022
Added guide on installing with conda #1041
Added a “cost-benefit curve” util method to graph cost-benefit matrix scores vs. binary classification thresholds #1081
Standardized error when calling transform/predict before fit for pipelines #1048
Added percent_better_than_baseline to Automl search rankings and full rankings table #1050
Added one-way partial dependence and partial dependence plots #1079
Added "Feature Value" column to prediction explanation reports. #1064
Added max_batches parameter to AutoMLSearch #1087

Fixes

Updated TextFeaturizer component to no longer require an internet connection to run #1022
Fixed non-deterministic element of TextFeaturizer transformations #1022
Added a StandardScaler to all ElasticNet pipelines #1065
Updated cost-benefit matrix to normalize score #1099
Fixed logic in calculate_percent_difference so that it can handle negative values #1100

Changes

Added needs_fitting property to ComponentBase #1044
Updated references to data types to use datatype lists defined in evalml.utils.gen_utils #1039
Remove maximum version limit for SciPy dependency #1051
Moved all_components and other component importers into runtime methods #1045
Consolidated graphing utility methods under evalml.utils.graph_utils #1060
Made slight tweaks to how TextFeaturizer uses featuretools, and did some refactoring of that and of LSA #1090
Changed show_all_features parameter into feature_threshold, which allows for thresholding feature importance #1097

Documentation Changes

Update setup.py URL to point to the github repo #1037
Added tutorial for using the cost-benefit matrix objective #1088

Testing Changes

Refactor CircleCI tests to use matrix jobs #1043
Added a test to check that all test directories are included in evalml package #1054

Breaking Changes

confusion_matrix and normalize_confusion_matrix have been moved to evalml.utils #1038
All graph utility methods previously under evalml.pipelines.graph_utils have been moved to evalml.utils.graph_utils #1060

Assets 2

25 Aug 18:33

freddyaboulton

v0.13.0.dev0

e821424

v0.13.0.dev0 Pre-release

Pre-release

Development release for testing purposes.

Assets 2

06 Aug 19:49

freddyaboulton

v0.12.2

f7fb96a

v0.12.2

v0.12.2 Aug. 6, 2020

Enhancements

Add save/load method to components #1023
Expose pickle protocol as optional arg to save/load #1023

Fixes

Changes

Removed DeprecationWarning for SimpleImputer #1018

Documentation Changes

Testing Changes

Test files are now included in the evalml package #1029

Assets 2

Releases: alteryx/evalml

v0.16.1

v0.16.1 Dec. 1, 2020

Enhancements

Fixes

Changes

Documentation Changes

Testing Changes

v0.16.0

v0.16.0 Nov. 24, 2020

Enhancements

Fixes

Changes

Documentation Changes

Testing Changes

Breaking Changes

v0.15.0

v0.15.0 Oct. 29, 2020

Enhancements

Fixes

Changes

Documentation Changes

Testing Changes

Breaking Changes

v0.14.1

v0.14.1 Sep. 29, 2020

Enhancements

Fixes

Changes

Documentation Changes

Breaking Changes

v0.14.dev0

v0.13.2

v0.13.2 Sep. 17, 2020

Enhancements

Fixes

Changes

Documentation Changes

Testing Changes

v0.13.dev1

v0.13.1

v0.13.1 Aug. 25, 2020

Enhancements

Fixes

Changes

Documentation Changes

Testing Changes

Breaking Changes

v0.13.0.dev0

v0.12.2

v0.12.2 Aug. 6, 2020

Enhancements

Fixes

Changes

Documentation Changes

Testing Changes