- Future Releases
- Enhancements
- Added
graph_prediction_vs_actual_over_time
andget_prediction_vs_actual_over_time_data
to the model understanding module for time series problems1483
- Updated data checks to accept
Woodwork
data structures1481
- Added parameter to
InvalidTargetDataCheck
to show only top unique values rather than all unique values1485
- Added
- Fixes
- Fix Windows CI jobs: install
numba
via conda, required forshap
1490
- Fix Windows CI jobs: install
- Changes
- Update circleci badge to apply to
main
1489
- Added script to generate github markdown for releases
1487
- Update circleci badge to apply to
- Documentation Changes
- Testing Changes
- v0.16.1 Dec. 1, 2020
- Enhancements
- Pin woodwork version to v0.0.6 to avoid breaking changes
1484
- Updated
Woodwork
to >=0.0.5 incore-requirements.txt
1473
- Removed
copy_dataframe
parameter forWoodwork
, updatedWoodwork
to >=0.0.6 incore-requirements.txt
1478
- Updated
detect_problem_type
to usepandas.api.is_numeric_dtype
1476
- Pin woodwork version to v0.0.6 to avoid breaking changes
- Changes
- Changed
make clean
to delete coverage reports as a convenience for developers1464
- Changed
- Documentation Changes
- Testing Changes
- Update dependency update checker to use everything from core and optional dependencies
1480
- Update dependency update checker to use everything from core and optional dependencies
- v0.16.0 Nov. 24, 2020
- Enhancements
- Updated pipelines and
make_pipeline
to acceptWoodwork
inputs1393
- Updated components to accept
Woodwork
inputs1423
- Added ability to freeze hyperparameters for
AutoMLSearch
1284
- Added
Target Encoder
into transformer components1401
- Added callback for error handling in
AutoMLSearch
1403
- Added the index id to the
explain_predictions_best_worst
output to help users identify which rows in their data are included1365
- The top_k features displayed in
explain_predictions_*
functions are now determined by the magnitude of shap values as opposed to thetop_k
largest and smallest shap values.1374
- Added a problem type for time series regression
1386
- Added a
is_defined_for_problem_type
method toObjectiveBase
1386
- Added a
random_state
parameter tomake_pipeline_from_components
function1411
- Added
DelayedFeaturesTransformer
1396
- Added a
TimeSeriesRegressionPipeline
class1418
- Removed
core-requirements.txt
from the package distribution1429
- Updated data check messages to include a "code" and "details" fields
1451
,1462
- Added a
TimeSeriesSplit
data splitter for time series problems1441
- Added a
problem_configuration
parameter to AutoMLSearch1457
- Updated pipelines and
- Fixes
- Fixed
IndexError
raised inAutoMLSearch
whenensembling = True
but only one pipeline to iterate over1397
- Fixed stacked ensemble input bug and LightGBM warning and bug in
AutoMLSearch
1388
- Updated enum classes to show possible enum values as attributes
1391
- Updated calls to
Woodwork
'sto_pandas()
toto_series()
andto_dataframe()
1428
- Fixed bug in OHE where column names were not guaranteed to be unique
1349
- Fixed bug with percent improvement of
ExpVariance
objective on data with highly skewed target1467
- Fix SimpleImputer error which occurs when all features are bool type
1215
- Fixed
- Changes
- Changed
OutliersDataCheck
to return the list of columns, rather than rows, that contain outliers1377
- Simplified and cleaned output for Code Generation
1371
- Reverted changes from
1337
1409
- Updated data checks to return dictionary of warnings and errors instead of a list
1448
- Updated
AutoMLSearch
to passWoodwork
data structures to every pipeline (instead of pandas DataFrames)1450
- Update
AutoMLSearch
to default tomax_batches=1
instead ofmax_iterations=5
1452
- Updated _evaluate_pipelines to consolidate side effects
1410
- Changed
- Documentation Changes
- Added description of CLA to contributing guide, updated description of draft PRs
1402
- Updated documentation to include all data checks,
DataChecks
, and usage of data checks in AutoML1412
- Updated docstrings from
np.array
tonp.ndarray
1417
- Added section on stacking ensembles in AutoMLSearch documentation
1425
- Added description of CLA to contributing guide, updated description of draft PRs
- Testing Changes
- Removed
category_encoders
from test-requirements.txt1373
- Tweak codecov.io settings again to avoid flakes
1413
- Modified
make lint
to check notebook versions in the docs1431
- Modified
make lint-fix
to standardize notebook versions in the docs1431
- Use new version of pull request Github Action for dependency check (
1443
) - Reduced number of workers for tests to 4
1447
- Removed
Warning
- Breaking Changes
- The
top_k
andtop_k_features
parameters inexplain_predictions_*
functions now returnk
features as opposed to2 * k
features1374
- Renamed
problem_type
toproblem_types
inRegressionObjective
,BinaryClassificationObjective
, andMulticlassClassificationObjective
1319
- Data checks now return a dictionary of warnings and errors instead of a list
1448
- The
- v0.15.0 Oct. 29, 2020
- Enhancements
- Added stacked ensemble component classes (
StackedEnsembleClassifier
,StackedEnsembleRegressor
)1134
- Added stacked ensemble components to
AutoMLSearch
1253
- Added
DecisionTreeClassifier
andDecisionTreeRegressor
to AutoML1255
- Added
graph_prediction_vs_actual
inmodel_understanding
for regression problems1252
- Added parameter to
OneHotEncoder
to enable filtering for features to encode for1249
- Added percent-better-than-baseline for all objectives to automl.results
1244
- Added
HighVarianceCVDataCheck
and replaced synonymous warning inAutoMLSearch
1254
- Added PCA Transformer component for dimensionality reduction
1270
- Added
generate_pipeline_code
andgenerate_component_code
to allow for code generation given a pipeline or component instance1306
- Added
PCA Transformer
component for dimensionality reduction1270
- Updated
AutoMLSearch
to supportWoodwork
data structures1299
- Added cv_folds to
ClassImbalanceDataCheck
and added this check toDefaultDataChecks
1333
- Make
max_batches
argument toAutoMLSearch.search
public1320
- Added text support to automl search
1062
- Added
_pipelines_per_batch
as a private argument toAutoMLSearch
1355
- Added stacked ensemble component classes (
- Fixes
- Fixed ML performance issue with ordered datasets: always shuffle data in automl's default CV splits
1265
- Fixed broken
evalml info
CLI command1293
- Fixed
boosting type='rf'
for LightGBM Classifier, as well asnum_leaves
error1302
- Fixed bug in
explain_predictions_best_worst
where a custom index in the target variable would cause aValueError
1318
- Added stacked ensemble estimators to to
evalml.pipelines.__init__
file1326
- Fixed bug in OHE where calls to transform were not deterministic if
top_n
was less than the number of categories in a column1324
- Fixed LightGBM warning messages during AutoMLSearch
1342
- Fix warnings thrown during AutoMLSearch in
HighVarianceCVDataCheck
1346
- Fixed bug where TrainingValidationSplit would return invalid location indices for dataframes with a custom index
1348
- Fixed bug where the AutoMLSearch
random_state
was not being passed to the created pipelines1321
- Fixed ML performance issue with ordered datasets: always shuffle data in automl's default CV splits
- Changes
- Allow
add_to_rankings
to be called before AutoMLSearch is called1250
- Removed Graphviz from test-requirements to add to requirements.txt
1327
- Removed
max_pipelines
parameter fromAutoMLSearch
1264
- Include editable installs in all install make targets
1335
- Made pip dependencies featuretools and nlp_primitives core dependencies
1062
- Removed PartOfSpeechCount from TextFeaturizer transform primitives
1062
- Added warning for
partial_dependency
when the feature includes null values1352
- Allow
- Documentation Changes
- Fixed and updated code blocks in Release Notes
1243
- Added DecisionTree estimators to API Reference
1246
- Changed class inheritance display to flow vertically
1248
- Updated cost-benefit tutorial to use a holdout/test set
1159
- Added
evalml info
command to documentation1293
- Miscellaneous doc updates
1269
- Removed conda pre-release testing from the release process document
1282
- Updates to contributing guide
1310
- Added Alteryx footer to docs with Twitter and Github link
1312
- Added documentation for evalml installation for Python 3.6
1322
- Added documentation changes to make the API Docs easier to understand
1323
- Fixed documentation for
feature_importance
1353
- Added tutorial for running AutoML with text data
1357
- Added documentation for woodwork integration with automl search
1361
- Fixed and updated code blocks in Release Notes
- Testing Changes
- Added tests for
jupyter_check
to handle IPython1256
- Cleaned up
make_pipeline
tests to test for all estimators1257
- Added a test to check conda build after merge to main
1247
- Removed code that was lacking codecov for
__main__.py
and unnecessary1293
- Codecov: round coverage up instead of down
1334
- Add DockerHub credentials to CI testing environment
1356
- Add DockerHub credentials to conda testing environment
1363
- Added tests for
Warning
- Breaking Changes
- Renamed
LabelLeakageDataCheck
toTargetLeakageDataCheck
1319
max_pipelines
parameter has been removed fromAutoMLSearch
. Please usemax_iterations
instead.1264
AutoMLSearch.search()
will now log a warning if the input is not aWoodwork
data structure (pandas
,numpy
)1299
- Make
max_batches
argument toAutoMLSearch.search
public1320
- Removed unused argument feature_types from AutoMLSearch.search
1062
- Renamed
- v0.14.1 Sep. 29, 2020
- Enhancements
- Updated partial dependence methods to support calculating numeric columns in a dataset with non-numeric columns
1150
- Added
get_feature_names
onOneHotEncoder
1193
- Added
detect_problem_type
toproblem_type/utils.py
to automatically detect the problem type given targets1194
- Added LightGBM to
AutoMLSearch
1199
- Updated
scikit-learn
andscikit-optimize
to use latest versions - 0.23.2 and 0.8.1 respectively1141
- Added
__str__
and__repr__
for pipelines and components1218
- Included internal target check for both training and validation data in
AutoMLSearch
1226
- Added
ProblemTypes.all_problem_types
helper to get list of supported problem types1219
- Added
DecisionTreeClassifier
andDecisionTreeRegressor
classes1223
- Added
ProblemTypes.all_problem_types
helper to get list of supported problem types1219
DataChecks
can now be parametrized by passing a list ofDataCheck
classes and a parameter dictionary1167
- Added first CV fold score as validation score in
AutoMLSearch.rankings
1221
- Updated
flake8
configuration to enable linting on__init__.py
files1234
- Refined
make_pipeline_from_components
implementation1204
- Updated partial dependence methods to support calculating numeric columns in a dataset with non-numeric columns
- Fixes
- Updated GitHub URL after migration to Alteryx GitHub org
1207
- Changed Problem Type enum to be more similar to the string name
1208
- Wrapped call to scikit-learn's partial dependence method in a
try
/finally
block1232
- Updated GitHub URL after migration to Alteryx GitHub org
- Changes
- Added
allow_writing_files
as a named argument to CatBoost estimators.1202
- Added
solver
andmulti_class
as named arguments toLogisticRegressionClassifier
1202
- Replaced pipeline's
._transform
method to evaluate all the preprocessing steps of a pipeline with.compute_estimator_features
1231
- Changed default large dataset train/test splitting behavior
1205
- Added
- Documentation Changes
- Included description of how to access the component instances and features for pipeline user guide
1163
- Updated API docs to refer to target as "target" instead of "labels" for non-classification tasks and minor docs cleanup
1160
- Added Class Imbalance Data Check to
api_reference.rst
1190
1200
- Added pipeline properties to API reference
1209
- Clarified what the objective parameter in AutoML is used for in AutoML API reference and AutoML user guide
1222
- Updated API docs to include
skopt.space.Categorical
option for component hyperparameter range definition1228
- Added install documentation for
libomp
in order to use LightGBM on Mac1233
- Improved description of
max_iterations
in documentation1212
- Removed unused code from sphinx conf
1235
- Included description of how to access the component instances and features for pipeline user guide
- Testing Changes
Warning
- Breaking Changes
DefaultDataChecks
now accepts aproblem_type
parameter that must be specified1167
- Pipeline's
._transform
method to evaluate all the preprocessing steps of a pipeline has been replaced with.compute_estimator_features
1231
get_objectives
has been renamed toget_core_objectives
. This function will now return a list of valid objective instances1230
- v0.13.2 Sep. 17, 2020
- Enhancements
- Added
output_format
field to explain predictions functions1107
- Modified
get_objective
andget_objectives
to be able to return any objective inevalml.objectives
1132
- Added a
return_instance
boolean parameter toget_objective
1132
- Added
ClassImbalanceDataCheck
to determine whether target imbalance falls below a given threshold1135
- Added label encoder to LightGBM for binary classification
1152
- Added labels for the row index of confusion matrix
1154
- Added
AutoMLSearch
object as another parameter in search callbacks1156
- Added the corresponding probability threshold for each point displayed in
graph_roc_curve
1161
- Added
__eq__
forComponentBase
andPipelineBase
1178
- Added support for multiclass classification for
roc_curve
1164
- Added
categories
accessor toOneHotEncoder
for listing the categories associated with a feature1182
- Added utility function to create pipeline instances from a list of component instances
1176
- Added
- Fixes
- Fixed XGBoost column names for partial dependence methods
1104
- Removed dead code validating column type from
TextFeaturizer
1122
- Fixed issue where
Imputer
cannot fit when there is None in a categorical or boolean column1144
OneHotEncoder
preserves the custom index in the input data1146
- Fixed representation for
ModelFamily
1165
- Removed duplicate
nbsphinx
dependency indev-requirements.txt
1168
- Users can now pass in any valid kwargs to all estimators
1157
- Remove broken accessor
OneHotEncoder.get_feature_names
and unneeded base class1179
- Removed LightGBM Estimator from AutoML models
1186
- Fixed XGBoost column names for partial dependence methods
- Changes
- Pinned
scikit-optimize
version to 0.7.41136
- Removed
tqdm
as a dependency1177
- Added lightgbm version 3.0.0 to
latest_dependency_versions.txt
1185
- Rename
max_pipelines
tomax_iterations
1169
- Pinned
- Documentation Changes
- Fixed API docs for
AutoMLSearch
add_result_callback
1113
- Added a step to our release process for pushing our latest version to conda-forge
1118
- Added warning for missing ipywidgets dependency for using
PipelineSearchPlots
on Jupyterlab1145
- Updated
README.md
example to load demo dataset1151
- Swapped mapping of breast cancer targets in
model_understanding.ipynb
1170
- Fixed API docs for
- Testing Changes
- Added test confirming
TextFeaturizer
never outputs null values1122
- Changed Python version of
Update Dependencies
action to 3.8.x1137
- Fixed release notes check-in test for
Update Dependencies
actions1172
- Added test confirming
Warning
- Breaking Changes
get_objective
will now return a class definition rather than an instance by default1132
- Deleted
OPTIONS
dictionary inevalml.objectives.utils.py
1132
- If specifying an objective by string, the string must now match the objective's name field, case-insensitive
1132
- Passing "Cost Benefit Matrix", "Fraud Cost", "Lead Scoring", "Mean Squared Log Error",
"Recall", "Recall Macro", "Recall Micro", "Recall Weighted", or "Root Mean Squared Log Error" to
AutoMLSearch
will now result in aValueError
rather than anObjectiveNotFoundError
1132
- Search callbacks
start_iteration_callback
andadd_results_callback
have changed to include a copy of the AutoMLSearch object as a third parameter1156
- Deleted
OneHotEncoder.get_feature_names
method which had been broken for a while, in favor of pipelines'input_feature_names
1179
- Deleted empty base class
CategoricalEncoder
whichOneHotEncoder
component was inheriting from1176
- Results from
roc_curve
will now return as a list of dictionaries with each dictionary representing a class1164
max_pipelines
now raises aDeprecationWarning
and will be removed in the next release.max_iterations
should be used instead.1169
- v0.13.1 Aug. 25, 2020
- Enhancements
- Added Cost-Benefit Matrix objective for binary classification
1038
- Split
fill_value
intocategorical_fill_value
andnumeric_fill_value
for Imputer1019
- Added
explain_predictions
andexplain_predictions_best_worst
for explaining multiple predictions with SHAP1016
- Added new LSA component for text featurization
1022
- Added guide on installing with conda
1041
- Added a “cost-benefit curve” util method to graph cost-benefit matrix scores vs. binary classification thresholds
1081
- Standardized error when calling transform/predict before fit for pipelines
1048
- Added
percent_better_than_baseline
to AutoML search rankings and full rankings table1050
- Added one-way partial dependence and partial dependence plots
1079
- Added "Feature Value" column to prediction explanation reports.
1064
- Added LightGBM classification estimator
1082
,1114
- Added
max_batches
parameter toAutoMLSearch
1087
- Added Cost-Benefit Matrix objective for binary classification
- Fixes
- Updated
TextFeaturizer
component to no longer require an internet connection to run1022
- Fixed non-deterministic element of
TextFeaturizer
transformations1022
- Added a StandardScaler to all ElasticNet pipelines
1065
- Updated cost-benefit matrix to normalize score
1099
- Fixed logic in
calculate_percent_difference
so that it can handle negative values1100
- Updated
- Changes
- Added
needs_fitting
property toComponentBase
1044
- Updated references to data types to use datatype lists defined in
evalml.utils.gen_utils
1039
- Remove maximum version limit for SciPy dependency
1051
- Moved
all_components
and other component importers into runtime methods1045
- Consolidated graphing utility methods under
evalml.utils.graph_utils
1060
- Made slight tweaks to how
TextFeaturizer
usesfeaturetools
, and did some refactoring of that and of LSA1090
- Changed
show_all_features
parameter intoimportance_threshold
, which allows for thresholding feature importance1097
,1103
- Added
- Documentation Changes
- Update
setup.py
URL to point to the github repo1037
- Added tutorial for using the cost-benefit matrix objective
1088
- Updated
model_understanding.ipynb
to include documentation for using plotly on Jupyter Lab1108
- Update
- Testing Changes
- Refactor CircleCI tests to use matrix jobs (
1043
) - Added a test to check that all test directories are included in evalml package
1054
- Refactor CircleCI tests to use matrix jobs (
Warning
- Breaking Changes
confusion_matrix
andnormalize_confusion_matrix
have been moved toevalml.utils
1038
- All graph utility methods previously under
evalml.pipelines.graph_utils
have been moved toevalml.utils.graph_utils
1060
- v0.12.2 Aug. 6, 2020
- Enhancements
- Add save/load method to components
1023
- Expose pickle
protocol
as optional arg to save/load1023
- Updated estimators used in AutoML to include ExtraTrees and ElasticNet estimators
1030
- Add save/load method to components
- Fixes
- Changes
- Removed
DeprecationWarning
forSimpleImputer
1018
- Removed
- Documentation Changes
- Add note about version numbers to release process docs
1034
- Add note about version numbers to release process docs
- Testing Changes
- Test files are now included in the evalml package
1029
- Test files are now included in the evalml package
- v0.12.0 Aug. 3, 2020
- Enhancements
- Added string and categorical targets support for binary and multiclass pipelines and check for numeric targets for
DetectLabelLeakage
data check932
- Added clear exception for regression pipelines if target datatype is string or categorical
960
- Added target column names and class labels in
predict
andpredict_proba
output for pipelines951
- Added
_compute_shap_values
andnormalize_values
topipelines/explanations
module958
- Added
explain_prediction
feature which explains single predictions with SHAP974
- Added Imputer to allow different imputation strategies for numerical and categorical dtypes
991
- Added support for configuring logfile path using env var, and don't create logger if there are filesystem errors
975
- Updated catboost estimators' default parameters and automl hyperparameter ranges to speed up fit time
998
- Added string and categorical targets support for binary and multiclass pipelines and check for numeric targets for
- Fixes
- Fixed ReadtheDocs warning failure regarding embedded gif
943
- Removed incorrect parameter passed to pipeline classes in
_add_baseline_pipelines
941
- Added universal error for calling
predict
,predict_proba
,transform
, andfeature_importances
before fitting969
,994
- Made
TextFeaturizer
component and pip dependenciesfeaturetools
andnlp_primitives
optional976
- Updated imputation strategy in automl to no longer limit impute strategy to
most_frequent
for all features if there are any categorical columns991
- Fixed
UnboundLocalError
forcv_pipeline
when automl search errors996
- Fixed
Imputer
to reset dataframe index to preserve behavior expected fromSimpleImputer
1009
- Fixed ReadtheDocs warning failure regarding embedded gif
- Changes
- Moved
get_estimators
toevalml.pipelines.components.utils
934
- Modified Pipelines to raise
PipelineScoreError
when they encounter an error during scoring936
- Moved
evalml.model_families.list_model_families
toevalml.pipelines.components.allowed_model_families
959
- Renamed
DateTimeFeaturization
toDateTimeFeaturizer
977
- Added check to stop search and raise an error if all pipelines in a batch return NaN scores
1015
- Moved
- Documentation Changes
- Updated
README.md
963
- Reworded message when errors are returned from data checks in search
982
- Added section on understanding model predictions with
explain_prediction
to User Guide981
- Added a section to the user guide and api reference about how XGBoost and CatBoost are not fully supported.
992
- Added custom components section in user guide
993
- Updated FAQ section formatting
997
- Updated release process documentation
1003
- Updated
- Testing Changes
- Moved
predict_proba
andpredict
tests regarding string / categorical targets totest_pipelines.py
972
- Fixed dependency update bot by updating python version to 3.7 to avoid frequent github version updates
1002
- Moved
Warning
- Breaking Changes
get_estimators
has been moved toevalml.pipelines.components.utils
(previously was underevalml.pipelines.utils
)934
- Removed the
raise_errors
flag in AutoML search. All errors during pipeline evaluation will be caught and logged.936
evalml.model_families.list_model_families
has been moved toevalml.pipelines.components.allowed_model_families
959
TextFeaturizer
: thefeaturetools
andnlp_primitives
packages must be installed after installing evalml in order to use this component976
- Renamed
DateTimeFeaturization
toDateTimeFeaturizer
977
- v0.11.2 July 16, 2020
- Enhancements
- Added
NoVarianceDataCheck
toDefaultDataChecks
893
- Added text processing and featurization component
TextFeaturizer
913
,924
- Added additional checks to
InvalidTargetDataCheck
to handle invalid target data types929
AutoMLSearch
will now handleKeyboardInterrupt
and prompt user for confirmation915
- Added
- Fixes
- Makes automl results a read-only property
919
- Makes automl results a read-only property
- Changes
- Deleted static pipelines and refactored tests involving static pipelines, removed
all_pipelines()
andget_pipelines()
904
- Moved
list_model_families
toevalml.model_family.utils
903
- Updated
all_pipelines
,all_estimators
,all_components
to use the same mechanism for dynamically generating their elements898
- Rename
master
branch tomain
918
- Add pypi release github action
923
- Updated
AutoMLSearch.search
stdout output and logging and removed tqdm progress bar921
- Moved automl config checks previously in
search()
to init933
- Deleted static pipelines and refactored tests involving static pipelines, removed
- Documentation Changes
- Reorganized and rewrote documentation
937
- Updated to use pydata sphinx theme
937
- Updated docs to use
release_notes
instead ofchangelog
942
- Reorganized and rewrote documentation
- Testing Changes
- Cleaned up fixture names and usages in tests
895
- Cleaned up fixture names and usages in tests
Warning
- Breaking Changes
list_model_families
has been moved toevalml.model_family.utils
(previously was underevalml.pipelines.utils
)903
get_estimators
has been moved toevalml.pipelines.components.utils
(previously was underevalml.pipelines.utils
)934
- Static pipeline definitions have been removed, but similar pipelines can still be constructed via creating an instance of
PipelineBase
904
all_pipelines()
andget_pipelines()
utility methods have been removed904
- v0.11.0 June 30, 2020
- Enhancements
- Added multiclass support for ROC curve graphing
832
- Added preprocessing component to drop features whose percentage of NaN values exceeds a specified threshold
834
- Added data check to check for problematic target labels
814
- Added PerColumnImputer that allows imputation strategies per column
824
- Added transformer to drop specific columns
827
- Added support for
categories
,handle_error
, anddrop
parameters inOneHotEncoder
830
897
- Added preprocessing component to handle DateTime columns featurization
838
- Added ability to clone pipelines and components
842
- Define getter method for component
parameters
847
- Added utility methods to calculate and graph permutation importances
860
,880
- Added new utility functions necessary for generating dynamic preprocessing pipelines
852
- Added kwargs to all components
863
- Updated
AutoSearchBase
to use dynamically generated preprocessing pipelines870
- Added SelectColumns transformer
873
- Added ability to evaluate additional pipelines for automl search
874
- Added
default_parameters
class property to components and pipelines879
- Added better support for disabling data checks in automl search
892
- Added ability to save and load AutoML objects to file
888
- Updated
AutoSearchBase.get_pipelines
to return an untrained pipeline instance876
- Saved learned binary classification thresholds in automl results cv data dict
876
- Added multiclass support for ROC curve graphing
- Fixes
- Fixed bug where SimpleImputer cannot handle dropped columns
846
- Fixed bug where PerColumnImputer cannot handle dropped columns
855
- Enforce requirement that builtin components save all inputted values in their parameters dict
847
- Don't list base classes in
all_components
output847
- Standardize all components to output pandas data structures, and accept either pandas or numpy
853
- Fixed rankings and full_rankings error when search has not been run
894
- Fixed bug where SimpleImputer cannot handle dropped columns
- Changes
- Update
all_pipelines
andall_components
to try initializing pipelines/components, and on failure exclude them849
- Refactor
handle_components
tohandle_components_class
, standardize toComponentBase
subclass instead of instance850
- Refactor "blacklist"/"whitelist" to "allow"/"exclude" lists
854
- Replaced
AutoClassificationSearch
andAutoRegressionSearch
withAutoMLSearch
871
- Renamed feature_importances and permutation_importances methods to use singular names (feature_importance and permutation_importance)
883
- Updated
automl
default data splitter to train/validation split for large datasets877
- Added open source license, update some repo metadata
887
- Removed dead code in
_get_preprocessing_components
896
- Update
- Documentation Changes
- Fix some typos and update the EvalML logo
872
- Fix some typos and update the EvalML logo
- Testing Changes
- Update the changelog check job to expect the new branching pattern for the deps update bot
836
- Check that all components output pandas datastructures, and can accept either pandas or numpy
853
- Replaced
AutoClassificationSearch
andAutoRegressionSearch
withAutoMLSearch
871
- Update the changelog check job to expect the new branching pattern for the deps update bot
Warning
- Breaking Changes
- Pipelines' static
component_graph
field must contain eitherComponentBase
subclasses orstr
, instead ofComponentBase
subclass instances850
- Rename
handle_component
tohandle_component_class
. Now standardizes toComponentBase
subclasses instead ofComponentBase
subclass instances850
- Renamed automl's
cv
argument todata_split
877
- Pipelines' and classifiers'
feature_importances
is renamedfeature_importance
,graph_feature_importances
is renamedgraph_feature_importance
883
- Passing
data_checks=None
to automl search will not perform any data checks as opposed to default checks.892
- Pipelines to search for in AutoML are now determined automatically, rather than using the statically-defined pipeline classes.
870
- Updated
AutoSearchBase.get_pipelines
to return an untrained pipeline instance, instead of one which happened to be trained on the final cross-validation fold876
- Pipelines' static
- v0.10.0 May 29, 2020
- Enhancements
- Added baseline models for classification and regression, add functionality to calculate baseline models before searching in AutoML
746
- Port over highly-null guardrail as a data check and define
DefaultDataChecks
andDisableDataChecks
classes745
- Update
Tuner
classes to work directly with pipeline parameters dicts instead of flat parameter lists779
- Add Elastic Net as a pipeline option
812
- Added new Pipeline option
ExtraTrees
790
- Added precicion-recall curve metrics and plot for binary classification problems in
evalml.pipeline.graph_utils
794
- Update the default automl algorithm to search in batches, starting with default parameters for each pipeline and iterating from there
793
- Added
AutoMLAlgorithm
class andIterativeAlgorithm
impl, separated fromAutoSearchBase
793
- Added baseline models for classification and regression, add functionality to calculate baseline models before searching in AutoML
- Fixes
- Update pipeline
score
to returnnan
score for any objective which throws an exception during scoring787
- Fixed bug introduced in
787
where binary classification metrics requiring predicted probabilities error in scoring798
- CatBoost and XGBoost classifiers and regressors can no longer have a learning rate of 0
795
- Update pipeline
- Changes
- Cleanup pipeline
score
code, and cleanup codecov711
- Remove
pass
for abstract methods for codecov730
- Added __str__ for AutoSearch object
675
- Add util methods to graph ROC and confusion matrix
720
- Refactor
AutoBase
toAutoSearchBase
758
- Updated AutoBase with
data_checks
parameter, removed previousdetect_label_leakage
parameter, and added functionality to run data checks before search in AutoML765
- Updated our logger to use Python's logging utils
763
- Refactor most of
AutoSearchBase._do_iteration
impl intoAutoSearchBase._evaluate
762
- Port over all guardrails to use the new DataCheck API
789
- Expanded
import_or_raise
to catch all exceptions759
- Adds RMSE, MSLE, RMSLE as standard metrics
788
- Don't allow
Recall
to be used as an objective for AutoML784
- Removed feature selection from pipelines
819
- Update default estimator parameters to make automl search faster and more accurate
793
- Cleanup pipeline
- Documentation Changes
- Add instructions to freeze
master
onrelease.md
726
- Update release instructions with more details
727
733
- Add objective base classes to API reference
736
- Fix components API to match other modules
747
- Add instructions to freeze
- Testing Changes
- Delete codecov yml, use codecov.io's default
732
- Added unit tests for fraud cost, lead scoring, and standard metric objectives
741
- Update codecov client
782
- Updated AutoBase __str__ test to include no parameters case
783
- Added unit tests for
ExtraTrees
pipeline790
- If codecov fails to upload, fail build
810
- Updated Python version of dependency action
816
- Update the dependency update bot to use a suffix when creating branches
817
- Delete codecov yml, use codecov.io's default
Warning
- Breaking Changes
- The
detect_label_leakage
parameter for AutoML classes has been removed and replaced by adata_checks
parameter765
- Moved ROC and confusion matrix methods from
evalml.pipeline.plot_utils
toevalml.pipeline.graph_utils
720
Tuner
classes require a pipeline hyperparameter range dict as an init arg instead of a space definition779
Tuner.propose
andTuner.add
work directly with pipeline parameters dicts instead of flat parameter lists779
PipelineBase.hyperparameters
andcustom_hyperparameters
use pipeline parameters dict format instead of being represented as a flat list779
- All guardrail functions previously under
evalml.guardrails.utils
will be removed and replaced by data checks789
Recall
disallowed as an objective for AutoML784
AutoSearchBase
parametertuner
has been renamed totuner_class
793
AutoSearchBase
parameterpossible_pipelines
andpossible_model_families
have been renamed toallowed_pipelines
andallowed_model_families
793
- The
- v0.9.0 Apr. 27, 2020
- Enhancements
- Added
Accuracy
as an standard objective624
- Added verbose parameter to load_fraud
560
- Added Balanced Accuracy metric for binary, multiclass
612
661
- Added XGBoost regressor and XGBoost regression pipeline
666
- Added
Accuracy
metric for multiclass672
- Added objective name in
AutoBase.describe_pipeline
686
- Added
DataCheck
andDataChecks
,Message
classes and relevant subclasses739
- Added
- Fixes
- Removed direct access to
cls.component_graph
595
- Add testing files to .gitignore
625
- Remove circular dependencies from
Makefile
637
- Add error case for
normalize_confusion_matrix()
640
- Fixed
XGBoostClassifier
andXGBoostRegressor
bug with feature names that contain [, ], or <659
- Update
make_pipeline_graph
to not accidentally create empty file when testing if path is valid649
- Fix pip installation warning about docsutils version, from boto dependency
664
- Removed zero division warning for F1/precision/recall metrics
671
- Fixed
summary
for pipelines without estimators707
- Removed direct access to
- Changes
- Updated default objective for binary/multiclass classification to log loss
613
- Created classification and regression pipeline subclasses and removed objective as an attribute of pipeline classes
405
- Changed the output of
score
to return one dictionary429
- Created binary and multiclass objective subclasses
504
- Updated objectives API
445
- Removed call to
get_plot_data
from AutoML615
- Set
raise_error
to default to True for AutoML classes638
- Remove unnecessary "u" prefixes on some unicode strings
641
- Changed one-hot encoder to return uint8 dtypes instead of ints
653
- Pipeline
_name
field changed tocustom_name
650
- Removed
graphs.py
and moved methods intoPipelineBase
657
,665
- Remove s3fs as a dev dependency
664
- Changed requirements-parser to be a core dependency
673
- Replace
supported_problem_types
field on pipelines withproblem_type
attribute on base classes678
- Changed AutoML to only show best results for a given pipeline template in
rankings
, addedfull_rankings
property to show all682
- Update
ModelFamily
values: don't list xgboost/catboost as classifiers now that we have regression pipelines for them677
- Changed AutoML's
describe_pipeline
to get problem type from pipeline instead685
- Standardize
import_or_raise
error messages683
- Updated argument order of objectives to align with sklearn's
698
- Renamed
pipeline.feature_importance_graph
topipeline.graph_feature_importances
700
- Moved ROC and confusion matrix methods to
evalml.pipelines.plot_utils
704
- Renamed
MultiClassificationObjective
toMulticlassClassificationObjective
, to align with pipeline naming scheme715
- Updated default objective for binary/multiclass classification to log loss
- Documentation Changes
- Fixed some sphinx warnings
593
- Fixed docstring for
AutoClassificationSearch
with correct command599
- Limit readthedocs formats to pdf, not htmlzip and epub
594
600
- Clean up objectives API documentation
605
- Fixed function on Exploring search results page
604
- Update release process doc
567
AutoClassificationSearch
andAutoRegressionSearch
show inherited methods in API reference651
- Fixed improperly formatted code in breaking changes for changelog
655
- Added configuration to treat Sphinx warnings as errors
660
- Removed separate plotting section for pipelines in API reference
657
,665
- Have leads example notebook load S3 files using https, so we can delete s3fs dev dependency
664
- Categorized components in API reference and added descriptions for each category
663
- Fixed Sphinx warnings about
BalancedAccuracy
objective669
- Updated API reference to include missing components and clean up pipeline docstrings
689
- Reorganize API ref, and clarify pipeline sub-titles
688
- Add and update preprocessing utils in API reference
687
- Added inheritance diagrams to API reference
695
- Documented which default objective AutoML optimizes for
699
- Create seperate install page
701
- Include more utils in API ref, like
import_or_raise
704
- Add more color to pipeline documentation
705
- Fixed some sphinx warnings
- Testing Changes
- Matched install commands of
check_latest_dependencies
test and it's GitHub action578
- Added Github app to auto assign PR author as assignee
477
- Removed unneeded conda installation of xgboost in windows checkin tests
618
- Update graph tests to always use tmpfile dir
649
- Changelog checkin test workaround for release PRs: If 'future release' section is empty of PR refs, pass check
658
- Add changelog checkin test exception for
dep-update
branch723
- Matched install commands of
Warning
Breaking Changes
- Pipelines will now no longer take an objective parameter during instantiation, and will no longer have an objective attribute.
fit()
andpredict()
now use an optionalobjective
parameter, which is only used in binary classification pipelines to fit for a specific objective.score()
will now use a requiredobjectives
parameter that is used to determine all the objectives to score on. This differs from the previous behavior, where the pipeline's objective was scored on regardless.score()
will now return one dictionary of all objective scores.ROC
andConfusionMatrix
plot methods viaAuto(*).plot
have been removed by615
and are replaced byroc_curve
andconfusion_matrix
inevamlm.pipelines.plot_utils
in704
normalize_confusion_matrix
has been moved toevalml.pipelines.plot_utils
704
- Pipelines
_name
field changed tocustom_name
- Pipelines
supported_problem_types
field is removed because it is no longer necessary678
- Updated argument order of objectives'
objective_function
to align with sklearn698
pipeline.feature_importance_graph
has been renamed topipeline.graph_feature_importances
in700
- Removed unsupported
MSLE
objective704
- v0.8.0 Apr. 1, 2020
- Enhancements
- Add normalization option and information to confusion matrix
484
- Add util function to drop rows with NaN values
487
- Renamed
PipelineBase.name
asPipelineBase.summary
and redefinedPipelineBase.name
as class property491
- Added access to parameters in Pipelines with
PipelineBase.parameters
(used to be return ofPipelineBase.describe
)501
- Added
fill_value
parameter forSimpleImputer
509
- Added functionality to override component hyperparameters and made pipelines take hyperparemeters from components
516
- Allow
numpy.random.RandomState
for random_state parameters556
- Add normalization option and information to confusion matrix
- Fixes
- Removed unused dependency
matplotlib
, and movecategory_encoders
to test reqs572
- Removed unused dependency
- Changes
- Undo version cap in XGBoost placed in
402
and allowed all released of XGBoost407
- Support pandas 1.0.0
486
- Made all references to the logger static
503
- Refactored
model_type
parameter for components and pipelines tomodel_family
507
- Refactored
problem_types
for pipelines and components intosupported_problem_types
515
- Moved
pipelines/utils.save_pipeline
andpipelines/utils.load_pipeline
toPipelineBase.save
andPipelineBase.load
526
- Limit number of categories encoded by
OneHotEncoder
517
- Undo version cap in XGBoost placed in
- Documentation Changes
- Updated API reference to remove
PipelinePlot
and added movedPipelineBase
plotting methods483
- Add code style and github issue guides
463
512
- Updated API reference for to surface class variables for pipelines and components
537
- Fixed README documentation link
535
- Unhid PR references in changelog
656
- Updated API reference to remove
- Testing Changes
- Added automated dependency check PR
482
,505
- Updated automated dependency check comment
497
- Have build_docs job use python executor, so that env vars are set properly
547
- Added simple test to make sure
OneHotEncoder
's top_n works with large number of categories552
- Run windows unit tests on PRs
557
- Added automated dependency check PR
Warning
Breaking Changes
AutoClassificationSearch
andAutoRegressionSearch
'smodel_types
parameter has been refactored intoallowed_model_families
ModelTypes
enum has been changed toModelFamily
- Components and Pipelines now have a
model_family
field instead ofmodel_type
get_pipelines
utility function now acceptsmodel_families
as an argument instead ofmodel_types
PipelineBase.name
no longer returns structure of pipeline and has been replaced byPipelineBase.summary
PipelineBase.problem_types
andEstimator.problem_types
has been renamed tosupported_problem_types
pipelines/utils.save_pipeline
andpipelines/utils.load_pipeline
moved toPipelineBase.save
andPipelineBase.load
- v0.7.0 Mar. 9, 2020
- Enhancements
- Added emacs buffers to .gitignore
350
- Add CatBoost (gradient-boosted trees) classification and regression components and pipelines
247
- Added Tuner abstract base class
351
- Added
n_jobs
as parameter forAutoClassificationSearch
andAutoRegressionSearch
403
- Changed colors of confusion matrix to shades of blue and updated axis order to match scikit-learn's
426
- Added
PipelineBase
.graph
and.feature_importance_graph
methods, moved from previous location423
- Added support for python 3.8
462
- Added emacs buffers to .gitignore
- Fixes
- Fixed ROC and confusion matrix plots not being calculated if user passed own additional_objectives
276
- Fixed ReadtheDocs
FileNotFoundError
exception for fraud dataset439
- Fixed ROC and confusion matrix plots not being calculated if user passed own additional_objectives
- Changes
- Added
n_estimators
as a tunable parameter for XGBoost307
- Remove unused parameter
ObjectiveBase.fit_needs_proba
320
- Remove extraneous parameter
component_type
from all components361
- Remove unused
rankings.csv
file397
- Downloaded demo and test datasets so unit tests can run offline
408
- Remove
_needs_fitting
attribute from Components398
- Changed plot.feature_importance to show only non-zero feature importances by default, added optional parameter to show all
413
- Refactored
PipelineBase
to take in parameter dictionary and moved pipeline metadata to class attribute421
- Dropped support for Python 3.5
438
- Removed unused
apply.py
file449
- Clean up
requirements.txt
to remove unused deps451
- Support installation without all required dependencies
459
- Added
- Documentation Changes
- Update release.md with instructions to release to internal license key
354
- Update release.md with instructions to release to internal license key
- Testing Changes
- Added tests for utils (and moved current utils to gen_utils)
297
- Moved XGBoost install into it's own separate step on Windows using Conda
313
- Rewind pandas version to before 1.0.0, to diagnose test failures for that version
325
- Added dependency update checkin test
324
- Rewind XGBoost version to before 1.0.0 to diagnose test failures for that version
402
- Update dependency check to use a whitelist
417
- Update unit test jobs to not install dev deps
455
- Added tests for utils (and moved current utils to gen_utils)
Warning
Breaking Changes
- Python 3.5 will not be actively supported.
- v0.6.0 Dec. 16, 2019
- Enhancements
- Added ability to create a plot of feature importances
133
- Add early stopping to AutoML using patience and tolerance parameters
241
- Added ROC and confusion matrix metrics and plot for classification problems and introduce PipelineSearchPlots class
242
- Enhanced AutoML results with search order
260
- Added utility function to show system and environment information
300
- Added ability to create a plot of feature importances
- Fixes
- Lower botocore requirement
235
- Fixed decision_function calculation for
FraudCost
objective254
- Fixed return value of
Recall
metrics264
- Components return
self
on fit289
- Lower botocore requirement
- Changes
- Renamed automl classes to
AutoRegressionSearch
andAutoClassificationSearch
287
- Updating demo datasets to retain column names
223
- Moving pipeline visualization to
PipelinePlot
class228
- Standarizing inputs as
pd.Dataframe
/pd.Series
130
- Enforcing that pipelines must have an estimator as last component
277
- Added
ipywidgets
as a dependency inrequirements.txt
278
- Added Random and Grid Search Tuners
240
- Renamed automl classes to
- Documentation Changes
- Adding class properties to API reference
244
- Fix and filter FutureWarnings from scikit-learn
249
,257
- Adding Linear Regression to API reference and cleaning up some Sphinx warnings
227
- Adding class properties to API reference
- Testing Changes
- Added support for testing on Windows with CircleCI
226
- Added support for doctests
233
- Added support for testing on Windows with CircleCI
Warning
Breaking Changes
- The
fit()
method forAutoClassifier
andAutoRegressor
has been renamed tosearch()
. AutoClassifier
has been renamed toAutoClassificationSearch
AutoRegressor
has been renamed toAutoRegressionSearch
AutoClassificationSearch.results
andAutoRegressionSearch.results
now is a dictionary withpipeline_results
andsearch_order
keys.pipeline_results
can be used to access a dictionary that is identical to the old.results
dictionary. Whereas,search_order
returns a list of the search order in terms ofpipeline_id
.- Pipelines now require an estimator as the last component in
component_list
. Slicing pipelines now throws anNotImplementedError
to avoid returning pipelines without an estimator.
- v0.5.2 Nov. 18, 2019
- Enhancements
- Adding basic pipeline structure visualization
211
- Adding basic pipeline structure visualization
- Documentation Changes
- Added notebooks to build process
212
- Added notebooks to build process
- v0.5.1 Nov. 15, 2019
- Enhancements
- Added basic outlier detection guardrail
151
- Added basic ID column guardrail
135
- Added support for unlimited pipelines with a
max_time
limit70
- Updated .readthedocs.yaml to successfully build
188
- Added basic outlier detection guardrail
- Fixes
- Removed MSLE from default additional objectives
203
- Fixed
random_state
passed in pipelines204
- Fixed slow down in RFRegressor
206
- Removed MSLE from default additional objectives
- Changes
- Pulled information for describe_pipeline from pipeline's new describe method
190
- Refactored pipelines
108
- Removed guardrails from Auto(*)
202
,208
- Pulled information for describe_pipeline from pipeline's new describe method
- Documentation Changes
- Updated documentation to show
max_time
enhancements189
- Updated release instructions for RTD
193
- Added notebooks to build process
212
- Added contributing instructions
213
- Added new content
222
- Updated documentation to show
- v0.5.0 Oct. 29, 2019
- Enhancements
- Added basic one hot encoding
73
- Use enums for model_type
110
- Support for splitting regression datasets
112
- Auto-infer multiclass classification
99
- Added support for other units in
max_time
125
- Detect highly null columns
121
- Added additional regression objectives
100
- Show an interactive iteration vs. score plot when using fit()
134
- Added basic one hot encoding
- Fixes
- Reordered
describe_pipeline
94
- Added type check for
model_type
109
- Fixed
s
units when setting stringmax_time
132
- Fix objectives not appearing in API documentation
150
- Reordered
- Changes
- Reorganized tests
93
- Moved logging to its own module
119
- Show progress bar history
111
- Using
cloudpickle
instead of pickle to allow unloading of custom objectives113
- Removed render.py
154
- Reorganized tests
- Documentation Changes
- Update release instructions
140
- Include additional_objectives parameter
124
- Added Changelog
136
- Update release instructions
- Testing Changes
- Code coverage
90
- Added CircleCI tests for other Python versions
104
- Added doc notebooks as tests
139
- Test metadata for CircleCI and 2 core parallelism
137
- Code coverage
- v0.4.1 Sep. 16, 2019
- Enhancements
- Added AutoML for classification and regressor using Autobase and Skopt
7
9
- Implemented standard classification and regression metrics
7
- Added logistic regression, random forest, and XGBoost pipelines
7
- Implemented support for custom objectives
15
- Feature importance for pipelines
18
- Serialization for pipelines
19
- Allow fitting on objectives for optimal threshold
27
- Added detect label leakage
31
- Implemented callbacks
42
- Allow for multiclass classification
21
- Added support for additional objectives
79
- Added AutoML for classification and regressor using Autobase and Skopt
- Fixes
- Fixed feature selection in pipelines
13
- Made
random_seed
usage consistent45
- Fixed feature selection in pipelines
- Documentation Changes
- Documentation Changes
- Added docstrings
6
- Created notebooks for docs
6
- Initialized readthedocs EvalML
6
- Added favicon
38
- Testing Changes
- Added testing for loading data
39
- Added testing for loading data
- v0.2.0 Aug. 13, 2019
- Enhancements
- Created fraud detection objective
4
- Created fraud detection objective
- v0.1.0 July. 31, 2019
- First Release
- Enhancements
- Added lead scoring objecitve
1
- Added basic classifier
1
- Added lead scoring objecitve
- Documentation Changes
- Initialized Sphinx for docs
1
- Initialized Sphinx for docs