- Future Releases
- Enhancements
- Add
ProphetRegressor
to AutoML2619
- Integrated
DefaultAlgorithm
intoAutoMLSearch
2634
- Removed SVM "linear" and "precomputed" kernel hyperparameter options, and improved default parameters
2651
- Updated
ComponentGraph
initalization to raiseValueError
when user attempts to use.y
for a component that does not produce a tuple output2662
- Updated pipeline
graph()
to distingush X and y edges2654
- Added
DropRowsTransformer
component2692
- Added
DROP_ROWS
to_make_component_list_from_actions
and clean up metadata2694
- Add
- Fixes
- Updated Oversampler logic to select best SMOTE based on component input instead of pipeline input
2695
- Updated Oversampler logic to select best SMOTE based on component input instead of pipeline input
- Changes
- Replaced
SMOTEOversampler
,SMOTENOversampler
andSMOTENCOversampler
with consolidatedOversampler
component2695
- Removed
LinearRegressor
from the list of defaultAutoMLSearch
estimators due to poor performance2660
- Replaced
- Documentation Changes
- Added docstring linting package
pydocstyle
and rule to make-lint command2670
- Added docstring linting package
- Testing Changes
- Removes the process-level parallelism from the
test_cancel_job
test2666
- Installed numba 0.53 in windows CI to prevent problems installing version 0.54
2710
- Removes the process-level parallelism from the
Warning
- Breaking Changes
- Renamed the current top level
search
method tosearch_iterative
and defined a newsearch
method for theDefaultAlgorithm
2634
- Replaced
SMOTEOversampler
,SMOTENOversampler
andSMOTENCOversampler
with consolidatedOversampler
component2695
- Removed
LinearRegressor
from the list of defaultAutoMLSearch
estimators due to poor performance2660
- Renamed the current top level
- v0.31.0 Aug. 19, 2021
- Enhancements
- Updated the high variance check in AutoMLSearch to be robust to a variety of objectives and cv scores
2622
- Use Woodwork's outlier detection for the
OutliersDataCheck
2637
- Added ability to utilize instantiated components when creating a pipeline
2643
- Sped up the all Nan and unknown check in
infer_feature_types
2661
- Updated the high variance check in AutoMLSearch to be robust to a variety of objectives and cv scores
- Fixes
- Changes
- Deleted
_put_into_original_order
helper function2639
- Refactored time series pipeline code using a time series pipeline base class
2649
- Renamed
dask_tests
toparallel_tests
2657
- Removed commented out code in
pipeline_meta.py
2659
- Deleted
- Documentation Changes
- Add complete install command to README and Install section
2627
- Cleaned up documentation for
MulticollinearityDataCheck
2664
- Add complete install command to README and Install section
- Testing Changes
- Speed up CI by splitting Prophet tests into a separate workflow in GitHub
2644
- Speed up CI by splitting Prophet tests into a separate workflow in GitHub
Warning
- Breaking Changes
TimeSeriesRegressionPipeline
no longer inherits fromTimeSeriesRegressionPipeline
2649
- v0.30.2 Aug. 16, 2021
- Fixes
- Updated changelog and version numbers to match the release. Release 0.30.1 was release erroneously without a change to the version numbers. 0.30.2 replaces it.
- v0.30.1 Aug. 12, 2021
- Enhancements
- Added
DatetimeFormatDataCheck
for time series problems2603
- Added
ProphetRegressor
to estimators2242
- Updated
ComponentGraph
to handle not calling samplers' transform during predict, and updated samplers' transform methods s.t.fit_transform
is equivalent tofit(X, y).transform(X, y)
2583
- Updated
ComponentGraph
_validate_component_dict
logic to be stricter about input values2599
- Patched bug in
xgboost
estimators where predicting on a feature matrix of only booleans would throw an exception.2602
- Updated
ARIMARegressor
to use relative forecasting to predict values2613
- Added support for creating pipelines without an estimator as the final component and added
transform(X, y)
method to pipelines and component graphs2625
- Updated to support Woodwork 0.5.1
2610
- Added
- Fixes
- Updated
AutoMLSearch
to dropARIMARegressor
fromallowed_estimators
if an incompatible frequency is detected2632
- Updated
get_best_sampler_for_data
to consider all non-numeric datatypes as categorical for SMOTE2590
- Fixed inconsistent test results from TargetDistributionDataCheck
2608
- Adopted vectorized pd.NA checking for Woodwork 0.5.1 support
2626
- Pinned upper version of astroid to 2.6.6 to keep ReadTheDocs working.
2638
- Updated
- Changes
- Renamed SMOTE samplers to SMOTE oversampler
2595
- Changed
partial_dependence
andgraph_partial_dependence
to raise aPartialDependenceError
instead ofValueError
. This is not a breaking change becausePartialDependenceError
is a subclass ofValueError
2604
- Cleaned up code duplication in
ComponentGraph
2612
- Stored predict_proba results in .x for intermediate estimators in ComponentGraph
2629
- Renamed SMOTE samplers to SMOTE oversampler
- Documentation Changes
- To avoid local docs build error, only add warning disable and download headers on ReadTheDocs builds, not locally
2617
- To avoid local docs build error, only add warning disable and download headers on ReadTheDocs builds, not locally
- Testing Changes
- Updated partial_dependence tests to change the element-wise comparison per the Plotly 5.2.1 upgrade
2638
- Changed the lint CI job to only check against python 3.9 via the -t flag
2586
- Installed Prophet in linux nightlies test and fixed
test_all_components
2598
- Refactored and fixed all
make_pipeline
tests to assert correct order and address new Woodwork Unknown type inference2572
- Removed
component_graphs
as a global variable intest_component_graphs.py
2609
- Updated partial_dependence tests to change the element-wise comparison per the Plotly 5.2.1 upgrade
Warning
- Breaking Changes
- Renamed SMOTE samplers to SMOTE oversampler. Please use
SMOTEOversampler
,SMOTENCOversampler
,SMOTENOversampler
instead ofSMOTESampler
,SMOTENCSampler
, andSMOTENSampler
2595
- Renamed SMOTE samplers to SMOTE oversampler. Please use
- v0.30.0 Aug. 3, 2021
- Enhancements
- Added
LogTransformer
andTargetDistributionDataCheck
2487
- Issue a warning to users when a pipeline parameter passed in isn't used in the pipeline
2564
- Added Gini coefficient as an objective
2544
- Added
repr
toComponentGraph
2565
- Added components to extract features from
URL
andEmailAddress
Logical Types2550
- Added support for NaN values in
TextFeaturizer
2532
- Added
SelectByType
transformer2531
- Added separate thresholds for percent null rows and columns in
HighlyNullDataCheck
2562
- Added support for NaN natural language values
2577
- Added
- Fixes
- Raised error message for types
URL
,NaturalLanguage
, andEmailAddress
inpartial_dependence
2573
- Raised error message for types
- Changes
- Updated
PipelineBase
implementation for creating pipelines from a list of components2549
- Moved
get_hyperparameter_ranges
toPipelineBase
class from automl/utils module2546
- Renamed
ComponentGraph
'sget_parents
toget_inputs
2540
- Removed
ComponentGraph.linearized_component_graph
andComponentGraph.from_list
2556
- Updated
ComponentGraph
to enforce requiring .x and .y inputs for each component in the graph2563
- Renamed existing ensembler implementation from
StackedEnsemblers
toSklearnStackedEnsemblers
2578
- Updated
- Documentation Changes
- Added documentation for
DaskEngine
andCFEngine
parallel engines2560
- Improved detail of
TextFeaturizer
docstring and tutorial2568
- Added documentation for
- Testing Changes
- Added test that makes sure
split_data
does not shuffle for time series problems2552
- Added test that makes sure
Warning
- Breaking Changes
- Moved
get_hyperparameter_ranges
toPipelineBase
class from automl/utils module2546
- Renamed
ComponentGraph
'sget_parents
toget_inputs
2540
- Removed
ComponentGraph.linearized_component_graph
andComponentGraph.from_list
2556
- Updated
ComponentGraph
to enforce requiring .x and .y inputs for each component in the graph2563
- Moved
- v0.29.0 Jul. 21, 2021
- Enhancements
- Updated 1-way partial dependence support for datetime features
2454
- Added details on how to fix error caused by broken ww schema
2466
- Added ability to use built-in pickle for saving AutoMLSearch
2463
- Updated our components and component graphs to use latest features of ww 0.4.1, e.g.
concat_columns
and drop in-place.2465
- Added new, concurrent.futures based engine for parallel AutoML
2506
- Added support for new Woodwork
Unknown
type in AutoMLSearch2477
- Updated our components with an attribute that describes if they modify features or targets and can be used in list API for pipeline initialization
2504
- Updated
ComponentGraph
to accept X and y as inputs2507
- Removed unused
TARGET_BINARY_INVALID_VALUES
fromDataCheckMessageCode
enum and fixed formatting of objective documentation2520
- Added
EvalMLAlgorithm
2525
- Added support for NaN values in
TextFeaturizer
2532
- Updated 1-way partial dependence support for datetime features
- Fixes
- Fixed
FraudCost
objective and reverted threshold optimization method for binary classification toGolden
2450
- Added custom exception message for partial dependence on features with scales that are too small
2455
- Ensures the typing for Ordinal and Datetime ltypes are passed through _retain_custom_types_and_initalize_woodwork
2461
- Updated to work with Pandas 1.3.0
2442
- Updated to work with sktime 0.7.0
2499
- Fixed
- Changes
- Updated XGBoost dependency to
>=1.4.2
2484
,2498
- Added a
DeprecationWarning
about deprecating the list API forComponentGraph
2488
- Updated
make_pipeline
for AutoML to create dictionaries, not lists, to initialize pipelines2504
- No longer installing graphviz on windows in our CI pipelines because release 0.17 breaks windows 3.7
2516
- Updated XGBoost dependency to
- Documentation Changes
- Moved docstrings from
__init__
to class pages, added missing docstrings for missing classes, and updated missing default values2452
- Build documentation with sphinx-autoapi
2458
- Change
autoapi_ignore
to only ignore files inevalml/tests/*
2530
- Moved docstrings from
- Testing Changes
- Fixed flaky dask tests
2471
- Removed shellcheck action from
build_conda_pkg
action2514
- Added a tmp_dir fixture that deletes its contents after tests run
2505
- Added a test that makes sure all pipelines in
AutoMLSearch
get the same data splits2513
- Condensed warning output in test logs
2521
- Fixed flaky dask tests
Warning
- Breaking Changes
- NaN values in the Natural Language type are no longer supported by the Imputer with the pandas upgrade.
2477
- NaN values in the Natural Language type are no longer supported by the Imputer with the pandas upgrade.
- v0.28.0 Jul. 2, 2021
- Enhancements
- Added support for showing a Individual Conditional Expectations plot when graphing Partial Dependence
2386
- Exposed
thread_count
for Catboost estimators asn_jobs
parameter2410
- Updated Objectives API to allow for sample weighting
2433
- Added support for showing a Individual Conditional Expectations plot when graphing Partial Dependence
- Fixes
- Deleted unreachable line from
IterativeAlgorithm
2464
- Deleted unreachable line from
- Changes
- Pinned Woodwork version between 0.4.1 and 0.4.2
2460
- Updated psutils minimum version in requirements
2438
- Updated
log_error_callback
to not include filepath in logged message2429
- Pinned Woodwork version between 0.4.1 and 0.4.2
- Documentation Changes
- Sped up docs
2430
- Removed mentions of
DataTable
andDataColumn
from the docs2445
- Sped up docs
- Testing Changes
- Added slack integration for nightlies tests
2436
- Changed
build_conda_pkg
CI job to run only when dependencies are updates2446
- Updated workflows to store pytest runtimes as test artifacts
2448
- Added
AutoMLTestEnv
test fixture for making it easy to mock automl tests2406
- Added slack integration for nightlies tests
- v0.27.0 Jun. 22, 2021
- Enhancements
- Adds force plots for prediction explanations
2157
- Removed self-reference from
AutoMLSearch
2304
- Added support for nonlinear pipelines for
generate_pipeline_code
2332
- Added
inverse_transform
method to pipelines2256
- Add optional automatic update checker
2350
- Added
search_order
toAutoMLSearch
'srankings
andfull_rankings
tables2345
- Updated threshold optimization method for binary classification
2315
- Updated demos to pull data from S3 instead of including demo data in package
2387
- Upgrade woodwork version to v0.4.1
2379
- Adds force plots for prediction explanations
- Fixes
- Preserve user-specified woodwork types throughout pipeline fit/predict
2297
- Fixed
ComponentGraph
appending target tofinal_component_features
if there is a component that returns both X and y2358
- Fixed partial dependence graph method failing on multiclass problems when the class labels are numeric
2372
- Added
thresholding_objective
argument toAutoMLSearch
for binary classification problems2320
- Added change for
k_neighbors
parameter in SMOTE Oversamplers to automatically handle small samples2375
- Changed naming for
Logistic Regression Classifier
file2399
- Pinned pytest-timeout to fix minimum dependence checker
2425
- Replaced
Elastic Net Classifier
base class withLogistsic Regression
to avoidNaN
outputs2420
- Preserve user-specified woodwork types throughout pipeline fit/predict
- Changes
- Cleaned up
PipelineBase
'scomponent_graph
and_component_graph
attributes. UpdatedPipelineBase
__repr__
and added__eq__
forComponentGraph
2332
- Added and applied
black
linting package to the EvalML repo in place ofautopep8
2306
- Separated custom_hyperparameters from pipelines and added them as an argument to
AutoMLSearch
2317
- Replaced allowed_pipelines with allowed_component_graphs
2364
- Removed private method
_compute_features_during_fit
fromPipelineBase
2359
- Updated
compute_order
inComponentGraph
to be a read-only property2408
- Unpinned PyZMQ version in requirements.txt
2389
- Uncapping LightGBM version in requirements.txt
2405
- Updated minimum version of plotly
2415
- Removed
SensitivityLowAlert
objective from core objectives2418
- Cleaned up
- Documentation Changes
- Fixed lead scoring weights in the demos documentation
2315
- Fixed start page code and description dataset naming discrepancy
2370
- Fixed lead scoring weights in the demos documentation
- Testing Changes
- Update minimum unit tests to run on all pull requests
2314
- Pass token to authorize uploading of codecov reports
2344
- Add
pytest-timeout
. All tests that run longer than 6 minutes will fail.2374
- Separated the dask tests out into separate github action jobs to isolate dask failures.
2376
- Refactored dask tests
2377
- Added the combined dask/non-dask unit tests back and renamed the dask only unit tests.
2382
- Sped up unit tests and split into separate jobs
2365
- Change CI job names, run lint for python 3.9, run nightlies on python 3.8 at 3am EST
2395
2398
- Set fail-fast to false for CI jobs that run for PRs
2402
- Update minimum unit tests to run on all pull requests
Warning
- Breaking Changes
- AutoMLSearch will accept allowed_component_graphs instead of allowed_pipelines
2364
- Removed
PipelineBase
's_component_graph
attribute. UpdatedPipelineBase
__repr__
and added__eq__
forComponentGraph
2332
- pipeline_parameters will no longer accept skopt.space variables since hyperparameter ranges will now be specified through custom_hyperparameters
2317
- AutoMLSearch will accept allowed_component_graphs instead of allowed_pipelines
- v0.25.0 Jun. 01, 2021
- Enhancements
- Upgraded minimum woodwork to version 0.3.1. Previous versions will not be supported
2181
- Added a new callback parameter for
explain_predictions_best_worst
2308
- Upgraded minimum woodwork to version 0.3.1. Previous versions will not be supported
- Fixes
- Changes
- Deleted the
return_pandas
flag from our demo data loaders2181
- Moved
default_parameters
toComponentGraph
fromPipelineBase
2307
- Deleted the
- Documentation Changes
- Updated the release procedure documentation
2230
- Updated the release procedure documentation
- Testing Changes
- Ignoring
test_saving_png_file
while building conda package2323
- Ignoring
Warning
- Breaking Changes
- Deleted the
return_pandas
flag from our demo data loaders2181
- Upgraded minimum woodwork to version 0.3.1. Previous versions will not be supported
2181
- Due to the weak-ref in woodwork, set the result of
infer_feature_types
to a variable before accessing woodwork2181
- Deleted the
- v0.24.2 May. 24, 2021
- Enhancements
- Added oversamplers to AutoMLSearch
2213
2286
- Added dictionary input functionality for
Undersampler
component2271
- Changed the default parameter values for
Elastic Net Classifier
andElastic Net Regressor
2269
- Added dictionary input functionality for the Oversampler components
2288
- Added oversamplers to AutoMLSearch
- Fixes
- Set default n_jobs to 1 for StackedEnsembleClassifier and StackedEnsembleRegressor until fix for text-based parallelism in sklearn stacking can be found
2295
- Set default n_jobs to 1 for StackedEnsembleClassifier and StackedEnsembleRegressor until fix for text-based parallelism in sklearn stacking can be found
- Changes
- Updated
start_iteration_callback
to accept a pipeline instance instead of a pipeline class and no longer accept pipeline parameters as a parameter2290
- Refactored
calculate_permutation_importance
method and add per-column permutation importance method2302
- Updated logging information in
AutoMLSearch.__init__
to clarify pipeline generation2263
- Updated
- Documentation Changes
- Minor changes to the release procedure
2230
- Minor changes to the release procedure
- Testing Changes
- Use codecov action to update coverage reports
2238
- Removed MarkupSafe dependency version pin from requirements.txt and moved instead into RTD docs build CI
2261
- Use codecov action to update coverage reports
Warning
- Breaking Changes
- Updated
start_iteration_callback
to accept a pipeline instance instead of a pipeline class and no longer accept pipeline parameters as a parameter2290
- Moved
default_parameters
toComponentGraph
fromPipelineBase
. A pipeline'sdefault_parameters
is now accessible viapipeline.component_graph.default_parameters
2307
- Updated
- v0.24.1 May. 16, 2021
- Enhancements
- Integrated
ARIMARegressor
into AutoML2009
- Updated
HighlyNullDataCheck
to also perform a null row check2222
- Set
max_depth
to 1 in calls to featuretools dfs2231
- Integrated
- Fixes
- Removed data splitter sampler calls during training
2253
- Set minimum required version for for pyzmq, colorama, and docutils
2254
- Changed BaseSampler to return None instead of y
2272
- Removed data splitter sampler calls during training
- Changes
- Removed ensemble split and indices in
AutoMLSearch
2260
- Updated pipeline
repr()
andgenerate_pipeline_code
to return pipeline instances without generating custom pipeline class2227
- Removed ensemble split and indices in
- Documentation Changes
- Capped Sphinx version under 4.0.0
2244
- Capped Sphinx version under 4.0.0
- Testing Changes
- Change number of cores for pytest from 4 to 2
2266
- Add minimum dependency checker to generate minimum requirement files
2267
- Add unit tests with minimum dependencies
2277
- Change number of cores for pytest from 4 to 2
- v0.24.0 May. 04, 2021
- Enhancements
- Added date_index as a required parameter for TimeSeries problems
2217
- Have the
OneHotEncoder
return the transformed columns as booleans rather than floats2170
- Added Oversampler transformer component to EvalML
2079
- Added Undersampler to AutoMLSearch, as well as arguments
_sampler_method
andsampler_balanced_ratio
2128
- Updated prediction explanations functions to allow pipelines with XGBoost estimators
2162
- Added partial dependence for datetime columns
2180
- Update precision-recall curve with positive label index argument, and fix for 2d predicted probabilities
2090
- Add pct_null_rows to
HighlyNullDataCheck
2211
- Added a standalone AutoML search method for convenience, which runs data checks and then runs automl
2152
- Make the first batch of AutoML have a predefined order, with linear models first and complex models last
2223
2225
- Added sampling dictionary support to
BalancedClassficationSampler
2235
- Added date_index as a required parameter for TimeSeries problems
- Fixes
- Fixed partial dependence not respecting grid resolution parameter for numerical features
2180
- Enable prediction explanations for catboost for multiclass problems
2224
- Fixed partial dependence not respecting grid resolution parameter for numerical features
- Changes
- Deleted baseline pipeline classes
2202
- Reverting user specified date feature PR
2155
until pmdarima installation fix is found2214
- Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term.
2091
- Removed all old datasplitters from EvalML
2193
- Deleted
make_pipeline_from_components
2218
- Deleted baseline pipeline classes
- Documentation Changes
- Renamed dataset to clarify that its gzipped but not a tarball
2183
- Updated documentation to use pipeline instances instead of pipeline subclasses
2195
- Updated contributing guide with a note about GitHub Actions permissions
2090
- Updated automl and model understanding user guides
2090
- Renamed dataset to clarify that its gzipped but not a tarball
- Testing Changes
- Use machineFL user token for dependency update bot, and add more reviewers
2189
- Use machineFL user token for dependency update bot, and add more reviewers
Warning
- Breaking Changes
- All baseline pipeline classes (
BaselineBinaryPipeline
,BaselineMulticlassPipeline
,BaselineRegressionPipeline
, etc.) have been deleted2202
- Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. Pipelines can now be initialized by specifying the component graph as the first parameter, and then passing in optional arguments such as
custom_name
,parameters
, etc. For example,BinaryClassificationPipeline(["Random Forest Classifier"], parameters={})
.2091
- Removed all old datasplitters from EvalML
2193
- Deleted utility method
make_pipeline_from_components
2218
- All baseline pipeline classes (
- v0.23.0 Apr. 20, 2021
- Enhancements
- Refactored
EngineBase
andSequentialEngine
api. AddingDaskEngine
1975
. - Added optional
engine
argument toAutoMLSearch
1975
- Added a warning about how time series support is still in beta when a user passes in a time series problem to
AutoMLSearch
2118
- Added
NaturalLanguageNaNDataCheck
data check2122
- Added ValueError to
partial_dependence
to prevent users from computing partial dependence on columns with all NaNs2120
- Added standard deviation of cv scores to rankings table
2154
- Refactored
- Fixes
- Fixed
BalancedClassificationDataCVSplit
,BalancedClassificationDataTVSplit
, andBalancedClassificationSampler
to useminority:majority
ratio instead ofmajority:minority
2077
- Fixed bug where two-way partial dependence plots with categorical variables were not working correctly
2117
- Fixed bug where
hyperparameters
were not displaying properly for pipelines with a listcomponent_graph
and duplicate components2133
- Fixed bug where
pipeline_parameters
argument inAutoMLSearch
was not applied to pipelines passed in asallowed_pipelines
2133
- Fixed bug where
AutoMLSearch
was not applying custom hyperparameters to pipelines with a listcomponent_graph
and duplicate components2133
- Fixed
- Changes
- Removed
hyperparameter_ranges
from Undersampler and renamedbalanced_ratio
tosampling_ratio
for samplers2113
- Renamed
TARGET_BINARY_NOT_TWO_EXAMPLES_PER_CLASS
data check message code toTARGET_MULTICLASS_NOT_TWO_EXAMPLES_PER_CLASS
2126
- Modified one-way partial dependence plots of categorical features to display data with a bar plot
2117
- Renamed
score
column forautoml.rankings
asmean_cv_score
2135
- Remove 'warning' from docs tool output
2031
- Removed
- Documentation Changes
- Fixed
conf.py
file2112
- Added a sentence to the automl user guide stating that our support for time series problems is still in beta.
2118
- Fixed documentation demos
2139
- Update test badge in README to use GitHub Actions
2150
- Fixed
- Testing Changes
- Fixed
test_describe_pipeline
forpandas
v1.2.4
2129
- Added a GitHub Action for building the conda package
1870
2148
- Fixed
Warning
- Breaking Changes
- Renamed
balanced_ratio
tosampling_ratio
for theBalancedClassificationDataCVSplit
,BalancedClassificationDataTVSplit
,BalancedClassficationSampler
, and Undersampler2113
- Deleted the "errors" key from automl results
1975
- Deleted the
raise_and_save_error_callback
and thelog_and_save_error_callback
1975
- Fixed
BalancedClassificationDataCVSplit
,BalancedClassificationDataTVSplit
, andBalancedClassificationSampler
to use minority:majority ratio instead of majority:minority2077
- Renamed
- v0.22.0 Apr. 06, 2021
- Enhancements
- Added a GitHub Action for
linux_unit_tests
2013
- Added recommended actions for
InvalidTargetDataCheck
, updated_make_component_list_from_actions
to address new action, and addedTargetImputer
component1989
- Updated
AutoMLSearch._check_for_high_variance
to not emitRuntimeWarning
2024
- Added exception when pipeline passed to
explain_predictions
is aStacked Ensemble
pipeline2033
- Added sensitivity at low alert rates as an objective
2001
- Added
Undersampler
transformer component2030
- Added a GitHub Action for
- Fixes
- Updated Engine's
train_batch
to apply undersampling2038
- Fixed bug in where Time Series Classification pipelines were not encoding targets in
predict
andpredict_proba
2040
- Fixed data splitting errors if target is float for classification problems
2050
- Pinned
docutils
to <0.17 to fix ReadtheDocs warning issues2088
- Updated Engine's
- Changes
- Removed lists as acceptable hyperparameter ranges in
AutoMLSearch
2028
- Renamed "details" to "metadata" for data check actions
2008
- Removed lists as acceptable hyperparameter ranges in
- Documentation Changes
- Catch and suppress warnings in documentation
1991
2097
- Change spacing in
start.ipynb
to provide clarity forAutoMLSearch
2078
- Fixed start code on README
2108
- Catch and suppress warnings in documentation
- Testing Changes
- v0.21.0 Mar. 24, 2021
- Enhancements
- Changed
AutoMLSearch
to defaultoptimize_thresholds
to True1943
- Added multiple oversampling and undersampling sampling methods as data splitters for imbalanced classification
1775
- Added params to balanced classification data splitters for visibility
1966
- Updated
make_pipeline
to not addImputer
if input data does not have numeric or categorical columns1967
- Updated
ClassImbalanceDataCheck
to better handle multiclass imbalances1986
- Added recommended actions for the output of data check's
validate
method1968
- Added error message for
partial_dependence
when features are mostly the same value1994
- Updated
OneHotEncoder
to drop one redundant feature by default for features with two categories1997
- Added a
PolynomialDetrender
component1992
- Added
DateTimeNaNDataCheck
data check2039
- Changed
- Fixes
- Changed best pipeline to train on the entire dataset rather than just ensemble indices for ensemble problems
2037
- Updated binary classification pipelines to use objective decision function during scoring of custom objectives
1934
- Changed best pipeline to train on the entire dataset rather than just ensemble indices for ensemble problems
- Changes
- Removed
data_checks
parameter,data_check_results
and data checks logic fromAutoMLSearch
1935
- Deleted
random_state
argument1985
- Updated Woodwork version requirement to
v0.0.11
1996
- Removed
- Documentation Changes
- Testing Changes
- Removed
build_docs
CI job in favor of RTD GH builder1974
- Added tests to confirm support for Python 3.9
1724
- Added tests to support Dask AutoML/Engine
1990
- Changed
build_conda_pkg
job to uselatest_release_changes
branch in the feedstock.1979
- Removed
Warning
- Breaking Changes
- Changed
AutoMLSearch
to defaultoptimize_thresholds
to True1943
- Removed
data_checks
parameter,data_check_results
and data checks logic fromAutoMLSearch
. To run the data checks which were previously run by default inAutoMLSearch
, please callDefaultDataChecks().validate(X_train, y_train)
or take a look at our documentation for more examples.1935
- Deleted
random_state
argument1985
- Changed
- v0.20.0 Mar. 10, 2021
- Enhancements
- Added a GitHub Action for Detecting dependency changes
1933
- Create a separate CV split to train stacked ensembler on for AutoMLSearch
1814
- Added a GitHub Action for Linux unit tests
1846
- Added
ARIMARegressor
estimator1894
- Added
DataCheckAction
class andDataCheckActionCode
enum1896
- Updated
Woodwork
requirement tov0.0.10
1900
- Added
BalancedClassificationDataCVSplit
andBalancedClassificationDataTVSplit
to AutoMLSearch1875
- Update default classification data splitter to use downsampling for highly imbalanced data
1875
- Updated
describe_pipeline
to return more information, includingid
of pipelines used for ensemble models1909
- Added utility method to create list of components from a list of
DataCheckAction
1907
- Updated
validate
method to include aaction
key in returned dictionary for allDataCheck
andDataChecks
1916
- Aggregating the shap values for predictions that we know the provenance of, e.g. OHE, text, and date-time.
1901
- Improved error message when custom objective is passed as a string in
pipeline.score
1941
- Added
score_pipelines
andtrain_pipelines
methods toAutoMLSearch
1913
- Added support for
pandas
version 1.2.01708
- Added
score_batch
andtrain_batch
abstact methods toEngineBase
and implementations inSequentialEngine
1913
- Added ability to handle index columns in
AutoMLSearch
andDataChecks
2138
- Added a GitHub Action for Detecting dependency changes
- Fixes
- Removed CI check for
check_dependencies_updated_linux
1950
- Added metaclass for time series pipelines and fix binary classification pipeline
predict
not using objective if it is passed as a named argument1874
- Fixed stack trace in prediction explanation functions caused by mixed string/numeric pandas column names
1871
- Fixed stack trace caused by passing pipelines with duplicate names to
AutoMLSearch
1932
- Fixed
AutoMLSearch.get_pipelines
returning pipelines with the same attributes1958
- Removed CI check for
- Changes
- Reversed GitHub Action for Linux unit tests until a fix for report generation is found
1920
- Updated
add_results
inAutoMLAlgorithm
to take in entire pipeline results dictionary fromAutoMLSearch
1891
- Updated
ClassImbalanceDataCheck
to look for severe class imbalance scenarios1905
- Deleted the
explain_prediction
function1915
- Removed
HighVarianceCVDataCheck
and convered it to anAutoMLSearch
method instead1928
- Removed warning in
InvalidTargetDataCheck
returned when numeric binary classification targets are not (0, 1)1959
- Reversed GitHub Action for Linux unit tests until a fix for report generation is found
- Documentation Changes
- Updated
model_understanding.ipynb
to demo the two-way partial dependence capability1919
- Updated
- Testing Changes
Warning
- Breaking Changes
- Deleted the
explain_prediction
function1915
- Removed
HighVarianceCVDataCheck
and convered it to anAutoMLSearch
method instead1928
- Added
score_batch
andtrain_batch
abstact methods toEngineBase
. These need to be implemented in Engine subclasses1913
- Deleted the
- v0.19.0 Feb. 23, 2021
- Enhancements
- Added a GitHub Action for Python windows unit tests
1844
- Added a GitHub Action for checking updated release notes
1849
- Added a GitHub Action for Python lint checks
1837
- Adjusted
explain_prediction
,explain_predictions
andexplain_predictions_best_worst
to handle timeseries problems.1818
- Updated
InvalidTargetDataCheck
to check for mismatched indices in target and features1816
- Updated
Woodwork
structures returned from components to supportWoodwork
logical type overrides set by the user1784
- Updated estimators to keep track of input feature names during
fit()
1794
- Updated
visualize_decision_tree
to include feature names in output1813
- Added
is_bounded_like_percentage
property for objectives. If true, thecalculate_percent_difference
method will return the absolute difference rather than relative difference1809
- Added full error traceback to AutoMLSearch logger file
1840
- Changed
TargetEncoder
to preserve custom indices in the data1836
- Refactored
explain_predictions
andexplain_predictions_best_worst
to only compute features once for all rows that need to be explained1843
- Added custom random undersampler data splitter for classification
1857
- Updated
OutliersDataCheck
implementation to calculate the probability of having no outliers1855
- Added
Engines
pipeline processing API1838
- Added a GitHub Action for Python windows unit tests
- Fixes
- Changed EngineBase random_state arg to random_seed and same for user guide docs
1889
- Changed EngineBase random_state arg to random_seed and same for user guide docs
- Changes
- Modified
calculate_percent_difference
so that division by 0 is now inf rather than nan1809
- Removed
text_columns
parameter fromLSA
andTextFeaturizer
components1652
- Added
random_seed
as an argument to our automl/pipeline/component API. Usingrandom_state
will raise a warning1798
- Added
DataCheckError
message inInvalidTargetDataCheck
if input target is None and removed exception raised1866
- Modified
- Documentation Changes
- Testing Changes
- Added back coverage for
_get_feature_provenance
inTextFeaturizer
aftertext_columns
was removed1842
- Pin graphviz version for windows builds
1847
- Unpin graphviz version for windows builds
1851
- Added back coverage for
Warning
- Breaking Changes
- Added a deprecation warning to
explain_prediction
. It will be deleted in the next release.1860
- Added a deprecation warning to
- v0.18.2 Feb. 10, 2021
- Enhancements
- Added uniqueness score data check
1785
- Added "dataframe" output format for prediction explanations
1781
- Updated LightGBM estimators to handle
pandas.MultiIndex
1770
- Sped up permutation importance for some pipelines
1762
- Added sparsity data check
1797
- Confirmed support for threshold tuning for binary time series classification problems
1803
- Added uniqueness score data check
- Fixes
- Changes
- Documentation Changes
- Added section on conda to the contributing guide
1771
- Updated release process to reflect freezing main before perf tests
1787
- Moving some prs to the right section of the release notes
1789
- Tweak README.md.
1800
- Fixed back arrow on install page docs
1795
- Fixed docstring for ClassImbalanceDataCheck.validate()
1817
- Added section on conda to the contributing guide
- Testing Changes
- v0.18.1 Feb. 1, 2021
- Enhancements
- Added
graph_t_sne
as a visualization tool for high dimensional data1731
- Added the ability to see the linear coefficients of features in linear models terms
1738
- Added support for
scikit-learn
v0.24.0
1733
- Added support for
scipy
v1.6.0
1752
- Added SVM Classifier and Regressor to estimators
1714
1761
- Added
- Fixes
- Addressed bug with
partial_dependence
and categorical data with more categories than grid resolution1748
- Removed
random_state
arg fromget_pipelines
inAutoMLSearch
1719
- Pinned pyzmq at less than 22.0.0 till we add support
1756
- Addressed bug with
- Changes
- Updated components and pipelines to return
Woodwork
data structures1668
- Updated
clone()
for pipelines and components to copy over random state automatically1753
- Dropped support for Python version 3.6
1751
- Removed deprecated
verbose
flag fromAutoMLSearch
parameters1772
- Updated components and pipelines to return
- Documentation Changes
- Add Twitter and Github link to documentation toolbar
1754
- Added Open Graph info to documentation
1758
- Add Twitter and Github link to documentation toolbar
- Testing Changes
Warning
- Breaking Changes
- Components and pipelines return
Woodwork
data structures instead ofpandas
data structures1668
- Python 3.6 will not be actively supported due to discontinued support from EvalML dependencies.
- Deprecated
verbose
flag is removed forAutoMLSearch
1772
- Components and pipelines return
- v0.18.0 Jan. 26, 2021
- Enhancements
- Added RMSLE, MSLE, and MAPE to core objectives while checking for negative target values in
invalid_targets_data_check
1574
- Added validation checks for binary problems with regression-like datasets and multiclass problems without true multiclass targets in
invalid_targets_data_check
1665
- Added time series support for
make_pipeline
1566
- Added target name for output of pipeline
predict
method1578
- Added multiclass check to
InvalidTargetDataCheck
for two examples per class1596
- Added support for
graphviz
v0.16
1657
- Enhanced time series pipelines to accept empty features
1651
- Added KNN Classifier to estimators.
1650
- Added support for list inputs for objectives
1663
- Added support for
AutoMLSearch
to handle time series classification pipelines1666
- Enhanced
DelayedFeaturesTransformer
to encode categorical features and targets before delaying them1691
- Added 2-way dependence plots.
1690
- Added ability to directly iterate through components within Pipelines
1583
- Added RMSLE, MSLE, and MAPE to core objectives while checking for negative target values in
- Fixes
- Fixed inconsistent attributes and added Exceptions to docs
1673
- Fixed
TargetLeakageDataCheck
to use Woodworkmutual_information
rather than using Pandas' Pearson Correlation1616
- Fixed thresholding for pipelines in
AutoMLSearch
to only threshold binary classification pipelines1622
1626
- Updated
load_data
to return Woodwork structures and update default parameter value forindex
toNone
1610
- Pinned scipy at < 1.6.0 while we work on adding support
1629
- Fixed data check message formatting in
AutoMLSearch
1633
- Addressed stacked ensemble component for
scikit-learn
v0.24 support by settingshuffle=True
for default CV1613
- Fixed bug where
Imputer
reset the index onX
1590
- Fixed
AutoMLSearch
stacktrace when a cutom objective was passed in as a primary objective or additional objective1575
- Fixed custom index bug for
MAPE
objective1641
- Fixed index bug for
TextFeaturizer
andLSA
components1644
- Limited
load_fraud
dataset loaded intoautoml.ipynb
1646
add_to_rankings
updatesAutoMLSearch.best_pipeline
when necessary1647
- Fixed bug where time series baseline estimators were not receiving
gap
andmax_delay
inAutoMLSearch
1645
- Fixed jupyter notebooks to help the RTD buildtime
1654
- Added
positive_only
objectives tonon_core_objectives
1661
- Fixed stacking argument
n_jobs
for IterativeAlgorithm1706
- Updated CatBoost estimators to return self in
.fit()
rather than the underlying model for consistency1701
- Added ability to initialize pipeline parameters in
AutoMLSearch
constructor1676
- Fixed inconsistent attributes and added Exceptions to docs
- Changes
- Added labeling to
graph_confusion_matrix
1632
- Rerunning search for
AutoMLSearch
results in a message thrown rather than failing the search, and removedhas_searched
property1647
- Changed tuner class to allow and ignore single parameter values as input
1686
- Capped LightGBM version limit to remove bug in docs
1711
- Removed support for np.random.RandomState in EvalML
1727
- Added labeling to
- Documentation Changes
- Update Model Understanding in the user guide to include
visualize_decision_tree
1678
- Updated docs to include information about
AutoMLSearch
callback parameters and methods1577
- Updated docs to prompt users to install graphiz on Mac
1656
- Added
infer_feature_types
to thestart.ipynb
guide1700
- Added multicollinearity data check to API reference and docs
1707
- Update Model Understanding in the user guide to include
- Testing Changes
Warning
- Breaking Changes
- Removed
has_searched
property fromAutoMLSearch
1647
- Components and pipelines return
Woodwork
data structures instead ofpandas
data structures1668
- Removed support for np.random.RandomState in EvalML. Rather than passing
np.random.RandomState
as component and pipeline random_state values, we use int random_seed1727
- Removed
- v0.17.0 Dec. 29, 2020
- Enhancements
- Added
save_plot
that allows for saving figures from different backends1588
- Added
LightGBM Regressor
to regression components1459
- Added
visualize_decision_tree
for tree visualization withdecision_tree_data_from_estimator
anddecision_tree_data_from_pipeline
to reformat tree structure output1511
- Added DFS Transformer component into transformer components
1454
- Added
MAPE
to the standard metrics for time series problems and update objectives1510
- Added
graph_prediction_vs_actual_over_time
andget_prediction_vs_actual_over_time_data
to the model understanding module for time series problems1483
- Added a
ComponentGraph
class that will support future pipelines as directed acyclic graphs1415
- Updated data checks to accept
Woodwork
data structures1481
- Added parameter to
InvalidTargetDataCheck
to show only top unique values rather than all unique values1485
- Added multicollinearity data check
1515
- Added baseline pipeline and components for time series regression problems
1496
- Added more information to users about ensembling behavior in
AutoMLSearch
1527
- Add woodwork support for more utility and graph methods
1544
- Changed
DateTimeFeaturizer
to encode features as int1479
- Return trained pipelines from
AutoMLSearch.best_pipeline
1547
- Added utility method so that users can set feature types without having to learn about Woodwork directly
1555
- Added Linear Discriminant Analysis transformer for dimensionality reduction
1331
- Added multiclass support for
partial_dependence
andgraph_partial_dependence
1554
- Added
TimeSeriesBinaryClassificationPipeline
andTimeSeriesMulticlassClassificationPipeline
classes1528
- Added
make_data_splitter
method for easier automl data split customization1568
- Integrated
ComponentGraph
class into Pipelines for full non-linear pipeline support1543
- Update
AutoMLSearch
constructor to take training data instead ofsearch
andadd_to_leaderboard
1597
- Update
split_data
helper args1597
- Add problem type utils
is_regression
,is_classification
,is_timeseries
1597
- Rename
AutoMLSearch
data_split
arg todata_splitter
1569
- Added
- Fixes
- Fix AutoML not passing CV folds to
DefaultDataChecks
for usage byClassImbalanceDataCheck
1619
- Fix Windows CI jobs: install
numba
via conda, required forshap
1490
- Added custom-index support for reset-index-get_prediction_vs_actual_over_time_data
1494
- Fix
generate_pipeline_code
to account for boolean and None differences between Python and JSON1524
1531
- Set max value for plotly and xgboost versions while we debug CI failures with newer versions
1532
- Undo version pinning for plotly
1533
- Fix ReadTheDocs build by updating the version of
setuptools
1561
- Set
random_state
of data splitter in AutoMLSearch to take int to keep consistency in the resulting splits1579
- Pin sklearn version while we work on adding support
1594
- Pin pandas at <1.2.0 while we work on adding support
1609
- Pin graphviz at < 0.16 while we work on adding support
1609
- Fix AutoML not passing CV folds to
- Changes
- Reverting
save_graph
1550
to resolve kaleido build issues1585
- Update circleci badge to apply to
main
1489
- Added script to generate github markdown for releases
1487
- Updated selection using pandas
dtypes
to selecting using Woodwork logical types1551
- Updated dependencies to fix
ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes'
error and to address Woodwork and Featuretool dependencies1540
- Made
get_prediction_vs_actual_data()
a public method1553
- Updated
Woodwork
version requirement to v0.0.71560
- Move data splitters from
evalml.automl.data_splitters
toevalml.preprocessing.data_splitters
1597
- Rename "# Testing" in automl log output to "# Validation"
1597
- Reverting
- Documentation Changes
- Added partial dependence methods to API reference
1537
- Updated documentation for confusion matrix methods
1611
- Added partial dependence methods to API reference
- Testing Changes
- Set
n_jobs=1
in most unit tests to reduce memory1505
- Set
Warning
- Breaking Changes
- Updated minimal dependencies:
numpy>=1.19.1
,pandas>=1.1.0
,scikit-learn>=0.23.1
,scikit-optimize>=0.8.1
- Updated
AutoMLSearch.best_pipeline
to return a trained pipeline. Pass intrain_best_pipeline=False
to AutoMLSearch in order to return an untrained pipeline. - Pipeline component instances can no longer be iterated through using
Pipeline.component_graph
1543
- Update
AutoMLSearch
constructor to take training data instead ofsearch
andadd_to_leaderboard
1597
- Update
split_data
helper args1597
- Move data splitters from
evalml.automl.data_splitters
toevalml.preprocessing.data_splitters
1597
- Rename
AutoMLSearch
data_split
arg todata_splitter
1569
- Updated minimal dependencies:
- v0.16.1 Dec. 1, 2020
- Enhancements
- Pin woodwork version to v0.0.6 to avoid breaking changes
1484
- Updated
Woodwork
to >=0.0.5 incore-requirements.txt
1473
- Removed
copy_dataframe
parameter forWoodwork
, updatedWoodwork
to >=0.0.6 incore-requirements.txt
1478
- Updated
detect_problem_type
to usepandas.api.is_numeric_dtype
1476
- Pin woodwork version to v0.0.6 to avoid breaking changes
- Changes
- Changed
make clean
to delete coverage reports as a convenience for developers1464
- Set
n_jobs=-1
by default for stacked ensemble components1472
- Changed
- Documentation Changes
- Updated pipeline and component documentation and demos to use
Woodwork
1466
- Updated pipeline and component documentation and demos to use
- Testing Changes
- Update dependency update checker to use everything from core and optional dependencies
1480
- Update dependency update checker to use everything from core and optional dependencies
- v0.16.0 Nov. 24, 2020
- Enhancements
- Updated pipelines and
make_pipeline
to acceptWoodwork
inputs1393
- Updated components to accept
Woodwork
inputs1423
- Added ability to freeze hyperparameters for
AutoMLSearch
1284
- Added
Target Encoder
into transformer components1401
- Added callback for error handling in
AutoMLSearch
1403
- Added the index id to the
explain_predictions_best_worst
output to help users identify which rows in their data are included1365
- The top_k features displayed in
explain_predictions_*
functions are now determined by the magnitude of shap values as opposed to thetop_k
largest and smallest shap values.1374
- Added a problem type for time series regression
1386
- Added a
is_defined_for_problem_type
method toObjectiveBase
1386
- Added a
random_state
parameter tomake_pipeline_from_components
function1411
- Added
DelayedFeaturesTransformer
1396
- Added a
TimeSeriesRegressionPipeline
class1418
- Removed
core-requirements.txt
from the package distribution1429
- Updated data check messages to include a "code" and "details" fields
1451
,1462
- Added a
TimeSeriesSplit
data splitter for time series problems1441
- Added a
problem_configuration
parameter to AutoMLSearch1457
- Updated pipelines and
- Fixes
- Fixed
IndexError
raised inAutoMLSearch
whenensembling = True
but only one pipeline to iterate over1397
- Fixed stacked ensemble input bug and LightGBM warning and bug in
AutoMLSearch
1388
- Updated enum classes to show possible enum values as attributes
1391
- Updated calls to
Woodwork
'sto_pandas()
toto_series()
andto_dataframe()
1428
- Fixed bug in OHE where column names were not guaranteed to be unique
1349
- Fixed bug with percent improvement of
ExpVariance
objective on data with highly skewed target1467
- Fix SimpleImputer error which occurs when all features are bool type
1215
- Fixed
- Changes
- Changed
OutliersDataCheck
to return the list of columns, rather than rows, that contain outliers1377
- Simplified and cleaned output for Code Generation
1371
- Reverted changes from
1337
1409
- Updated data checks to return dictionary of warnings and errors instead of a list
1448
- Updated
AutoMLSearch
to passWoodwork
data structures to every pipeline (instead of pandas DataFrames)1450
- Update
AutoMLSearch
to default tomax_batches=1
instead ofmax_iterations=5
1452
- Updated _evaluate_pipelines to consolidate side effects
1410
- Changed
- Documentation Changes
- Added description of CLA to contributing guide, updated description of draft PRs
1402
- Updated documentation to include all data checks,
DataChecks
, and usage of data checks in AutoML1412
- Updated docstrings from
np.array
tonp.ndarray
1417
- Added section on stacking ensembles in AutoMLSearch documentation
1425
- Added description of CLA to contributing guide, updated description of draft PRs
- Testing Changes
- Removed
category_encoders
from test-requirements.txt1373
- Tweak codecov.io settings again to avoid flakes
1413
- Modified
make lint
to check notebook versions in the docs1431
- Modified
make lint-fix
to standardize notebook versions in the docs1431
- Use new version of pull request Github Action for dependency check (
1443
) - Reduced number of workers for tests to 4
1447
- Removed
Warning
- Breaking Changes
- The
top_k
andtop_k_features
parameters inexplain_predictions_*
functions now returnk
features as opposed to2 * k
features1374
- Renamed
problem_type
toproblem_types
inRegressionObjective
,BinaryClassificationObjective
, andMulticlassClassificationObjective
1319
- Data checks now return a dictionary of warnings and errors instead of a list
1448
- The
- v0.15.0 Oct. 29, 2020
- Enhancements
- Added stacked ensemble component classes (
StackedEnsembleClassifier
,StackedEnsembleRegressor
)1134
- Added stacked ensemble components to
AutoMLSearch
1253
- Added
DecisionTreeClassifier
andDecisionTreeRegressor
to AutoML1255
- Added
graph_prediction_vs_actual
inmodel_understanding
for regression problems1252
- Added parameter to
OneHotEncoder
to enable filtering for features to encode for1249
- Added percent-better-than-baseline for all objectives to automl.results
1244
- Added
HighVarianceCVDataCheck
and replaced synonymous warning inAutoMLSearch
1254
- Added PCA Transformer component for dimensionality reduction
1270
- Added
generate_pipeline_code
andgenerate_component_code
to allow for code generation given a pipeline or component instance1306
- Added
PCA Transformer
component for dimensionality reduction1270
- Updated
AutoMLSearch
to supportWoodwork
data structures1299
- Added cv_folds to
ClassImbalanceDataCheck
and added this check toDefaultDataChecks
1333
- Make
max_batches
argument toAutoMLSearch.search
public1320
- Added text support to automl search
1062
- Added
_pipelines_per_batch
as a private argument toAutoMLSearch
1355
- Added stacked ensemble component classes (
- Fixes
- Fixed ML performance issue with ordered datasets: always shuffle data in automl's default CV splits
1265
- Fixed broken
evalml info
CLI command1293
- Fixed
boosting type='rf'
for LightGBM Classifier, as well asnum_leaves
error1302
- Fixed bug in
explain_predictions_best_worst
where a custom index in the target variable would cause aValueError
1318
- Added stacked ensemble estimators to to
evalml.pipelines.__init__
file1326
- Fixed bug in OHE where calls to transform were not deterministic if
top_n
was less than the number of categories in a column1324
- Fixed LightGBM warning messages during AutoMLSearch
1342
- Fix warnings thrown during AutoMLSearch in
HighVarianceCVDataCheck
1346
- Fixed bug where TrainingValidationSplit would return invalid location indices for dataframes with a custom index
1348
- Fixed bug where the AutoMLSearch
random_state
was not being passed to the created pipelines1321
- Fixed ML performance issue with ordered datasets: always shuffle data in automl's default CV splits
- Changes
- Allow
add_to_rankings
to be called before AutoMLSearch is called1250
- Removed Graphviz from test-requirements to add to requirements.txt
1327
- Removed
max_pipelines
parameter fromAutoMLSearch
1264
- Include editable installs in all install make targets
1335
- Made pip dependencies featuretools and nlp_primitives core dependencies
1062
- Removed PartOfSpeechCount from TextFeaturizer transform primitives
1062
- Added warning for
partial_dependency
when the feature includes null values1352
- Allow
- Documentation Changes
- Fixed and updated code blocks in Release Notes
1243
- Added DecisionTree estimators to API Reference
1246
- Changed class inheritance display to flow vertically
1248
- Updated cost-benefit tutorial to use a holdout/test set
1159
- Added
evalml info
command to documentation1293
- Miscellaneous doc updates
1269
- Removed conda pre-release testing from the release process document
1282
- Updates to contributing guide
1310
- Added Alteryx footer to docs with Twitter and Github link
1312
- Added documentation for evalml installation for Python 3.6
1322
- Added documentation changes to make the API Docs easier to understand
1323
- Fixed documentation for
feature_importance
1353
- Added tutorial for running AutoML with text data
1357
- Added documentation for woodwork integration with automl search
1361
- Fixed and updated code blocks in Release Notes
- Testing Changes
- Added tests for
jupyter_check
to handle IPython1256
- Cleaned up
make_pipeline
tests to test for all estimators1257
- Added a test to check conda build after merge to main
1247
- Removed code that was lacking codecov for
__main__.py
and unnecessary1293
- Codecov: round coverage up instead of down
1334
- Add DockerHub credentials to CI testing environment
1356
- Add DockerHub credentials to conda testing environment
1363
- Added tests for
Warning
- Breaking Changes
- Renamed
LabelLeakageDataCheck
toTargetLeakageDataCheck
1319
max_pipelines
parameter has been removed fromAutoMLSearch
. Please usemax_iterations
instead.1264
AutoMLSearch.search()
will now log a warning if the input is not aWoodwork
data structure (pandas
,numpy
)1299
- Make
max_batches
argument toAutoMLSearch.search
public1320
- Removed unused argument feature_types from AutoMLSearch.search
1062
- Renamed
- v0.14.1 Sep. 29, 2020
- Enhancements
- Updated partial dependence methods to support calculating numeric columns in a dataset with non-numeric columns
1150
- Added
get_feature_names
onOneHotEncoder
1193
- Added
detect_problem_type
toproblem_type/utils.py
to automatically detect the problem type given targets1194
- Added LightGBM to
AutoMLSearch
1199
- Updated
scikit-learn
andscikit-optimize
to use latest versions - 0.23.2 and 0.8.1 respectively1141
- Added
__str__
and__repr__
for pipelines and components1218
- Included internal target check for both training and validation data in
AutoMLSearch
1226
- Added
ProblemTypes.all_problem_types
helper to get list of supported problem types1219
- Added
DecisionTreeClassifier
andDecisionTreeRegressor
classes1223
- Added
ProblemTypes.all_problem_types
helper to get list of supported problem types1219
DataChecks
can now be parametrized by passing a list ofDataCheck
classes and a parameter dictionary1167
- Added first CV fold score as validation score in
AutoMLSearch.rankings
1221
- Updated
flake8
configuration to enable linting on__init__.py
files1234
- Refined
make_pipeline_from_components
implementation1204
- Updated partial dependence methods to support calculating numeric columns in a dataset with non-numeric columns
- Fixes
- Updated GitHub URL after migration to Alteryx GitHub org
1207
- Changed Problem Type enum to be more similar to the string name
1208
- Wrapped call to scikit-learn's partial dependence method in a
try
/finally
block1232
- Updated GitHub URL after migration to Alteryx GitHub org
- Changes
- Added
allow_writing_files
as a named argument to CatBoost estimators.1202
- Added
solver
andmulti_class
as named arguments toLogisticRegressionClassifier
1202
- Replaced pipeline's
._transform
method to evaluate all the preprocessing steps of a pipeline with.compute_estimator_features
1231
- Changed default large dataset train/test splitting behavior
1205
- Added
- Documentation Changes
- Included description of how to access the component instances and features for pipeline user guide
1163
- Updated API docs to refer to target as "target" instead of "labels" for non-classification tasks and minor docs cleanup
1160
- Added Class Imbalance Data Check to
api_reference.rst
1190
1200
- Added pipeline properties to API reference
1209
- Clarified what the objective parameter in AutoML is used for in AutoML API reference and AutoML user guide
1222
- Updated API docs to include
skopt.space.Categorical
option for component hyperparameter range definition1228
- Added install documentation for
libomp
in order to use LightGBM on Mac1233
- Improved description of
max_iterations
in documentation1212
- Removed unused code from sphinx conf
1235
- Included description of how to access the component instances and features for pipeline user guide
- Testing Changes
Warning
- Breaking Changes
DefaultDataChecks
now accepts aproblem_type
parameter that must be specified1167
- Pipeline's
._transform
method to evaluate all the preprocessing steps of a pipeline has been replaced with.compute_estimator_features
1231
get_objectives
has been renamed toget_core_objectives
. This function will now return a list of valid objective instances1230
- v0.13.2 Sep. 17, 2020
- Enhancements
- Added
output_format
field to explain predictions functions1107
- Modified
get_objective
andget_objectives
to be able to return any objective inevalml.objectives
1132
- Added a
return_instance
boolean parameter toget_objective
1132
- Added
ClassImbalanceDataCheck
to determine whether target imbalance falls below a given threshold1135
- Added label encoder to LightGBM for binary classification
1152
- Added labels for the row index of confusion matrix
1154
- Added
AutoMLSearch
object as another parameter in search callbacks1156
- Added the corresponding probability threshold for each point displayed in
graph_roc_curve
1161
- Added
__eq__
forComponentBase
andPipelineBase
1178
- Added support for multiclass classification for
roc_curve
1164
- Added
categories
accessor toOneHotEncoder
for listing the categories associated with a feature1182
- Added utility function to create pipeline instances from a list of component instances
1176
- Added
- Fixes
- Fixed XGBoost column names for partial dependence methods
1104
- Removed dead code validating column type from
TextFeaturizer
1122
- Fixed issue where
Imputer
cannot fit when there is None in a categorical or boolean column1144
OneHotEncoder
preserves the custom index in the input data1146
- Fixed representation for
ModelFamily
1165
- Removed duplicate
nbsphinx
dependency indev-requirements.txt
1168
- Users can now pass in any valid kwargs to all estimators
1157
- Remove broken accessor
OneHotEncoder.get_feature_names
and unneeded base class1179
- Removed LightGBM Estimator from AutoML models
1186
- Fixed XGBoost column names for partial dependence methods
- Changes
- Pinned
scikit-optimize
version to 0.7.41136
- Removed
tqdm
as a dependency1177
- Added lightgbm version 3.0.0 to
latest_dependency_versions.txt
1185
- Rename
max_pipelines
tomax_iterations
1169
- Pinned
- Documentation Changes
- Fixed API docs for
AutoMLSearch
add_result_callback
1113
- Added a step to our release process for pushing our latest version to conda-forge
1118
- Added warning for missing ipywidgets dependency for using
PipelineSearchPlots
on Jupyterlab1145
- Updated
README.md
example to load demo dataset1151
- Swapped mapping of breast cancer targets in
model_understanding.ipynb
1170
- Fixed API docs for
- Testing Changes
- Added test confirming
TextFeaturizer
never outputs null values1122
- Changed Python version of
Update Dependencies
action to 3.8.x1137
- Fixed release notes check-in test for
Update Dependencies
actions1172
- Added test confirming
Warning
- Breaking Changes
get_objective
will now return a class definition rather than an instance by default1132
- Deleted
OPTIONS
dictionary inevalml.objectives.utils.py
1132
- If specifying an objective by string, the string must now match the objective's name field, case-insensitive
1132
- Passing "Cost Benefit Matrix", "Fraud Cost", "Lead Scoring", "Mean Squared Log Error",
"Recall", "Recall Macro", "Recall Micro", "Recall Weighted", or "Root Mean Squared Log Error" to
AutoMLSearch
will now result in aValueError
rather than anObjectiveNotFoundError
1132
- Search callbacks
start_iteration_callback
andadd_results_callback
have changed to include a copy of the AutoMLSearch object as a third parameter1156
- Deleted
OneHotEncoder.get_feature_names
method which had been broken for a while, in favor of pipelines'input_feature_names
1179
- Deleted empty base class
CategoricalEncoder
whichOneHotEncoder
component was inheriting from1176
- Results from
roc_curve
will now return as a list of dictionaries with each dictionary representing a class1164
max_pipelines
now raises aDeprecationWarning
and will be removed in the next release.max_iterations
should be used instead.1169
- v0.13.1 Aug. 25, 2020
- Enhancements
- Added Cost-Benefit Matrix objective for binary classification
1038
- Split
fill_value
intocategorical_fill_value
andnumeric_fill_value
for Imputer1019
- Added
explain_predictions
andexplain_predictions_best_worst
for explaining multiple predictions with SHAP1016
- Added new LSA component for text featurization
1022
- Added guide on installing with conda
1041
- Added a “cost-benefit curve” util method to graph cost-benefit matrix scores vs. binary classification thresholds
1081
- Standardized error when calling transform/predict before fit for pipelines
1048
- Added
percent_better_than_baseline
to AutoML search rankings and full rankings table1050
- Added one-way partial dependence and partial dependence plots
1079
- Added "Feature Value" column to prediction explanation reports.
1064
- Added LightGBM classification estimator
1082
,1114
- Added
max_batches
parameter toAutoMLSearch
1087
- Added Cost-Benefit Matrix objective for binary classification
- Fixes
- Updated
TextFeaturizer
component to no longer require an internet connection to run1022
- Fixed non-deterministic element of
TextFeaturizer
transformations1022
- Added a StandardScaler to all ElasticNet pipelines
1065
- Updated cost-benefit matrix to normalize score
1099
- Fixed logic in
calculate_percent_difference
so that it can handle negative values1100
- Updated
- Changes
- Added
needs_fitting
property toComponentBase
1044
- Updated references to data types to use datatype lists defined in
evalml.utils.gen_utils
1039
- Remove maximum version limit for SciPy dependency
1051
- Moved
all_components
and other component importers into runtime methods1045
- Consolidated graphing utility methods under
evalml.utils.graph_utils
1060
- Made slight tweaks to how
TextFeaturizer
usesfeaturetools
, and did some refactoring of that and of LSA1090
- Changed
show_all_features
parameter intoimportance_threshold
, which allows for thresholding feature importance1097
,1103
- Added
- Documentation Changes
- Update
setup.py
URL to point to the github repo1037
- Added tutorial for using the cost-benefit matrix objective
1088
- Updated
model_understanding.ipynb
to include documentation for using plotly on Jupyter Lab1108
- Update
- Testing Changes
- Refactor CircleCI tests to use matrix jobs (
1043
) - Added a test to check that all test directories are included in evalml package
1054
- Refactor CircleCI tests to use matrix jobs (
Warning
- Breaking Changes
confusion_matrix
andnormalize_confusion_matrix
have been moved toevalml.utils
1038
- All graph utility methods previously under
evalml.pipelines.graph_utils
have been moved toevalml.utils.graph_utils
1060
- v0.12.2 Aug. 6, 2020
- Enhancements
- Add save/load method to components
1023
- Expose pickle
protocol
as optional arg to save/load1023
- Updated estimators used in AutoML to include ExtraTrees and ElasticNet estimators
1030
- Add save/load method to components
- Fixes
- Changes
- Removed
DeprecationWarning
forSimpleImputer
1018
- Removed
- Documentation Changes
- Add note about version numbers to release process docs
1034
- Add note about version numbers to release process docs
- Testing Changes
- Test files are now included in the evalml package
1029
- Test files are now included in the evalml package
- v0.12.0 Aug. 3, 2020
- Enhancements
- Added string and categorical targets support for binary and multiclass pipelines and check for numeric targets for
DetectLabelLeakage
data check932
- Added clear exception for regression pipelines if target datatype is string or categorical
960
- Added target column names and class labels in
predict
andpredict_proba
output for pipelines951
- Added
_compute_shap_values
andnormalize_values
topipelines/explanations
module958
- Added
explain_prediction
feature which explains single predictions with SHAP974
- Added Imputer to allow different imputation strategies for numerical and categorical dtypes
991
- Added support for configuring logfile path using env var, and don't create logger if there are filesystem errors
975
- Updated catboost estimators' default parameters and automl hyperparameter ranges to speed up fit time
998
- Added string and categorical targets support for binary and multiclass pipelines and check for numeric targets for
- Fixes
- Fixed ReadtheDocs warning failure regarding embedded gif
943
- Removed incorrect parameter passed to pipeline classes in
_add_baseline_pipelines
941
- Added universal error for calling
predict
,predict_proba
,transform
, andfeature_importances
before fitting969
,994
- Made
TextFeaturizer
component and pip dependenciesfeaturetools
andnlp_primitives
optional976
- Updated imputation strategy in automl to no longer limit impute strategy to
most_frequent
for all features if there are any categorical columns991
- Fixed
UnboundLocalError
forcv_pipeline
when automl search errors996
- Fixed
Imputer
to reset dataframe index to preserve behavior expected fromSimpleImputer
1009
- Fixed ReadtheDocs warning failure regarding embedded gif
- Changes
- Moved
get_estimators
toevalml.pipelines.components.utils
934
- Modified Pipelines to raise
PipelineScoreError
when they encounter an error during scoring936
- Moved
evalml.model_families.list_model_families
toevalml.pipelines.components.allowed_model_families
959
- Renamed
DateTimeFeaturization
toDateTimeFeaturizer
977
- Added check to stop search and raise an error if all pipelines in a batch return NaN scores
1015
- Moved
- Documentation Changes
- Updated
README.md
963
- Reworded message when errors are returned from data checks in search
982
- Added section on understanding model predictions with
explain_prediction
to User Guide981
- Added a section to the user guide and api reference about how XGBoost and CatBoost are not fully supported.
992
- Added custom components section in user guide
993
- Updated FAQ section formatting
997
- Updated release process documentation
1003
- Updated
- Testing Changes
- Moved
predict_proba
andpredict
tests regarding string / categorical targets totest_pipelines.py
972
- Fixed dependency update bot by updating python version to 3.7 to avoid frequent github version updates
1002
- Moved
Warning
- Breaking Changes
get_estimators
has been moved toevalml.pipelines.components.utils
(previously was underevalml.pipelines.utils
)934
- Removed the
raise_errors
flag in AutoML search. All errors during pipeline evaluation will be caught and logged.936
evalml.model_families.list_model_families
has been moved toevalml.pipelines.components.allowed_model_families
959
TextFeaturizer
: thefeaturetools
andnlp_primitives
packages must be installed after installing evalml in order to use this component976
- Renamed
DateTimeFeaturization
toDateTimeFeaturizer
977
- v0.11.2 July 16, 2020
- Enhancements
- Added
NoVarianceDataCheck
toDefaultDataChecks
893
- Added text processing and featurization component
TextFeaturizer
913
,924
- Added additional checks to
InvalidTargetDataCheck
to handle invalid target data types929
AutoMLSearch
will now handleKeyboardInterrupt
and prompt user for confirmation915
- Added
- Fixes
- Makes automl results a read-only property
919
- Makes automl results a read-only property
- Changes
- Deleted static pipelines and refactored tests involving static pipelines, removed
all_pipelines()
andget_pipelines()
904
- Moved
list_model_families
toevalml.model_family.utils
903
- Updated
all_pipelines
,all_estimators
,all_components
to use the same mechanism for dynamically generating their elements898
- Rename
master
branch tomain
918
- Add pypi release github action
923
- Updated
AutoMLSearch.search
stdout output and logging and removed tqdm progress bar921
- Moved automl config checks previously in
search()
to init933
- Deleted static pipelines and refactored tests involving static pipelines, removed
- Documentation Changes
- Reorganized and rewrote documentation
937
- Updated to use pydata sphinx theme
937
- Updated docs to use
release_notes
instead ofchangelog
942
- Reorganized and rewrote documentation
- Testing Changes
- Cleaned up fixture names and usages in tests
895
- Cleaned up fixture names and usages in tests
Warning
- Breaking Changes
list_model_families
has been moved toevalml.model_family.utils
(previously was underevalml.pipelines.utils
)903
get_estimators
has been moved toevalml.pipelines.components.utils
(previously was underevalml.pipelines.utils
)934
- Static pipeline definitions have been removed, but similar pipelines can still be constructed via creating an instance of
PipelineBase
904
all_pipelines()
andget_pipelines()
utility methods have been removed904
- v0.11.0 June 30, 2020
- Enhancements
- Added multiclass support for ROC curve graphing
832
- Added preprocessing component to drop features whose percentage of NaN values exceeds a specified threshold
834
- Added data check to check for problematic target labels
814
- Added PerColumnImputer that allows imputation strategies per column
824
- Added transformer to drop specific columns
827
- Added support for
categories
,handle_error
, anddrop
parameters inOneHotEncoder
830
897
- Added preprocessing component to handle DateTime columns featurization
838
- Added ability to clone pipelines and components
842
- Define getter method for component
parameters
847
- Added utility methods to calculate and graph permutation importances
860
,880
- Added new utility functions necessary for generating dynamic preprocessing pipelines
852
- Added kwargs to all components
863
- Updated
AutoSearchBase
to use dynamically generated preprocessing pipelines870
- Added SelectColumns transformer
873
- Added ability to evaluate additional pipelines for automl search
874
- Added
default_parameters
class property to components and pipelines879
- Added better support for disabling data checks in automl search
892
- Added ability to save and load AutoML objects to file
888
- Updated
AutoSearchBase.get_pipelines
to return an untrained pipeline instance876
- Saved learned binary classification thresholds in automl results cv data dict
876
- Added multiclass support for ROC curve graphing
- Fixes
- Fixed bug where SimpleImputer cannot handle dropped columns
846
- Fixed bug where PerColumnImputer cannot handle dropped columns
855
- Enforce requirement that builtin components save all inputted values in their parameters dict
847
- Don't list base classes in
all_components
output847
- Standardize all components to output pandas data structures, and accept either pandas or numpy
853
- Fixed rankings and full_rankings error when search has not been run
894
- Fixed bug where SimpleImputer cannot handle dropped columns
- Changes
- Update
all_pipelines
andall_components
to try initializing pipelines/components, and on failure exclude them849
- Refactor
handle_components
tohandle_components_class
, standardize toComponentBase
subclass instead of instance850
- Refactor "blacklist"/"whitelist" to "allow"/"exclude" lists
854
- Replaced
AutoClassificationSearch
andAutoRegressionSearch
withAutoMLSearch
871
- Renamed feature_importances and permutation_importances methods to use singular names (feature_importance and permutation_importance)
883
- Updated
automl
default data splitter to train/validation split for large datasets877
- Added open source license, update some repo metadata
887
- Removed dead code in
_get_preprocessing_components
896
- Update
- Documentation Changes
- Fix some typos and update the EvalML logo
872
- Fix some typos and update the EvalML logo
- Testing Changes
- Update the changelog check job to expect the new branching pattern for the deps update bot
836
- Check that all components output pandas datastructures, and can accept either pandas or numpy
853
- Replaced
AutoClassificationSearch
andAutoRegressionSearch
withAutoMLSearch
871
- Update the changelog check job to expect the new branching pattern for the deps update bot
Warning
- Breaking Changes
- Pipelines' static
component_graph
field must contain eitherComponentBase
subclasses orstr
, instead ofComponentBase
subclass instances850
- Rename
handle_component
tohandle_component_class
. Now standardizes toComponentBase
subclasses instead ofComponentBase
subclass instances850
- Renamed automl's
cv
argument todata_split
877
- Pipelines' and classifiers'
feature_importances
is renamedfeature_importance
,graph_feature_importances
is renamedgraph_feature_importance
883
- Passing
data_checks=None
to automl search will not perform any data checks as opposed to default checks.892
- Pipelines to search for in AutoML are now determined automatically, rather than using the statically-defined pipeline classes.
870
- Updated
AutoSearchBase.get_pipelines
to return an untrained pipeline instance, instead of one which happened to be trained on the final cross-validation fold876
- Pipelines' static
- v0.10.0 May 29, 2020
- Enhancements
- Added baseline models for classification and regression, add functionality to calculate baseline models before searching in AutoML
746
- Port over highly-null guardrail as a data check and define
DefaultDataChecks
andDisableDataChecks
classes745
- Update
Tuner
classes to work directly with pipeline parameters dicts instead of flat parameter lists779
- Add Elastic Net as a pipeline option
812
- Added new Pipeline option
ExtraTrees
790
- Added precicion-recall curve metrics and plot for binary classification problems in
evalml.pipeline.graph_utils
794
- Update the default automl algorithm to search in batches, starting with default parameters for each pipeline and iterating from there
793
- Added
AutoMLAlgorithm
class andIterativeAlgorithm
impl, separated fromAutoSearchBase
793
- Added baseline models for classification and regression, add functionality to calculate baseline models before searching in AutoML
- Fixes
- Update pipeline
score
to returnnan
score for any objective which throws an exception during scoring787
- Fixed bug introduced in
787
where binary classification metrics requiring predicted probabilities error in scoring798
- CatBoost and XGBoost classifiers and regressors can no longer have a learning rate of 0
795
- Update pipeline
- Changes
- Cleanup pipeline
score
code, and cleanup codecov711
- Remove
pass
for abstract methods for codecov730
- Added __str__ for AutoSearch object
675
- Add util methods to graph ROC and confusion matrix
720
- Refactor
AutoBase
toAutoSearchBase
758
- Updated AutoBase with
data_checks
parameter, removed previousdetect_label_leakage
parameter, and added functionality to run data checks before search in AutoML765
- Updated our logger to use Python's logging utils
763
- Refactor most of
AutoSearchBase._do_iteration
impl intoAutoSearchBase._evaluate
762
- Port over all guardrails to use the new DataCheck API
789
- Expanded
import_or_raise
to catch all exceptions759
- Adds RMSE, MSLE, RMSLE as standard metrics
788
- Don't allow
Recall
to be used as an objective for AutoML784
- Removed feature selection from pipelines
819
- Update default estimator parameters to make automl search faster and more accurate
793
- Cleanup pipeline
- Documentation Changes
- Add instructions to freeze
master
onrelease.md
726
- Update release instructions with more details
727
733
- Add objective base classes to API reference
736
- Fix components API to match other modules
747
- Add instructions to freeze
- Testing Changes
- Delete codecov yml, use codecov.io's default
732
- Added unit tests for fraud cost, lead scoring, and standard metric objectives
741
- Update codecov client
782
- Updated AutoBase __str__ test to include no parameters case
783
- Added unit tests for
ExtraTrees
pipeline790
- If codecov fails to upload, fail build
810
- Updated Python version of dependency action
816
- Update the dependency update bot to use a suffix when creating branches
817
- Delete codecov yml, use codecov.io's default
Warning
- Breaking Changes
- The
detect_label_leakage
parameter for AutoML classes has been removed and replaced by adata_checks
parameter765
- Moved ROC and confusion matrix methods from
evalml.pipeline.plot_utils
toevalml.pipeline.graph_utils
720
Tuner
classes require a pipeline hyperparameter range dict as an init arg instead of a space definition779
Tuner.propose
andTuner.add
work directly with pipeline parameters dicts instead of flat parameter lists779
PipelineBase.hyperparameters
andcustom_hyperparameters
use pipeline parameters dict format instead of being represented as a flat list779
- All guardrail functions previously under
evalml.guardrails.utils
will be removed and replaced by data checks789
Recall
disallowed as an objective for AutoML784
AutoSearchBase
parametertuner
has been renamed totuner_class
793
AutoSearchBase
parameterpossible_pipelines
andpossible_model_families
have been renamed toallowed_pipelines
andallowed_model_families
793
- The
- v0.9.0 Apr. 27, 2020
- Enhancements
- Added
Accuracy
as an standard objective624
- Added verbose parameter to load_fraud
560
- Added Balanced Accuracy metric for binary, multiclass
612
661
- Added XGBoost regressor and XGBoost regression pipeline
666
- Added
Accuracy
metric for multiclass672
- Added objective name in
AutoBase.describe_pipeline
686
- Added
DataCheck
andDataChecks
,Message
classes and relevant subclasses739
- Added
- Fixes
- Removed direct access to
cls.component_graph
595
- Add testing files to .gitignore
625
- Remove circular dependencies from
Makefile
637
- Add error case for
normalize_confusion_matrix()
640
- Fixed
XGBoostClassifier
andXGBoostRegressor
bug with feature names that contain [, ], or <659
- Update
make_pipeline_graph
to not accidentally create empty file when testing if path is valid649
- Fix pip installation warning about docsutils version, from boto dependency
664
- Removed zero division warning for F1/precision/recall metrics
671
- Fixed
summary
for pipelines without estimators707
- Removed direct access to
- Changes
- Updated default objective for binary/multiclass classification to log loss
613
- Created classification and regression pipeline subclasses and removed objective as an attribute of pipeline classes
405
- Changed the output of
score
to return one dictionary429
- Created binary and multiclass objective subclasses
504
- Updated objectives API
445
- Removed call to
get_plot_data
from AutoML615
- Set
raise_error
to default to True for AutoML classes638
- Remove unnecessary "u" prefixes on some unicode strings
641
- Changed one-hot encoder to return uint8 dtypes instead of ints
653
- Pipeline
_name
field changed tocustom_name
650
- Removed
graphs.py
and moved methods intoPipelineBase
657
,665
- Remove s3fs as a dev dependency
664
- Changed requirements-parser to be a core dependency
673
- Replace
supported_problem_types
field on pipelines withproblem_type
attribute on base classes678
- Changed AutoML to only show best results for a given pipeline template in
rankings
, addedfull_rankings
property to show all682
- Update
ModelFamily
values: don't list xgboost/catboost as classifiers now that we have regression pipelines for them677
- Changed AutoML's
describe_pipeline
to get problem type from pipeline instead685
- Standardize
import_or_raise
error messages683
- Updated argument order of objectives to align with sklearn's
698
- Renamed
pipeline.feature_importance_graph
topipeline.graph_feature_importances
700
- Moved ROC and confusion matrix methods to
evalml.pipelines.plot_utils
704
- Renamed
MultiClassificationObjective
toMulticlassClassificationObjective
, to align with pipeline naming scheme715
- Updated default objective for binary/multiclass classification to log loss
- Documentation Changes
- Fixed some sphinx warnings
593
- Fixed docstring for
AutoClassificationSearch
with correct command599
- Limit readthedocs formats to pdf, not htmlzip and epub
594
600
- Clean up objectives API documentation
605
- Fixed function on Exploring search results page
604
- Update release process doc
567
AutoClassificationSearch
andAutoRegressionSearch
show inherited methods in API reference651
- Fixed improperly formatted code in breaking changes for changelog
655
- Added configuration to treat Sphinx warnings as errors
660
- Removed separate plotting section for pipelines in API reference
657
,665
- Have leads example notebook load S3 files using https, so we can delete s3fs dev dependency
664
- Categorized components in API reference and added descriptions for each category
663
- Fixed Sphinx warnings about
BalancedAccuracy
objective669
- Updated API reference to include missing components and clean up pipeline docstrings
689
- Reorganize API ref, and clarify pipeline sub-titles
688
- Add and update preprocessing utils in API reference
687
- Added inheritance diagrams to API reference
695
- Documented which default objective AutoML optimizes for
699
- Create seperate install page
701
- Include more utils in API ref, like
import_or_raise
704
- Add more color to pipeline documentation
705
- Fixed some sphinx warnings
- Testing Changes
- Matched install commands of
check_latest_dependencies
test and it's GitHub action578
- Added Github app to auto assign PR author as assignee
477
- Removed unneeded conda installation of xgboost in windows checkin tests
618
- Update graph tests to always use tmpfile dir
649
- Changelog checkin test workaround for release PRs: If 'future release' section is empty of PR refs, pass check
658
- Add changelog checkin test exception for
dep-update
branch723
- Matched install commands of
Warning
Breaking Changes
- Pipelines will now no longer take an objective parameter during instantiation, and will no longer have an objective attribute.
fit()
andpredict()
now use an optionalobjective
parameter, which is only used in binary classification pipelines to fit for a specific objective.score()
will now use a requiredobjectives
parameter that is used to determine all the objectives to score on. This differs from the previous behavior, where the pipeline's objective was scored on regardless.score()
will now return one dictionary of all objective scores.ROC
andConfusionMatrix
plot methods viaAuto(*).plot
have been removed by615
and are replaced byroc_curve
andconfusion_matrix
inevamlm.pipelines.plot_utils
in704
normalize_confusion_matrix
has been moved toevalml.pipelines.plot_utils
704
- Pipelines
_name
field changed tocustom_name
- Pipelines
supported_problem_types
field is removed because it is no longer necessary678
- Updated argument order of objectives'
objective_function
to align with sklearn698
pipeline.feature_importance_graph
has been renamed topipeline.graph_feature_importances
in700
- Removed unsupported
MSLE
objective704
- v0.8.0 Apr. 1, 2020
- Enhancements
- Add normalization option and information to confusion matrix
484
- Add util function to drop rows with NaN values
487
- Renamed
PipelineBase.name
asPipelineBase.summary
and redefinedPipelineBase.name
as class property491
- Added access to parameters in Pipelines with
PipelineBase.parameters
(used to be return ofPipelineBase.describe
)501
- Added
fill_value
parameter forSimpleImputer
509
- Added functionality to override component hyperparameters and made pipelines take hyperparemeters from components
516
- Allow
numpy.random.RandomState
for random_state parameters556
- Add normalization option and information to confusion matrix
- Fixes
- Removed unused dependency
matplotlib
, and movecategory_encoders
to test reqs572
- Removed unused dependency
- Changes
- Undo version cap in XGBoost placed in
402
and allowed all released of XGBoost407
- Support pandas 1.0.0
486
- Made all references to the logger static
503
- Refactored
model_type
parameter for components and pipelines tomodel_family
507
- Refactored
problem_types
for pipelines and components intosupported_problem_types
515
- Moved
pipelines/utils.save_pipeline
andpipelines/utils.load_pipeline
toPipelineBase.save
andPipelineBase.load
526
- Limit number of categories encoded by
OneHotEncoder
517
- Undo version cap in XGBoost placed in
- Documentation Changes
- Updated API reference to remove
PipelinePlot
and added movedPipelineBase
plotting methods483
- Add code style and github issue guides
463
512
- Updated API reference for to surface class variables for pipelines and components
537
- Fixed README documentation link
535
- Unhid PR references in changelog
656
- Updated API reference to remove
- Testing Changes
- Added automated dependency check PR
482
,505
- Updated automated dependency check comment
497
- Have build_docs job use python executor, so that env vars are set properly
547
- Added simple test to make sure
OneHotEncoder
's top_n works with large number of categories552
- Run windows unit tests on PRs
557
- Added automated dependency check PR
Warning
Breaking Changes
AutoClassificationSearch
andAutoRegressionSearch
'smodel_types
parameter has been refactored intoallowed_model_families
ModelTypes
enum has been changed toModelFamily
- Components and Pipelines now have a
model_family
field instead ofmodel_type
get_pipelines
utility function now acceptsmodel_families
as an argument instead ofmodel_types
PipelineBase.name
no longer returns structure of pipeline and has been replaced byPipelineBase.summary
PipelineBase.problem_types
andEstimator.problem_types
has been renamed tosupported_problem_types
pipelines/utils.save_pipeline
andpipelines/utils.load_pipeline
moved toPipelineBase.save
andPipelineBase.load
- v0.7.0 Mar. 9, 2020
- Enhancements
- Added emacs buffers to .gitignore
350
- Add CatBoost (gradient-boosted trees) classification and regression components and pipelines
247
- Added Tuner abstract base class
351
- Added
n_jobs
as parameter forAutoClassificationSearch
andAutoRegressionSearch
403
- Changed colors of confusion matrix to shades of blue and updated axis order to match scikit-learn's
426
- Added
PipelineBase
.graph
and.feature_importance_graph
methods, moved from previous location423
- Added support for python 3.8
462
- Added emacs buffers to .gitignore
- Fixes
- Fixed ROC and confusion matrix plots not being calculated if user passed own additional_objectives
276
- Fixed ReadtheDocs
FileNotFoundError
exception for fraud dataset439
- Fixed ROC and confusion matrix plots not being calculated if user passed own additional_objectives
- Changes
- Added
n_estimators
as a tunable parameter for XGBoost307
- Remove unused parameter
ObjectiveBase.fit_needs_proba
320
- Remove extraneous parameter
component_type
from all components361
- Remove unused
rankings.csv
file397
- Downloaded demo and test datasets so unit tests can run offline
408
- Remove
_needs_fitting
attribute from Components398
- Changed plot.feature_importance to show only non-zero feature importances by default, added optional parameter to show all
413
- Refactored
PipelineBase
to take in parameter dictionary and moved pipeline metadata to class attribute421
- Dropped support for Python 3.5
438
- Removed unused
apply.py
file449
- Clean up
requirements.txt
to remove unused deps451
- Support installation without all required dependencies
459
- Added
- Documentation Changes
- Update release.md with instructions to release to internal license key
354
- Update release.md with instructions to release to internal license key
- Testing Changes
- Added tests for utils (and moved current utils to gen_utils)
297
- Moved XGBoost install into it's own separate step on Windows using Conda
313
- Rewind pandas version to before 1.0.0, to diagnose test failures for that version
325
- Added dependency update checkin test
324
- Rewind XGBoost version to before 1.0.0 to diagnose test failures for that version
402
- Update dependency check to use a whitelist
417
- Update unit test jobs to not install dev deps
455
- Added tests for utils (and moved current utils to gen_utils)
Warning
Breaking Changes
- Python 3.5 will not be actively supported.
- v0.6.0 Dec. 16, 2019
- Enhancements
- Added ability to create a plot of feature importances
133
- Add early stopping to AutoML using patience and tolerance parameters
241
- Added ROC and confusion matrix metrics and plot for classification problems and introduce PipelineSearchPlots class
242
- Enhanced AutoML results with search order
260
- Added utility function to show system and environment information
300
- Added ability to create a plot of feature importances
- Fixes
- Lower botocore requirement
235
- Fixed decision_function calculation for
FraudCost
objective254
- Fixed return value of
Recall
metrics264
- Components return
self
on fit289
- Lower botocore requirement
- Changes
- Renamed automl classes to
AutoRegressionSearch
andAutoClassificationSearch
287
- Updating demo datasets to retain column names
223
- Moving pipeline visualization to
PipelinePlot
class228
- Standarizing inputs as
pd.Dataframe
/pd.Series
130
- Enforcing that pipelines must have an estimator as last component
277
- Added
ipywidgets
as a dependency inrequirements.txt
278
- Added Random and Grid Search Tuners
240
- Renamed automl classes to
- Documentation Changes
- Adding class properties to API reference
244
- Fix and filter FutureWarnings from scikit-learn
249
,257
- Adding Linear Regression to API reference and cleaning up some Sphinx warnings
227
- Adding class properties to API reference
- Testing Changes
- Added support for testing on Windows with CircleCI
226
- Added support for doctests
233
- Added support for testing on Windows with CircleCI
Warning
Breaking Changes
- The
fit()
method forAutoClassifier
andAutoRegressor
has been renamed tosearch()
. AutoClassifier
has been renamed toAutoClassificationSearch
AutoRegressor
has been renamed toAutoRegressionSearch
AutoClassificationSearch.results
andAutoRegressionSearch.results
now is a dictionary withpipeline_results
andsearch_order
keys.pipeline_results
can be used to access a dictionary that is identical to the old.results
dictionary. Whereas,search_order
returns a list of the search order in terms ofpipeline_id
.- Pipelines now require an estimator as the last component in
component_list
. Slicing pipelines now throws anNotImplementedError
to avoid returning pipelines without an estimator.
- v0.5.2 Nov. 18, 2019
- Enhancements
- Adding basic pipeline structure visualization
211
- Adding basic pipeline structure visualization
- Documentation Changes
- Added notebooks to build process
212
- Added notebooks to build process
- v0.5.1 Nov. 15, 2019
- Enhancements
- Added basic outlier detection guardrail
151
- Added basic ID column guardrail
135
- Added support for unlimited pipelines with a
max_time
limit70
- Updated .readthedocs.yaml to successfully build
188
- Added basic outlier detection guardrail
- Fixes
- Removed MSLE from default additional objectives
203
- Fixed
random_state
passed in pipelines204
- Fixed slow down in RFRegressor
206
- Removed MSLE from default additional objectives
- Changes
- Pulled information for describe_pipeline from pipeline's new describe method
190
- Refactored pipelines
108
- Removed guardrails from Auto(*)
202
,208
- Pulled information for describe_pipeline from pipeline's new describe method
- Documentation Changes
- Updated documentation to show
max_time
enhancements189
- Updated release instructions for RTD
193
- Added notebooks to build process
212
- Added contributing instructions
213
- Added new content
222
- Updated documentation to show
- v0.5.0 Oct. 29, 2019
- Enhancements
- Added basic one hot encoding
73
- Use enums for model_type
110
- Support for splitting regression datasets
112
- Auto-infer multiclass classification
99
- Added support for other units in
max_time
125
- Detect highly null columns
121
- Added additional regression objectives
100
- Show an interactive iteration vs. score plot when using fit()
134
- Added basic one hot encoding
- Fixes
- Reordered
describe_pipeline
94
- Added type check for
model_type
109
- Fixed
s
units when setting stringmax_time
132
- Fix objectives not appearing in API documentation
150
- Reordered
- Changes
- Reorganized tests
93
- Moved logging to its own module
119
- Show progress bar history
111
- Using
cloudpickle
instead of pickle to allow unloading of custom objectives113
- Removed render.py
154
- Reorganized tests
- Documentation Changes
- Update release instructions
140
- Include additional_objectives parameter
124
- Added Changelog
136
- Update release instructions
- Testing Changes
- Code coverage
90
- Added CircleCI tests for other Python versions
104
- Added doc notebooks as tests
139
- Test metadata for CircleCI and 2 core parallelism
137
- Code coverage
- v0.4.1 Sep. 16, 2019
- Enhancements
- Added AutoML for classification and regressor using Autobase and Skopt
7
9
- Implemented standard classification and regression metrics
7
- Added logistic regression, random forest, and XGBoost pipelines
7
- Implemented support for custom objectives
15
- Feature importance for pipelines
18
- Serialization for pipelines
19
- Allow fitting on objectives for optimal threshold
27
- Added detect label leakage
31
- Implemented callbacks
42
- Allow for multiclass classification
21
- Added support for additional objectives
79
- Added AutoML for classification and regressor using Autobase and Skopt
- Fixes
- Fixed feature selection in pipelines
13
- Made
random_seed
usage consistent45
- Fixed feature selection in pipelines
- Documentation Changes
- Documentation Changes
- Added docstrings
6
- Created notebooks for docs
6
- Initialized readthedocs EvalML
6
- Added favicon
38
- Testing Changes
- Added testing for loading data
39
- Added testing for loading data
- v0.2.0 Aug. 13, 2019
- Enhancements
- Created fraud detection objective
4
- Created fraud detection objective
- v0.1.0 July. 31, 2019
- First Release
- Enhancements
- Added lead scoring objecitve
1
- Added basic classifier
1
- Added lead scoring objecitve
- Documentation Changes
- Initialized Sphinx for docs
1
- Initialized Sphinx for docs