Releases: alteryx/evalml
Releases · alteryx/evalml
v0.24.0
v0.24.0 May. 4, 2021
Enhancements
- Added
date_index
as a required parameter for TimeSeries problems #2217 - Have the
OneHotEncoder
return the transformed columns as booleans rather than floats #2170 - Added Oversampler transformer component to EvalML #2079
- Added Undersampler to AutoMLSearch, as well as arguments
_sampler_method
andsampler_balanced_ratio
#2128 - Updated prediction explanations functions to allow pipelines with XGBoost estimators #2162
- Added partial dependence for datetime columns #2180
- Update precision-recall curve with positive label index argument, and fix for 2d predicted probabilities #2090
- Add pct_null_rows to
HighlyNullDataCheck
#2211 - Added a standalone AutoML
search
method for convenience, which runs data checks and then runs automl #2152 - Make the first batch of AutoML have a predefined order, with linear models first and complex models last #2223
Fixes
- Fixed partial dependence not respecting grid resolution parameter for numerical features #2180
- Enable prediction explanations for catboost for multiclass problems #2224
Changes
- Deleted baseline pipeline classes #2202
- Reverting user specified date feature PR #2155 until
pmdarima
installation fix is found #2214 - Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. #2091
- Removed all old datasplitters from EvalML #2193
- Deleted
make_pipeline_from_components
#2218
Documentation Changes
- Renamed dataset to clarify that its gzipped but not a tarball #2183
- Updated documentation to use pipeline instances instead of pipeline subclasses #2195
- Updated contributing guide with a note about GitHub Actions permissions #2090
- Updated automl and model understanding user guides #2090
Testing Changes
- Use machineFL user token for dependency update bot, and add more reviewers #2189
Breaking Changes
- All baseline pipeline classes (
BaselineBinaryPipeline
,BaselineMulticlassPipeline
,BaselineRegressionPipeline
, etc.) have been deleted #2202 - Updated pipeline API to accept component graph and other class attributes as instance parameters. Old pipeline API still works but will not be supported long-term. Pipelines can now be initialized by specifying the component graph as the first parameter, and then passing in optional arguments such as
custom_name
,parameters
, etc. For example,BinaryClassificationPipeline(["Random Forest Classifier"], parameters={})
. #2091 - Removed all old datasplitters from EvalML #2193
- Deleted utility method
make_pipeline_from_components
#2218
v0.23.0
v0.23.0 Apr. 21, 2021
Enhancements
- Refactored
EngineBase
andSequentialEngine
api. AddingDaskEngine
#1975. - Added optional
engine
argument toAutoMLSearch
#1975 - Added a warning about how time series support is still in beta when a user passes in a time series problem to
AutoMLSearch
#2118 - Added
NaturalLanguageNaNDataCheck
data check #2122 - Added ValueError to
partial_dependence
to prevent users from computing partial dependence on columns with all NaNs #2120 - Added standard deviation of cv scores to rankings table #2154
Fixes
- Fixed
BalancedClassificationDataCVSplit
,BalancedClassificationDataTVSplit
, andBalancedClassificationSampler
to useminority:majority
ratio instead ofmajority:minority
#2077 - Fixed bug where two-way partial dependence plots with categorical variables were not working correctly #2117
- Fixed bug where
hyperparameters
were not displaying properly for pipelines with a listcomponent_graph
and duplicate components #2133 - Fixed bug where
pipeline_parameters
argument inAutoMLSearch
was not applied to pipelines passed in asallowed_pipelines
#2133 - Fixed bug where
AutoMLSearch
was not applying custom hyperparameters to pipelines with a listcomponent_graph
and duplicate components #2133
Changes
- Removed
hyperparameter_ranges
from Undersampler and renamedbalanced_ratio
tosampling_ratio
for samplers #2113 - Renamed
TARGET_BINARY_NOT_TWO_EXAMPLES_PER_CLASS
data check message code toTARGET_MULTICLASS_NOT_TWO_EXAMPLES_PER_CLASS
#2126 - Modified one-way partial dependence plots of categorical features to display data with a bar plot #2117
- Renamed
score
column forautoml.rankings
asmean_cv_score
#2135
Documentation Changes
- Fixed
conf.py
file #2112 - Added a sentence to the automl user guide stating that our support for time series problems is still in beta. #2118
- Fixed documentation demos #2139
- Update test badge in README to use GitHub Actions #2150
Testing Changes
- Fixed
test_describe_pipeline
forpandas
v1.2.4
#2129 - Added a GitHub Action for building the conda package #1870 #2148
Breaking Changes
- Renamed
balanced_ratio
tosampling_ratio
for theBalancedClassificationDataCVSplit
,BalancedClassificationDataTVSplit
,BalancedClassficationSampler
, and Undersampler #2113 - Deleted the "errors" key from automl results #1975
- Deleted the
raise_and_save_error_callback
and thelog_and_save_error_callback
#1975 - Fixed
BalancedClassificationDataCVSplit
,BalancedClassificationDataTVSplit
, andBalancedClassificationSampler
to use minority:majority ratio instead of majority:minority #2077
v0.22.0
v0.22.0 Apr. 7, 2021
Enhancements
- Added a GitHub Action for
linux_unit_tests
#2013 - Added recommended actions for
InvalidTargetDataCheck
, updated_make_component_list_from_actions
to address new action, and addedTargetImputer
component #1989 - Updated
AutoMLSearch._check_for_high_variance
to not emitRuntimeWarning
#2024 - Added exception when pipeline passed to
explain_predictions
is aStacked Ensemble
pipeline #2033 - Added sensitivity at low alert rates as an objective #2001
- Added
Undersampler
transformer component #2030
Fixes
- Updated Engine's
train_batch
to apply undersampling #2038 - Fixed bug in where Time Series Classification pipelines were not encoding targets in
predict
andpredict_proba
#2040 - Fixed data splitting errors if target is float for classification problems #2050
- Pinned
docutils
to <0.17 to fix ReadtheDocs warning issues #2088
Changes
- Removed lists as acceptable hyperparameter ranges in
AutoMLSearch
#2028 - Renamed "details" to "metadata" for data check actions #2008
Documentation Changes
v0.21.0
v0.21.0 Mar. 24, 2021
Enhancements
- Changed
AutoMLSearch
to defaultoptimize_thresholds
to True #1943 - Added multiple oversampling and undersampling sampling methods as data splitters for imbalanced classification #1775
- Added params to balanced classification data splitters for visibility #1966
- Updated
make_pipeline
to not addImputer
if input data does not have numeric or categorical columns #1967 - Updated
ClassImbalanceDataCheck
to better handle multiclass imbalances #1986 - Added recommended actions for the output of data check's
validate
method #1968 - Added error message for
partial_dependence
when features are mostly the same value #1994 - Updated
OneHotEncoder
to drop one redundant feature by default for features with two categories #1997 - Added a
PolynomialDetrender
component #1992
Fixes
- Updated binary classification pipelines to use objective decision function during scoring of custom objectives #1934
Changes
- Removed
data_checks
parameter,data_check_results
and data checks logic fromAutoMLSearch
#1935 - Deleted
random_state
argument #1985 - Updated Woodwork version requirement to
v0.0.11
#1996
Documentation Changes
Testing Changes
- Removed
build_docs
CI job in favor of RTD GH builder #1974 - Added tests to confirm support for Python 3.9 #1724
- Changed
build_conda_pkg
job to uselatest_release_changes
branch in the feedstock. #1979
Breaking Changes
- Changed
AutoMLSearch
to defaultoptimize_thresholds
to True #1943 - Removed
data_checks
parameter,data_check_results
and data checks logic fromAutoMLSearch
. To run the data checks which were previously run by default inAutoMLSearch
, please callDefaultDataChecks().validate(X_train, y_train)
or take a look at our documentation for more examples. #1935 - Deleted
random_state
argument #1985
v0.20.0
v0.20.0 Mar. 10, 2021
Enhancements
- Added a GitHub Action for Detecting dependency changes #1933
- Create a separate CV split to train stacked ensembler on for AutoMLSearch #1814
- Added a GitHub Action for Linux unit tests #1846
- Added
DataCheckAction
class andDataCheckActionCode
enum #1896 - Updated
Woodwork
requirement tov0.0.10
#1900 - Added
BalancedClassificationDataCVSplit
andBalancedClassificationDataTVSplit
to AutoMLSearch #1875 - Update default classification data splitter to use downsampling for highly imbalanced data #1875
- Updated
describe_pipeline
to return more information, includingid
of pipelines used for ensemble models #1909 - Added utility method to create list of components from a list of
DataCheckAction
#1907 - Updated
validate
method to include aaction
key in returned dictionary for allDataCheck
andDataChecks
#1916 - Aggregating the shap values for predictions that we know the provenance of, e.g. OHE, text, and date-time. #1901
- Improved error message when custom objective is passed as a string in
pipeline.score
#1941 - Added
score_pipelines
andtrain_pipelines
methods toAutoMLSearch
#1913 - Added
score_batch
andtrain_batch
abstact methods toEngineBase
and implementations inSequentialEngine
#1913
Fixes
- Removed CI check for
check_dependencies_updated_linux
#1950 - Added metaclass for time series pipelines and fix binary classification pipeline
predict
not using objective if it is passed as a named argument #1874 - Fixed stack trace in prediction explanation functions caused by mixed string/numeric pandas column names #1871
- Fixed stack trace caused by passing pipelines with duplicate names to
AutoMLSearch
#1932 - Fixed
AutoMLSearch.get_pipelines
returning pipelines with the same attributes #1958
Changes
- Reversed GitHub Action for Linux unit tests until a fix for report generation is found #1920
- Updated
add_results
inAutoMLAlgorithm
to take in entire pipeline results dictionary fromAutoMLSearch
#1891 - Updated
ClassImbalanceDataCheck
to look for severe class imbalance scenarios #1905 - Deleted the
explain_prediction
function #1915 - Removed
HighVarianceCVDataCheck
and convered it to anAutoMLSearch
method instead #1928
Documentation Changes
- Updated
model_understanding.ipynb
to demo the two-way partial dependence capability #1919
Testing Changes
Breaking Changes
v0.19.0
v0.19.0 Feb. 24, 2021
Enhancements
- Added a GitHub Action for Python windows unit tests #1844
- Added a GitHub Action for checking updated release notes #1849
- Added a GitHub Action for Python lint checks #1837
- Adjusted
explain_prediction
,explain_predictions
andexplain_predictions_best_worst
to handle timeseries problems. #1818 - Updated
InvalidTargetDataCheck
to check for mismatched indices in target and features #1816 - Updated
Woodwork
structures returned from components to supportWoodwork
logical type overrides set by the user #1784 - Updated estimators to keep track of input feature names during
fit()
#1794 - Updated
visualize_decision_tree
to include feature names in output #1813 - Added
is_bounded_like_percentage
property for objectives. If true, thecalculate_percent_difference
method will return the absolute difference rather than relative difference #1809 - Added full error traceback to AutoMLSearch logger file #1840
- Changed
TargetEncoder
to preserve custom indices in the data #1836 - Refactored
explain_predictions
andexplain_predictions_best_worst
to only compute features once for all rows that need to be explained #1843 - Added custom random undersampling sampler for classification #1857
- Updated
OutliersDataCheck
implementation to calculate the probability of having no outliers #1855 - Added
Engines
pipeline processing API #1838
Fixes
- Changed EngineBase random_state arg to random_seed and same for user guide docs #1889
Changes
- Modified
calculate_percent_difference
so that division by 0 is now inf rather than nan #1809 - Removed
text_columns
parameter fromLSA
andTextFeaturizer
components #1652 - Added
random_seed
as an argument to our automl/pipeline/component API. Usingrandom_state
will raise a warning #1798 - Added
DataCheckError
message inInvalidTargetDataCheck
if input target is None and removed exception raised #1866
Testing Changes
- Added back coverage for
_get_feature_provenance
inTextFeaturizer
aftertext_columns
was removed #1842 - Pin graphviz version for windows builds #1847
- Unpin graphviz version for windows builds #1851
Breaking Changes
- Added a deprecation warning to
explain_prediction
. It will be deleted in the next release. #1860
v0.18.2
v0.18.2 Feb. 10, 2021
Enhancements
- Added uniqueness score data check #1785
- Added "dataframe" output format for prediction explanations #1781
- Updated LightGBM estimators to handle
pandas.MultiIndex
#1770 - Sped up permutation importance for some pipelines #1762
- Added sparsity data check #1797
- Confirmed support for threshold tuning for binary time series classification problems #1803
Fixes
Changes
Documentation Changes
- Added section on conda to the contributing guide #1771
- Updated release process to reflect freezing
main
before perf tests #1787 - Moving some prs to the right section of the release notes #1789
- Tweak README.md. #1800
- Fixed back arrow on install page docs #1795
Testing Changes
v0.18.1
v0.18.1 Feb. 1, 2021
Enhancements
- Added
graph_t_sne
as a visualization tool for high dimensional data #1731 - Added the ability to see the linear coefficients of features in linear models terms #1738
- Added support for
scikit-learn
v0.24.0
#1733 - Added support for
scipy
v1.6.0
#1752 - Added SVM Classifier and Regressor to estimators #1714 #1761
Fixes
- Addressed bug with
partial_dependence
and categorical data with more categories than grid resolution #1748 - Removed
random_state
arg fromget_pipelines
inAutoMLSearch
#1719 - Pinned pyzmq at less than 22.0.0 till we add support #1756
- Remove
ProphetRegressor
from main as windows tests were flaky #1764
Changes
- Updated components and pipelines to return
Woodwork
data structures #1668 - Updated
clone()
for pipelines and components to copy over random state automatically #1753 - Dropped support for Python version 3.6 #1751
- Removed deprecated
verbose
flag fromAutoMLSearch
parameters #1772
Documentation Changes
- Add Twitter and Github link to documentation toolbar #1754
- Added Open Graph info to documentation #1758
Testing Changes
Breaking Changes
v0.18.0
v0.18.0 Jan. 26, 2021
Enhancements
- Added RMSLE, MSLE, and MAPE to core objectives while checking for negative target values in
invalid_targets_data_check
#1574 - Added validation checks for binary problems with regression-like datasets and multiclass problems without true multiclass targets in
invalid_targets_data_check
#1665 - Added time series support for
make_pipeline
#1566 - Added target name for output of pipeline
predict
method #1578 - Added multiclass check to
InvalidTargetDataCheck
for two examples per class #1596 - Support graphviz 0.16 #1657
- Enhanced time series pipelines to accept empty features #1651
- Added KNN Classifier to estimators. #1650
- Added support for list inputs for objectives #1663
- Added support for
AutoMLSearch
to handle time series classification pipelines #1666 - Enhanced
DelayedFeaturesTransformer
to encode categorical features and targets before delaying them #1691 - Added 2-way dependence plots. #1690
- Added ability to directly iterate through components within Pipelines #1583
Fixes
- Fixed inconsistent attributes and added Exceptions to docs #1673
- Fixed
TargetLeakageDataCheck
to use Woodworkmutual_information
rather than using Pandas' Pearson Correlation #1616 - Fixed thresholding for pipelines in
AutoMLSearch
to only threshold binary classification pipelines #1622 #1626 - Updated
load_data
to return Woodwork structures and update default parameter value forindex
toNone
#1610 - Pinned scipy at < 1.6.0 while we work on adding support #1629
- Fixed data check message formatting in
AutoMLSearch
#1633 - Addressed stacked ensemble component for
scikit-learn
v0.24 support by settingshuffle=True
for default CV #1613 - Fixed bug where
Imputer
reset the index onX
#1590 - Fixed
AutoMLSearch
stacktrace when a cutom objective was passed in as a primary objective or additional objective #1575 - Fixed custom index bug for
MAPE
objective #1641 - Fixed index bug for
TextFeaturizer
andLSA
components #1644 - Limited
load_fraud
dataset loaded intoautoml.ipynb
#1646 add_to_rankings
updatesAutoMLSearch.best_pipeline
when necessary #1647- Fixed bug where time series baseline estimators were not receiving
gap
andmax_delay
inAutoMLSearch
#1645 - Fixed jupyter notebooks to help the RTD buildtime #1654
- Added
positive_only
objectives tonon_core_objectives
#1661 - Fixed stacking argument
n_jobs
for IterativeAlgorithm #1706 - Updated CatBoost estimators to return self in
.fit()
rather than the underlying model for consistency #1701 - Added ability to initialize pipeline parameters in
AutoMLSearch
constructor #1676 - Make AutoMLSearch pipelines pickle-able #1721
Changes
- Added labeling to
graph_confusion_matrix
#1632 - Rerunning search for
AutoMLSearch
results in a message thrown rather than failing the search, and removedhas_searched
property #1647 - Changed tuner class to allow and ignore single parameter values as input #1686
- Capped LightGBM version limit to remove bug in docs #1711
- Removed support for
np.random.RandomState
in EvalML #1727
Documentation Changes
- Update Model Understanding in the user guide to include
visualize_decision_tree
#1678 - Updated docs to include information about
AutoMLSearch
callback parameters and methods #1577 - Updated docs to prompt users to install graphiz on Mac #1656
- Added
infer_feature_types
to thestart.ipynb
guide #1700 - Added multicollinearity data check to API reference and docs #1707
Testing Changes
Breaking Changes
v0.17.0
v0.17.0 Dec. 29, 2020
Enhancements
- Added
save_plot
that allows for saving figures from different backends #1588 - Added
LightGBM Regressor
to regression components #1459 - Added
visualize_decision_tree
for tree visualization withdecision_tree_data_from_estimator
anddecision_tree_data_from_pipeline
to reformat tree structure output #1511 - Added
DFS Transformer
component into transformer components #1454 - Added
MAPE
to the standard metrics for time series problems and update objectives #1510 - Added
graph_prediction_vs_actual_over_time
andget_prediction_vs_actual_over_time_data
to the model understanding module for time series problems #1483 - Added a
ComponentGraph
class that will support future pipelines as directed acyclic graphs #1415 - Updated data checks to accept
Woodwork
data structures #1481 - Added parameter to
InvalidTargetDataCheck
to show only top unique values rather than all unique values #1485 - Added multicollinearity data check #1515
- Added baseline pipeline and components for time series regression problems #1496
- Added more information to users about ensembling behavior in
AutoMLSearch
#1527 - Add woodwork support for more utility and graph methods #1544
- Changed
DateTimeFeaturizer
to encode features as int #1479 - Return trained pipelines from
AutoMLSearch.best_pipeline
#1547 - Added utility method so that users can set feature types without having to learn about Woodwork directly #1555
- Added Linear Discriminant Analysis transformer for dimensionality reduction #1331
- Added multiclass support for
partial_dependence
andgraph_partial_dependence
#1554 - Added
TimeSeriesBinaryClassificationPipeline
andTimeSeriesMulticlassClassificationPipeline
classes #1528 - Added
make_data_splitter
method for easier automl data split customization #1568 - Integrated
ComponentGraph
class into Pipelines for full non-linear pipeline support #1543 - Update
AutoMLSearch
constructor to take training data instead ofsearch
andadd_to_leaderboard
#1597 - Update
split_data
helper args #1597 - Add problem type utils
is_regression
,is_classification
,is_timeseries
#1597 - Rename
AutoMLSearch
data_split
arg todata_splitter
#1569
Fixes
- Fix Windows CI jobs: install
numba
via conda, required forshap
#1490 - Added custom-index support for
reset-index-get_prediction_vs_actual_over_time_data
#1494 - Fix
generate_pipeline_code
to account for boolean and None differences between Python and JSON #1524 #1531 - Set max value for plotly and xgboost versions while we debug CI failures with newer versions #1532
- Undo version pinning for plotly #1533
- Fix ReadTheDocs build by updating the version of
setuptools
#1561 - Set
random_state
of data splitter in AutoMLSearch to take int to keep consistency in the resulting splits #1579 - Pin sklearn version while we work on adding support #1594
- Pin pandas at <1.2.0 while we work on adding support #1609
- Pin graphviz at < 0.16 while we work on adding support #1609
Changes
- Reverting
save_graph
#1550 to resolve kaleido build issues #1585 - Update circleci badge to apply to
main
#1489 - Added script to generate github markdown for releases #1487
- Updated dependencies to fix
ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes'
error and to address Woodwork and Featuretool dependencies #1540 - Made
get_prediction_vs_actual_data()
a public method #1553 - Updated
Woodwork
version requirement to v0.0.7 #1560 - Move data splitters from
evalml.automl.data_splitters
toevalml.preprocessing.data_splitters
#1597 - Rename "# Testing" in automl log output to "# Validation" #1597
Documentation Changes
- Added partial dependence methods to API reference #1537
- Updated documentation for confusion matrix methods #1611
Testing Changes
- Set
n_jobs=1
in most unit tests to reduce memory #1505
Breaking Changes
- Updated minimal dependencies:
numpy>=1.19.1
,pandas>=1.1.0
,scikit-learn>=0.23.1
,scikit-optimize>=0.8.1
- Updated
AutoMLSearch.best_pipeline
to return a trained pipeline. Pass intrain_best_pipeline=False
to AutoMLSearch in order to return an untrained pipeline. - Pipeline component instances can no longer be iterated through using
Pipeline.component_graph
#1543 - Update
AutoMLSearch
constructor to take training data instead ofsearch
andadd_to_leaderboard
#1597 - Update
split_data
helper args #1597 - Move data splitters from
evalml.automl.data_splitters
toevalml.preprocessing.data_splitters
#1597 - Rename
AutoMLSearch
data_split
arg todata_splitter
#1569