Releases: alteryx/evalml
Releases · alteryx/evalml
v0.16.1
v0.16.1 Dec. 1, 2020
Enhancements
- Pin woodwork version to v0.0.6 to avoid breaking changes #1484
Fixes
- Updated
Woodwork
to >=0.0.5 incore-requirements.txt
#1473 - Removed
copy_dataframe
parameter forWoodwork
, updatedWoodwork
to >=0.0.6 incore-requirements.txt
#1478 - Updated
detect_problem_type
to usepandas.api.is_numeric_dtype
#1476
Changes
- Changed
make clean
to delete coverage reports as a convenience for developers #1464
Documentation Changes
Testing Changes
- Update dependency update checker to use everything from core and optional dependencies #1480
v0.16.0
v0.16.0 Nov. 24, 2020
Enhancements
- Updated pipelines and
make_pipeline
to acceptWoodwork
inputs #1393 - Updated components to accept
Woodwork
inputs #1423 - Added ability to freeze hyperparameters for
AutoMLSearch
#1284 - Added
Target Encoder
into transformer components #1401 - Added callback for error handling in
AutoMLSearch
#1403 - Added the index id to the
explain_predictions_best_worst
output to help users identify which rows in their data are included #1365 - The top_k features displayed in
explain_predictions_*
functions are now determined by the magnitude of shap values as opposed to thetop_k
largest and smallest shap values. #1374 - Added a problem type for time series regression #1386
- Added a
is_defined_for_problem_type
method toObjectiveBase
#1386 - Added a
random_state
parameter tomake_pipeline_from_components
function #1411 - Added
DelayedFeaturesTransformer
#1396 - Added a
TimeSeriesRegressionPipeline
class #1418 - Removed
core-requirements.txt
from the package distribution #1429 - Updated data check messages to include a
"code"
and"details"
fields #1451 #1462 - Added a
TimeSeriesSplit
data splitter for time series problems #1441 - Added a
problem_configuration
parameter to AutoMLSearch #1457
Fixes
- Fixed
IndexError
raised inAutoMLSearch
whenensembling = True
but only one pipeline to iterate over #1397 - Fixed stacked ensemble input bug and LightGBM warning and bug in
AutoMLSearch
#1388 - Updated enum classes to show possible enum values as attributes #1391
- Updated calls to
Woodwork
'sto_pandas()
toto_series()
andto_dataframe()
#1428 - Fixed bug in OHE where column names were not guaranteed to be unique #1349
- Fixed bug with percent improvement of
ExpVariance
objective on data with highly skewed target #1467
Changes
- Changed
OutliersDataCheck
to return the list of columns, rather than rows, that contain outliers #1377 - Simplified and cleaned output for Code Generation #1371
- Updated data checks to return dictionary of warnings and errors instead of a list #1448
- Updated
AutoMLSearch
to passWoodwork
data structures to every pipeline (instead of pandas DataFrames) #1450 - Update
AutoMLSearch
to default tomax_batches=1
instead ofmax_iterations=5
#1452
Documentation Changes
- Added description of CLA to contributing guide, updated description of draft PRs #1402
- Updated documentation to include all data checks,
DataChecks
, and usage of data checks in AutoML #1412 - Updated docstrings from
np.array
tonp.ndarray
#1417 - Added section on stacking ensembles in AutoMLSearch documentation #1425
Testing Changes
- Removed
category_encoders
from test-requirements.txt #1373 - Tweak codecov.io settings again to avoid flakes #1413
- Modified
make lint
to check notebook versions in the docs #1431 - Modified
make lint-fix
to standardize notebook versions in the docs #1431 - Use new version of pull request Github Action for dependency check #1443
- Reduced number of workers for tests to 4 #1447
Breaking Changes
- The
top_k
andtop_k_features
parameters inexplain_predictions_*
functions now returnk
features as opposed to2 * k
features #1374 - Renamed
problem_type
toproblem_types
inRegressionObjective
,BinaryClassificationObjective
, andMulticlassClassificationObjective
#1319 - Data checks now return a dictionary of warnings and errors instead of a list #1448
- 🦃 🚀
v0.15.0
v0.15.0 Oct. 29, 2020
Enhancements
- Added stacked ensemble component classes (
StackedEnsembleClassifier
,StackedEnsembleRegressor
) #1134 - Added stacked ensemble components to
AutoMLSearch
#1253 - Added
DecisionTreeClassifier
andDecisionTreeRegressor
to AutoML #1255 - Added
graph_prediction_vs_actual
inmodel_understanding
for regression problems #1252 - Added parameter to
OneHotEncoder
to enable filtering for features to encode for #1249 - Added percent-better-than-baseline for all objectives to automl.results #1244
- Added
HighVarianceCVDataCheck
and replaced synonymous warning inAutoMLSearch
#1254 - Added
PCA Transformer
component for dimensionality reduction #1270 - Added
generate_pipeline_code
andgenerate_component_code
to allow for code generation given a pipeline or component instance #1306 - Added
PCA Transformer
component for dimensionality reduction #1270 - Updated
AutoMLSearch
to supportWoodwork
data structures #1299 - Added cv_folds to
ClassImbalanceDataCheck
and added this check toDefaultDataChecks
#1333 - Make
max_batches
argument toAutoMLSearch.search
public #1320 - Added text support to automl search #1062
- Added
_pipelines_per_batch
as a private argument toAutoMLSearch
#1355
Fixes
- Fixed ML performance issue with ordered datasets: always shuffle data in automl's default CV splits #1265
- Fixed broken
evalml info
CLI command #1293 - Fixed
boosting type='rf'
for LightGBM Classifier, as well asnum_leaves
error #1302 - Fixed bug in
explain_predictions_best_worst
where a custom index in the target variable would cause aValueError
#1318 - Added stacked ensemble estimators to to
evalml.pipelines.__init__
file #1326 - Fixed bug in OHE where calls to transform were not deterministic if
top_n
was less than the number of categories in a column #1324 - Fixed LightGBM warning messages during AutoMLSearch #1342
- Fix warnings thrown during AutoMLSearch in
HighVarianceCVDataCheck
#1346 - Fixed bug where TrainingValidationSplit would return invalid location indices for dataframes with a custom index #1348
- Fixed bug where the AutoMLSearch
random_state
was not being passed to the created pipelines #1321
Changes
- Allow
add_to_rankings
to be called before AutoMLSearch is called #1250 - Removed Graphviz from test-requirements to add to requirements.txt #1327
- Removed
max_pipelines
parameter fromAutoMLSearch
#1264 - Include editable installs in all install make targets #1335
- Made pip dependencies
featuretools
andnlp_primitives
core dependencies #1062 - Removed
PartOfSpeechCount
fromTextFeaturizer
transform primitives #1062 - Added warning for
partial_dependency
when the feature includes null values #1352
Documentation Changes
- Fixed and updated code blocks in Release Notes #1243
- Added DecisionTree estimators to API Reference #1246
- Changed class inheritance display to flow vertically #1248
- Updated cost-benefit tutorial to use a holdout/test set #1159
- Added
evalml info
command to documentation #1293 - Miscellaneous doc updates #1269
- Removed conda pre-release testing from the release process document #1282
- Updates to contributing guide #1310
- Added Alteryx footer to docs with Twitter and Github link #1312
- Added documentation for evalml installation for Python 3.6 #1322
- Added documentation changes to make the API Docs easier to understand #1323
- Fixed documentation for
feature_importance
#1353 - Added tutorial for running
AutoML
with text data #1357 - Added documentation for woodwork integration with automl search #1361
Testing Changes
- Added tests for
jupyter_check
to handle IPython #1256 - Cleaned up
make_pipeline
tests to test for all estimators #1257 - Added a test to check conda build after merge to main #1247
- Removed code that was lacking codecov for
__main__.py
and unnecessary #1293 - Codecov: round coverage up instead of down #1334
- Add DockerHub credentials to CI testing environment #1356
- Add DockerHub credentials to conda testing environment #1363
Breaking Changes
- Renamed
LabelLeakageDataCheck
toTargetLeakageDataCheck
#1319 max_pipelines
parameter has been removed fromAutoMLSearch
. Please usemax_iterations
instead. #1264AutoMLSearch.search()
will now log a warning if the input is not aWoodwork
data structure (pandas
,numpy
) #1299- Make
max_batches
argument toAutoMLSearch.search
public #1320 - Removed unused argument
feature_types
from AutoMLSearch.search #1062
v0.14.1
v0.14.1 Sep. 29, 2020
Enhancements
- Updated partial dependence methods to support calculating numeric columns in a dataset with non-numeric columns #1150
- Added
get_feature_names
onOneHotEncoder
#1193 - Added
detect_problem_type
toproblem_type/utils.py
to automatically detect the problem type given targets #1194 - Added LightGBM to AutoMLSearch #1199
- Updates scikit-learn and scikit-optimize to use latest versions - 0.23.2 and 0.8.1 respectively #1141
- Added
__str__
and__repr__
for pipelines and components #1218 - Included internal target check for both training and validation data in AutoMLSearch #1226
- Add
ProblemTypes.all_problem_types
helper to get list of supported problem types #1219 - Added
DecisionTreeClassifier
andDecisionTreeRegressor
classes #1223 - Added
ProblemTypes.all_problem_types
helper to get list of supported problem types #1219 DataChecks
can now be parametrized by passing a list ofDataCheck
classes and a parameter dictionary #1167- Added first CV fold score as validation score in
AutoMLSearch.rankings
#1221 - Updated
flake8 configuration to enable linting on
init.py` files #1234 - Refined
make_pipeline_from_components
implementation #1204
Fixes
- Updated GitHub URL after migration to Alteryx GitHub org #1207
- Changed Problem Type enum to be more similar to the string name #1208
- Wrapped call to scikit-learn's partial dependence method in a
try
/finally
block #1232
Changes
- Added
allow_writing_files
as a named argument to CatBoost estimators. #1202 - Added
solver
andmulti_class
as named arguments to LogisticRegressionClassifier #1202 - Replaced pipeline's
._transform
method to evaluate all the preprocessing steps of a pipeline with.compute_estimator_features
#1231 - Changed default large dataset train/test splitting behavior #1205
Documentation Changes
- Included description of how to access the component instances and features for pipeline user guide #1163
- Updated API docs to refer to target as "target" instead of "labels" for non-classification tasks and minor docs cleanup #1160
- Added Class Imbalance Data Check to
api_reference.rst
#1190 #1200 - Added pipeline properties to API reference #1209
- Clarified what the objective parameter in AutoML is used for in AutoML API reference and AutoML user guide #1222
- Updated API docs to include
skopt.space.Categorical
option for component hyperparameter range definition #1228 - Added install documentation for
libomp
in order to use LightGBM on Mac #1233 - Improved description of
max_iterations
in documentation #1212 - Removed unused code from sphinx conf #1235
###Testing Changes
Breaking Changes
- DefaultDataChecks now accepts a problem_type parameter that must be specified #1167
- Pipeline's
._transform
method to evaluate all the preprocessing steps of a pipeline has been replaced with.compute_estimator_features
#1231 get_objectives
has been renamed toget_core_objectives
. This function will now return a list of valid objective instances #1230
v0.14.dev0
Development release for testing purposes
v0.13.2
v0.13.2 Sep. 17, 2020
Enhancements
- Added
output_format
field to explain predictions functions #1107 - Modified
get_objective
andget_objectives
to be able to return any objective inevalml.objectives
#1132 - Added a
return_instance
boolean parameter toget_objective
#1132 - Added
ClassImbalanceDataCheck
to determine whether target imbalance falls below a given threshold #1135 - Added label encoder to lightGBM for binary classification #1152
- Added labels for the row index of confusion matrix #1154
- Added AutoMLSearch object as another parameter in search callbacks #1156
- Added the corresponding probability threshold for each point displayed in
graph_roc_curve
#1161 - Added
__eq__
forComponentBase
andPipelineBase
#1178 - Added support for multiclass classification for
roc_curve
#1164 - Added
categories
accessor toOneHotEncoder
for listing the categories associated with a feature #1182 - Added utility function to create pipeline instances from a list of component instances #1176
Fixes
- Fixed XGBoost column names for partial dependence methods #1104
- Removed dead code validating column type from
TextFeaturizer
#1122 - Fixed issue where Imputer cannot fit when there is None in a categorical or boolean column #1144
- OneHotEncoder preserves the custom index in the input data #1146
- Fixed representation for
ModelFamily
#1165 - Removed duplicate
nbsphinx
dependency indev-requirements.txt
#1168 - Users can now pass in any valid kwargs to all estimators #1157
- Remove broken accessor
OneHotEncoder.get_feature_names
and unneeded base class #1179 - Removed LightGBM Estimator from AutoML models #1186
Changes
- Pinned scikit-optimize version to 0.7.4 #1136
- Removed tqdm as a dependency #1177
- Added lightgbm version 3.0.0 to latest_dependency_versions.txt #1185
Documentation Changes
- Fixed API docs for
AutoMLSearch
add_result_callback
#1113 - Added a step to our release process for pushing our latest version to conda-forge #1118
- Added warning for missing ipywidgets dependency for using
PipelineSearchPlots
on Jupyterlab #1145 - Updated README.md example to load demo dataset #1151
- Swapped mapping of breast cancer targets in
model_understanding.ipynb
#1170
Testing Changes
v0.13.dev1
Development release for testing purposes.
v0.13.1
v0.13.1 Aug. 25, 2020
Enhancements
- Added Cost-Benefit Matrix objective for binary classification #1038
- Split
fill_value
intocategorical_fill_value
andnumeric_fill_value
for Imputer #1019 - Added
explain_predictions
andexplain_predictions_best_worst
for explaining multiple predictions with SHAP #1016 - Added new LSA component for text featurization #1022
- Added guide on installing with conda #1041
- Added a “cost-benefit curve” util method to graph cost-benefit matrix scores vs. binary classification thresholds #1081
- Standardized error when calling transform/predict before fit for pipelines #1048
- Added
percent_better_than_baseline
to Automl search rankings and full rankings table #1050 - Added one-way partial dependence and partial dependence plots #1079
- Added "Feature Value" column to prediction explanation reports. #1064
- Added
max_batches
parameter to AutoMLSearch #1087
Fixes
- Updated TextFeaturizer component to no longer require an internet connection to run #1022
- Fixed non-deterministic element of TextFeaturizer transformations #1022
- Added a StandardScaler to all ElasticNet pipelines #1065
- Updated cost-benefit matrix to normalize score #1099
- Fixed logic in
calculate_percent_difference
so that it can handle negative values #1100
Changes
- Added
needs_fitting
property to ComponentBase #1044 - Updated references to data types to use datatype lists defined in
evalml.utils.gen_utils
#1039 - Remove maximum version limit for SciPy dependency #1051
- Moved
all_components
and other component importers into runtime methods #1045 - Consolidated graphing utility methods under
evalml.utils.graph_utils
#1060 - Made slight tweaks to how TextFeaturizer uses featuretools, and did some refactoring of that and of LSA #1090
- Changed
show_all_features
parameter intofeature_threshold
, which allows for thresholding feature importance #1097
Documentation Changes
- Update setup.py URL to point to the github repo #1037
- Added tutorial for using the cost-benefit matrix objective #1088
Testing Changes
- Refactor CircleCI tests to use matrix jobs #1043
- Added a test to check that all test directories are included in evalml package #1054
Breaking Changes
v0.13.0.dev0
Development release for testing purposes.