EvalML is an open-source Python library created by folks at Alteryx, the people behind Featuretools, that facilitates automated machine learning (AutoML) and model understanding. It abstracts multiple modelling libraries and provides a simple, unified API for building machine learning models. EvalML supports a wide range of supervised learning problems such as regression, binary classification and multiclass classification. 

The pipelines created by EvalML’s AutoMLSearch includes preprocessing and featuring engineering out of the box. The user has to identify the target attribute; AutoML runs a search algorithm to train and score several models for the problem type. This enables the user to select one of the models based on their scores and then use it to generate predictions or do analysis. It also supports custom problem-specific objective functions, enabling users to specify exactly what makes a model valuable for their use case. 

Not only do these custom objectives help steer the AutoML search towards models with higher impact, but they are also used to tune the classification thresholds of binary classification models. You can find an example of a custom objective function created for the task credit card fraud detection here. Additionally, EvalML has a collection of models and tools for model understanding. It currently supports feature importance and permutation importance, partial dependence, precision-recall, confusion matrices, ROC curves, prediction explanations, and binary classifier threshold optimization.

Furthermore, EvalML provides data checks that can be used to catch common problems with data before modelling. This helps prevent model quality problems, ambiguous bugs and stack traces. Currently EvalML includes the following data checks:

An approach for detecting target leakage by providing the model with information during training that won’t be available at prediction-time
Detection of invalid datatypes 
Checking for class imbalance
Looking for redundant features like highly null columns, constant columns, and columns which are probably an ID and not useful for modelling.

# *Install EvalML from PyPI.*

In [1]:
!pip install evalml

Collecting evalml
  Downloading evalml-0.23.0-py3-none-any.whl (6.2 MB)
[K     |████████████████████████████████| 6.2 MB 963 kB/s 
Collecting woodwork==0.0.11
  Downloading woodwork-0.0.11-py3-none-any.whl (91 kB)
[K     |████████████████████████████████| 91 kB 5.3 MB/s 
Collecting nlp-primitives>=1.1.0
  Downloading nlp_primitives-1.1.0-py3-none-any.whl (18.0 MB)
[K     |████████████████████████████████| 18.0 MB 18.8 MB/s 
Collecting kaleido>=0.1.0
  Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl (79.9 MB)
[K     |████████████████████████████████| 79.9 MB 160 kB/s 
[?25hCollecting xgboost<1.3.0,>=0.82
  Downloading xgboost-1.2.1-py3-none-manylinux2010_x86_64.whl (148.9 MB)
[K     |████████████████████████████████| 148.9 MB 79 kB/s 
[?25hCollecting lightgbm<3.1.0,>=2.3.1
  Downloading lightgbm-3.0.0-py2.py3-none-manylinux1_x86_64.whl (1.7 MB)
[K     |████████████████████████████████| 1.7 MB 19.6 MB/s 
Collecting graphviz>=0.13
  Downloading grap

# Load the breast cancer dataset and split it.

In [2]:
import evalml
from evalml import AutoMLSearch
X, y = evalml.demos.load_breast_cancer()
X_train, X_test, y_train, y_test = evalml.preprocessing.split_data(X, y, problem_type='binary')
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((455, 30), (114, 30), (455,), (114,))

# Run the search for the best classification model.

In [3]:
automl = AutoMLSearch(X_train=X_train, y_train=y_train,   problem_type='binary')
automl.search() 

Using default limit of max_batches=1.

Generating pipelines to search over...



*****************************
* Beginning pipeline search *
*****************************

Optimizing for Log Loss Binary. 
Lower score is better.

Using SequentialEngine to train and score pipelines.
Searching up to 1 batches for a total of 9 pipelines. 
Allowed model families: random_forest, xgboost, decision_tree, linear_model, catboost, extra_trees, lightgbm



FigureWidget({
    'data': [{'mode': 'lines+markers',
              'name': 'Best Score',
              'type'…

Evaluating Baseline Pipeline: Mode Baseline Binary Classification Pipeline
Mode Baseline Binary Classification Pipeline:
	Starting cross validation
	Finished cross validation - mean Log Loss Binary: 12.904

*****************************
* Evaluating Batch Number 1 *
*****************************

Decision Tree Classifier w/ Imputer:
	Starting cross validation
	Finished cross validation - mean Log Loss Binary: 2.432
	High coefficient of variation (cv >= 0.2) within cross validation scores.
	Decision Tree Classifier w/ Imputer may not perform as estimated on unseen data.
Extra Trees Classifier w/ Imputer:
	Starting cross validation
	Finished cross validation - mean Log Loss Binary: 0.137
CatBoost Classifier w/ Imputer:
	Starting cross validation
	Finished cross validation - mean Log Loss Binary: 0.386
Random Forest Classifier w/ Imputer:
	Starting cross validation
	Finished cross validation - mean Log Loss Binary: 0.120
LightGBM Classifier w/ Imputer:
	Starting cross validation
	Finished

# Print model rankings and get the best pipeline.

In [4]:
automl.rankings

Unnamed: 0,id,pipeline_name,mean_cv_score,standard_deviation_cv_score,validation_score,percent_better_than_baseline,high_variance_cv,parameters
0,8,Logistic Regression Classifier w/ Imputer + St...,0.094015,0.033791,0.060529,99.271446,True,{'Imputer': {'categorical_impute_strategy': 'm...
1,6,XGBoost Classifier w/ Imputer,0.113098,0.038613,0.069048,99.123568,True,{'Imputer': {'categorical_impute_strategy': 'm...
2,4,Random Forest Classifier w/ Imputer,0.119972,0.019487,0.099614,99.070299,False,{'Imputer': {'categorical_impute_strategy': 'm...
3,5,LightGBM Classifier w/ Imputer,0.132722,0.024842,0.110679,98.971496,False,{'Imputer': {'categorical_impute_strategy': 'm...
4,2,Extra Trees Classifier w/ Imputer,0.136959,0.022862,0.111169,98.938661,False,{'Imputer': {'categorical_impute_strategy': 'm...
5,3,CatBoost Classifier w/ Imputer,0.386387,0.011583,0.374338,97.005774,False,{'Imputer': {'categorical_impute_strategy': 'm...
6,7,Elastic Net Classifier w/ Imputer + Standard S...,0.505862,0.008317,0.496767,96.079926,False,{'Imputer': {'categorical_impute_strategy': 'm...
7,1,Decision Tree Classifier w/ Imputer,2.431916,0.531935,2.726782,81.15435,True,{'Imputer': {'categorical_impute_strategy': 'm...
8,0,Mode Baseline Binary Classification Pipeline,12.904388,0.082537,12.952041,0.0,False,{'Baseline Classifier': {'strategy': 'mode'}}


In [5]:
automl.describe_pipeline(automl.rankings.iloc[0]["id"])


***************************************************************
* Logistic Regression Classifier w/ Imputer + Standard Scaler *
***************************************************************

Problem Type: binary
Model Family: Linear

Pipeline Steps
1. Imputer
	 * categorical_impute_strategy : most_frequent
	 * numeric_impute_strategy : mean
	 * categorical_fill_value : None
	 * numeric_fill_value : None
2. Standard Scaler
3. Logistic Regression Classifier
	 * penalty : l2
	 * C : 1.0
	 * n_jobs : -1
	 * multi_class : auto
	 * solver : lbfgs

Training
Training for binary problems.
Total training time (including CV): 4.8 seconds

Cross Validation
----------------
             Log Loss Binary  MCC Binary   AUC  Precision    F1  Balanced Accuracy Binary  Accuracy Binary  Sensitivity at Low Alert Rates # Training # Validation
0                      0.061       0.958 0.997      0.966 0.974                     0.981            0.980                           0.412        303          152
1

# Logistic Regression is the best model for the binary log-loss objective. Let’s change it to the area under the Precision-Recall curve and see how that impacts the best model.

In [6]:
automl_auc = AutoMLSearch(X_train=X_train, y_train=y_train,
                           problem_type='binary',
                           objective='auc',
                           additional_objectives=['f1', 'precision'],                    
                           optimize_thresholds=True)
automl_auc.search() 

Using default limit of max_batches=1.

Generating pipelines to search over...

*****************************
* Beginning pipeline search *
*****************************

Optimizing for AUC. 
Greater score is better.

Using SequentialEngine to train and score pipelines.
Searching up to 1 batches for a total of 9 pipelines. 
Allowed model families: random_forest, xgboost, decision_tree, linear_model, catboost, extra_trees, lightgbm



FigureWidget({
    'data': [{'mode': 'lines+markers',
              'name': 'Best Score',
              'type'…

Evaluating Baseline Pipeline: Mode Baseline Binary Classification Pipeline
Mode Baseline Binary Classification Pipeline:
	Starting cross validation
	Finished cross validation - mean AUC: 0.500

*****************************
* Evaluating Batch Number 1 *
*****************************

Decision Tree Classifier w/ Imputer:
	Starting cross validation
	Finished cross validation - mean AUC: 0.923
Extra Trees Classifier w/ Imputer:
	Starting cross validation
	Finished cross validation - mean AUC: 0.993
CatBoost Classifier w/ Imputer:
	Starting cross validation
	Finished cross validation - mean AUC: 0.991
Random Forest Classifier w/ Imputer:
	Starting cross validation
	Finished cross validation - mean AUC: 0.992
LightGBM Classifier w/ Imputer:
	Starting cross validation
	Finished cross validation - mean AUC: 0.991
XGBoost Classifier w/ Imputer:
	Starting cross validation
	Finished cross validation - mean AUC: 0.991
Elastic Net Classifier w/ Imputer + Standard Scaler:
	Starting cross validation

# Print model rankings and get the best pipeline.

In [7]:
automl_auc.rankings

Unnamed: 0,id,pipeline_name,mean_cv_score,standard_deviation_cv_score,validation_score,percent_better_than_baseline,high_variance_cv,parameters
0,2,Extra Trees Classifier w/ Imputer,0.992791,0.00392,0.995753,49.279119,False,{'Imputer': {'categorical_impute_strategy': 'm...
1,4,Random Forest Classifier w/ Imputer,0.992482,0.00384,0.994367,49.248175,False,{'Imputer': {'categorical_impute_strategy': 'm...
2,8,Logistic Regression Classifier w/ Imputer + St...,0.991342,0.006489,0.996676,49.134239,False,{'Imputer': {'categorical_impute_strategy': 'm...
3,3,CatBoost Classifier w/ Imputer,0.991305,0.003883,0.993906,49.130502,False,{'Imputer': {'categorical_impute_strategy': 'm...
4,6,XGBoost Classifier w/ Imputer,0.991265,0.004012,0.995568,49.126544,False,{'Imputer': {'categorical_impute_strategy': 'm...
5,5,LightGBM Classifier w/ Imputer,0.9907,0.001723,0.991505,49.070044,False,{'Imputer': {'categorical_impute_strategy': 'm...
6,7,Elastic Net Classifier w/ Imputer + Standard S...,0.984943,0.011105,0.996861,48.494262,False,{'Imputer': {'categorical_impute_strategy': 'm...
7,1,Decision Tree Classifier w/ Imputer,0.923371,0.018786,0.919298,42.337093,False,{'Imputer': {'categorical_impute_strategy': 'm...
8,0,Mode Baseline Binary Classification Pipeline,0.5,0.0,0.5,0.0,False,{'Baseline Classifier': {'strategy': 'mode'}}


In [8]:
automl_auc.describe_pipeline(automl.rankings.iloc[0]["id"])


***************************************************************
* Logistic Regression Classifier w/ Imputer + Standard Scaler *
***************************************************************

Problem Type: binary
Model Family: Linear

Pipeline Steps
1. Imputer
	 * categorical_impute_strategy : most_frequent
	 * numeric_impute_strategy : mean
	 * categorical_fill_value : None
	 * numeric_fill_value : None
2. Standard Scaler
3. Logistic Regression Classifier
	 * penalty : l2
	 * C : 1.0
	 * n_jobs : -1
	 * multi_class : auto
	 * solver : lbfgs

Training
Training for binary problems.
Total training time (including CV): 1.2 seconds

Cross Validation
----------------
              AUC    F1  Precision # Training # Validation
0           0.997 0.974      0.966        303          152
1           0.984 0.955      0.981        303          152
2           0.993 0.963      1.000        304          151
mean        0.991 0.964      0.982          -            -
std         0.006 0.010      0.0

# The optimal model has now changed to ExtraTreesClassifier. This model can be used to make predictions on the validation/test data or saved for use later

In [9]:
best_model = automl_auc.best_pipeline
best_model.save("./model.pkl")
old_model=automl.load('./model.pkl')
old_model.predict_proba(X_test).to_dataframe()

Unnamed: 0,benign,malignant
0,0.961507,0.038493
1,0.782505,0.217495
2,0.765290,0.234710
3,0.962348,0.037652
4,0.929774,0.070226
...,...,...
109,0.965529,0.034471
110,0.812739,0.187261
111,0.982346,0.017654
112,0.009876,0.990124
