# EvalML

EvalML is an open-source Python library created by folks at Alteryx, the people behind Featuretools, that facilitates automated machine learning (AutoML) and model understanding. It abstracts multiple modelling libraries and provides a simple, unified API for building machine learning models. EvalML supports a wide range of supervised learning problems such as regression, binary classification and multiclass classification. 

To read about it more, please refer to [this](https://analyticsindiamag.com/automate-your-ml-pipelines-with-evalml/) article.

# Using EvalML’s AutoML to search for the best Classification Algorithm

  Install EvalML from PyPI.

In [None]:
!python -m pip install pip --upgrade --user -q
!python -m pip install numpy pandas seaborn matplotlib scipy sklearn statsmodels tensorflow keras --user -q

In [None]:
!python -m pip install evalml --user -q --no-warn-script-location

In [None]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

In [None]:
import evalml
from evalml import AutoMLSearch
X, y = evalml.demos.load_breast_cancer()
X_train, X_test, y_train, y_test = evalml.preprocessing.split_data(X, y, problem_type='binary') 

Run the search for the best classification model.

In [None]:
automl = AutoMLSearch(X_train=X_train, y_train=y_train,   problem_type='binary')
automl.search() 

This uses the default objective function, binary log loss. 

  Print model rankings and get the best pipeline.

In [None]:
automl.rankings

In [None]:
automl.describe_pipeline(automl.rankings.iloc[0]["id"])

Logistic Regression is the best model for the binary log-loss objective. Let’s change it to the area under the Precision-Recall curve and see how that impacts the best model.

In [None]:
automl_auc = AutoMLSearch(X_train=X_train, y_train=y_train,
                          problem_type='binary',
                          objective='auc',
                          additional_objectives=['f1', 'precision'],                    
                          optimize_thresholds=True)
automl_auc.search() 

  Print model rankings and get the best pipeline.

In [None]:
automl_auc.rankings

In [None]:
automl_auc.describe_pipeline(automl.rankings.iloc[0]["id"])

The optimal model has now changed to ExtraTreesClassifier. This model can be used to make predictions on the validation/test data or saved for use later.

In [None]:
best_model = automl_auc.best_pipeline
# best_model.save("model.pkl")
old_model=automl.load('model.pkl')
old_model.predict_proba(X_test)