# Train a binary classification model

In this tutorial, we walk through a simple binary classificaiton problem using PyCaret.

## Install required packages

In [None]:
!pip install --upgrade pycaret scikit-plot

## Setup cloud tracking

[Mlflow](https://github.com/mlflow/mlflow) is a great tool for local ML experimentation tracking. However, using it alone is like using git without GitHub. Your Azure Machine Learning workspace can easily be used to setup a remote tracking URI for mlflow:

In [None]:
import mlflow
from azureml.core import Workspace

ws = Workspace.from_config()
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())

## Load a pandas.DataFrame

The PyCaret datasets module contains many sample datasets. Try replacing with your own data!

In [None]:
from pycaret.datasets import get_data

df = get_data("credit")
df

In [None]:
df.shape

## Split data

Split the data into training data (for modeling) and test data (for prediction):

In [None]:
data = df.sample(frac=0.95, random_state=42)
data_unseen = df.drop(data.index)
data.reset_index(inplace=True, drop=True)
data_unseen.reset_index(inplace=True, drop=True)
print("Data for modeling: " + str(data.shape))
print("Unseen data for predictions: " + str(data_unseen.shape))

## Setup PyCaret

The `setup()` function initializes the environment in pycaret and creates the transformation pipeline to prepare the data for modeling and deployment. 

`setup()` must be called before executing any other function in pycaret. 

It takes two mandatory parameters: a `pandas.DataFrame` and the name of the target column. All other parameters are optional.

Refer to the [PyCaret documentation](https://pycaret.readthedocs.io/en/stable/) for details.

In [None]:
from pycaret.classification import *

exp = setup(
    data=data,
    target="default",
    log_experiment=True,
    experiment_name="automl-with-pycaret-tutorial",
    log_plots=True,
    log_profile=True,
    silent=True,  # set to False for interactively setting data types
)

In [None]:
models()

## Run AutoML

Run a series of trials to find the best model.

In [None]:
%%time
best_model = compare_models()

In [None]:
print(best_model)

## Evaluate model

Evaluate the best model.

In [None]:
evaluate_model(best_model)

## Test model

Evaluate the best model on unseen data.

In [None]:
unseen_predictions = predict_model(best_model, data=data_unseen)
unseen_predictions.head()

In [None]:
from pycaret.utils import check_metric

check_metric(
    unseen_predictions.default,
    unseen_predictions.Label.astype(int),
    "Accuracy",
)