# HOWTO: How to use pycaret to training and deploy model in PrimeHub.

## Step 1: Install pycaret

In [1]:
# Install pycaret python package.

!pip install pycaret==2.3



## Step 2: Use pandas to read bank marketing data.

- Context:

Find the best strategies to improve for the next marketing campaign. How can the financial institution have a greater effectiveness for future marketing campaigns? In order to answer this, we have to analyze the last marketing campaign the bank performed and identify the patterns that will help us find conclusions in order to develop future strategies.

- Dataset Link: [kaggle data](https://www.kaggle.com/janiobachmann/bank-marketing-dataset)

In [2]:
import pandas as pd
df = pd.read_csv("./data/bank.csv")
df.columns

Index(['age', 'job', 'marital', 'education', 'default', 'balance', 'housing',
       'loan', 'contact', 'day', 'month', 'duration', 'campaign', 'pdays',
       'previous', 'poutcome', 'deposit'],
      dtype='object')

## Step 3: Use pycaret package to analysis and automatically train data.

- Pycaret Introduction

PyCaret is an open source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within minutes in your choice of notebook environment.

- Documents: [Documents of Pycaret](https://pycaret.org/guide/)

### Part 1: Import pycaret classification 

In [3]:
from pycaret.classification import *

### Part 2: Initializing setup

In [4]:
clf1 = setup(df, target = 'deposit', log_experiment = True, experiment_name = 'bank_dataset', silent = True)

Unnamed: 0,Description,Value
0,session_id,7891
1,Target,deposit
2,Target Type,Binary
3,Label Encoded,"no: 0, yes: 1"
4,Original Data,"(11162, 17)"
5,Missing Values,False
6,Numeric Features,7
7,Categorical Features,9
8,Ordinal Features,False
9,High Cardinality Features,False


### Part 3: Compare the model
Compare all the classification model. EX: catboost, lightgbm, svm, etc.

In [5]:
best_model = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lightgbm,Light Gradient Boosting Machine,0.8601,0.9252,0.8857,0.8297,0.8567,0.7204,0.7221,0.236
rf,Random Forest Classifier,0.8523,0.9183,0.8744,0.8237,0.8482,0.7046,0.706,1.033
gbc,Gradient Boosting Classifier,0.846,0.9162,0.8551,0.8255,0.8398,0.6917,0.6925,0.871
ada,Ada Boost Classifier,0.8212,0.9006,0.7902,0.8241,0.8064,0.6405,0.6414,0.272
lr,Logistic Regression,0.8207,0.8986,0.79,0.8229,0.806,0.6394,0.64,1.156
et,Extra Trees Classifier,0.8199,0.8939,0.8014,0.8144,0.8075,0.6384,0.6388,1.043
ridge,Ridge Classifier,0.8104,0.0,0.7381,0.841,0.7859,0.6172,0.6217,0.033
lda,Linear Discriminant Analysis,0.8103,0.8993,0.7378,0.8409,0.7857,0.6169,0.6215,0.062
dt,Decision Tree Classifier,0.7786,0.7773,0.7544,0.7715,0.7625,0.5552,0.5557,0.07
knn,K Neighbors Classifier,0.7496,0.8062,0.7243,0.7396,0.7317,0.4971,0.4974,0.167


### Part 4: Tuning the best model.

In [6]:
final_best = finalize_model(best_model)

### Part 5: Record the parameter of automl value

In [7]:
best = automl(optimize = 'Recall')
best

LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
               importance_type='split', learning_rate=0.1, max_depth=-1,
               min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
               n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,
               random_state=7891, reg_alpha=0.0, reg_lambda=0.0, silent=True,
               subsample=1.0, subsample_for_bin=200000, subsample_freq=0)

## Step 4: Create the particular model.

### Insert model name

In [8]:
particular_model = input("Please insert the model name. (EX: lightgbm): ")

Please insert the model name. (EX: lightgbm):  lightgbm


### Create model

In [9]:
particular_model = create_model(particular_model)

Unnamed: 0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,0.8708,0.9338,0.897,0.8401,0.8676,0.7418,0.7433
1,0.8581,0.9269,0.8672,0.8377,0.8522,0.7157,0.7161
2,0.8478,0.9067,0.8943,0.8049,0.8472,0.6965,0.7003
3,0.8656,0.9334,0.8859,0.838,0.8613,0.7311,0.7321
4,0.8361,0.9015,0.8641,0.803,0.8325,0.6725,0.6742
5,0.872,0.9417,0.8913,0.8454,0.8677,0.7438,0.7448
6,0.8464,0.9107,0.8913,0.8039,0.8454,0.6935,0.6971
7,0.8668,0.9317,0.8777,0.8455,0.8613,0.7333,0.7338
8,0.8784,0.9417,0.8889,0.8586,0.8735,0.7564,0.7569
9,0.8592,0.9237,0.8997,0.8198,0.8579,0.7189,0.7219


### Test: Predict model

In [10]:
prediction = predict_model(particular_model)
prediction.head()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,Light Gradient Boosting Machine,0.8638,0.9282,0.8934,0.8341,0.8627,0.728,0.7297


Unnamed: 0,age,balance,day,duration,campaign,pdays,previous,job_admin.,job_blue-collar,job_entrepreneur,...,month_nov,month_oct,month_sep,poutcome_failure,poutcome_other,poutcome_success,poutcome_unknown,deposit,Label,Score
0,81.0,1154.0,17.0,231.0,1.0,-1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,yes,yes,0.8755
1,55.0,282.0,5.0,99.0,2.0,-1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,no,no,0.9503
2,45.0,0.0,4.0,153.0,6.0,-1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,no,no,0.9917
3,34.0,932.0,4.0,218.0,1.0,-1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,no,yes,0.8255
4,23.0,9216.0,5.0,471.0,2.0,-1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,no,yes,0.9265


### Test: Evaluate model

In [11]:
evaluate_model(particular_model)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Hyperparameters', 'param…

## Step 5: Use testing data to test the model.

### Insert model path

#### Get model path method
1. Click PrimeHub UI → `Models` → `MLFlow UI`→Enter into MLFlow platform.
2. Click `bank_dataset` experiments → the latest one of model. (Run name: Light Gradient Boosting Machine)→Artifact `model` folder

In [12]:
model_path = input("""
Please insert the model path. You can find it in MLFlow server artifact.
For Example: "/project/example/phapplications/mlflow-12345/mlruns/15/2dcf45b3143948f494e381698e4c5ba7/artifacts/model/" \n
Model path: """)


Please insert the model path. You can find it in MLFlow server artifact.
For Example: "/project/example/phapplications/mlflow-12345/mlruns/15/2dcf45b3143948f494e381698e4c5ba7/artifacts/model/" 

Model path:  /project/phusers/phapplications/mlflow-bqj9n/mlruns/1/879fa85be3e3461fb146b798606ab6d4/artifacts/model


### Get model run id

In [13]:
model_run_id = model_path.split("/")[-3]

### Part 1: Load saved model.

In [14]:
import os
saved_model = load_model(os.path.join(model_path,"model"))

Transformation Pipeline and Model Successfully Loaded


### Part 2: Arrange the testing data.

In [15]:
test_data = [['25', 'admin.', 'married', 'secondary', 'no', '45', 'no', 'no' ,'unknown' ,'5','may' ,'1467', '1' ,'-1' ,'0' ,'unknown', 'yes']]
data_unseen = pd.DataFrame(test_data, columns=['age', 'job', 'marital', 'education', 'default', 'balance', 'housing',
       'loan', 'contact', 'day', 'month', 'duration', 'campaign', 'pdays',
       'previous', 'poutcome', 'deposit'])
data_unseen

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,deposit
0,25,admin.,married,secondary,no,45,no,no,unknown,5,may,1467,1,-1,0,unknown,yes


### Part 3: Use testing data to test saved model.

In [16]:
prediction = predict_model(saved_model, data = data_unseen)
prediction

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,deposit,Label,Score
0,25,admin.,married,secondary,no,45,no,no,unknown,5,may,1467,1,-1,0,unknown,yes,yes,0.8809


## Step 6: Registry the model.

In [17]:
from mlflow.tracking import MlflowClient

registry_name = "bank-model-registry"

client = MlflowClient()
client.create_registered_model(registry_name)
result = client.create_model_version(
    name=registry_name,
    source=model_path,
    run_id=model_run_id
)

2021/09/28 02:34:32 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: bank-model-registry, version 1


## Next step

Step 1: Build up pycaret classfication seldom server or use dockerhub images.

Step 2: Use PrimeHub `Deployments` function to deploy model.