# <b><span style='color:#F1A424'>AutoML - Binary Classification - Bank Churn Prediction </span> </b>

### Disclaimer
Please note, the Vantage Functions via SQLAlchemy feature is a preview/beta code release with limited functionality (the “Code”). As such, you acknowledge that the Code is experimental in nature and that the Code is provided “AS IS” and may not be functional on any machine or in any environment. TERADATA DISCLAIMS ALL WARRANTIES RELATING TO THE CODE, EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES AGAINST INFRINGEMENT OF THIRD-PARTY RIGHTS, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

TERADATA SHALL NOT BE RESPONSIBLE OR LIABLE WITH RESPECT TO ANY SUBJECT MATTER OF THE CODE UNDER ANY CONTRACT, NEGLIGENCE, STRICT LIABILITY OR OTHER THEORY 
    (A) FOR LOSS OR INACCURACY OF DATA OR COST OF PROCUREMENT OF SUBSTITUTE GOODS, SERVICES OR TECHNOLOGY, OR 
    (B) FOR ANY INDIRECT, INCIDENTAL OR CONSEQUENTIAL DAMAGES INCLUDING, BUT NOT LIMITED TO LOSS OF REVENUES AND LOSS OF PROFITS. TERADATA SHALL NOT BE RESPONSIBLE FOR ANY MATTER BEYOND ITS REASONABLE CONTROL.

Notwithstanding anything to the contrary: 
    (a) Teradata will have no obligation of any kind with respect to any Code-related comments, suggestions, design changes or improvements that you elect to provide to Teradata in either verbal or written form (collectively, “Feedback”), and 
    (b) Teradata and its affiliates are hereby free to use any ideas, concepts, know-how or techniques, in whole or in part, contained in Feedback: 
        (i) for any purpose whatsoever, including developing, manufacturing, and/or marketing products and/or services incorporating Feedback in whole or in part, and 
        (ii) without any restrictions or limitations, including requiring the payment of any license fees, royalties, or other consideration. 

## <b> Problem overview:</b>
    

**Dataset used - Bank Churn Dataset**

**Features**:

- `customer_id`: customer_id .
- `credit_score`: credit score of customer.
- `country`: country of customer.
- `gender`: Gender of customer.
- `age`: Age of customer.
- `tenure`: tenure.
- `balance`: bank balance.
- `products_number`: products number.
- `credit_card`: having credit card or not.
- `active_member`: active member or not.
- `estimated_salary`: Estimated salary of customer..

**Target Variable**:

- `churn`: 1 if the client has left the bank during some period or 0 if he/she has not.

        
**Objective**:

The primary objective is typically to build a model that can accurately predict the Customer Churn for ABC Bank.

**Usecase**:

Here, we will use AutoML(Automated Machine Learning) functionality to automate the entire process of developing a predictive model. It will perform feature exploration, feature engineering, data preparation, model training and evaluation on dataset in custom run and at end we will get leaderboard containined different models along with their performance. Model will also have rank associated with them which indicates which is best performing model for given data followed by other models.

As part of custom AutoML run, we will customize below functionalities:
- Binning on 'Age' feature :
    - Aim is to treat 'Age' as categorical variable rather than numerical and check how different age group people will impact prediction.
- Target encoding on 'gender' feature :
    - Aim is to get gender feature encoded with help of target column distribution rather than default encoding i.e., one-hot encoding.
- Deletion of id column 'customer_id' using antiselect.
- Feature scaling using 'std' method. 

In [1]:
# Importing AutoML from teradataml
from teradataml import AutoML, AutoClassifier

In [2]:
# Importing other important libraries
import getpass
from teradataml import create_context, remove_context
from teradataml import DataFrame
from teradataml import load_example_data
from teradataml import TrainTestSplit

In [3]:
# Create the connection.
host = getpass.getpass("Host: ")
username = getpass.getpass("Username: ")
password = getpass.getpass("Password: ")

con = create_context(host=host, username=username, password=password)

Host:  ········
Username:  ········
Password:  ········


## <b><span style='color:#F1A424'>| 1.</span> Loading Deployed Models - 'churn_top_5_models' </b>

### <b><span style='color:#F1A424'>| 1.1.</span> Loading Model </b>

In [4]:
# Creating AutoML object

aml=AutoML()

In [5]:
# Loading models

models_1 = aml.load('churn_top_5_models')

In [6]:
# Display loaded models

models_1

Unnamed: 0,RANK,MODEL_ID,FEATURE_SELECTION,ACCURACY,MICRO-PRECISION,MICRO-RECALL,MICRO-F1,MACRO-PRECISION,MACRO-RECALL,MACRO-F1,WEIGHTED-PRECISION,WEIGHTED-RECALL,WEIGHTED-F1,DATA_TABLE
0,1,KNN_9,lasso,0.837731,0.837731,0.837731,0.837731,0.837897,0.837727,0.83771,0.837894,0.837731,0.837711,ml__churn_lasso_1723408493503047
1,2,KNN_0,lasso,0.834187,0.834187,0.834187,0.834187,0.834504,0.834181,0.834146,0.8345,0.834187,0.834147,ml__churn_lasso_1723408493503047
2,3,KNN_4,rfe,0.825916,0.825916,0.825916,0.825916,0.826509,0.825907,0.825834,0.826504,0.825916,0.825835,ml__churn_rfe_1723403665947275
3,4,DECISIONFOREST_3,lasso,0.820796,0.820796,0.820796,0.820796,0.821353,0.820804,0.82072,0.821359,0.820796,0.820719,ml__churn_lasso_1723408493503047
4,5,XGBOOST_2,pca,0.81864,0.81864,0.81864,0.81864,0.716248,0.634388,0.655946,0.795369,0.81864,0.797782,ml__churn_pca_1723409447845319


### <b><span style='color:#F1A424'>| 1.2.</span> Get Loaded Model Hyperparameters</b>

In [7]:
aml.model_hyperparameters(rank=1, use_loaded_models=True)

{'response_column': 'churn',
 'name': 'knn',
 'model_type': 'Classification',
 'k': 5,
 'id_column': 'id',
 'voting_weight': 1.0,
 'persist': False,
 'output_prob': True,
 'output_responses': ['1', '0']}

In [8]:
aml.model_hyperparameters(rank=5, use_loaded_models=True)

{'response_column': 'churn',
 'name': 'xgboost',
 'model_type': 'Classification',
 'column_sampling': 1,
 'min_impurity': 0.0,
 'lambda1': 0.01,
 'shrinkage_factor': 0.5,
 'max_depth': 5,
 'min_node_size': 1,
 'iter_num': 10,
 'seed': 42,
 'persist': False,
 'output_prob': True,
 'output_responses': ['1', '0']}

### <b><span style='color:#F1A424'>| 1.3.</span> Loading Dataset</b>

In [9]:
### Loading Dataset for Prediction

load_example_data('teradataml','bank_churn')
df = DataFrame('bank_churn')



In [10]:
# Display data

df

customer_id,credit_score,country,gender,age,tenure,balance,products_number,credit_card,active_member,estimated_salary,churn
15668775,757,France,Male,47,3,130747.1,1,1,0,143829.54,0
15688963,731,France,Female,52,10,0.0,1,1,1,24998.75,1
15706602,760,Spain,Female,33,1,118114.28,2,0,1,156660.21,0
15809826,728,France,Female,46,2,109705.52,1,1,0,20276.87,1
15614716,515,France,Female,37,0,196853.62,1,1,1,132770.11,0
15685476,658,France,Male,31,5,100082.14,1,0,1,49809.88,0
15609618,721,Germany,Male,28,9,154475.54,2,0,1,101300.94,1
15806808,834,Germany,Female,57,8,112281.6,3,1,0,140225.14,1
15603582,569,Spain,Female,34,3,0.0,1,1,0,133997.53,0
15618203,773,Germany,Male,51,8,116197.65,2,1,1,86701.4,0


### <b><span style='color:#F1A424'>| 1.4.</span> Generating Prediction & Performance Metrics</b>

In [11]:
# Generate prediction using some data rows and model rank

prediction = aml.predict(df, rank=1)

Generating prediction using:
Model Name: KNN
Feature Selection: lasso
Completed: ｜⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿｜ 100% - 15/15            

In [12]:
prediction

id,prediction,prob_1,prob_0,churn
14,1,0.2101751330259643,0.7898248669740356,1
12,0,0.9999996637276444,3.362723557563805e-07,0
29,0,0.999999910695992,8.930400806290185e-08,0
9,0,1.0,0.0,0
41,0,0.5614065204000482,0.4385934795999518,1
48,0,1.0,0.0,0
10,1,2.984765955691004e-07,0.9999997015234044,1
15,1,4.529889396752895e-07,0.9999995470110604,1
22,0,1.0,0.0,0
27,0,1.0,0.0,0


In [13]:
# Generate performance metrics

performance_metric = aml.evaluate(df, rank=1)

Generating performance metrics using:
Model Name: KNN
Feature Selection: lasso


In [14]:
performance_metric


############ output_data Output ############

   SeqNum              Metric  MetricValue
0       3        Micro-Recall     0.955100
1       5     Macro-Precision     0.925652
2       6        Macro-Recall     0.938379
3       7            Macro-F1     0.931824
4       9     Weighted-Recall     0.955100
5      10         Weighted-F1     0.955431
6       8  Weighted-Precision     0.955949
7       4            Micro-F1     0.955100
8       2     Micro-Precision     0.955100
9       1            Accuracy     0.955100


############ result Output ############

       Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
SeqNum                                                                              
0               0  CLASS_1     7697      183   0.976777  0.966596  0.971659     7963
1               1  CLASS_2      266     1854   0.874528  0.910162  0.891989     2037


In [15]:
# Generate prediction using data and model rank

prediction = aml.predict(df, rank=4)

Generating prediction using:
Model Name: DECISIONFOREST
Feature Selection: lasso


In [16]:
prediction

id,prediction,prob_1,prob_0,churn
4968,0,0.2666666666666666,0.7333333333333333,0
3040,0,0.0666666666666666,0.9333333333333332,0
6936,0,0.5,0.5,0
4946,1,0.6,0.4,1
7978,1,0.7333333333333333,0.2666666666666666,0
2730,0,0.1666666666666666,0.8333333333333334,0
6858,1,0.9,0.1,1
8311,0,0.2,0.8,0
291,0,0.1666666666666666,0.8333333333333334,0
618,1,0.7666666666666667,0.2333333333333333,1


In [17]:
# Generate performance metrics

performance_metric = aml.evaluate(df, rank=4)

Generating performance metrics using:
Model Name: DECISIONFOREST
Feature Selection: lasso


In [18]:
performance_metric


############ output_data Output ############

   SeqNum              Metric  MetricValue
0       3        Micro-Recall     0.775900
1       5     Macro-Precision     0.661100
2       6        Macro-Recall     0.671869
3       7            Macro-F1     0.665955
4       9     Weighted-Recall     0.775900
5      10         Weighted-F1     0.779522
6       8  Weighted-Precision     0.783722
7       4            Micro-F1     0.775900
8       2     Micro-Precision     0.775900
9       1            Accuracy     0.775900


############ result Output ############

       Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
SeqNum                                                                              
0               0  CLASS_1     6748     1026   0.868022  0.847419  0.857597     7963
1               1  CLASS_2     1215     1011   0.454178  0.496318  0.474314     2037


## <b><span style='color:#F1A424'>| 2.</span> Loading Deployed Models - 'churn_mixed_models' </b>

### <b><span style='color:#F1A424'>| 2.1.</span> Loading Model </b>

In [19]:
# Loading models

models_2 = aml.load('churn_mixed_models')

In [20]:
models_2

Unnamed: 0,RANK,MODEL_ID,FEATURE_SELECTION,ACCURACY,MICRO-PRECISION,MICRO-RECALL,MICRO-F1,MACRO-PRECISION,MACRO-RECALL,MACRO-F1,WEIGHTED-PRECISION,WEIGHTED-RECALL,WEIGHTED-F1,DATA_TABLE
0,1,XGBOOST_2,pca,0.81864,0.81864,0.81864,0.81864,0.716248,0.634388,0.655946,0.795369,0.81864,0.797782,ml__churn_pca_1723402925700993
1,2,DECISIONFOREST_1,rfe,0.79362,0.79362,0.79362,0.79362,0.797885,0.793643,0.792887,0.797899,0.79362,0.792882,ml__churn_rfe_1723409188128962
2,3,XGBOOST_1,rfe,0.715242,0.715242,0.715242,0.715242,0.780147,0.715147,0.697677,0.780094,0.715242,0.697706,ml__churn_rfe_1723409188128962


### <b><span style='color:#F1A424'>| 2.2.</span> Generating Prediction & Performance Metrics</b>

In [21]:
# Generate prediction using data and model rank

prediction = aml.predict(df)

Generating prediction using:
Model Name: XGBOOST
Feature Selection: pca


In [22]:
prediction

id,Prediction,Prob_1,Prob_0,churn
7312,0,0.1577558535671116,0.8422441464328884,0
4968,0,0.1795910402161278,0.8204089597838722,0
291,0,0.1374877944322966,0.8625122055677034,0
787,0,0.0793114403069032,0.9206885596930968,0
3040,0,0.0803792003319216,0.9196207996680784,0
6136,0,0.1694259885532123,0.8305740114467877,0
618,1,0.6916263780065909,0.3083736219934092,1
8311,0,0.0577019818023685,0.9422980181976316,0
6858,1,0.6616466851120484,0.3383533148879515,1
7636,0,0.0605728620126906,0.9394271379873091,0


In [23]:
# Generate performance metrics

performance_metric = aml.evaluate(df)

Generating performance metrics using:
Model Name: XGBOOST
Feature Selection: pca


In [24]:
performance_metric


############ output_data Output ############

   SeqNum              Metric  MetricValue
0       3        Micro-Recall     0.843700
1       5     Macro-Precision     0.781573
2       6        Macro-Recall     0.681195
3       7            Macro-F1     0.710797
4       9     Weighted-Recall     0.843700
5      10         Weighted-F1     0.826977
6       8  Weighted-Precision     0.829813
7       4            Micro-F1     0.843700
8       2     Micro-Precision     0.843700
9       1            Accuracy     0.843700


############ result Output ############

       Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
SeqNum                                                                              
0               0  CLASS_1     7608     1208   0.862976  0.955419  0.906848     7963
1               1  CLASS_2      355      829   0.700169  0.406971  0.514747     2037


## <b><span style='color:#F1A424'>| 3.</span> Loading Deployed Models - 'churn_range_models' </b>

### <b><span style='color:#F1A424'>| 3.1.</span> Loading Model</b>

In [25]:
# Creating another AutoML object

obj=AutoML()

In [26]:
# Loading models

models_3 = obj.load('churn_range_models')

In [27]:
models_3

Unnamed: 0,RANK,MODEL_ID,FEATURE_SELECTION,ACCURACY,MICRO-PRECISION,MICRO-RECALL,MICRO-F1,MACRO-PRECISION,MACRO-RECALL,MACRO-F1,WEIGHTED-PRECISION,WEIGHTED-RECALL,WEIGHTED-F1,DATA_TABLE
0,1,KNN_8,pca,0.778338,0.778338,0.778338,0.778338,0.624802,0.587976,0.597757,0.749093,0.778338,0.759329,ml__churn_pca_1723403937001843
1,2,DECISIONFOREST_0,lasso,0.767625,0.767625,0.767625,0.767625,0.793952,0.767684,0.762329,0.793987,0.767625,0.762315,ml__churn_lasso_1723403639131079
2,3,XGBOOST_1,rfe,0.715242,0.715242,0.715242,0.715242,0.780147,0.715147,0.697677,0.780094,0.715242,0.697706,ml__churn_rfe_1723402723065074
3,4,XGBOOST_3,lasso,0.587633,0.587633,0.587633,0.587633,0.725722,0.587479,0.512991,0.725653,0.587633,0.513066,ml__churn_lasso_1723403639131079
4,5,XGBOOST_0,lasso,0.587633,0.587633,0.587633,0.587633,0.725722,0.587479,0.512991,0.725653,0.587633,0.513066,ml__churn_lasso_1723403639131079


### <b><span style='color:#F1A424'>| 3.2.</span> Generating Prediction & Performance Metrics</b>

In [28]:
# Generate prediction using data and model rank

prediction = obj.predict(df, rank=2)

Generating prediction using:
Model Name: DECISIONFOREST
Feature Selection: lasso
Completed: ｜⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿⫿｜ 100% - 15/15            

In [29]:
prediction

id,prediction,prob_1,prob_0,churn
6370,1,1.0,0.0,0
2279,0,0.0,1.0,0
8776,0,0.0,1.0,0
9046,0,0.0,1.0,0
5431,1,1.0,0.0,0
6611,0,0.0,1.0,0
5067,0,0.0,1.0,0
9818,1,1.0,0.0,0
8459,0,0.0,1.0,0
5032,0,0.0,1.0,0


In [30]:
# Generate performance metrics

performance_metric = obj.evaluate(df, rank=2)

Generating performance metrics using:
Model Name: DECISIONFOREST
Feature Selection: lasso


In [31]:
performance_metric


############ output_data Output ############

   SeqNum              Metric  MetricValue
0       3        Micro-Recall     0.813600
1       5     Macro-Precision     0.708305
2       6        Macro-Recall     0.650056
3       7            Macro-F1     0.668807
4       9     Weighted-Recall     0.813600
5      10         Weighted-F1     0.798578
6       8  Weighted-Precision     0.793805
7       4            Micro-F1     0.813600
8       2     Micro-Precision     0.813600
9       1            Accuracy     0.813600


############ result Output ############

       Prediction  Mapping  CLASS_1  CLASS_2  Precision    Recall        F1  Support
SeqNum                                                                              
0               0  CLASS_1     7374     1275   0.852584  0.926033  0.887792     7963
1               1  CLASS_2      589      762   0.564027  0.374080  0.449823     2037


In [32]:
remove_context()

True