<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Banking Customer Churn Analysis using AutoChurn in Vantage
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial'><b>Introduction</b></p>

<center><img src="images/churn.webp"/></center>

<p style = 'font-size:16px;font-family:Arial'>Source: <a href = 'https://medium.com/@islamhasabo/predicting-customer-churn-bc76f7760377'>Medium</a></p>

<p style = 'font-size:16px;font-family:Arial'>Customer churn is a critical metric in banking because it can directly impact a bank's revenue and profitability. When customers leave, banks lose the income they would have earned from those customers' transactions, investments, and account fees. Additionally, attracting new customers to replace those who have left can be expensive and time-consuming, so reducing customer churn is often more cost-effective than acquiring new customers.</p>

<p style = 'font-size:16px;font-family:Arial'>Customer churn can also be an indicator of customer satisfaction and loyalty. If customers leave at a high rate, they may be dissatisfied with the bank's products or services, customer service, or overall experience.</p>

<p style = 'font-size:16px;font-family:Arial'>Banks can use various strategies to reduce customer churns, such as improving customer service, offering more competitive rates and fees, providing personalized recommendations and offers, and enhancing digital channels and mobile apps. By tracking and analyzing customer churn rates, banks can identify areas for improvement and make strategic decisions to retain customers and improve overall customer satisfaction.</p>

<p style = 'font-size:16px;font-family:Arial'>In this demo, we demonstrate how to implement the entire lifecycle of churn prediction can using Vantage technologies and, specifically, the combination of Bring Your Own Model (BYOM), Vantage Analytics Library (VAL) and teradataml python client library solution.</p>

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'><b>Importing libraries need for execution. </b>

In [None]:
!pip install teradataml --upgrade

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial'><b>Note: </b><i>The AutoChurn functionality requires teradataml 20.0.0.9 or greater, the above pip install needs to be executed if you have teradataml version is less than 20.0.0.9. The kernel needs to be restart after installing the required teradataml version. The simplest way to restart the Kernel is by typing zero zero: <b> 0 0</b></i></p>
</div>
<p style = 'font-size:16px;font-family:Arial'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
import os
import warnings
warnings.filterwarnings('ignore')

from sklearn import tree
from xgboost import XGBClassifier
from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline
from jdk4py import JAVA, JAVA_HOME, JAVA_VERSION

from teradataml import *

# Modify the following to match the specific client environment settings
display.max_rows = 5
configure.val_install_location = 'val'
configure.byom_install_location = 'mldb'
os.environ['PATH'] = os.environ['PATH'] + os.pathsep + str(JAVA_HOME)
os.environ['PATH'] = os.environ['PATH'] + os.pathsep + str(JAVA)[:-5]

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>1. Initiate a connection to Vantage</b>
<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=BankingChurn_AutoChurn.ipynb;' UPDATE FOR SESSION; ''')

<p style = 'font-size:16px;font-family:Arial'>Begin running steps with Shift + Enter keys. </p>

<p style = 'font-size:20px;font-family:Arial'><b>Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial'>We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.</p>

In [None]:
# %run -i ../run_procedure.py "call get_data('DEMO_BankChurn_cloud');"        # Takes 30 seconds
%run -i ../run_procedure.py "call get_data('DEMO_BankChurn_local');"        # Takes 1 minute

<p style = 'font-size:16px;font-family:Arial'>Next is an optional step – if you want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../run_procedure.py "call space_report();"        # Takes 10 seconds

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>2. Data Exploration</b>
<p style = 'font-size:16px;font-family:Arial'>Create a "Virtual DataFrame" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.</p>
<p style = 'font-size:16px;font-family:Arial'><b><i>*Please scroll down to the end of the notebook for detailed column descriptions of the dataset.</i></b></p>

In [None]:
bank_df = DataFrame(in_schema("DEMO_BankChurn", "customer_churn"))
print("Shape of the data: ", bank_df.shape)
bank_df

In [None]:
bank_df.dtypes

<p style = 'font-size:16px;font-family:Arial'>By looking at the datatypes and sample data, we classify the columns into ID column, target variable(y), numerical, categorical and binary. We skip using <i>RowNumber</i> and <i>Surname</i> columns as they are not helpful in the analysis.</p>

<hr style="height:1px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>Train/Test Split</b></p>
<p style = 'font-size:16px;font-family:Arial'>Split the dataset into train and test datasets according to the split ratio, here 0.8</p>

In [None]:
# Performing sampling to get 80% for training and 20% for testing
bank_df_sample = bank_df.sample(frac = [0.8, 0.2])

In [None]:
bank_df_sample.head()

In [None]:
# Fetching train and test data
bank_df_train= bank_df_sample[bank_df_sample['sampleid'] == 1].drop('sampleid', axis=1)
bank_df_test = bank_df_sample[bank_df_sample['sampleid'] == 2].drop('sampleid', axis=1)

In [None]:
# train data shape
bank_df_train.shape

In [None]:
# test data shape
bank_df_test.shape

In [None]:
#train dataset
bank_df_train.head()

In [None]:
# test dataset
bank_df_test.head()

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>3. Fitting AutoChurn </b>

<p style = 'font-size:16px;font-family:Arial'>AutoChurn is a dedicated AutoML pipeline designed specifically for churn prediction tasks. It automates the process of building, training, and evaluating models tailored to identify customer churn, streamlining the workflow for churn prediction use cases.</p>

In [None]:
# Creating AutoChurn instance
# Keeping early stopping metrics threshold to 0.5 for 'MICRO-RECALL', early stopping timer threshold to 100 sec
# and verbose level 2 for detailed loggging

aml = AutoChurn(stopping_metric='MACRO-F1', 
                stopping_tolerance=0.70,
                max_runtime_secs=100,
                verbose=2)

In [None]:
# Fitting train data
aml.fit(bank_df_train, bank_df_train.Exited)

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>4. Methods of AutoChurn : </b>
<p></p>
<b style = 'font-size:18px;font-family:Arial'>4.1 Leaderboard :</b>

In [None]:
# Fetching Leaderboard
aml.leaderboard()

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>4.2 Best Performing Model : </b>

In [None]:
# Fetching best performing model for dataset
aml.leader()

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>4.3 Get Hyperparameter for Trained Model : </b>

In [None]:
aml.model_hyperparameters(rank=1)

In [None]:
aml.model_hyperparameters(rank=5)

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>5. Generate Prediction and Performance Metrics : </b>

In [None]:
# Generating prediction on test data
prediction = aml.predict(bank_df_test, rank=6)

In [None]:
# Printing prediction
prediction.head()

In [None]:
# Fetching performance metrics on test data
performance_metrics = aml.evaluate(bank_df_test, rank=6)

In [None]:
performance_metrics

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>6. Deploy the models in Database</b>
<p></p>
<p style = 'font-size:16px;font-family:Arial'>The deploy function saves models to the specified table name. If 'ranks' is provided, specified models in 'ranks' will be saved and ranks will be reassigned to specified models based on the order of the leaderboard, non-specified models will be ignored.</p>

In [None]:
aml.deploy(table_name='top_10_models', top_n=10)

In [None]:
aml.deploy(table_name='mixed_models', ranks=[2,5,6,9])

In [None]:
aml.deploy(table_name='range_models', ranks=range(8,11))

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>7. Cleanup</b>

<p style = 'font-size:16px;font-family:Arial'>Since we are using these model tables in our next notebook we will not cleanup any tables and only remove context so that we make sure that these deployed models are available later.</p>

In [None]:
remove_context()

<hr style="height:2px;border:none">
<b style = 'font-size:20px;font-family:Arial'>8. Future use of models deployed by using AutoChurn.</b></p>

<p style = 'font-size:16px;font-family:Arial'>In the steps above, we have used In-Database AutoChurn functionality to find the best model for predicting Customer Churn. These models were then deployed in database for future use. As a followup of this functionality, in the next notebook we will see how we can use these deployed models and different ways of accessing and using these models.</p>


<p style = 'font-size:16px;font-family:Arial'>Click the below button which will showcase the next steps to use these deployed models.</p>

<a href="Bank_Customer_Churn_Prediction_Load_Models.ipynb" style="display: inline-flex; align-items: center; justify-content: center; background-color: #007373; color: #FFFFFF; font-family: Arial, sans-serif; font-size: 16px; font-weight: bold; text-decoration: none; padding: 12px 24px; border: none; border-radius: 8px; box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1); cursor: pointer; transition: all 0.3s ease;">
  LAUNCH Notebook for loading and using models
  <img src="https://img.icons8.com/ios-filled/50/ffffff/external-link.png" alt="External Link Icon" style="margin-left: 8px; width: 20px; height: 20px;">
</a>


<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>Dataset:</b>

- `Surname`: Surname
- `CreditScore`: Credit score
- `Geography`: Country (Germany / France / Spain)
- `Gender`: Gender (Female / Male)
- `Age`: Age
- `Tenure`: No of years the customer has been associated with the bank
- `Balance`: Balance
- `NumOfProducts`: No of bank products used
- `HasCrCard`: Credit card status (0 = No, 1 = Yes)
- `IsActiveMember`: Active membership status (0 = No, 1 = Yes)
- `EstimatedSalary`: Estimated salary
- `Exited`: Abandoned or not? (0 = No, 1 = Yes)



<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2026. All Rights Reserved
        </div>
    </div>
</footer>