<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       IVSM Banking Customer Churn Embed BYOM
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<hr style="height:2px;border:none">
<p style = 'font-size:18px;font-family:Arial'><b>Import the required libraries</b></p>

<p style = 'font-size:16px;font-family:Arial'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
import warnings
warnings.filterwarnings('ignore')

import os
import pandas as pd

import teradataml as tdml
import getpass
from teradataml import in_schema
from teradataml import DecisionForest, XGBoost, TrainTestSplit, DecisionForestPredict, XGBoostPredict, SentimentExtractor, ColumnTransformer, ScaleFit, OneHotEncodingFit
from teradataml import ColumnSummary, AutoML, AutoClassifier
from teradataml import RoundColumns, ClassificationEvaluator, ROC
from teradataml import (
    DataFrame,
    create_context
)
from xgboost import XGBClassifier
from sklearn.pipeline import Pipeline
from nyoka import xgboost_to_pmml
from teradataml import save_byom,list_byom,retrieve_byom,delete_byom,PMMLPredict

In [None]:
tdml.configure.val_install_location = "val"

<hr style="height:2px;border:none">
<b style = 'font-size:20px;font-family:Arial'>1. Initiate a connection to Vantage</b>
<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
# %run -i ../run_procedure.py "call get_data('DEMO_BankChurnIVSM_cloud');"  
%run -i ../run_procedure.py "call get_data('DEMO_BankChurnIVSM_local');"

<p style = 'font-size:18px;font-family:Arial'><b>1.1 Confirmation for functions</b>
<p style = 'font-size:16px;font-family:Arial'>Before starting let us confirm that the required functions are installed.</p>

In [None]:
from IPython.display import display, Markdown

df_check= DataFrame.from_query('''select count(*) as cnt from dbc.tablesV where databasename = 'ivsm';''')
if df_check.get_values()[0][0] >= 10:
    print('Functions are installed, please continue.')
else:
    print('Functions are not installed, please go to Instalization notebook before proceeding further')
    display(Markdown("[Initialization Notebook](./IVSM_Banking_Customer_Churn_Model_Install.ipynb)"))

In [None]:
df = tdml.DataFrame('semantic_search_results')
df[df['reference_txt'] == 'Negative or Abusive comment']

In [None]:
df[df['reference_txt'] == 'Positive and Upbeat comment']

<p style = 'font-size:16px;font-family:Arial'>Create a "Virtual DataFrame" that points to the data set in Vantage.</p>
<p style = 'font-size:16px;font-family:Arial'><b><i>*Please scroll down to the end of the notebook for detailed column descriptions of the dataset.</i></b></p>

In [None]:
customer_churn = DataFrame(in_schema('DEMO_BankChurnIVSM', 'Bank_Churn'))
customer_churn

In [None]:
new_df = customer_churn.merge(df[['target_id','reference_txt']], on='customerid = target_id', how='inner')
new_df

In [None]:
new_df = new_df.drop('target_id',axis=1)

<hr style="height:2px;border:none">
<b style = 'font-size:20px;font-family:Arial'>2. Data Transformation</b>

<p style = 'font-size:18px;font-family:Arial'> <b>Define Column Categories</b> </p>
<p style = 'font-size:16px;font-family:Arial'>Specifies the target variable and categorizes input columns into numeric, categorical, binary, and identifier groups for preprocessing and modeling.<br>

In [None]:
target_variable = "Exited"
numeric_columns = ["Age", "Balance", "CreditScore", "EstimatedSalary", "Tenure"]
categorical_columns = ["Gender", "Geography", "reference_txt", "NumOfProducts"]
binary_columns = ["HasCrCard", "IsActiveMember"]
id_column = ["CustomerId"]

<p style = 'font-size:16px;font-family:Arial'><b>ScaleFit()</b> function outputs statistics to input to ScaleTransform() function, which scales specified input DataFrame columns.<br>

In [None]:
fit1 = ScaleFit(data=new_df,
                target_columns=numeric_columns,
                scale_method="USTD",
                miss_value="KEEP",
                global_scale=False,
                multiplier="1")

<p style = 'font-size:16px;font-family:Arial'><b>OneHotEncodingFit </b>outputs a table of attributes and categorical values to input to OneHotEncodingTransform which encodes them as one-hot numeric vectors.</p>

In [None]:
fit2 = OneHotEncodingFit(data=new_df,
                         is_input_dense=True,
                         approach="auto",
                         target_column=categorical_columns[0:3],
                         category_counts=[2,3,2])

<p style = 'font-size:16px;font-family:Arial'>The <b>ColumnTransformer</b> function transforms the entire dataset in a single operation. You only need
to provide the FIT tables to the function, and the function runs all transformations that you require in a
single operation. Running all the it table transformations together in one-go gives approx. 30% performance improvement over running each transformation sequentially.</p>

In [None]:
new_table = ColumnTransformer(input_data=new_df,
                             onehotencoding_fit_data=fit2.result,
                             scale_fit_data=fit1.output).result

In [None]:
new_table=new_table[['CustomerId', 'Age', 'Balance', 'CreditScore', 'EstimatedSalary', 'Exited', 'HasCrCard', 'IsActiveMember',
                     'NumOfProducts', 'Tenure', 'Gender_0', 'Gender_1', 'Geography_0', 'Geography_1', 'Geography_2',
                     'reference_txt_0', 'reference_txt_1']]

<hr style="height:2px;border:none">

<p style = 'font-size:20px;font-family:Arial'><b>3. Train-Test Split</b>

<p style = 'font-size:16px;font-family:Arial'>The <b>TrainTestSplit()</b> function divides the dataset into train and test subsets to be used for evaluating machine learning models and validation processes.<br>
80% is used for Training and 20% for validation.</p>

In [None]:
TrainTestSplit_out = TrainTestSplit(data = new_table,
                                    id_column='CustomerId',
                                    train_size=0.80,
                                    test_size=0.20,
                                    seed=3432)

In [None]:
TrainTestSplit_out.result.head()

In [None]:
df_train = TrainTestSplit_out.result[TrainTestSplit_out.result['TD_IsTrainRow'] == 1].drop(['TD_IsTrainRow'], axis = 1)
df_test = TrainTestSplit_out.result[TrainTestSplit_out.result['TD_IsTrainRow'] == 0].drop(['TD_IsTrainRow'], axis = 1)

print("Training Set = " + str(df_train.shape[0]) + ". Testing Set = " + str(df_test.shape[0]))

In [None]:
df_test.head()

In [None]:
tdml.copy_to_sql(df_train, table_name = 'clean_data_train', if_exists = 'replace')
tdml.copy_to_sql(df_test, table_name = 'clean_data_test', if_exists = 'replace')

In [None]:
df_train = tdml.DataFrame(in_schema('demo_user','clean_data_train'))

In [None]:
df_test = tdml.DataFrame(in_schema('demo_user','clean_data_test'))

<p style = 'font-size:18px;font-family:Arial'> <b>3.1 Split Features and Target</b> </p>
<p style = 'font-size:16px;font-family:Arial'>Separates feature columns and target labels for both training and test datasets, keeping CustomerId for reference and including encoded categorical and semantic features.</p>



In [None]:
df_train_features = df_train[['CustomerId', 'Age', 'Balance', 'CreditScore', 'EstimatedSalary', 
                              'HasCrCard', 'IsActiveMember', 'NumOfProducts','Tenure', 
                              'Gender_0', 'Gender_1', 'Geography_0', 'Geography_1', 
                              'Geography_2', 'reference_txt_0','reference_txt_1']]

df_train_target = df_train[['CustomerId', 'Exited']]
df_test_features = df_test[['CustomerId', 'Age', 'Balance', 'CreditScore', 'EstimatedSalary', 
                              'HasCrCard', 'IsActiveMember', 'NumOfProducts','Tenure', 
                              'Gender_0', 'Gender_1', 'Geography_0', 'Geography_1', 
                              'Geography_2', 'reference_txt_0','reference_txt_1']]

df_test_target = df_test[['CustomerId', 'Exited']]

In [None]:
tdml.copy_to_sql(df_train_features, table_name = 'xgb_train_features', if_exists = 'replace', schema_name = 'demo_user')
tdml.copy_to_sql(df_train_target, table_name = 'xgb_train_target', if_exists = 'replace', schema_name = 'demo_user')
tdml.copy_to_sql(df_test_features, table_name = 'xgb_test_features', if_exists = 'replace', schema_name = 'demo_user')
tdml.copy_to_sql(df_test_target, table_name = 'xgb_test_target', if_exists = 'replace', schema_name = 'demo_user')

<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'> <b>4.Grant Access to ModelOps</b> </p>
<p style = 'font-size:16px;font-family:Arial'>Grants SELECT permissions on training, test, and clean data tables to the modelops role, allowing model deployment processes to access the data.

In [None]:
SQL = ['''grant select on demo_user.xgb_train_features to modelops with grant option;''',
       '''grant select on demo_user.xgb_train_target to modelops with grant option;''',
       '''grant select on demo_user.xgb_test_features to modelops with grant option;''',
       '''grant select on demo_user.xgb_test_target to modelops with grant option;''',
       '''grant select on demo_user.clean_data_train to modelops with grant option;''',
       '''grant select on demo_user.clean_data_test to modelops with grant option;'''       
      ]

for i in SQL:
    try:
        tdml.execute_sql(i)
    except:
        True

In [None]:
train_pdf = df_train.to_pandas(all_rows=True)

features = cols = ['Age', 'Balance', 'CreditScore', 'EstimatedSalary', 'HasCrCard', 'IsActiveMember', 'NumOfProducts',
                   'Tenure', 'Gender_0', 'Gender_1', 'Geography_0', 'Geography_1', 'Geography_2', 'reference_txt_0',
                   'reference_txt_1']
target = "Exited"

# split data into X and y
X_train = train_pdf[features]
y_train = train_pdf[target]

model = Pipeline([('xgb', XGBClassifier(n_estimators=5, max_depth=10))])

model.fit(X_train, y_train)
#database = 'modelops'

<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'><b>5. Convert the model to PMML</b></p>
<p style = 'font-size:16px;font-family:Arial'>You can use the sklearn2pmml or the nyoka python libraries to convert to pmml. The nyoka is a python only package and so it is preferable.</p>

In [None]:
xgboost_to_pmml(
    pipeline=model, 
    col_names=cols, 
    target_name='Exited', 
    pmml_f_name="xgb_model.pmml")

In [None]:
tdml.configure.byom_install_location = "mldb"

In [None]:
try:
    save_byom("xgb_model",
              "xgb_model.pmml",
              "byom_models",
              additional_columns={},
              schema_name='modelops'
             )
except:
    delete_byom(model_id="xgb_model", table_name="byom_models", schema_name = 'modelops')
    save_byom("xgb_model",
              "xgb_model.pmml",
              "byom_models",
              additional_columns={},
              schema_name='modelops'
    )

In [None]:
list_byom(table_name="byom_models", schema_name="modelops")

In [None]:
result = PMMLPredict(
    modeldata = retrieve_byom("xgb_model", "byom_models", schema_name="modelops"),
    newdata = df_test,
    accumulate = ['CustomerId'],
    overwrite_cached_models = '*',
)

print(result.show_query())

result.result

<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'> <b>6. Clean up </b></p>
<p style = 'font-size:16px;font-family:Arial'>The following code will remove the context.</p>

In [None]:
tdml.remove_context()

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>Dataset:</b>

- `Unnamed`: Unnamed
- `CustomerId`: Customer ID
- `Surname`: Surname
- `CreditScore`: Credit score
- `Geography`: Country (Germany / France / Spain)
- `Gender`: Gender (Female / Male)
- `Age`: Age
- `Tenure`: No of years the customer has been associated with the bank
- `Balance`: Balance
- `NumOfProducts`: No of bank products used
- `HasCrCard`: Credit card status (0 = No, 1 = Yes)
- `IsActiveMember`: Active membership status (0 = No, 1 = Yes)
- `EstimatedSalary`: Estimated salary
- `Exited`: Abandoned or not? (0 = No, 1 = Yes)

<p style = 'font-size:16px;font-family:Arial'><b>Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Teradataml Python reference: <a href = 'https://docs.teradata.com/search/all?query=Python+Package+User+Guide&content-lang=en-US'>here</a></li>
    <li>ScaleFit reference: <a href = 'https://docs.teradata.com/r/Enterprise/Teradata-Package-for-Python-Function-Reference-17.20/teradataml-Analytic-Database-17.20.xx-Analytic-Functions/FEATURE-ENGINEERING-TRANSFORM-functions/ScaleFit'>here</a></li>
    <li>OneHotEncodingFit reference: <a href = 'https://docs.teradata.com/r/Enterprise/Teradata-Package-for-Python-Function-Reference-17.20/teradataml-Analytic-Database-17.20.xx-Analytic-Functions/FEATURE-ENGINEERING-TRANSFORM-functions/OneHotEncodingFit'>here</a></li>
    <li>ColumnTransformer reference: <a href = 'https://docs.teradata.com/r/Enterprise/Teradata-Package-for-Python-Function-Reference-17.20/teradataml-Analytic-Database-17.20.xx-Analytic-Functions/FEATURE-ENGINEERING-TRANSFORM-functions/ColumnTransformer'>here</a></li>
    <li>TrainTestSplit reference: <a href = 'https://docs.teradata.com/r/Enterprise/Teradata-Package-for-Python-Function-Reference-17.20/teradataml-Analytic-Database-17.20.xx-Analytic-Functions/MODEL-EVALUATION-functions/TrainTestSplit'>here</a></li>
    <li>PMMLPredict reference: <a href = 'https://docs.teradata.com/r/Enterprise/Teradata-Package-for-Python-Function-Reference-17.20/teradataml-Bring-Your-Own-Analytics/PMMLPredict'>here</a></li>
</ul>

<footer style="padding-bottom:35px; border-bottom:3px solid ">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>