<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       ClassificationEvaluator function in Vantage
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial'><b>Introduction</b></p>
<p style = 'font-size:16px;font-family:Arial'>ClassificationEvaluator function computes evaluation metrics to evaluate and compare multiple classification models and summarize how close predictions are to their expected values. It takes the actual and predicted values of the dependent variables to calculate specified metrics. Apart from accuracy, the secondary output table returns micro, macro and weighted averaged metrics of precision, recall and F1 score values..<br> In this notebook we will see how we can use the ClassificationEvaluator function available in Vantage.</p>

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>1. Initiate a connection to Vantage</b>

<p style = 'font-size:16px;font-family:Arial'>In the section, we import the required libraries and set environment variables and environment paths (if required).

In [None]:
from teradataml import *

# Modify the following to match the specific client environment settings
display.max_rows = 5
configure.val_install_location = 'val'

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'><b>1.1 Connect to Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../../UseCases/startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=PP_CLassificationEvaluator_Python.ipynb;' UPDATE FOR SESSION; ''')

<p style = 'font-size:16px;font-family:Arial'>Begin running steps with Shift + Enter keys. </p>

<hr style='height:1px;border:none;'>

<p style = 'font-size:18px;font-family:Arial'><b>1.2 Getting Data for This Demo</b></p>

<p style = 'font-size:16px;font-family:Arial'>We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.</p>

In [None]:
# %run -i ../../UseCases/run_procedure.py "call get_data('DEMO_GLM_Fraud_cloud');"        # Takes 1 minute
%run -i ../../UseCases/run_procedure.py "call get_data('DEMO_GLM_Fraud_local');"        # Takes 2 minutes

<p style = 'font-size:16px;font-family:Arial'>Next is an optional step – if you want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../../UseCases/run_procedure.py "call space_report();"        # Takes 10 seconds

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>2. Data Exploration</b>
<p style = 'font-size:16px;font-family:Arial'>Create a "Virtual DataFrame" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.</p>

In [None]:
txn_data = DataFrame(in_schema('DEMO_GLM_Fraud', 'transaction_data'))
print(txn_data.shape)
txn_data

<p style = 'font-size:16px;font-family:Arial'>Below steps perform the data engineering functions needed prior to creating an model.</p>

In [None]:
txn_data = txn_data.drop(['nameDest', 'nameOrig', 'isFlaggedFraud'], axis = 1)
txn_data

<p style = 'font-size:16px;font-family:Arial'>Onehotencoding and spliting the dataset into training and test datasets</p>

In [None]:
txn_type_encoder = OneHotEncoder(
    values = ["CASH_IN", "CASH_OUT", "DEBIT", "PAYMENT", "TRANSFER"],
    columns = "type"
)

retain = Retain(
    columns = ['step', 'amount','newbalanceOrig','oldbalanceDest','newbalanceDest','oldbalanceOrig', 'isFraud']
)

obj = valib.Transform(
    data = txn_data,
    one_hot_encode = txn_type_encoder,
    retain = retain,
    index_columns = 'txn_id'
)
txn_trans = obj.result
txn_trans

In [None]:
TrainTestSplit_out = TrainTestSplit(
    data = txn_trans,
    id_column = "txn_id",
    train_size = 0.80,
    test_size = 0.20,
    seed = 25
)

df_train = TrainTestSplit_out.result[TrainTestSplit_out.result['TD_IsTrainRow'] == 1].drop(['TD_IsTrainRow'], axis = 1)
df_test = TrainTestSplit_out.result[TrainTestSplit_out.result['TD_IsTrainRow'] == 0].drop(['TD_IsTrainRow'], axis = 1)


<p style = 'font-size:16px;font-family:Arial'>Creating XGBoost Model from the train dataset.</p>

In [None]:
cols = df_train.columns
cols.remove('txn_id')
cols.remove('step')
cols.remove('isFraud')

In [None]:
XGBoost_out = XGBoost(
    data=df_train,
    input_columns=cols,
    response_column = 'isFraud',
    lambda1 = 120.0,
    model_type='Classification',
    seed=42,
    shrinkage_factor=0.1,
    max_depth=7
)
XGBoost_out.output_data

In [None]:
XGBoostPredict_out = XGBoostPredict(
    newdata=df_test,
    object=XGBoost_out.result,
    model_type='Classification',
    id_column='txn_id',
    object_order_column=['task_index', 'tree_num',
                       'iter', 'tree_order'],
    accumulate='isFraud',
    output_prob=True,
    output_responses=['0', '1']
).result
XGBoostPredict_out

<p style = 'font-size:16px;font-family:Arial'>Let's evaluate the model.<br>
Detailed help can be found by passing function name to built-in help function.</p>

In [None]:
help(ClassificationEvaluator)

In [None]:
out = XGBoostPredict_out.assign(Prediction = XGBoostPredict_out.Prediction.cast(type_ = BYTEINT))
out = out.assign(Prediction = out.Prediction.cast(type_ = VARCHAR(2)),
                isFraud = out.isFraud.cast(type_ = VARCHAR(2)))


In [None]:
ClassificationEvaluator_obj = ClassificationEvaluator(
    data = out,
    observation_column = 'isFraud',
    prediction_column = 'Prediction',
    labels = ['0', '1']
)

In [None]:
ClassificationEvaluator_obj.output_data.head(10)

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>3. Cleanup</b>

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial'>The following code will clean up tables and databases created above.</p>

In [None]:
%run -i ../../UseCases/run_procedure.py "call remove_data('DEMO_GLM_Fraud');"        # Takes 10 seconds

In [None]:
remove_context()

<hr style="height:1px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>Dataset:</b>

- `txn_id`: transaction id
- `step`: maps a unit of time in the real world. In this case 1 step is 1 hour of time. Total steps 744 (31 days simulation).
- `type`: CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER
- `amount`: amount of the transaction in local currency
- `nameOrig`: customer who started the transaction
- `oldbalanceOrig`: customer's balance before the transaction
- `newbalanceOrig`: customer's balance after the transaction
- `nameDest`: customer who is the recipient of the transaction
- `oldbalanceDest`: recipient's balance before the transaction
- `newbalanceDest`: recipient's balance after the transaction
- `isFraud`: identifies a fraudulent transaction (1) and non fraudulent (0)
- `isFlaggedFraud`: flags illegal attempts to transfer more than 200,000 in a single transaction
<p style = 'font-size:16px;font-family:Arial'><b>Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Teradataml Python reference: <a href = 'https://docs.teradata.com/search/all?query=Python+Package+User+Guide&content-lang=en-US'>here</a></li>
    <li>ClassificationEvaluator function reference: <a href = 'https://docs.teradata.com/search/all?query=ClassificationEvaluator&content-lang=en-US'>here</a></li>
    
</ul>

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>