<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Financial Fraud Detection using AutoFraud
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial;'><b>Introduction</b></p>
<p style = 'font-size:16px;font-family:Arial;'>
    In recent years we have seen a massive increase in Fraud attempts, making fraud detection imperative for Banking and Financial Institutions. Despite countless efforts and human supervision, hundreds of millions of dollars are lost due to fraud. Fraud can happen using various methods, i.e., stolen credit cards, misleading accounting, phishing emails, etc. Due to small cases in significant populations, fraud detection has become more and more challenging. 
    <br>
    <br>
    With ClearScape Analytics, data scientists can use their preferred language, tools and platform to develop models to identify this fraud. Even in large scale operations, users have the guarantee that Vantage can scale to their needs and reduce fraud.</p>
    
<p style = 'font-size:18px;font-family:Arial;'><b>Business Values</b></p>
<ul style = 'font-size:16px;font-family:Arial;'>
    <li>Identification of financial fraud in multiple accounts</li>
    <li>Pattern recognition of fraudulent versus normal transactions</li>
    <li>Reduction of money lost due to recovering fraudulent charges</li>
    <li>Improved customer satisfaction and reduction of customer churn</li>
</ul>

<p style = 'font-size:18px;font-family:Arial;'><b>Why Vantage?</b></p>
<p style = 'font-size:16px;font-family:Arial;'>To maximize the business value of advanced analytic techniques including Machine Learning and Artificial Intelligence, it is estimated that organizations must scale their model development and deployment pipelines to 100s or 1000s of times greater amounts of data, models, or both.
    <br>
    <br>
    ClearScape Analytics provides powerful, flexible end-to-end data connectivity, feature engineering, model training, evaluation, and operational functions that can be deployed at scale as enterprise data assets; treating the products of ML and AI as first-class analytic processes in the enterprise.<br><b>AutoFraud </b>is a dedicated AutoML pipeline designed specifically for fraud detection 
    tasks. It automates the process of building, training, and evaluating models 
    tailored to identify fraudulent activities, streamlining the workflow for 
    fraud detection use cases.</p>

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial;'>1. Configuring the Environment</b>
<p style = 'font-size:16px;font-family:Arial;'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
# Standard Libraries
import os
import getpass
import warnings
warnings.filterwarnings("ignore")

# Teradata Libraries
from teradataml import *


<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial;'>2. Connect to Vantage</b>
<p style = 'font-size:16px;font-family:Arial;'>We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username = 'demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql("SET query_band='DEMO=Financial_Fraud_Detection_using_AutoFraud_Python.ipynb;' UPDATE FOR SESSION;")

<p style = 'font-size:16px;font-family:Arial;'>We begin running steps with Shift + Enter keys. </p>

<p style = 'font-size:20px;font-family:Arial;'><b>Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial;'>We have provided data for this demo on cloud storage. We have the option of either running the demo using foreign tables to access the data without using any storage on our environment or downloading the data to local storage, which may yield somewhat faster execution. However, we need to consider available storage. There are two statements in the following cell, and one is commented out. We may switch which mode we choose by changing the comment string.</p>

In [None]:
# %run -i ../run_procedure.py "call get_data('DEMO_GLM_Fraud_cloud');"        # Takes 1 minute
%run -i ../run_procedure.py "call get_data('DEMO_GLM_Fraud_local');"        # Takes 2 minutes

<p style = 'font-size:16px;font-family:Arial;'>Optional step – We should execute the below step only if we want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../run_procedure.py "call space_report();"        # Takes 10 seconds

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial;'>3. Data Exploration</b>
<p style = 'font-size:16px;font-family:Arial;'>We loaded the data from <a href = 'https://www.kaggle.com/code/georgepothur/4-financial-fraud-detection-xgboost/data'>https://www.kaggle.com/code/georgepothur/4-financial-fraud-detection-xgboost/data</a> into Vantage in a table named "transaction_data". We checked the data size and printed sample rows: 63k rows and 12 columns.</p>
<p style = 'font-size:16px;font-family:Arial;'><b><i>*Please scroll down to the end of the notebook for detailed column descriptions of the dataset.</i></b></p>

In [None]:
fraud_df = DataFrame(in_schema('DEMO_GLM_Fraud', 'transaction_data'))

print(fraud_df.shape)
fraud_df

<p style = 'font-size:16px;font-family:Arial;'>In this simulated scenario, deceptive agents engage in transactions with the objective of taking control of customers' accounts, transferring funds to another account, and ultimately cashing out for profit.<br> The AutoFraud ML pipeline does most of the tasks itself we have to provide the dataset and it does Feature Exploration,Feature Engineering,Data Preparation,Model Training and Evaluation.

In [None]:
#rename the type column to payment_type as type is reseved word in Teradata
fraud_df = fraud_df.assign(payment_type = fraud_df.type)

In [None]:
#dropping extra columns
fraud_df = fraud_df.drop(['type','isFlaggedFraud','txn_id'], axis=1)

In [None]:
#creating train and test datasets
fraud_df_sample = fraud_df.sample(frac = [0.8, 0.2])

In [None]:
fraud_df_train= fraud_df_sample[fraud_df_sample['sampleid'] == 1].drop('sampleid', axis=1)
fraud_df_test = fraud_df_sample[fraud_df_sample['sampleid'] == 2].drop('sampleid', axis=1)

In [None]:
fraud_df_test.shape

In [None]:
fraud_df_train.shape

In [None]:
fraud_df_train.head()

In [None]:
fraud_df_test.head()

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial;'>4. Fitting AutoFraud</b>

In [None]:
#help(AutoFraud)

In [None]:
fd2 = AutoFraud(verbose=2,
                stopping_metric="MACRO-F1",
                stopping_tolerance=0.7,
                max_runtime_secs=100)

In [None]:
fd2.fit(fraud_df_train, 'isFraud')

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial;'>5. Leaderboard</b>

In [None]:
fd2.leaderboard() 

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial;'>6. Best Performing Model</b>

In [None]:
fd2.leader()

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial;'>7. Get Hyperparameter for Trained Model</b>

In [None]:
fd2.model_hyperparameters(rank=1)

In [None]:
fd2.model_hyperparameters(rank=2)

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial;'>8. Generate Prediction and Performance Metrics</b>

In [None]:
prediction = fd2.predict(fraud_df_test)

In [None]:
prediction.head()

In [None]:
performance_metrics = fd2.evaluate(fraud_df_test)

In [None]:
performance_metrics

In [None]:
prediction2 = fd2.predict(fraud_df_test, rank=9)

In [None]:
performance_metrics2 = fd2.evaluate(fraud_df_test, rank=9)

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial;'>9. Deploy Models</b>
<p style = 'font-size:16px;font-family:Arial;'>Function saves models to the specified table name.</p>

In [None]:
fd2.deploy(table_name='top_5_models', top_n=5)

<p style = 'font-size:18px;font-family:Arial;'><b>Conclusion</b></p>

<p style = 'font-size:16px;font-family:Arial;'>In this demonstration, we presented a simplified yet end-to-end view of implementing a typical fraud detection machine learning workflow entirely within Teradata Vantage using the teradataml AutoFraud pipeline. By executing the full workflow in-database, we are able to fully leverage Vantage’s operational scale, performance, and enterprise-grade stability, enabling efficient and reliable fraud analytics without data movement.</p>

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial;'>10. Cleanup</b>

<p style = 'font-size:18px;font-family:Arial;'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial;'>We need to clean up our work tables to prevent errors next time.</p>

In [None]:
tables = ['top_5_models']

# Loop through the list of tables and execute the drop table command for each table
for table in tables:
    try:
        db_drop_table(table_name = table)
    except:
        pass

<p style = 'font-size:18px;font-family:Arial;'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial;'>We will use the following code to clean up tables and databases created for this demonstration.</p>

In [None]:
%run -i ../run_procedure.py "call remove_data('demo_glm_fraud');"        # Takes 5 seconds

In [None]:
remove_context()

<hr style="height:2px;border:none;">

<b style = 'font-size:20px;font-family:Arial;'>Dataset:</b>

- `txn_id`: transaction id
- `step`: maps a unit of time in the real world. In this case 1 step is 1 hour of time. Total steps 744 (31 days simulation).
- `type`: CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER
- `amount`: amount of the transaction in local currency
- `nameOrig`: customer who started the transaction
- `oldbalanceOrig`: customer's balance before the transaction
- `newbalanceOrig`: customer's balance after the transaction
- `nameDest`: customer who is the recipient of the transaction
- `oldbalanceDest`: recipient's balance before the transaction
- `newbalanceDest`: recipient's balance after the transaction
- `isFraud`: identifies a fraudulent transaction (1) and non fraudulent (0)
- `isFlaggedFraud`: flags illegal attempts to transfer more than 200,000 in a single transaction

<p style = 'font-size:18px;font-family:Arial;'><b>Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial;'>
    <li>Uses a dataset and feature discovery methods outlined here: <a href = 'https://www.kaggle.com/georgepothur/4-financial-fraud-detection-xgboost/notebook'>https://www.kaggle.com/georgepothur/4-financial-fraud-detection-xgboost/notebook</a></li>
    <li>Teradataml AutoFraud Python reference: <a href = 'https://docs.teradata.com/search/all?query=AutoFraud&content-lang=en-US'>here</a></li>
</ul>

<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2026. All Rights Reserved
        </div>
    </div>
</footer>