<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       TargetEncodingFit and TargetEncodingTransform Functions in Vantage
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial'><b>Introduction</b></p>
<p style = 'font-size:16px;font-family:Arial'>The TargetEncodingFit function generally uses the likelihood or expected
    value of the target variable for each category and encodes that category with
    that value. This technique works for both binary classification and regression
    and for multiclass classification a similar technique is applied, which encodes
    the categorical variable with k new variables, where k is the number of classes.<br>
    The TargetEncodingTransform function takes the input data
    and a Fit data generated by the TargetEncodingFit function
    for encoding the categorical values.
<br> In this notebook we will see how we can use the TargetEncodingFit and TargetEncodingTransform functions available in Vantage.</p>

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>1. Initiate a connection to Vantage</b>

In [None]:
from teradataml import *

# Modify the following to match the specific client environment settings
display.max_rows = 5

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'><b>1.1 Connect to Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../../UseCases/startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=PP_TargetEncodingFitandTransform_Python.ipynb;' UPDATE FOR SESSION; ''')

<hr style='height:1px;border:none;'>

<p style = 'font-size:18px;font-family:Arial'><b>1.2 Getting Data for This Demo</b></p>

<p style = 'font-size:16px;font-family:Arial'>We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.</p>

In [None]:
%run -i ../../UseCases/run_procedure.py "call get_data('DEMO_BankChurn_cloud');"        # Takes 30 seconds
#%run -i ../../UseCases/run_procedure.py "call get_data('DEMO_BankChurn_local');" 

In [None]:
%run -i ../../UseCases/run_procedure.py "call space_report();"        # Takes 10 seconds

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>2. Data Exploration</b>
<p style = 'font-size:16px;font-family:Arial'>Create a "Virtual DataFrame" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.</p>

In [None]:
tdf = DataFrame(in_schema("DEMO_BankChurn", "customer_churn"))
print("Shape of the data: ", tdf.shape)
tdf

In [None]:
tdf.tdtypes

<p style = 'font-size:16px;font-family:Arial'>A bank wants to analyze how customer demographics and financial behavior impact churn. Traditional encoding methods for categorical variables (like Geography and Gender) may not capture the true relationship between these features and churn. To address this, we apply Target Encoding, which replaces categorical values with statistically meaningful numerical representations based on the churn rate (Exited).
</p>

In [None]:
# Identify categorical columns that need encoding
categorical_columns = ["Geography", "Gender"]

In [None]:
# Generate categorical summary to find distinct values and counts
categorical_sum = CategoricalSummary(data=tdf, target_columns=categorical_columns)

# Display distinct values and counts
categorical_sum.result

In [None]:
# Extract required category count data
category_data = categorical_sum.result.groupby('ColumnName').count()
category_data = category_data.assign(
    drop_columns=True,
    ColumnName=category_data.ColumnName,
    CategoryCount=category_data.count_DistinctValue
)

In [None]:
help(TargetEncodingFit)

In [None]:
# Generate target encoding mappings using 'Exited' column as response
fit_result = TargetEncodingFit(
    data=tdf,
    category_data=category_data,
    encoder_method='CBM_BETA',  # Choosing CBM_BETA method
    target_columns=categorical_columns,
    response_column='Exited',  # Churn indicator
    default_values=[-1, -2]  # Default encoding for unknown categories
)

<p style = 'font-size:16px;font-family:Arial'>TargetEncodingTransform replaces categorical values with numerical target encoded values.
</p>

In [None]:
help(TargetEncodingTransform)

In [None]:
# Apply the encoding transformation
transformed_data = TargetEncodingTransform(
    data=tdf,
    object=fit_result,
    accumulate=["CustomerId", "CreditScore", "Age", "Balance", "Exited"]  # Keeping relevant columns
)

<p style = 'font-size:16px;font-family:Arial'>The output shows the transformed dataset where Geography and Gender are now numerically encoded based on customer churn. The dataset is now suitable for machine learning models.
</p>

In [None]:
# Display transformed dataset
transformed_data.result

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>3. Cleanup</b>

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial'>The following code will clean up tables and databases created above.</p>

In [None]:
%run -i ../../UseCases/run_procedure.py "call remove_data('DEMO_BankChurn');"        # Takes 10 seconds

In [None]:
remove_context()

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>Dataset:</b>

- `RowNumber`: Row index
- `CustomerId`: Unique customer ID
- `Surname`: Customer's surname
- `CreditScore`: Credit score of the customer
- `Geography`: Country (Germany / France / Spain)
- `Gender`: Gender (Male / Female)
- `Age`: Age of the customer
- `Tenure`: Number of years the customer has been associated with the bank
- `Balance`: Account balance
- `NumOfProducts`: Number of bank products used
- `HasCrCard`: Credit card status (0 = No, 1 = Yes)
- `IsActiveMember`: Active membership status (0 = No, 1 = Yes)
- `EstimatedSalary`: Estimated salary of the customer
- `Exited`: Customer churn status (0 = No, 1 = Yes)

<p style = 'font-size:16px;font-family:Arial'><b>Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Teradataml Python reference: <a href = 'https://docs.teradata.com/search/all?query=Python+Package+User+Guide&content-lang=en-US'>here</a></li>
    <li>TargetEncodingFit function reference: <a href = 'https://docs.teradata.com/search/all?query=TargetEncodingFit&value-filters=prodname~%2522Teradata+Package+for+Python%2522*vrm_release~%252220.00.00.03%2522&content-lang=en-US&_gl=1*3a7qi*_gcl_aw*R0NMLjE3MzMyMDc4MjguRUFJYUlRb2JDaE1JeVpYM3BQNktpZ01WSWpLREF4MmluUmowRUFBWUFTQUFFZ0tSRVBEX0J3RQ..*_gcl_au*MTM2MDk0NzQ4OS4xNzM3NTI3NTA5*_ga*NTU2MTUwNDQ1LjE2OTM4MDU3NjE.*_ga_7PE2TMW3FE*MTczOTE2Nzc1NS4xNTUuMS4xNzM5MTY3ODI1LjYwLjAuMA..'>here</a></li>
    <li>TargetEncodingTransform function reference: <a href = 'https://docs.teradata.com/search/all?query=TargetEncodingTransform&value-filters=prodname~%2522Teradata+Package+for+Python%2522*vrm_release~%252220.00.00.03%2522&content-lang=en-US&_gl=1*3a7qi*_gcl_aw*R0NMLjE3MzMyMDc4MjguRUFJYUlRb2JDaE1JeVpYM3BQNktpZ01WSWpLREF4MmluUmowRUFBWUFTQUFFZ0tSRVBEX0J3RQ..*_gcl_au*MTM2MDk0NzQ4OS4xNzM3NTI3NTA5*_ga*NTU2MTUwNDQ1LjE2OTM4MDU3NjE.*_ga_7PE2TMW3FE*MTczOTE2Nzc1NS4xNTUuMS4xNzM5MTY3ODI1LjYwLjAuMA..'>here</a></li></li>
</ul>

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>