<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       ModelOps : In-Database DecisionForest using Git for Telco Churn
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style ='font-size:18px;font-family:Arial;color:#00233C'><b>Introduction</b></p>

<p style ='font-size:16px;font-family:Arial;color:#00233C'>This Notebook is a part of the Teradata End-to-End Telco Customer Churn usecase and should be executed only after the Traditional Approach notebook is executed.</p>

<p style='font-size:16px;font-family:Arial;color:#00233C'>In this Notebook will we go throught the process on how to work with ClearScape Analytics in-database functions with ModelOps. With in-database analytics you can solve your scalable challenges by using Vantage to train and score your models. Whether you have a big volume of data or you want to avoid the data movement implementation to train models outside Vantage, you can use ModelOps to manage your Catalog of Models from multiple platforms including in-database algorithms.<br>To know more about in-database algorithms review teradata official documentation.</p>
 
<p style='font-size:16px;font-family:Arial;color:#00233C'>This notebook will cover the Operationalization of the Telco Customer Churn use case with Python using the Teradata In-database XGBoost model. <strong>XGBoost</strong> is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It is one of the most used libraries by the community that solve many data science problems in a fast and accurate way.</p>

<hr style="height:2px;border:none;background-color:#00233C;">
<p><b style = 'font-size:20px;font-family:Arial;color:#00233C'>1. Configure the Environment</b></p>


<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>1.1 Set up Git repository</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will need to set up git repository for using the In-Db functions in the ModelOps cycle.</p>

<div class="alert alert-block alert-warning">
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Note: </b><i>In case  the <b>git repository</b> for your user is not set up, please refer to the notebook located at</i> <b>ModelOps/06_ModelOps_GIT_Project_Setup.ipynb</b> and set up the git repository using the steps mentioned in the notebook. This will be required for execution of the steps below to create the ModelOps cycle for this End-to_End Demo.</p>
    <a href="./06_ModelOps_GIT_Project_Setup.ipynb" style="font-size:16px; font-family:Arial; color:white; background-color:#017373; padding:10px 20px; border-radius:5px; text-decoration:none; display:inline-block;">
  06_ModelOps_GIT_Project_Setup.ipynb &gt;&gt;
    </a>
</div>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>1.2 Libraries installation</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>A restart of the Kernel is needed to confirm changes</b>. We use -q parameter for a non-verbose log of the installation command, you may remove this parameter if you want to know all the steps of the pip installation.</p>

In [1]:
#%pip install -q teradataml==17.20.0.6 teradatamodelops==7.0.3 matplotlib==3.8.2
%pip install -q teradataml==20.0.0.3 teradatasqlalchemy==20.0.0.3 teradatamodelops==7.0.6 matplotlib==3.8.2 scikit-learn==1.1.3 

Note: you may need to restart the kernel to use updated packages.


<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Hint:</b><i>The easy way to restart the kernel to bring the above installed software into memory is to type zero zero (<b> 0 0 </b>). </i></p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>1.2 Libraries import</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [1]:
from teradataml import (
    create_context, 
    remove_context,
    get_context,
    get_connection,
    DataFrame,
    TrainTestSplit,
    copy_to_sql,
    db_drop_table,
    configure,
    execute_sql
)
import os
import getpass
import logging
import sys

<hr style="height:2px;border:none;background-color:#00233C;">
<p><b style = 'font-size:20px;font-family:Arial;color:#00233C'>2. Connect to Vantage</b></p>

<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, then use down arrow to go to next cell. Begin running steps with Shift + Enter keys.</p>

In [2]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

Performing setup ...
Setup complete



Enter password:  ········


... Logon successful
Connected as: xxxxxsql://demo_user:xxxxx@host.docker.internal/dbc
Engine(teradatasql://demo_user:***@host.docker.internal)


<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>2.1 Set up Install locations and model local path</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here, we will configure the install locations for VAL and BYOM. We will also create a local path(similar to the git path) for the code which will be used later to commit the code to the git repository.</p>

In [3]:
%%capture
execute_sql('''SET query_band='DEMO=PP_Telco_EndtoEnd_ModelOps_GIT_Python_indb_DF .ipynb;' UPDATE FOR SESSION; ''')

# configure byom/val installation
configure.val_install_location = "VAL"
configure.byom_install_location = "MLDB"

# set the path to the local project repository for this model demo
model_local_path = '~/modelops-demo-models/model_definitions/telco_python_indb_decisionForest'
res = os.system(f'mkdir -p {model_local_path}/model_modules')

In [4]:
print(model_local_path)

~/modelops-demo-models/model_definitions/telco_python_indb_decisionForest


<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>2.2 Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Since this is a continuation of the Traditional approach notebook, we will be using the data from the same table we created in the earlier notebook.</p>

In [5]:
transformed_data = DataFrame('Transformed_data')
transformed_data

CustomerID,PaymentMethod,Contract,TotalCharges,PhoneService,TechSupport,Churn,MultipleLines,Gender,Tenure,StreamingMovies,Dependents,SeniorCitizen,StreamingTV,InternetService,DeviceProtection,PaperlessBilling,OnlineSecurity,Partner,OnlineBackup,MonthlyCharges
4829-AUOAX,0,0,0.5065747052321298,1,0,1,1,0,46,1,0,0,1,1,0,0,0,0,0,0.7741293532338308
1658-BYGOY,2,0,0.2017950902726603,1,0,1,1,1,18,1,0,1,1,1,0,1,0,0,0,0.7681592039800995
2990-IAJSV,0,2,0.7637193717759765,1,1,0,1,1,72,1,0,0,1,0,1,0,1,0,1,0.7338308457711443
6330-JKLPC,0,0,0.103272383935151,1,0,1,1,1,11,0,0,0,0,1,0,1,0,1,1,0.6194029850746269
6685-GBWJZ,1,1,0.5122512896094327,1,1,0,0,1,63,0,0,0,1,0,0,0,1,1,1,0.5228855721393034
4973-MGTON,1,2,0.6873272844509949,1,1,0,0,0,71,1,0,0,1,0,1,1,1,1,1,0.6582089552238807
0654-PQKDW,0,1,0.4909094049373618,1,1,0,0,0,62,0,1,0,1,0,1,1,1,1,0,0.5223880597014925
2651-ZCBXV,1,2,0.6633025515843773,1,1,0,1,1,54,1,0,0,1,1,0,1,1,0,1,0.8930348258706468
4009-ALQFH,2,0,0.2727811809137804,1,0,1,0,0,25,1,0,0,1,1,1,1,0,0,1,0.8084577114427861
5736-YEJAX,1,2,0.6335839627855564,1,1,0,1,1,69,0,1,0,1,0,1,1,1,0,1,0.6089552238805971



<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We use the TrainTestSplit function to divide the dataset into train and test dataset which will be copied to Vantge to be used in the ModelOps cycle.</p>

In [6]:
TrainTestSplit_out = TrainTestSplit(
                                    data = DataFrame('Transformed_data'),
                                    id_column = "CustomerID",
                                    train_size = 0.75,
                                    test_size = 0.25,
                                    seed = 21
)

In [7]:
# Split into 2 virtual dataframes
df_train = TrainTestSplit_out.result[TrainTestSplit_out.result['TD_IsTrainRow'] == 1].drop(['TD_IsTrainRow'], axis = 1)
df_test = TrainTestSplit_out.result[TrainTestSplit_out.result['TD_IsTrainRow'] == 0].drop(['TD_IsTrainRow'], axis = 1)

In [8]:
copy_to_sql(df_train, table_name='transform_data_train', if_exists='replace')
copy_to_sql(df_test, table_name='transform_data_test', if_exists='replace')

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>2.3 Creating predictions and model table</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will create a predictions table where we get our model predictions and the model table where we will upload the model created.</p>

In [9]:
# %run -i ../run_procedure.py "call get_data('DEMO_ModelOps_local');"        # Takes 30 seconds

In [9]:
#ddl for Aoa_Byom_Models 
query = '''CREATE SET TABLE DEMO_USER.Aoa_Byom_Models 
     (
      model_version VARCHAR(255) CHARACTER SET LATIN NOT CASESPECIFIC,
      model_id VARCHAR(255) CHARACTER SET LATIN NOT CASESPECIFIC,
      model_type VARCHAR(255) CHARACTER SET LATIN NOT CASESPECIFIC,
      project_id VARCHAR(255) CHARACTER SET LATIN NOT CASESPECIFIC,
      deployed_at TIMESTAMP(6) DEFAULT CURRENT_TIMESTAMP(6),
      model BLOB(2097088000))
UNIQUE PRIMARY INDEX ( model_version );
'''
 
try:
    execute_sql(query)
except:
    execute_sql('DROP TABLE DEMO_USER.Aoa_Byom_Models;')
    execute_sql(query)

In [10]:
#ddl for Telco_Churn_Predictions
query = '''CREATE MULTISET TABLE Telco_Churn_Predictions 
     (
      job_id VARCHAR(255) CHARACTER SET LATIN NOT CASESPECIFIC,
      CustomerID VARCHAR(10) CHARACTER SET LATIN,
      Churn BYTEINT,
      json_report CLOB(1048544000) CHARACTER SET LATIN)
PRIMARY INDEX ( CustomerID );
'''
try:
    execute_sql(query)
except:
    db_drop_table('Telco_Churn_Predictions')
    execute_sql(query) 


In [11]:
#ddl for collecting feature stats
query = '''CREATE SET TABLE aoa_stats 
     (
      column_name VARCHAR(1024) CHARACTER SET LATIN NOT CASESPECIFIC,
      column_type VARCHAR(1024) CHARACTER SET LATIN NOT CASESPECIFIC,
      stats VARCHAR(1024) CHARACTER SET LATIN NOT CASESPECIFIC,
      update_ts VARCHAR(1024) CHARACTER SET LATIN NOT CASESPECIFIC)
UNIQUE PRIMARY INDEX ( column_name );;
'''
try:
    execute_sql(query)
except:
    db_drop_table('aoa_stats')
    execute_sql(query) 


<p style = 'font-size:16px;font-family:Arial'>Next is an optional step – if you want to see the status of databases/tables created and space used.</p>

In [14]:
%run -i ../run_procedure.py "call space_report();"        # Takes 10 seconds

You have:  #databases=9 #tables=169 #views=261  You have used 80.4 MB of 30,678.3 MB available - 0.3%  ... Space Usage OK
 
   Database Name                  #tables  #views     Avail MB      Used MB
   demo_user                          125     256  27,598.5 MB      66.2 MB 
   DEMO_GLM_Fraud                       0       1       0.0 MB       0.0 MB 
   DEMO_GLM_Fraud_db                    1       0     195.9 MB       7.3 MB 
   DEMO_ModelOps                        0       3       0.0 MB       0.0 MB 
   DEMO_ModelOps_db                     3       0      19.1 MB       0.6 MB 
   DEMO_Telco                           0       1       0.0 MB       0.0 MB 
   DEMO_Telco_db                        1       0       3.8 MB       0.8 MB 
   FinFraud                            13       0     953.7 MB       2.0 MB 
   FinRepo                             13       0     953.7 MB       1.6 MB 
   TelcoFS                             13       0     953.7 MB       2.0 MB 


<hr style="height:2px;border:none;background-color:#00233C;">
<p><b style = 'font-size:20px;font-family:Arial;color:#00233C'>3. Define Training, Evaluation and Scoring functions </b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will need to create below 3 .py files to be used in the ModelOps cycle.</p>

<li style = 'font-size:16px;font-family:Arial;color:#00233C'><code>training.py</code>: The code using In-Db functions to train the model.</li>

<li style = 'font-size:16px;font-family:Arial;color:#00233C'><code>evaluation.py</code>: The code using In-Db functions to evaluate the model.</li>

<li style = 'font-size:16px;font-family:Arial;color:#00233C'><code>scoring.py</code>: The code using In-Db functions for scoring new data.</li>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The steps below show the way of creating these 3 files and test the code before commiting it to the repository. After testing the code we set up various configuration files and than push the .py and configuration files to the git to be used from the ModelOps UI.</p> 
   

<hr style="height:1px;border:none;background-color:#00233C;">
<p><b style = 'font-size:20px;font-family:Arial;color:#00233C'>3.1 Define Training Function</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The training function takes the following shape </p>

```python
def train(context: ModelContext, **kwargs):
    aoa_create_context()
    
    # your training code using teradataml indDB function
    model = <InDB Function>(...)
    
    # save your model
    model.result.to_sql(f"model_${context.model_version}", if_exists="replace")  
    
    record_training_stats(...)
```
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>You can execute this from the CLI or directly within the notebook as shown. The below code will created the training.py file in the local model path.</p>

In [12]:
%%writefile $model_local_path/model_modules/training.py
from teradataml import (
    DataFrame,
    DecisionForest,
    ScaleFit,
    ScaleTransform,
    OrdinalEncodingFit,
    ColumnTransformer,
    Shap
)

from aoa import (
    record_training_stats,
    aoa_create_context,
    ModelContext
)

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import json
from collections import Counter


def compute_feature_importance(feat_df):
    df = feat_df.to_pandas()
    df = df.T.reset_index()
    df=df.rename(columns={'index': 'Feature', 0: 'Importance'})
    df['Feature'] = df['Feature'].str.replace('TD_', '')
    df['Feature'] = df['Feature'].str.replace('_SHAP', '')
    return df


def compute_feature_explain(explain_df):
    explain_df = explain_df.drop(['CustomerID','label','tree_num'],axis=1)
    shap_mean = explain_df.agg(['min', 'max'])
    df = shap_mean.to_pandas()
    df = df.T.reset_index()
    df=df.rename(columns={'index': 'Feature', 0: 'Importance'})
    mean_positive = df[df['Importance'] > 0]
    mean_negative = df[df['Importance'] < 0]
    mean_positive['Feature'] = mean_positive.loc[:,'Feature'].str.replace('max_TD_', '')
    mean_positive['Feature'] = mean_positive.loc[:,'Feature'].str.replace('_SHAP', '')
    mean_negative['Feature'] = mean_negative.loc[:,'Feature'].str.replace('min_TD_', '')
    mean_negative['Feature'] = mean_negative.loc[:,'Feature'].str.replace('_SHAP', '')
    # mean_positive['Feature'] = mean_positive['Feature'].str.replace('max_TD_', '')
    # mean_positive['Feature'] = mean_positive['Feature'].str.replace('_SHAP', '')
    # mean_negative['Feature'] = mean_negative['Feature'].str.replace('min_TD_', '')
    # mean_negative['Feature'] = mean_negative['Feature'].str.replace('_SHAP', '')
    return mean_positive,mean_negative
    

def plot_feature_importance(df, img_filename):
    df = df.sort_values(by="Importance", ascending=False)
    # Plot the bar graph
    plt.figure(figsize=(10, 8))
    sns.barplot(x="Importance",y="Feature",data=df, palette="viridis")
    plt.title("Feature Importance")
    plt.xlabel("SHAP Importance Value")
    plt.ylabel("Features")
    plt.tight_layout()
    fig = plt.gcf()
    fig.savefig(img_filename, dpi=500)
    plt.clf()
    
def plot_feature_explain(mean_positive,mean_negative, img_filename):
    fig, ax = plt.subplots(figsize=(10, 6))
    bar_width = 0.35

    ax.barh(mean_positive["Feature"], mean_positive["Importance"],color='salmon', label='-1 (positive)') 
    ax.barh(mean_negative["Feature"], mean_negative["Importance"],color='cyan', label='1 (negative)')
    ax.set_xlabel("mean(|SHAP value|)")
    ax.set_title("Mean shap for all samples")
    ax.legend(title="sign")
    plt.gca().invert_yaxis()
    plt.tight_layout()
    # plt.show()
    fig = plt.gcf()
    fig.savefig(img_filename, dpi=500)
    plt.clf()    
    
def train(context: ModelContext, **kwargs):
    aoa_create_context()
    
    # Extracting feature names, target name, and entity key from the context
    feature_names = context.dataset_info.feature_names
    target_name = context.dataset_info.target_names[0]
    entity_key = context.dataset_info.entity_key

    # Load the training data from Teradata
    train_df = DataFrame.from_query(context.dataset_info.sql)
    
    print("Starting training...")

    # Train the model using XGBoost
    model = DecisionForest(data = train_df,
                input_columns = ["Tenure", "InternetService", "OnlineSecurity", "SeniorCitizen",
                                    "PaymentMethod", "OnlineBackup", "Dependents", "Partner", "MultipleLines", 
                                    "StreamingMovies", "Gender", "PhoneService", "TotalCharges", "Contract", 
                                    "MonthlyCharges", "DeviceProtection", "PaperlessBilling", "StreamingTV", 
                                    "TechSupport"],
                response_column = 'Churn',
                family = 'Binomial',
                min_impurity= 0.0,
                max_depth= 5,
                min_node_size= 1,
                num_trees= -1,
                seed= 42,
                tree_type = 'CLASSIFICATION')
    
    # XGBoost(
    #                data = train_df,
    #                input_columns = ["Tenure", "InternetService", "OnlineSecurity", "SeniorCitizen",
    #                                 "PaymentMethod", "OnlineBackup", "Dependents", "Partner", "MultipleLines", 
    #                                 "StreamingMovies", "Gender", "PhoneService", "TotalCharges", "Contract", 
    #                                 "MonthlyCharges", "DeviceProtection", "PaperlessBilling", "StreamingTV", 
    #                                 "TechSupport"],
    #                response_column = 'Churn',
    #                model_type = 'CLASSIFICATION',
    #                  )

    # Save the trained model to SQL
    model.result.to_sql(f"model_${context.model_version}", if_exists="replace")  
    print("Saved trained model")
    
    #Shap explainer 
    Shap_out = Shap(data=train_df, 
                object=model.result, 
                id_column='CustomerID',
                training_function="TD_DecisionForest", 
                model_type="Classification",
                input_columns=feature_names, 
                detailed=True)
    
    feat_df = Shap_out.output_data
    explain_df = Shap_out.result
    # print(explain_df)

 
    df = compute_feature_importance(feat_df)
    plot_feature_importance(df, f"{context.artifact_output_path}/feature_importance")
    pos_expl_df, neg_expl_df = compute_feature_explain(explain_df)
    # print(pos_expl_df)
    # print(neg_expl_df)
    plot_feature_explain(pos_expl_df,neg_expl_df, f"{context.artifact_output_path}/feature_explainability")

    # categorical=["Partner", "Dependents","Contract", "SeniorCitizen","PaymentMethod", "TechSupport", "StreamingTV", "OnlineSecurity", 
    #                                     "PhoneService", "OnlineBackup", "MultipleLines", "DeviceProtection", "Gender", 
    #                                     "PaperlessBilling", "StreamingMovies", "InternetService"]


    # record_training_stats(
    #     train_df,
    #     features=["Tenure",  "MonthlyCharges",  "TotalCharges"],
    #     targets=[target_name],
    #     categorical=categorical,
    #     # feature_importance=feature_importance,
    #     context=context
    # )

    categorical=["Partner", "Dependents","Contract", "SeniorCitizen","PaymentMethod", "TechSupport", "StreamingTV", "OnlineSecurity", 
                                        "PhoneService", "OnlineBackup", "MultipleLines", "DeviceProtection", "Gender", 
                                        "PaperlessBilling", "StreamingMovies", "InternetService", "Churn"]

    record_training_stats(
        train_df,
        features=feature_names,
        targets=[target_name],
        categorical=categorical
        ,
        # feature_importance=feature_importance,
        context=context
    )
    
    print("All done!")

Writing /home/jovyan/modelops-demo-models/model_definitions/telco_python_indb_decisionForest/model_modules/training.py


<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b> Test the train </b><code>train(context: ModelContext, **kwargs) </code> <b>function</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The below code is for testing the train function and will not be a part of the git</p>

In [13]:
# Define the ModelContext to test with. The ModelContext is created and managed automatically by ModelOps 
# when it executes your code via CLI / UI. However, for testing in the notebook, you can define as follows

# define the training dataset 
sql = """
SELECT 
    * from demo_user.transform_data_train;
"""

feature_metadata =  {
    "database": "DEMO_USER",
    "table": "aoa_stats"
}
hyperparams = {
    "model_type": "Classification",
    "scale_method":"RANGE",
    "miss_value":"KEEP",
    "global_scale": "False",
    "multiplier":"1",
    "intercept":"0",
    "max_depth": 8,
    "num_boosted_trees": 100,
    "tree_size": 0.5,
    "lambda1" : 1.0
}

entity_key = "CustomerID"
target_names = ["Churn"]
feature_names = ["Tenure", "InternetService", "OnlineSecurity", "SeniorCitizen", "PaymentMethod", "OnlineBackup", "Dependents", 
"Partner", "MultipleLines", "StreamingMovies", "Gender", "PhoneService", "TotalCharges", "Contract", 
"MonthlyCharges", "DeviceProtection", "PaperlessBilling", "StreamingTV", "TechSupport"]
 
from aoa import ModelContext, DatasetInfo

dataset_info = DatasetInfo(sql=sql,
                           entity_key=entity_key,
                           feature_names=["Tenure",  "MonthlyCharges",  "TotalCharges"],
                           target_names=target_names,
                           categorical=["Partner", "Dependents","Contract", "SeniorCitizen","PaymentMethod", "TechSupport", "StreamingTV", "OnlineSecurity", 
                                        "PhoneService", "OnlineBackup", "MultipleLines", "DeviceProtection", "Gender", 
                                        "PaperlessBilling", "StreamingMovies", "InternetService"],
                           feature_metadata=feature_metadata)

ctx = ModelContext(hyperparams=hyperparams,
                   dataset_info=dataset_info,
                   artifact_output_path="./artifacts",
                   model_version="indb_df_v1",
                   model_table="model_indb_df_v1")

sys.path.append(os.path.expanduser(f"{model_local_path}/model_modules"))
import training
training.train(context=ctx)

Starting training...
Saved trained model


Feature tenure doesn't have statistics metadata defined, and will not be monitored.
In order to enable monitoring for this feature, make sure that statistics metadata is availabe in DEMO_USER.aoa_stats
Feature totalcharges doesn't have statistics metadata defined, and will not be monitored.
In order to enable monitoring for this feature, make sure that statistics metadata is availabe in DEMO_USER.aoa_stats
Feature monthlycharges doesn't have statistics metadata defined, and will not be monitored.
In order to enable monitoring for this feature, make sure that statistics metadata is availabe in DEMO_USER.aoa_stats
Feature churn doesn't have statistics metadata defined, and will not be monitored.
In order to enable monitoring for this feature, make sure that statistics metadata is availabe in DEMO_USER.aoa_stats


All done!


<Figure size 720x576 with 0 Axes>

<Figure size 720x432 with 0 Axes>

In [14]:
# Check the generated files
!ls -lh artifacts

total 464K
-rw-r--r-- 1 jovyan users 122K Apr  4 06:26 confusion_matrix.png
-rw-r--r-- 1 jovyan users 2.0K Apr  4 07:12 data_stats.json
-rw-r--r-- 1 jovyan users 148K Apr  4 07:12 feature_explainability.png
-rw-r--r-- 1 jovyan users 145K Apr  4 07:12 feature_importance.png
-rw-r--r-- 1 jovyan users  242 Apr  4 06:26 metrics.json
-rw-r--r-- 1 jovyan users  35K Apr  4 06:26 roc_curve.png


<hr style="height:1px;border:none;background-color:#00233C;">
<p><b style = 'font-size:18px;font-family:Arial;color:#00233C'>3.2 Define Evaluation Function</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The evaluation function takes the following shape</p>

```python
def evaluate(context: ModelContext, **kwargs):
    aoa_create_context()

    # read your model from Vantage
    model = DataFrame(f"model_${context.model_version}")
    
    # your evaluation logic
    
    record_evaluation_stats(...)
```
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>You can execute this from the CLI or directly within the notebook as shown. The below code will created the evaluation.py file in the local model path.</p>

In [15]:
%%writefile $model_local_path/model_modules/evaluation.py
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from teradataml import(
    DataFrame, 
    copy_to_sql, 
    get_context, 
    get_connection, 
    OrdinalEncodingFit, 
    ScaleFit,
    ColumnTransformer,
    TDDecisionForestPredict, 
    ConvertTo, 
    ClassificationEvaluator,
    ROC,
    Shap
)
from aoa import (
    record_evaluation_stats,
    save_plot,
    aoa_create_context,
    ModelContext
)

import joblib
import json
import numpy as np
import pandas as pd
import os


# Define function to plot a confusion matrix from given data
def plot_confusion_matrix(cf, img_filename):
    import matplotlib.pyplot as plt
    fig, ax = plt.subplots(figsize=(7.5, 7.5))
    ax.matshow(cf, cmap=plt.cm.Blues, alpha=0.3)
    for i in range(cf.shape[0]):
        for j in range(cf.shape[1]):
            ax.text(x=j, y=i,s=cf[i, j], va='center', ha='center', size='xx-large')
    ax.set_xlabel('Predicted labels');
    ax.set_ylabel('True labels'); 
    ax.set_title('Confusion Matrix');
    fig = plt.gcf()
    fig.savefig(img_filename, dpi=500)
    plt.clf()

    
def plot_roc_curve(roc_out, img_filename):
    from teradataml import Figure
    figure = Figure(width=500, height=400, heading="Receiver Operating Characteristic (ROC) Curve")
    auc = roc_out.result.get_values()[0][0]
    plot = roc_out.output_data.plot(
        x=roc_out.output_data.fpr,
        y=[roc_out.output_data.tpr, roc_out.output_data.fpr],
        xlabel='False Positive Rate',
        ylabel='True Positive Rate',
        color='carolina blue',
        figure=figure,
        legend=[f'DF AUC = {round(auc, 4)}', 'AUC Baseline'],
        legend_style='lower right',
        grid_linestyle='--',
        grid_linewidth=0.5
    )
    plot.save(img_filename)
    # plot.show()
    # fig = plt.gcf()
    # fig.savefig(img_filename, dpi=500)
    # plt.clf()    

def evaluate(context: ModelContext, **kwargs):

    aoa_create_context()

    # Load the trained model from SQL
    model = DataFrame(f"model_${context.model_version}")

    feature_names = context.dataset_info.feature_names
    target_name = context.dataset_info.target_names[0]
    entity_key = context.dataset_info.entity_key

    # Load the test data from Teradata
    test_df = DataFrame.from_query(context.dataset_info.sql)

    # Make predictions using the XGBoostPredict function
    print("Evaluating ...........")
    predictions = TDDecisionForestPredict(object = model,
                                        newdata = test_df,
                                        id_column = "CustomerID",
                                        detailed = False,
                                        output_prob = True,
                                        output_responses = ['0','1'],
                                        accumulate="Churn")

    # Convert the predicted data into the specified format
    # print(predictions.result)
    predicted_data = ConvertTo(
        data = predictions.result,
        target_columns = [target_name,'prediction'],
        target_datatype = ["INTEGER"]
    )

    # Evaluate classification metrics using ClassificationEvaluator
    ClassificationEvaluator_obj = ClassificationEvaluator(
        data=predicted_data.result,
        observation_column=target_name,
        prediction_column='prediction',
        num_labels=2
    )

     # Extract and store evaluation metrics
        
    metrics_pd = ClassificationEvaluator_obj.output_data.to_pandas()

    evaluation = {
        'Accuracy': '{:.2f}'.format(metrics_pd.MetricValue[0]),
        'Micro-Precision': '{:.2f}'.format(metrics_pd.MetricValue[1]),
        'Micro-Recall': '{:.2f}'.format(metrics_pd.MetricValue[2]),
        'Micro-F1': '{:.2f}'.format(metrics_pd.MetricValue[3]),
        'Macro-Precision': '{:.2f}'.format(metrics_pd.MetricValue[4]),
        'Macro-Recall': '{:.2f}'.format(metrics_pd.MetricValue[5]),
        'Macro-F1': '{:.2f}'.format(metrics_pd.MetricValue[6]),
        'Weighted-Precision': '{:.2f}'.format(metrics_pd.MetricValue[7]),
        'Weighted-Recall': '{:.2f}'.format(metrics_pd.MetricValue[8]),
        'Weighted-F1': '{:.2f}'.format(metrics_pd.MetricValue[9]),
    }

     # Save evaluation metrics to a JSON file
    with open(f"{context.artifact_output_path}/metrics.json", "w+") as f:
        json.dump(evaluation, f)
        
    # Generate and save confusion matrix plot
    cm_df = ClassificationEvaluator_obj.result
    # print(cm_df)
    cm_df = cm_df.select(['CLASS_1','CLASS_2'])
    # print(cm_df.get_values())
    cm_df_t = cm_df.to_pandas().T
    # print(cm_df_t.values)
    cm = confusion_matrix(predicted_data.result.to_pandas()['Churn'], predicted_data.result.to_pandas()['prediction'])
    # print(cm)
    plot_confusion_matrix(cm_df_t.values, f"{context.artifact_output_path}/confusion_matrix")

    # Generate and save ROC curve plot
    roc_out = ROC(
        data=predictions.result,
        probability_column='prob_1',
        observation_column=target_name,
        positive_class='1',
        num_thresholds=1000
    )
    plot_roc_curve(roc_out, f"{context.artifact_output_path}/roc_curve")

    # Calculate feature importance and generate plot
    # try:
    #     model_pdf = model.result.to_pandas()['classification_tree']
    #     feature_importance = compute_feature_importance(model_pdf)
    #     feature_importance_df = pd.DataFrame(list(feature_importance.items()), columns=['Feature', 'Importance'])
    #     plot_feature_importance(feature_importance, f"{context.artifact_output_path}/feature_importance")
    # except:
    #     feature_importance = {}

    predictions_table = "Telco_Churn_Predictions"
    copy_to_sql(df=predicted_data.result, table_name=predictions_table, index=False, if_exists="replace", temporary=True)

    # calculate stats if training stats exist
    if os.path.exists(f"{context.artifact_input_path}/data_stats.json"):
        record_evaluation_stats(
            features_df=test_df,
            predicted_df=DataFrame.from_query(f"SELECT * FROM {predictions_table}"),
            # feature_importance=feature_importance,
            context=context
        )

    print("All done!")

Writing /home/jovyan/modelops-demo-models/model_definitions/telco_python_indb_decisionForest/model_modules/evaluation.py


<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b> Test the evaluation </b><code>evaluate(context: ModelContext, **kwargs) </code> <b>function</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The below code is for testing the evaluation function and will not be a part of the git</p>

In [16]:
# Define the ModelContext to test with. The ModelContext is created and managed automatically by ModelOps 
# when it executes your code via CLI / UI. However, for testing in the notebook, you can define as follows

# define the evaluation dataset 
sql = """
SELECT 
   * from transform_data_test;
"""

dataset_info = DatasetInfo(sql=sql,
                           entity_key=entity_key,
                           feature_names=["Tenure",  "MonthlyCharges",  "TotalCharges"],
                           target_names=target_names,
                           categorical=["Partner", "Dependents","Contract", "SeniorCitizen","PaymentMethod", "TechSupport", "StreamingTV", "OnlineSecurity", 
                                        "PhoneService", "OnlineBackup", "MultipleLines", "DeviceProtection", "Gender", 
                                        "PaperlessBilling", "StreamingMovies", "InternetService"],
                           feature_metadata=feature_metadata)

ctx = ModelContext(hyperparams=hyperparams,
                   dataset_info=dataset_info,
                   artifact_output_path="./artifacts",
                   artifact_input_path="./artifacts",
                   model_version="indb_df_v1",
                   model_table="model_indb_df_v1")

import evaluation
evaluation.evaluate(context=ctx)

# view evaluation results
import json
with open(f"{ctx.artifact_output_path}/metrics.json") as f:
    print(json.load(f))

Evaluating ...........


Feature tenure doesn't have statistics metadata defined, and will not be monitored.
In order to enable monitoring for this feature, make sure that statistics metadata is availabe in DEMO_USER.aoa_stats
Feature totalcharges doesn't have statistics metadata defined, and will not be monitored.
In order to enable monitoring for this feature, make sure that statistics metadata is availabe in DEMO_USER.aoa_stats
Feature monthlycharges doesn't have statistics metadata defined, and will not be monitored.
In order to enable monitoring for this feature, make sure that statistics metadata is availabe in DEMO_USER.aoa_stats
Feature churn doesn't have statistics metadata defined, and will not be monitored.
In order to enable monitoring for this feature, make sure that statistics metadata is availabe in DEMO_USER.aoa_stats


All done!
{'Accuracy': '0.79', 'Micro-Precision': '0.79', 'Micro-Recall': '0.79', 'Micro-F1': '0.79', 'Macro-Precision': '0.74', 'Macro-Recall': '0.66', 'Macro-F1': '0.68', 'Weighted-Precision': '0.77', 'Weighted-Recall': '0.79', 'Weighted-F1': '0.77'}


<Figure size 540x540 with 0 Axes>

In [17]:
# Check the generated files
!ls -lh artifacts

total 464K
-rw-r--r-- 1 jovyan users 122K Apr  4 07:12 confusion_matrix.png
-rw-r--r-- 1 jovyan users 2.0K Apr  4 07:12 data_stats.json
-rw-r--r-- 1 jovyan users 148K Apr  4 07:12 feature_explainability.png
-rw-r--r-- 1 jovyan users 145K Apr  4 07:12 feature_importance.png
-rw-r--r-- 1 jovyan users  242 Apr  4 07:12 metrics.json
-rw-r--r-- 1 jovyan users  35K Apr  4 07:12 roc_curve.png


<hr style="height:1px;border:none;background-color:#00233C;">
<p><b style = 'font-size:18px;font-family:Arial;color:#00233C'>3.3 Define Scoring Function</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The scoring function takes the following shape</p>

```python
def score(context: ModelContext, **kwargs):
    aoa_create_context()

    # read your model
    model = DataFrame(f"model_${context.model_version}")
    
    # your evaluation logic
    
    record_scoring_stats(...)
```

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>You can execute this from the CLI or directly within the notebook as shown. The below code will created the scoring.py file in the local model path.</p>

In [18]:
%%writefile $model_local_path/model_modules/scoring.py
from teradataml import (
    copy_to_sql,
    DataFrame,
    TDDecisionForestPredict,
    OrdinalEncodingFit,
    ScaleFit,
    ColumnTransformer,
    ConvertTo,
    translate
)
from aoa import (
    record_scoring_stats,
    aoa_create_context,
    ModelContext
)
import pandas as pd
from teradatasqlalchemy import INTEGER


def score(context: ModelContext, **kwargs):
    
    aoa_create_context()

    # Load the trained model from SQL
    model = DataFrame(f"model_${context.model_version}")

    # Extract feature names, target name, and entity key from the context
    feature_names = context.dataset_info.feature_names
    target_name = context.dataset_info.target_names[0]
    entity_key = context.dataset_info.entity_key

    # Load the test dataset
    test_df = DataFrame.from_query(context.dataset_info.sql)
    features_tdf = DataFrame.from_query(context.dataset_info.sql)

    print("Scoring...")
    # Make predictions using the XGBoostPredict function
    predictions = TDDecisionForestPredict(object = model,
                                        newdata = test_df,
                                        id_column = "CustomerID",
                                        detailed = False,
                                        output_prob = True,
                                        output_responses = ['0','1'])
    
    # Convert predictions to pandas DataFrame and process
    # predictions_pdf = predictions.result.to_pandas(all_rows=True).rename(columns={"Prediction": target_name}).astype(int)
    predictions_df = predictions.result
    # print(predictions_df)
    predictions_pdf = predictions_df.assign(drop_columns=True,
                                             job_id=translate(context.job_id),
                                             CustomerID=predictions_df.CustomerID,
                                             Churn=predictions_df.prediction.cast(type_=INTEGER),
                                             json_report=translate("  "))
                                             
    
    
    # converted_data = ConvertTo(data = predictions_pdf,
    #                            target_columns = ['job_id','PatientId', 'HasDiabetes','json_report'],
    #                            target_datatype = ["VARCHAR(charlen=255,charset=LATIN,casespecific=NO)"
    #                                               ,"integer","integer","VARCHAR(charlen=5000,charset=LATIN)"])
    # df=converted_data.result
    
    # print(predictions_pdf)
    print("Finished Scoring")
    # print(predictions_pdf)

    # store the predictions

#     # teradataml doesn't match column names on append.. and so to match / use same table schema as for byom predict
#     # example (see README.md), we must add empty json_report column and change column order manually (v17.0.0.4)
#     # CREATE MULTISET TABLE pima_patient_predictions
#     # (
#     #     job_id VARCHAR(255), -- comes from airflow on job execution
#     #     PatientId BIGINT,    -- entity key as it is in the source data
#     #     HasDiabetes BIGINT,   -- if model automatically extracts target
#     #     json_report CLOB(1048544000) CHARACTER SET UNICODE  -- output of
#     # )
#     # PRIMARY INDEX ( job_id );

    copy_to_sql(
        df=predictions_pdf,
        schema_name=context.dataset_info.predictions_database,
        table_name=context.dataset_info.predictions_table,
        index=False,
        if_exists="append"
    )
    
    print("Saved predictions in Teradata")

    # calculate stats
    predictions_df = DataFrame.from_query(f"""
        SELECT 
            * 
        FROM {context.dataset_info.get_predictions_metadata_fqtn()} 
            WHERE job_id = '{context.job_id}'
    """)

    record_scoring_stats(features_df=features_tdf, predicted_df=predictions_pdf, context=context)

    print("All done!")

Writing /home/jovyan/modelops-demo-models/model_definitions/telco_python_indb_decisionForest/model_modules/scoring.py


<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b> Test the scoring </b><code>score(context: ModelContext, **kwargs) </code> <b>function</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The below code is for testing the score function and will not be a part of the git</p>

In [19]:
# Define the ModelContext to test with. The ModelContext is created and managed automatically by ModelOps 
# when it executes your code via CLI / UI. However, for testing in the notebook, you can define as follows

# define the scoring dataset 

sql = """
SELECT 
   * from transform_data_test;
"""

# where to store predictions
predictions = {
    "database": "demo_user",
    "table": "Telco_Churn_Predictions_tmp"
}

import uuid
job_id=str(uuid.uuid4())

dataset_info = DatasetInfo(sql=sql,
                           entity_key=entity_key,
                           feature_names=["Tenure",  "MonthlyCharges",  "TotalCharges"],
                           target_names=target_names,
                           categorical=["Partner", "Dependents","Contract", "SeniorCitizen","PaymentMethod", "TechSupport", "StreamingTV", "OnlineSecurity", 
                                        "PhoneService", "OnlineBackup", "MultipleLines", "DeviceProtection", "Gender", 
                                        "PaperlessBilling", "StreamingMovies", "InternetService"],
                           feature_metadata=feature_metadata,
                           predictions=predictions)

ctx = ModelContext(hyperparams=hyperparams,
                   dataset_info=dataset_info,
                   artifact_output_path="./artifacts",
                   artifact_input_path="./artifacts",
                   model_version="indb_df_v1",
                   model_table="model_indb_df_v1",
                   job_id=job_id)

import scoring
scoring.score(context=ctx)

Scoring...
Finished Scoring
Saved predictions in Teradata


Feature tenure doesn't have statistics metadata defined, and will not be monitored.
In order to enable monitoring for this feature, make sure that statistics metadata is availabe in DEMO_USER.aoa_stats
Feature totalcharges doesn't have statistics metadata defined, and will not be monitored.
In order to enable monitoring for this feature, make sure that statistics metadata is availabe in DEMO_USER.aoa_stats
Feature monthlycharges doesn't have statistics metadata defined, and will not be monitored.
In order to enable monitoring for this feature, make sure that statistics metadata is availabe in DEMO_USER.aoa_stats
Feature churn doesn't have statistics metadata defined, and will not be monitored.
In order to enable monitoring for this feature, make sure that statistics metadata is availabe in DEMO_USER.aoa_stats


All done!


In [20]:
DataFrame.from_query(f"SELECT * FROM Telco_Churn_Predictions_tmp WHERE job_id='{job_id}'")

CustomerID,Churn,job_id,json_report
0096-FCPUF,0,af8bfb03-d698-450b-a7f5-b6c9c31c33b2,
0030-FNXPP,0,af8bfb03-d698-450b-a7f5-b6c9c31c33b2,
0067-DKWBL,1,af8bfb03-d698-450b-a7f5-b6c9c31c33b2,
0019-GFNTW,0,af8bfb03-d698-450b-a7f5-b6c9c31c33b2,
0118-JPNOY,0,af8bfb03-d698-450b-a7f5-b6c9c31c33b2,
0013-SMEOE,0,af8bfb03-d698-450b-a7f5-b6c9c31c33b2,
0027-KWYKW,0,af8bfb03-d698-450b-a7f5-b6c9c31c33b2,
0078-XZMHT,0,af8bfb03-d698-450b-a7f5-b6c9c31c33b2,
0074-HDKDG,0,af8bfb03-d698-450b-a7f5-b6c9c31c33b2,
0013-MHZWF,0,af8bfb03-d698-450b-a7f5-b6c9c31c33b2,


In [None]:
# Clean up

os.system('rm -f artifacts/*')

try:
    get_context().execute(f"DROP TABLE model_indb_df_v1")
except: 
    pass

try:
    get_context().execute(f"DROP TABLE Telco_Churn_predictions_tmp")
except: 
    pass

<hr style="height:2px;border:none;background-color:#00233C;">
<p><b style = 'font-size:20px;font-family:Arial;color:#00233C'>4. Define Model Metadata</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Now let's create the configuration files.<br>Requirements file contains the various libraries/packages needed for execution of the code created above. We will specify the libraries with the dependencies and versions:</p>

In [21]:
%%writefile $model_local_path/model_modules/requirements.txt
pandas==2.1.4
matplotlib==3.8.2
PyYAML==5.4.1
scikit-learn==1.1.3
teradataml==20.0.0.3 
teradatasqlalchemy==20.0.0.3 
teradatamodelops==7.0.6
seaborn==0.12.2

Writing /home/jovyan/modelops-demo-models/model_definitions/telco_python_indb_decisionForest/model_modules/requirements.txt


<p style = 'font-size:16px;font-family:Arial'>The config file will contain the hyper parameter configuration (default values): which will used in the training of the model.</p>

In [22]:
%%writefile $model_local_path/config.json
{
   "hyperParameters": {
    "model_type": "Classification",
    "scale_method":"RANGE",
    "miss_value":"KEEP",
    "global_scale": "False",
    "multiplier":"1",
    "intercept":"0",
    "max_depth": 5,
    "num_trees": -1,
    "tree_size": 0.5,
    "lambda1" : 1.5
    }
}

Writing /home/jovyan/modelops-demo-models/model_definitions/telco_python_indb_decisionForest/config.json


<p style = 'font-size:16px;font-family:Arial'>The model file will contain the configuration details of the model created</p>

In [23]:
%%writefile $model_local_path/model.json
{
    "id": "e5ec5b19-ea2b-493f-ace3-d958194f91bb",
    "name": "In-database Telco Churn Prediction DF",
    "description": "In-database DF for Telco Customer Churn Prediction",
    "language": "python"
}

Writing /home/jovyan/modelops-demo-models/model_definitions/telco_python_indb_decisionForest/model.json


<hr style="height:2px;border:none;background-color:#00233C;">
<p><b style = 'font-size:20px;font-family:Arial;color:#00233C'>5. Commit and Push to Git to let ModelOps manage</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The files which are created in the code above will be committed to the git repository</p>

<li style = 'font-size:16px;font-family:Arial;color:#00233C'>training.py</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>evaluation.py</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>scoring.py</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>requirements.txt</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>config.json</li>
<li style = 'font-size:16px;font-family:Arial;color:#00233C'>model.json</li>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Run the command below to commit and push changes to our forked repository, so ModelOps can fetch the changes to the model.</p>

In [24]:
!cd $model_local_path/../.. && git add . && git commit -m "Added Telco DecisionForest in database demo model " && git push

[master cb05de7] Added Telco DecisionForest in database demo model
 1 file changed, 8 insertions(+), 8 deletions(-)
Enumerating objects: 11, done.
Counting objects: 100% (11/11), done.
Delta compression using up to 4 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 552 bytes | 552.00 KiB/s, done.
Total 6 (delta 5), reused 0 (delta 0)
remote: Resolving deltas: 100% (5/5), completed with 5 local objects.[K
To https://github.com/shilpa-nalkande/modelops-demo-models.git
   d641536..cb05de7  master -> master


<p style = 'font-size:16px;font-family:Arial'>Now that changes are pushed, you can make the lifecycle inside <strong>ModelOps User Interface</strong>, plan for new trainings, evaluations, scorings. Compare models and operationalize into Production with automated Monitoring and alerting capabilities.</p>

<hr style="height:2px;border:none;background-color:#00233C;">
<p><b style = 'font-size:20px;font-family:Arial;color:#00233C'>6. ModelOps full lifecycle till deployment</b></p>

<p style='font-size:16px;font-family:Arial'>Create a Project with the your git repository having the code created in the steps above.</p>

<img src="images/ModelOps/ete_telco_01.png" alt="Create Project" />

<p style='font-size:16px;font-family:Arial'>Add a personal connection using the host, user and password for the Clearscape environment</p>

<img src="images/ModelOps/ete_telco_02.png" alt="Model Conn" width="500" height="500"/>

<p style='font-size:16px;font-family:Arial'>Save the Project and we can see the model created above available in the catalog of models that can be used for future steps.</p>

<img src="images/ModelOps/ete_telco_03.png" alt="Model Catalog with inDB" width="500" height="500"/>

<p style='font-size:16px;font-family:Arial'>Next step is to create the Dataset Template. Along with the Name , Decription etc. we have to specify the table that will contain statistics for the features. This table can be created and statistics generated using the options provided.</p>
<img src="images/ModelOps/ete_telco_04.png" alt="Create Dataset" width="500" height="500"/>

<p style='font-size:16px;font-family:Arial'>The query that can be used to create the dataset template is:</p>
<p style='font-size:16px;font-family:Arial'><code>Select * from demo_user.Transformed_data</code></p>

<p style='font-size:16px;font-family:Arial'>Select the features which are to be used in model tarining. We have to specify the table that will contain statistics for the features. You will have to deselect the target variable("Churn" here) as it is not a part of the features. There is an option od Validating the statistics and also Generate/Regenerate the statistics for all  features.</p>

<div style="display: flex; justify-content: center; gap: 20px; align-items: center; font-family: Arial, sans-serif;">
  <div style="text-align: center;">
    <h3 style="color: #00233C;">Select Features</h3>
    <img src="images/ModelOps/ete_telco_05.png" alt="Select Features" width="500" height="500" style = "border: 1px solid #00233C; border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);"/>
  </div>
  <div style="text-align: center;">
    <h3 style="color: #00233C;">Generate and Validate Statistics</h3>
    <img src="images/ModelOps/ete_telco_06.png" alt="Generate Stats" width="500" height="500" style = "border: 1px solid #00233C; border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);"/>
  </div>
</div>

<p style='font-size:16px;font-family:Arial'>The query that can be used for Entity and Target is:</p>
<p style='font-size:16px;font-family:Arial'><code>Select CustomerID, Churn from demo_user.Transformed_data</code></p>

<p style='font-size:16px;font-family:Arial'>Similar to features statistics can be collected for Target variable. In the Predictions section please select the database and table to be used to save the predictions.</p>

<img src="images/ModelOps/ete_telco_31.png" alt="Model Catalog with inDB" width="500" height="500"/>

<p style='font-size:16px;font-family:Arial'>Click Create and the Dataset template will get created.</p>

<p style='font-size:16px;font-family:Arial'>Similarly, we create the Train and Test Dataset. The queries that can be used for creating the Train and Test Dataset are as below:</p>

<div style="display: flex; justify-content: center; gap: 20px; align-items: center; font-family: Arial, sans-serif;">
  <div style="text-align: center;">
    <h5 style="color: #00233C;"><code>Select * from demo_user.transform_data_train</code></h5>
    <img src="images/ModelOps/ete_telco_09.png" alt="Train Dataset" width="500" height="500" style = "border: 1px solid #00233C; border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);"/>
  </div>
  <div style="text-align: center;">
    <h5 style="color: #00233C;"><code>Select * from demo_user.transform_data_test</code></h5>
    <img src="images/ModelOps/ete_telco_10.png" alt="Test Dataset" width="500" height="500" style = "border: 1px solid #00233C; border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);"/>
  </div>
</div>

<p style='font-size:16px;font-family:Arial'>The Train and Test Dataset get created.</p>
<img src="images/ModelOps/ete_telco_12.png" alt="Train Test"/>


<p style='font-size:16px;font-family:Arial'>Go to the  Models to Select the Model and then click Train a new Model. Use default hyper-parameters. This will launch the training job with the training script we generated and pushed to Git.</p>

<div style="display: flex; justify-content: center; gap: 20px; align-items: center; font-family: Arial, sans-serif;">
  <div style="text-align: center;">
    <h4 style="color: #00233C;"><code>Train Parameters</code></h4>
    <img src="images/ModelOps/ete_telco_13.png" alt="Train Model" width="500" height="500" style = "border: 1px solid #00233C; border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);"/>
  </div>
  <div style="text-align: center;">
    <h4 style="color: #00233C;"><code>Train logs</code></h4>
    <img src="images/ModelOps/ete_telco_14.png" alt="Train Logs" width="500" height="500" style = "border: 1px solid #00233C; border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);"/>
  </div>
</div>

</p>

<p style='font-size:16px;font-family:Arial'>When Model is trained a new Model Id is created and you can get inside the Model Lifecycle screen to review artifacts and other details. Now, let's evaluate the Model, click the button and select the evaluation dataset. This will launch the evaluation job with the training script we generated and pushed to Git.</p>

<img src="images/ModelOps/ete_telco_15.png" alt="Model lifecycle"/>

<p style='font-size:16px;font-family:Arial'>We can check the evaluation logs for the status of the evaluation job.</p>

<img src="images/ModelOps/ete_telco_16.png" alt="Evaluation" width="400" height="500"/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<img src="images/ModelOps/ete_telco_17.png" alt="Evaluation job" width="400" height="500"/>

<p style='font-size:16px;font-family:Arial'>When evaluation job is finished a Model evaluation Report is generated with the metrics and charts that evaluation script generates</p>

<img src="images/ModelOps/ete_telco_26.png" alt="Model Report" />

<img src="images/ModelOps/ete_telco_27.png" alt="Evaluation" width="450" height="500"/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<img src="images/ModelOps/ete_telco_29.png" alt="Evaluation job" width="450" height="500"/>

<img src="images/ModelOps/ete_telco_28.png" alt="Model Report" />

<p style='font-size:16px;font-family:Arial'>Now, let's approve the model and provide an approval description</p>

<img src="images/ModelOps/ete_telco_18.png" alt="Approval" />

<img src="images/ModelOps/ete_telco_19.png" alt="Deploy" width="800" height="500"/>

<p style='font-size:16px;font-family:Arial'>The model is ready to be deployed. Let's deploy using a Batch scheduling option - Run it manual</p>

<img src="images/ModelOps/ete_telco_20.png" alt="Deployment Engine" width="900" height="500"/>

<img src="images/ModelOps/ete_telco_21.png" alt="Deployment Publish" width="450" height="500"/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<img src="images/ModelOps/ete_telco_22.png" alt="Deployment Schedule" width="450" height="500"/>

<p style='font-size:16px;font-family:Arial'>Go and try this Step by yourself. Launch ModelOps from this button below:</p>
<a href="/modelops" style="display: inline-flex; align-items: center; justify-content: center; background-color: #007373; color: #FFFFFF; font-family: Arial, sans-serif; font-size: 16px; font-weight: bold; text-decoration: none; padding: 12px 24px; border: none; border-radius: 8px; box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1); cursor: pointer; transition: all 0.3s ease;">
  LAUNCH MODELOPS
  <img src="https://img.icons8.com/ios-filled/50/ffffff/external-link.png" alt="External Link Icon" style="margin-left: 8px; width: 20px; height: 20px;">
</a>

<hr style="height:2px;border:none;background-color:#00233C;">
<p><b style = 'font-size:20px;font-family:Arial;color:#00233C'>9. Cleanup</b></p>

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>If you are done with ModelOps usecase, please uncomment and run the below cleanup section.</p>
</div>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Cleanup work tables to prevent errors next time.</p>

In [None]:
# db_drop_table(table_name = 'aoa_byom_models', schema_name = 'demo_user')
# db_drop_table(table_name = 'Telco_Churn_predictions', schema_name = 'demo_user')

In [19]:
remove_context()

True

[<< Back to Traditional Approach Notebook](./Telco_Customer_Churn_Traditional_Approach.ipynb) 

<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>