<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       ModelOps demo: PIMA Predictions with teradataml OpenSourceML DecisionTree using Git
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

![image](images/git_meth.png) 

<p style ='font-size:18px;font-family:Arial'><b>Introduction</b></p>

<p style='font-size:16px;font-family:Arial'>This Notebook will show you how to work with ClearScape Analytics OpenSourceML functions from teradataml with ModelOps. With teradataml OpenSourceML functions you can solve your scalable challenges by using Vantage to train and score your models. Whether you have a big volume of data or you want to avoid the data movement implementation to train models outside Vantage, you can use ModelOps to manage your Catalog of Models from multiple platforms including teradataml OpenSourecML functions.<br>To know more about teradataml OpenSouceML functions review teradata official documentation.</p>
 
<p style='font-size:16px;font-family:Arial'>This notebook will cover the Operationalization of the PIMA diabetes use case with Python using the Teradata OpenSouceML DecisionTree model. The Teradata Package for Python introduces teradataml open-source machine learning functions, referred as teradataml OpenSourceML, which exposes most of the functionality of open-source packages like scikit-learn, and so on. With teradataml open-source machine learning functions, you can run these open-source packages without needing to pull the data to your client. A <strong>decision tree</strong> is a flowchart-like tree structure where an internal node represents a feature(or attribute), the branch represents a decision rule, and each leaf node represents the outcome.</p>

<p style = 'font-size:18px;font-family:Arial'><b>Steps in this Notebook</b></p>

<ol style = 'font-size:16px;font-family:Arial'>
    <li>Configure the Environment </li>
    <li>Connect to Vantage</li>
    <li>Define Training function </li>
    <li>Define Evaluate function </li>
    <li>Define Scoring function</li>
    <li>Define Model Metadata</li>
    <li>Commit and Push to Git to let ModelOps manage</li>
    <li>ModelOps full lifecycle till deployment</li>
    <li>ModelOps Monitoring</li>
</ol>

<hr style="height:2px;border:none;">
<p><b style = 'font-size:20px;font-family:Arial'>1. Configure the Environment</b></p>

<p style = 'font-size:16px;font-family:Arial'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

<p style = 'font-size:18px;font-family:Arial'><b>1.1 Libraries installation</b></p>

<p style = 'font-size:16px;font-family:Arial'><b>A restart of the Kernel is needed to confirm changes</b>. We use -q parameter for a non-verbose log of the installation command, you may remove this parameter if you want to know all the steps of the pip installation.</p>

In [None]:
%pip install -q teradataml==20.0.0.2 teradatamodelops==7.0.6 matplotlib==3.8.2 scikit-learn==1.1.3 

In [None]:
%pip install shap 

<p style = 'font-size:16px;font-family:Arial'><b>Hint:</b><i>The easy way to restart the kernel to bring the above installed software into memory is to type zero zero (<b> 0 0 </b>). </i></p>

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'><b>1.2 Libraries import</b></p>

In [None]:
from teradataml import (
    create_context, 
    remove_context,
    get_context,
    get_connection,
    DataFrame,
    configure,
    execute_sql,
    db_drop_table
)
import os
import getpass
import logging
import sys

<hr style="height:2px;border:none;">
<p><b style = 'font-size:20px;font-family:Arial'>2. Connect to Vantage</b></p>

<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, then use down arrow to go to next cell. Begin running steps with Shift + Enter keys.</p>

In [None]:
%run -i ../UseCases/startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=13_ModelOps_GIT_PIMA_Python_osml_DecisionTree.ipynb;' UPDATE FOR SESSION; ''')

# configure byom/val installation
configure.val_install_location = "VAL"
configure.byom_install_location = "MLDB"

# set the path to the local project repository for this model demo
model_local_path = '~/modelops-demo-models/model_definitions/python_pima_osml_dt'
res = os.system(f'mkdir -p {model_local_path}/model_modules')

<p style = 'font-size:18px;font-family:Arial'><b>Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial'>We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.</p>

In [None]:
#%run -i ../UseCases/run_procedure.py "call get_data('DEMO_ModelOps_cloud');"        # Takes 10 seconds
%run -i ../UseCases/run_procedure.py "call get_data('DEMO_ModelOps_local');"        # Takes 30 seconds

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'><b>Creating predictions and model table</b></p>
<p style = 'font-size:16px;font-family:Arial'>We will create a predictions table where we get our model predictions and the model table where we will upload the model created.</p>

In [None]:
#ddl for Aoa_Byom_Models 
query = '''CREATE SET TABLE DEMO_USER.Aoa_Byom_Models 
     (
      model_version VARCHAR(255) CHARACTER SET LATIN NOT CASESPECIFIC,
      model_id VARCHAR(255) CHARACTER SET LATIN NOT CASESPECIFIC,
      model_type VARCHAR(255) CHARACTER SET LATIN NOT CASESPECIFIC,
      project_id VARCHAR(255) CHARACTER SET LATIN NOT CASESPECIFIC,
      deployed_at TIMESTAMP(6) DEFAULT CURRENT_TIMESTAMP(6),
      model BLOB(2097088000))
UNIQUE PRIMARY INDEX ( model_version );
'''
 
try:
    execute_sql(query)
except:
    execute_sql('DROP TABLE DEMO_USER.Aoa_Byom_Models;')
    execute_sql(query)

In [None]:
#ddl for PIMA_Predictions
query = '''CREATE MULTISET TABLE Pima_Patient_Predictions 
     (
      job_id VARCHAR(255) CHARACTER SET LATIN NOT CASESPECIFIC,
      PatientId BIGINT,
      HasDiabetes BIGINT,
      json_report CLOB(1048544000) CHARACTER SET LATIN)
PRIMARY INDEX ( job_id );;
'''
 
try:
    execute_sql(query)
except:
    db_drop_table('Pima_Patient_Predictions')
    execute_sql(query)

<p style = 'font-size:16px;font-family:Arial'>Next is an optional step – if you want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../UseCases/run_procedure.py "call space_report();"        # Takes 10 seconds

<hr style="height:2px;border:none;">
<p><b style = 'font-size:20px;font-family:Arial'>3. Define Training Function</b></p>

<p style = 'font-size:16px;font-family:Arial'>The training function takes the following shape </p>

```python
def train(context: ModelContext, **kwargs):
    aoa_create_context()
    
    # your training code using teradataml indDB function
    model = <InDB Function>(...)
    
    # save your model
    model.result.to_sql(f"model_${context.model_version}", if_exists="replace")  
    
    record_training_stats(...)
```
<p style = 'font-size:16px;font-family:Arial'>You can execute this from the CLI or directly within the notebook as shown.</p>

In [None]:
%%writefile $model_local_path/model_modules/training.py
from teradataml import (
    DataFrame,
    ScaleFit,
    ScaleTransform,
)
from teradataml import td_sklearn as osml

from aoa import (
    record_training_stats,
    aoa_create_context,
    ModelContext
)

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import json

from collections import Counter
import shap
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter(action='ignore', category=DeprecationWarning)
warnings.simplefilter(action='ignore', category=UserWarning)
warnings.simplefilter(action='ignore', category=FutureWarning)

# Compute feature importance based on tree traversal
def compute_feature_importance(model,X_train):
    # from sklearn.inspection import permutation_importance
    feat_dict= {}
    for col, val in sorted(zip(X_train.columns, model.tree_.compute_feature_importances()),key=lambda x:x[1],reverse=True):
        feat_dict[col]=val
    feat_df = pd.DataFrame({'Feature':feat_dict.keys(),'Importance':feat_dict.values()})
    
    return feat_df
    

def plot_feature_importance(fi, img_filename):
    feat_importances = fi.sort_values(['Importance'],ascending = False).head(10)
    feat_importances.plot(kind='barh').set_title('Feature Importance')
    fig = plt.gcf()
    fig.savefig(img_filename, dpi=500)
    plt.clf()
    
def train(context: ModelContext, **kwargs):
    aoa_create_context()
    
    # Extracting feature names, target name, and entity key from the context
    feature_names = context.dataset_info.feature_names
    target_name = context.dataset_info.target_names[0]
    entity_key = context.dataset_info.entity_key

    # Load the training data from Teradata
    train_df = DataFrame.from_query(context.dataset_info.sql)

    print ("Scaling using InDB Functions...")
    X_train = train_df.drop(['HasDiabetes','PatientId'], axis = 1)
    y_train = train_df.select(["HasDiabetes"])
    # Scale the training data using the ScaleFit and ScaleTransform functions
    scaler = ScaleFit(
        data=train_df,
        target_columns = feature_names,
        scale_method="STD",
        global_scale=False
    )

    scaled_train = ScaleTransform(
        data=train_df,
        object=scaler.output,
        accumulate = [target_name,entity_key]
    )
    
    scaler.output.to_sql(f"scaler_${context.model_version}", if_exists="replace")
    print("Saved scaler")
    
         
    print("Starting training using teradata osml...")

    DT_classifier = osml.DecisionTreeClassifier(random_state=int(context.hyperparams["random_state"])
                                                ,max_leaf_nodes=int(context.hyperparams["max_leaf_nodes"])
                                                ,max_features=context.hyperparams["max_features"]
                                                ,max_depth=int(context.hyperparams["max_depth"]))
    DT_classifier.fit(X_train, y_train)
    DT_classifier.deploy(model_name="DT_classifier", replace_if_exists=True)
        
    print("Complete osml training...")
    
    # Calculate feature importance and generate plot
    feature_importance = compute_feature_importance(DT_classifier.modelObj,X_train)
    plot_feature_importance(feature_importance, f"{context.artifact_output_path}/feature_importance")
    
    record_training_stats(
        train_df,
        features=feature_names,
        targets=[target_name],
        categorical=[target_name],
        feature_importance=feature_importance,
        context=context
    )
    
    print("All done!")

In [None]:
# Define the ModelContext to test with. The ModelContext is created and managed automatically by ModelOps 
# when it executes your code via CLI / UI. However, for testing in the notebook, you can define as follows

# define the training dataset 
sql = """
SELECT 
    F.*, D.hasdiabetes
FROM DEMO_ModelOps.PIMA_PATIENT_FEATURES F 
JOIN DEMO_ModelOps.PIMA_PATIENT_DIAGNOSES D
ON F.patientid = D.patientid
    WHERE D.patientid MOD 5 <> 0
"""

feature_metadata =  {
    "database": "DEMO_ModelOps",
    "table": "aoa_statistics_metadata"
}

hyperparams = {
    "random_state":32,
    "max_leaf_nodes":4,
    "max_features":'auto',
    "max_depth":4
}


entity_key = "PatientId"
target_names = ["HasDiabetes"]
feature_names = ["NumTimesPrg", "PlGlcConc", "BloodP", "SkinThick", "TwoHourSerIns", "BMI", "DiPedFunc", "Age"]

from aoa import ModelContext, DatasetInfo

dataset_info = DatasetInfo(sql=sql,
                           entity_key=entity_key,
                           feature_names=feature_names,
                           target_names=target_names,
                           feature_metadata=feature_metadata)

ctx = ModelContext(hyperparams=hyperparams,
                   dataset_info=dataset_info,
                   artifact_output_path="./artifacts",
                   model_version="osml_decisiontree_v1",
                   model_table="model_osml_decisiontree_v1")

sys.path.append(os.path.expanduser(f"{model_local_path}/model_modules"))
import training
training.train(context=ctx)

In [None]:
# Check the generated files
!ls -lh artifacts

<hr style="height:2px;border:none;">
<p><b style = 'font-size:20px;font-family:Arial'>4. Define Evaluation Function</b></p>

<p style = 'font-size:16px;font-family:Arial'>The evaluation function takes the following shape</p>

```python
def evaluate(context: ModelContext, **kwargs):
    aoa_create_context()

    # read your model from Vantage
    model = DataFrame(f"model_${context.model_version}")
    
    # your evaluation logic
    
    record_evaluation_stats(...)
```
<p style = 'font-size:16px;font-family:Arial'>You can execute this from the CLI or directly within the notebook as shown.</p>

In [None]:
%%writefile $model_local_path/model_modules/evaluation.py
from sklearn.metrics import confusion_matrix, roc_curve, auc
import matplotlib.pyplot as plt
from teradataml import td_sklearn as osml
# from lime.lime_tabular import LimeTabularExplainer
from teradataml import(
    DataFrame, 
    copy_to_sql, 
    get_context, 
    get_connection, 
    ScaleTransform, 
    ConvertTo, 
    ClassificationEvaluator,
    ROC
)
from aoa import (
    record_evaluation_stats,
    save_plot,
    aoa_create_context,
    ModelContext
)

import joblib
import json

import numpy as np
import pandas as pd
import shap
import os
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter(action='ignore', category=DeprecationWarning)
warnings.simplefilter(action='ignore', category=UserWarning)
warnings.simplefilter(action='ignore', category=FutureWarning)

        
# Compute feature importance based on tree traversal
def compute_feature_importance(model,X_train):
    feat_dict= {}
    for col, val in sorted(zip(X_train.columns, model.feature_importances_),key=lambda x:x[1],reverse=True):
        feat_dict[col]=val
    feat_df = pd.DataFrame({'Feature':feat_dict.keys(),'Importance':feat_dict.values()})
    # print(feat_df)
    return feat_df

def plot_feature_importance(fi, img_filename):
    feat_importances = fi.sort_values(['Importance'],ascending = False).head(10)
    feat_importances.plot(kind='barh').set_title('Feature Importance')
    fig = plt.gcf()
    fig.savefig(img_filename, dpi=500)
    plt.clf()


# Define function to plot a confusion matrix from given data
def plot_confusion_matrix(cf, img_filename):
    import matplotlib.pyplot as plt
    fig, ax = plt.subplots(figsize=(7.5, 7.5))
    ax.matshow(cf, cmap=plt.cm.Blues, alpha=0.3)
    for i in range(cf.shape[0]):
        for j in range(cf.shape[1]):
            ax.text(x=j, y=i,s=cf[i, j], va='center', ha='center', size='xx-large')
    ax.set_xlabel('Predicted labels');
    ax.set_ylabel('True labels'); 
    ax.set_title('Confusion Matrix');
    fig = plt.gcf()
    fig.savefig(img_filename, dpi=500)
    plt.clf()


# Define function to plot ROC curve from ROC output data 
def plot_roc_curve(roc_out, img_filename):
    import matplotlib.pyplot as plt
    from sklearn import metrics
    fpr, tpr, thresholds = metrics.roc_curve(roc_out['HasDiabetes'], roc_out['decisiontreeclassifier_predict_1'])
    plt.plot(fpr,tpr,label="ROC curve AUC="+str(auc), color='darkorange')
    plt.plot([0, 1], [0, 1], color='darkblue', linestyle='--') 
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver Operating Characteristic (ROC) Curve')
    plt.legend(loc="lower right")
    fig = plt.gcf()
    fig.savefig(img_filename, dpi=200)
    plt.clf()
    
    

def evaluate(context: ModelContext, **kwargs):

    aoa_create_context()

    
    feature_names = context.dataset_info.feature_names
    target_name = context.dataset_info.target_names[0]
    entity_key = context.dataset_info.entity_key

    # Load the test data from Teradata
    test_df = DataFrame.from_query(context.dataset_info.sql)
    X_test = test_df.drop(['HasDiabetes','PatientId'], axis = 1)
    y_test = test_df.select(["HasDiabetes"])
    # Scaling the test set
    print ("Loading scaler...")
    scaler = DataFrame(f"scaler_${context.model_version}")

    scaled_test = ScaleTransform(
        data=test_df,
        object=scaler,
        accumulate = [target_name,entity_key]
    )
    
    print("Evaluating osml...")
    DT_classifier = osml.load(model_name="DT_classifier")
    predict_df =DT_classifier.predict(X_test,y_test)
    accuracy_dt = DT_classifier.score(X_test, y_test)
    df = X_test.sample(n=1)
    df = df.drop(columns="sampleid")
   
    explainer_shap = shap.TreeExplainer(DT_classifier.modelObj)
    shap_values = explainer_shap.shap_values(X_test.to_pandas())
    
    shap.summary_plot(shap_values, X_test.to_pandas(),show=False, plot_size=(12, 8), plot_type='bar')
    save_plot('SHAP Feature Importance', context=context)
    
    
    explainer_ebm = shap.Explainer(DT_classifier.modelObj.predict, X_test.to_pandas())
    shap_values_ebm = explainer_ebm(X_test.to_pandas())
    
    shap.plots.beeswarm(shap_values_ebm,show=False, plot_size=(12,8))
    save_plot('SHAP Beeswarm Plot', context=context)

    # Evaluate classification metrics using ClassificationEvaluator
    ClassificationEvaluator_obj = ClassificationEvaluator(
        data=predict_df,
        observation_column=target_name,
        prediction_column='decisiontreeclassifier_predict_1',
        num_labels=2
    )

#      # Extract and store evaluation metrics
    metrics_pd = ClassificationEvaluator_obj.output_data.to_pandas()
      
         
    evaluation = {
        'Accuracy': '{:.4f}'.format(metrics_pd.MetricValue[0]),
        'Micro-Precision': '{:.4f}'.format(metrics_pd.MetricValue[1]),
        'Micro-Recall': '{:.4f}'.format(metrics_pd.MetricValue[2]),
        'Micro-F1': '{:.4f}'.format(metrics_pd.MetricValue[3]),
        'Macro-Precision': '{:.4f}'.format(metrics_pd.MetricValue[4]),
        'Macro-Recall': '{:.4f}'.format(metrics_pd.MetricValue[5]),
        'Macro-F1': '{:.4f}'.format(metrics_pd.MetricValue[6]),
        'Weighted-Precision': '{:.4f}'.format(metrics_pd.MetricValue[7]),
        'Weighted-Recall': '{:.4f}'.format(metrics_pd.MetricValue[8]),
        'Weighted-F1': '{:.4f}'.format(metrics_pd.MetricValue[9]),
        # 'Accuracy-osml': '{:.2f}'.format(accuracy_osml.score[0]),
    }

     # Save evaluation metrics to a JSON file
    with open(f"{context.artifact_output_path}/metrics.json", "w+") as f:
        json.dump(evaluation, f)
        
    # Generate and save confusion matrix plot
    cm = confusion_matrix(predict_df.to_pandas()['HasDiabetes'], predict_df.to_pandas()['decisiontreeclassifier_predict_1'])
    plot_confusion_matrix(cm, f"{context.artifact_output_path}/confusion_matrix")
    
    # Generate and save ROC curve plot
    roc_out = ROC(
        data=predict_df,
        probability_column='decisiontreeclassifier_predict_1',
        observation_column=target_name,
        positive_class='1',
        num_thresholds=1000
    )
    
    plot_roc_curve(predict_df.to_pandas(), f"{context.artifact_output_path}/roc_curve")
    
    feature_importance = compute_feature_importance(DT_classifier.modelObj,X_test)
    plot_feature_importance(feature_importance, f"{context.artifact_output_path}/feature_importance")
    
    
    predictions_table = "predictions_tmp"
    copy_to_sql(df=predict_df, table_name=predictions_table, index=False, if_exists="replace", temporary=True)
    
    
    # calculate stats if training stats exist
    if os.path.exists(f"{context.artifact_input_path}/data_stats.json"):
        record_evaluation_stats(
            features_df=test_df,
            predicted_df=DataFrame.from_query(f"SELECT * FROM {predictions_table}"),
            feature_importance=feature_importance,
            context=context
        )

    print("All done!")

In [None]:
# Define the ModelContext to test with. The ModelContext is created and managed automatically by ModelOps 
# when it executes your code via CLI / UI. However, for testing in the notebook, you can define as follows

# define the evaluation dataset 
sql = """
SELECT 
    F.*, D.hasdiabetes 
FROM DEMO_ModelOps.PIMA_PATIENT_FEATURES F 
JOIN DEMO_ModelOps.PIMA_PATIENT_DIAGNOSES D
ON F.patientid = D.patientid
    WHERE D.patientid MOD 5 = 0
"""

dataset_info = DatasetInfo(sql=sql,
                           entity_key=entity_key,
                           feature_names=feature_names,
                           target_names=target_names,
                           feature_metadata=feature_metadata)

ctx = ModelContext(hyperparams=hyperparams,
                   dataset_info=dataset_info,
                   artifact_output_path="./artifacts",
                   artifact_input_path="./artifacts",
                   model_version="osml_decisiontree_v1",
                   model_table="model_osml_decisiontree_v1")

import evaluation
evaluation.evaluate(context=ctx)

# view evaluation results
import json
with open(f"{ctx.artifact_output_path}/metrics.json") as f:
    print(json.load(f))

In [None]:
# Check the generated files
!ls -lh artifacts

<hr style="height:2px;border:none;">
<p><b style = 'font-size:20px;font-family:Arial'>5. Define Scoring Function</b></p>

<p style = 'font-size:16px;font-family:Arial'>The scoring function takes the following shape</p>

```python
def score(context: ModelContext, **kwargs):
    aoa_create_context()

    # read your model
    model = DataFrame(f"model_${context.model_version}")
    
    # your evaluation logic
    
    record_scoring_stats(...)
```

<p style = 'font-size:16px;font-family:Arial'>You can execute this from the CLI or directly within the notebook as shown.</p>

In [None]:
%%writefile $model_local_path/model_modules/scoring.py
from teradataml import td_sklearn as osml
from teradataml import (
    copy_to_sql,
    DataFrame,
    ScaleTransform
)
from aoa import (
    record_scoring_stats,
    aoa_create_context,
    ModelContext
)
import pandas as pd

import json
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter(action='ignore', category=DeprecationWarning)
warnings.simplefilter(action='ignore', category=UserWarning)
warnings.simplefilter(action='ignore', category=FutureWarning)

def score(context: ModelContext, **kwargs):
    
    aoa_create_context()

    
    # Extract feature names, target name, and entity key from the context
    feature_names = context.dataset_info.feature_names
    target_name = context.dataset_info.target_names[0]
    entity_key = context.dataset_info.entity_key
    
    # Load the test dataset
    test_df = DataFrame.from_query(context.dataset_info.sql)
    copy_to_sql(
        df=test_df,
        schema_name=context.dataset_info.predictions_database,
        table_name='test_df',
        index=False,
        if_exists="replace"
    )
    X_test = test_df.drop(['PatientId'], axis = 1)
    # y_test = test_df.select(["anomaly_int"])
    features_tdf = DataFrame.from_query(context.dataset_info.sql)
    features_pdf = features_tdf.to_pandas(all_rows=True)
    test_df.set_index("PatientId")

    print("Scoring using osml...")
    DT_classifier = osml.load(model_name="DT_classifier")
    predict_df =DT_classifier.predict(X_test)
    # Convert predictions to pandas DataFrame and process
    # predictions_pdf = predict_DT.to_pandas(all_rows=True)
    df_pred = predict_df.to_pandas(all_rows=True)
    
    predictions_pdf = predict_df.to_pandas(all_rows=True).rename(columns={"decisiontreeclassifier_predict_1": target_name})
    print("Finished Scoring")

    # store the predictions
   
    predictions_pdf = pd.DataFrame(predictions_pdf, columns=[target_name])
    # predictions_pdf[entity_key] = features_pdf.index.values
    predictions_pdf[entity_key] = test_df.select(["PatientId"]).get_values()
    # add job_id column so we know which execution this is from if appended to predictions table
    # print(predictions_pdf)
    predictions_pdf["job_id"] = context.job_id
    predictions_pdf["json_report"] = ""
    predictions_pdf = predictions_pdf[["job_id", entity_key, target_name, "json_report"]]
    
    copy_to_sql(
        df=predictions_pdf,
        schema_name=context.dataset_info.predictions_database,
        table_name=context.dataset_info.predictions_table,
        index=False,
        if_exists="replace"
    )
        
    print("Saved predictions in Teradata")

    # calculate stats
    predictions_df = DataFrame.from_query(f"""
        SELECT 
            * 
        FROM {context.dataset_info.get_predictions_metadata_fqtn()} 
            WHERE job_id = '{context.job_id}'
    """)
    
    record_scoring_stats(features_df=features_tdf, predicted_df=predictions_df, context=context)

    print("All done!")

In [None]:
# Define the ModelContext to test with. The ModelContext is created and managed automatically by ModelOps 
# when it executes your code via CLI / UI. However, for testing in the notebook, you can define as follows

# define the scoring dataset 

sql = """
SELECT 
    F.*
FROM DEMO_ModelOps.PIMA_PATIENT_FEATURES F 
    WHERE F.patientid MOD 5 = 0
"""

# where to store predictions
predictions = {
    "database": "demo_user",
    "table": "pima_patient_predictions_tmp"
}

import uuid
job_id=str(uuid.uuid4())

dataset_info = DatasetInfo(sql=sql,
                           entity_key=entity_key,
                           feature_names=feature_names,
                           target_names=target_names,
                           feature_metadata=feature_metadata,
                           predictions=predictions)

ctx = ModelContext(hyperparams=hyperparams,
                   dataset_info=dataset_info,
                   artifact_output_path="./artifacts",
                   artifact_input_path="./artifacts",
                   model_version="osml_randomforest_v1",
                   model_table="model_osml_randomforest_v1",
                   job_id=job_id)

import scoring
scoring.score(context=ctx)

In [None]:
DataFrame.from_query(f"SELECT * FROM pima_patient_predictions_tmp WHERE job_id='{job_id}'")

In [None]:
# Clean up

os.system('rm -f artifacts/*')

try:
    get_context().execute(f"DROP TABLE model_osml_decisiontree_v1")
except: 
    pass

try:
    get_context().execute(f"DROP TABLE pima_patient_predictions_tmp")
except: 
    pass

<hr style="height:2px;border:none;">
<p><b style = 'font-size:20px;font-family:Arial'>6. Define Model Metadata</b></p>

<p style = 'font-size:16px;font-family:Arial'>Now let's create the configuration files.<br>Requirements file with the dependencies and versions:</p>

In [None]:
%%writefile $model_local_path/model_modules/requirements.txt
teradataml==20.0.0.2
pandas==2.1.3
teradatamodelops==7.0.6
matplotlib==3.5.2
PyYAML==6.0
scikit-learn==1.1.3
shap==0.46.0

<p style = 'font-size:16px;font-family:Arial'>The hyper parameter configuration (default values):</p>

In [None]:
%%writefile $model_local_path/config.json
{
   "hyperParameters": {
    "random_state":32,
    "max_leaf_nodes":4,
    "max_features":"auto",
    "max_depth":4
    }
}

<p style = 'font-size:16px;font-family:Arial'>The model configuration:</p>

In [None]:
%%writefile $model_local_path/model.json
{
    "id": "7794bd1a-e932-450e-944b-8c6c3ed4db5e",
    "name": "Diabetes Prediction using teradataml OpenSource DecisionTreeClassifier",
    "description": "teradataml OpenSource DecisionTreeClassifier for Diabetes Predictions",
    "language": "python"
}

<hr style="height:2px;border:none;">
<p><b style = 'font-size:20px;font-family:Arial'>7. Commit and Push to Git to let ModelOps manage</b></p>

<p style = 'font-size:16px;font-family:Arial'>Run the command below to commit and push changes to our forked repository, so ModelOps can fetch the changes to the model.</p>

In [None]:
!cd $model_local_path/../.. \
&& git add . \
&& git commit -m "Added Diabetes Prediction using osml demo model" \
&& git push

<p style = 'font-size:16px;font-family:Arial'>Now that changes are pushed, you can make the lifecycle inside <strong>ModelOps User Interface</strong>, plan for new trainings, evaluations, scorings. Compare models and operationalize into Production with automated Monitoring and alerting capabilities.</p>

<hr style="height:2px;border:none;">
<p><b style = 'font-size:20px;font-family:Arial'>8. ModelOps full lifecycle till deployment</b></p>

<p style='font-size:16px;font-family:Arial'>Use or Create a Project with the git code repository with the model code, then you should see the model in the catalog already created</p>

<img src="images/13_01.png" alt="Model Catalog with inDB"/>

<p style='font-size:16px;font-family:Arial'>Select the Model and then click Train a new Model. Use default hyper-parameters. This will launch the training job with the training script we generated and pushed to Git.</p>

<img src="images/13_02.png" alt="Train"/>

<img src="images/13_03.png" alt="Train job" width="500" height="500"/>

<img src="images/13_04.png" alt="Train finished" width="500" height="500"/>

<p style='font-size:16px;font-family:Arial'>When Model is trained a new Model Id is created and you can get inside the Model Lifecycle screen to review artifacts and other details</p>

<img src="images/13_05.png" alt="Model lifecycle"/>

<p style='font-size:16px;font-family:Arial'>Now, let's evaluate the Model, click the button and select the evaluation dataset. This will launch the evaluation job with the training script we generated and pushed to Git.</p>

<img src="images/13_06.png" alt="Evaluation" width="500" height="500"/> <img src="images/13_07.png" alt="Evaluation job" width="500" height="500"/>

<p style='font-size:16px;font-family:Arial'>When evaluation job is finished a Model evaluation Report is generated with the metrics and charts that evaluation script generates. We also have some explainability charts.</p>

<img src="images/13_08.png" alt="Model Report" />
<img src="images/13_23.png" alt="Model Report shap" width="600" height="500"/>
<img src="images/13_24.png" alt="Model Report features" width="600" height="500"/>

<p style='font-size:16px;font-family:Arial'>Now, let's approve the model and provide an approval description</p>

<img src="images/13_09.png" alt="Approval" />

<img src="images/13_10.png" alt="Approval description" width="500" height="500"/>

<p style='font-size:16px;font-family:Arial'>The model is ready to be deployed. Let's deploy using a Batch scheduling option - Run it manual</p>

<img src="images/13_11.png" alt="Deployment Engine" width="500" height="500"/>

<img src="images/13_12.png" alt="Deployment Publish" width="500" height="500"/>

<img src="images/13_13.png" alt="Deployment Schedule" width="500" height="500"/>

<p style='font-size:16px;font-family:Arial'>Go and try this Step by yourself. Launch ModelOps from this button below:</p>
<a href="/modelops"><img src="images/launchModelOps.png" alt="Launch ModelOps" /></a>

<hr style="height:2px;border:none;">
<p><b style = 'font-size:20px;font-family:Arial'>9. ModelOps Monitoring</b></p>

<p style = 'font-size:16px;font-family:Arial'>Now the model is deployed and a new Deployment appears in the deployment screen</p>


<img src="images/13_14.png" alt="Deploymet" />


<p style = 'font-size:16px;font-family:Arial'>You can run jobs manually from here, review history of executions and view the predictions for a specific job</p>

<img src="images/13_15.png" alt="Deployment Run" width="500" height="500"/>

<img src="images/13_16.png" alt="Deployment Jobs" />

<img src="images/13_17.png" alt="Deployment view" width="500" height="500" />

<img src="images/13_16_new.png" alt="Deployment predictions" width="500" height="500"/>

<img src="images/13_17_new.png" alt="Deployment" width="500" height="500"/>


<p style = 'font-size:16px;font-family:Arial'>From the Feature Drift and Prediction Drift tabs you can check on the monitoring of the data drift</p>

<img src="images/13_18.png" alt="Feature Drift" />

<img src="images/13_19.png" alt="Prediction Drift" />

<img src="images/13_20.png" alt="Performance Monitoring" />



<p style = 'font-size:16px;font-family:Arial'>From the Performance Drift, you can review multiple evaluations, let's evaluate the model with a new dataset. We create a new evaluation dataset with this query:</p>
    
```sql
SELECT * FROM pima_patient_diagnoses F WHERE F.patientid MOD 8 <> 0  
```

<img src="images/13_21.png" alt="Evaluate" width="500" height="500" />

<p style = 'font-size:16px;font-family:Arial'>and now see the evolution of the metrics</p>

<img src="images/13_22.png" alt="Metrics monitoring" />

<p style = 'font-size:16px;font-family:Arial'>
With ModelOps you can close the cycle and review make decisions when you need to replace yor model in production, For example, You could get alerting from Data Drift of Performance Drift and you can create multiple versions and compare them, select a champion and deploy new versions that replace existing in Production.</p>

<p style='font-size:16px;font-family:Arial'>Go and try this Step by yourself. Launch ModelOps from this button below:</p>
<a href="/modelops"><img src="images/launchModelOps.png" alt="Launch ModelOps" /></a>

<hr style="height:2px;border:none;">
<p><b style = 'font-size:20px;font-family:Arial'>10. Cleanup</b></p>

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial'>If you are done with ModelOps usecase, please uncomment and run the below cleanup section.</p>
</div>

<p style = 'font-size:18px;font-family:Arial'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial'>Cleanup work tables to prevent errors next time.</p>

In [None]:
# db_drop_table(table_name = 'aoa_byom_models', schema_name = 'demo_user')
# db_drop_table(table_name = 'pima_patient_predictions', schema_name = 'demo_user')

<p style = 'font-size:18px;font-family:Arial'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial'>The following code will clean up tables and databases created above.</p>

In [None]:
# %run -i ../UseCases/run_procedure.py "call remove_data('DEMO_ModelOps');"        # Takes 10 seconds

In [None]:
remove_context()

[<< Back to Git PIMA Python XGBoost](./11_ModelOps_GIT_PIMA_Python_indb_XGboost.ipynb) 

<footer style="padding:10px;background:#f9f9f9;border-bottom:3px solid #394851">Copyright © Teradata Corporation - 2024. All Rights Reserved.</footer>