<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Deploy a Hyper-Segmented Model Scikit Learn Pipeline in Vantage with TDStone2
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:16px;font-family:Arial'>
This notebook demonstrates how to deploy a hyper-segmented model created using the Python Scikit-Learn module, and how to load and run it in Vantage using the TDStone2 module.</p>

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>1. Connect to Vantage</b>
<p style = 'font-size:16px;font-family:Arial'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
%%capture
!pip install tdstone2 --upgrade

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial'><b>Note: </b><i>The kernel must be restarted to bring the installed libraries into memory. The simplest way to restart the Kernel is by typing zero zero: <b> 0 0</b></i></p>
</div>

In [None]:
import warnings
warnings.filterwarnings('ignore')

from teradataml import (
    create_context,
    execute_sql, 
    copy_to_sql, 
    DataFrame,
    in_schema,
    remove_context,
    db_drop_table
    )
import tdstone2


# Modify the following to match the specific client environment settings
display.max_rows = 5

<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../../UseCases/startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

<p style = 'font-size:16px;font-family:Arial'>Setup for execution of notebook. Begin running steps with Shift + Enter keys.</p>

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=PP_Deploy_HyperSegmented_Model_Pipeline_Python.ipynb;' 
               UPDATE FOR SESSION; ''')

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>2. Getting Data for This Demo</b></p>

<p style = 'font-size:16px;font-family:Arial'>We will generate the required data. The data we are creating is categorized by typical software issues and some questions that are typically asked. To simplify the process we will insert the data into a python dictionary, load it into pandas dataframe, and than copy the dataframeinto a table in Vantage.</p> 

In [None]:
nb_amps = 1 + execute_sql('SEL hashamp()').fetchall()[0][0]
nb_amps

In [None]:
from tdstone2.dataset_generation import GenerateDataSet,GenerateEquallyDistributedDataSet

In [None]:
query, features = GenerateEquallyDistributedDataSet(n_partitions=nb_amps,n_rows=10000)

In [None]:
DataFrame.from_query(query).to_sql(
    schema_name= 'demo_user',
    table_name='dataset_00',
    if_exists='replace',
    primary_index='Partition_ID')

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>3. Installing the files in Vantage</b></p>

<p style = 'font-size:16px;font-family:Arial'>We will Install tdstone2 framework. First step will be to clean up the schema and than setup the framework for sto and than install the required files in Vantage.</p> 

In [None]:
from tdstone2.utils import cleanup
cleanup(schema='demo_user')

<p style = 'font-size:16px;font-family:Arial'>We will use the tdstone2 package to install the generic python files that enables the ML training, scoring, feature engineering and vector embedding computations. These files are installed once, and enable user-friendly interactions with the platform.</p> 

In [None]:
from tdstone2.tdstone import TDStone
sto = TDStone(schema_name = 'demo_user', SEARCHUIFDBPATH = 'demo_user')
sto.setup()

<p style = 'font-size:16px;font-family:Arial'>We will install the necessary libraries into the sto environment of Vantage. PushFile installs the py files.</p> 

In [None]:
sto.PushFile()

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>4. Hyper Model Dataset</b></p>
<p style = 'font-size:16px;font-family:Arial'> Let's have a look at the dataset</p>

In [None]:
dataset = DataFrame(in_schema('demo_user','dataset_00'))
dataset

In [None]:
dataset.shape

In [None]:
summary = dataset[['Partition_ID','FOLD','ID']].groupby(['Partition_ID','FOLD']).count()
summary.sort(['Partition_ID','FOLD'])

In [None]:
summary.shape

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>4. Hyper-Segmented Model Deployment</b>

<p style = 'font-size:16px;font-family:Arial'>In this section we will see how we can create multiple pipelines and deploy and run them. We have created four pipelines in this demo.</p>

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>4.1 Classifier Pipeline</b><br>
<b style = 'font-size:18px;font-family:Arial'>4.1.1 Engineering of the Scikit-Learn Classifier Pipeline </b>

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
# Example usage
steps_classifier = [
    ('scaler', StandardScaler()),
    ('classifier', RandomForestClassifier(
        max_depth = 5,
        n_estimators = 95 
    ))]

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>4.1.2 Deployment of the Scikit-Learn Classifier Pipeline</b>

In [None]:
from tdstone2.tdshypermodel import HyperModel
from tdstone2.tdstone import TDStone
sto = TDStone(schema_name = 'demo_user', SEARCHUIFDBPATH = 'demo_user')

In [None]:
model_cl_parameters = {
    "target": 'Y2',
    "column_categorical": ['flag','Y2'],
    "column_names_X": ['X1','X2','X3','X4','X5','X6','X7','X8','X9','flag']
}

In [None]:
model_cl = HyperModel(tdstone            = sto,
                   metadata           = {'project': 'test'},
                   skl_pipeline_steps = steps_classifier,
                   model_parameters   = model_cl_parameters,
                   dataset            = in_schema('demo_user','dataset_00'),
                   id_row             = 'ID',
                   id_partition       = 'Partition_ID',
                   id_fold            = 'FOLD',
                   fold_training      = 'train')

In [None]:
sto.list_hyper_models()

In [None]:
id_mapper = sto.list_hyper_models()[['CREATION_DATE','ID_MAPPER_TRAINING']].sort('CREATION_DATE',ascending=False).to_pandas()['ID_MAPPER_TRAINING'].values[0]
id_mapper

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>4.2 Regressor Pipeline</b><br>
<b style = 'font-size:18px;font-family:Arial'>4.2.1 Engineering of the Scikit-Learn Regressor Pipeline </b>

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
# Example usage
steps_regressor = [
    ('scaler', StandardScaler()),
    ('regressor', RandomForestRegressor(
        max_depth = 5,
        n_estimators = 95 
    ))]

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>4.2.2 Deployment of the Scikit-Learn Regressor Pipeline</b>

In [None]:
model_rg_parameters = {
    "target": 'Y1',
    "column_categorical": ['flag'],
    "column_names_X": ['X1','X2','X3','X4','X5','X6','X7','X8','X9','flag']
}

In [None]:
model_rg = HyperModel(tdstone            = sto,
                   metadata           = {'project': 'test'},
                   skl_pipeline_steps = steps_regressor,
                   model_parameters   = model_rg_parameters,
                   dataset            = in_schema('demo_user','dataset_00'),
                   id_row             = 'ID',
                   id_partition       = 'Partition_ID',
                   id_fold            = 'FOLD',
                   fold_training      = 'train')

In [None]:
sto.list_hyper_models()

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>4.3 Linear Model Pipeline</b><br>
<b style = 'font-size:18px;font-family:Arial'>4.3.1 Engineering of the Scikit-Learn Linear Model Pipeline </b>

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LassoLarsCV
from sklearn.pipeline import Pipeline

# Setting up the pipeline for regression with LassoLarsCV
steps_lasso_lars_cv = [
    ('scaler', StandardScaler()),
    ('lasso_lars_cv', LassoLarsCV(
        cv        = 5,     # Number of cross-validation folds. Adjust this based on your dataset size and requirements.
        max_iter  = 1000,  # Maximum number of iterations. Adjust as needed.
    ))
]

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>4.3.2 Deployment of the Scikit-Learn Linear Model Pipeline</b>

In [None]:
model_cv_parameters = {
    "target": 'Y1',
    "column_categorical": ['flag'],
    "column_names_X": ['X1','X2','X3','X4','X5','X6','X7','X8','X9','flag']
}

In [None]:
model_cv = HyperModel(tdstone            = sto,
                   metadata           = {'project': 'test'},
                   skl_pipeline_steps = steps_lasso_lars_cv,
                   model_parameters   = model_cv_parameters,
                   dataset            = in_schema('demo_user','dataset_00'),
                   id_row             = 'ID',
                   id_partition       = 'Partition_ID',
                   id_fold            = 'FOLD',
                   fold_training      = 'train')

In [None]:
sto.list_hyper_models()

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>4.4 OneClassSVM Pipeline</b><br>
<b style = 'font-size:18px;font-family:Arial'>4.4.1 Engineering of the Scikit-Learn OneClassSVM Pipeline </b>

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.svm import OneClassSVM
from sklearn.pipeline import Pipeline

# Example usage
steps_anomaly_detection = [
    ('scaler', StandardScaler()),
    ('one_class_svm', OneClassSVM(
        kernel='rbf',  # Radial Basis Function Kernel
        nu=0.05,       # An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors.
        gamma='auto'   # Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. If ‘auto’, 1/n_features will be used.
    ))
]

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>4.4.2 Deployment of the Scikit-Learn OneClassSVM Pipeline</b>

In [None]:
model_svm_parameters = {
    "column_categorical": ['flag'],
    "column_names_X": ['X1','X2','X3','X4','X5','X6','X7','X8','X9','flag']
}

In [None]:
model_svm = HyperModel(tdstone            = sto,
                   metadata           = {'project': 'test'},
                   skl_pipeline_steps = steps_anomaly_detection,
                   model_parameters   = model_svm_parameters,
                   dataset            = in_schema('demo_user','dataset_00'),
                   id_row             = 'ID',
                   id_partition       = 'Partition_ID',
                   id_fold            = 'FOLD',
                   fold_training      = 'train',
                   convert_to_onnx    = True
                  )

In [None]:
sto.list_hyper_models()

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>5. Execution of the Deployed Hypermodels</b>

<p style = 'font-size:16px;font-family:Arial'><i>* We are running the commands in this  section for a single model. The similar commands can be run for the other three models as well.</i></p>

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>5.1 Model Training</b>

<p style = 'font-size:16px;font-family:Arial'>Let's train the models we have deployed.</p>

In [None]:
#classifier model
model_cl.train()

```python
query = f"SELECT * FROM {Param['database']}.{'TDS_TRAINED_MODELS_'+ID_MAPPER_TRAINING.replace('-','_')}"
print(query)
tdml.DataFrame.from_query("query")
```

In [None]:
model_cl.get_trained_models().groupby('TD_TIMECODE').count()

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>5.2 Model Scoring</b>

In [None]:
model_cl.score()

In [None]:
model_cl.get_model_predictions()

In [None]:
model_cl.get_model_predictions().groupby('TD_TIMECODE').count()

In [None]:
model_cl.get_model_predictions(denormalized_view=False)

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>6. Model Lineage</b></p>

<p style = 'font-size:16px;font-family:Arial'>In this section we are looking at the commands that are used for admininstrator work.</p>

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>6.1 Access to the List of Deployed Codes</b>

In [None]:
sto.list_codes()

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>6.2 List of Deployed Models (Code + Parameters)</b>

In [None]:
sto.list_models()

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>6.3 List of Available Mappers (Mapping Between Partitions and Models or Trained Models)</b>

In [None]:
sto.list_mappers()

<hr style="height:1px;border:none;">
<b style = 'font-size:18px;font-family:Arial'>6.4 List of Hypermodels ( Models and Mappers Mapping)</b>

In [None]:
sto.list_hyper_models()

In [None]:
ID_MAPPER_TRAINING = sto.list_hyper_models().to_pandas().reset_index().sort_values('CREATION_DATE', ascending=False).ID_MAPPER_TRAINING.values[0]
ID_MAPPER_TRAINING

In [None]:
DataFrame.from_query(f'CURRENT VALIDTIME SEL * FROM TDS_MAPPER_{ID_MAPPER_TRAINING.replace("-","_")}')

In [None]:
ID_MAPPER_SCORING = sto.list_hyper_models().to_pandas().reset_index().sort_values('CREATION_DATE', ascending=False).ID_MAPPER_SCORING.values[0]
ID_MAPPER_SCORING

In [None]:
DataFrame.from_query(f'CURRENT VALIDTIME SEL * FROM TDS_MAPPER_{ID_MAPPER_SCORING.replace("-","_")}')

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial'><b>7. Cleanup</b></p>
<p style = 'font-size:18px;font-family:Arial;color:##00233C'><b>Work Tables</b></p>

In [None]:
db_drop_table(table_name="dataset_00")

In [None]:
remove_context()

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>