#### Use Case Background 

##### This script uses the XGBoost package in Python to create a propensity model of agents/advisors most likely to sell products based on past behavior, demographics, and market data, which is a common model used in the insurance and annuity business.  Once the model is trained, the model it can be registered and scored in SAS Viya.  The scored data can also be charted in SAS Viya VA to visualize, interpret, and leverage for downstream decisions.

In [11]:
###############################################
###  Train & Register Python XGBoost Model  ###
###############################################

###################
### Credentials ###
###################

import os
import sys
from pathlib import Path

filepath = input("file path to credentials: ")
sys.path.append(filepath)
from credentials import hostname, session, protocol, output_dir, git_dir, token, token_pem, username

In [47]:
#############################
### Connect with SAS Viya ###
#############################

import swat

access_token = open(token, "r").read()
conn =  swat.CAS(hostname=hostname, username=None, password=access_token, ssl_ca_list=token_pem, protocol=protocol)
print(conn.serverstatus())

NOTE: Grid node action status report: 3 nodes, 9 total actions executed.
[About]

 {'CAS': 'Cloud Analytic Services',
  'CASCacheLocation': 'CAS Disk Cache',
  'CASHostAccountRequired': 'OPTIONAL',
  'Copyright': 'Copyright © 2014-2025 SAS Institute Inc. All Rights Reserved.',
  'GlobalReadOnlyMode': 'NO',
  'ServerTime': '2025-11-06T00:56:21Z',
  'System': {'Hostname': 'controller.sas-cas-server-default.viya.svc.cluster.local',
   'Linux Distribution': 'Red Hat Enterprise Linux release 8.10 (Ootpa)',
   'Model Number': 'x86_64',
   'OS Family': 'LIN X64',
   'OS Name': 'Linux',
   'OS Release': '5.15.0-1091-azure',
   'OS Version': '#100-Ubuntu SMP Tue May 27 21:41:06 UTC 2025'},
  'Transferred': 'NO',
  'Version': '4.00',
  'VersionLong': 'V.04.00M0P07072025',
  'Viya Release': '20250816.1755312373510',
  'Viya Version': 'Stable 2025.07',
  'license': {'expires': '06Mar2026:00:00:00',
   'gracePeriod': 0,
   'site': 'ENGAGE PLATFORM FINANCIAL CRIMES ANALYTICS PREMIER',
   'siteNum': 

In [None]:
###############################
### Upload Data to SAS Viya ###
###############################

### upload if not already imported
conn.upload('https://raw.githubusercontent.com/christopher-parrish/sas_viya/refs/heads/main/poc/1_data_management/annuity_advisors/annuity_advisors.csv', casOut={"caslib":"public", "name":"annuity_advisors", "promote":True})
conn.upload('https://raw.githubusercontent.com/christopher-parrish/sas_viya/refs/heads/main/poc/1_data_management/annuity_advisors/annuity_advisors_prep.csv', casOut={"caslib":"public", "name":"annuity_advisors_prep", "promote":True})
### also available: conn.read_csv('https://...csv')

# promote tables to global scope if removing the promotiion above
#conn.table.promote(caslib="public", name="annuity_advisors", targetlib="public", target="annuity_advisors")

In [14]:
#############################
### Identify Table in CAS ###
#############################

### caslib and table to use in modeling
caslib = 'public'
in_mem_tbl = 'ANNUITY_ADVISORS_PREP'

### load table in-memory if not already exists in-memory
if conn.table.tableExists(caslib=caslib, name=in_mem_tbl).exists<=0:
    conn.table.loadTable(caslib=caslib, path=str(in_mem_tbl+str('.sashdat')), 
                         casout={'name':in_mem_tbl, 'caslib':caslib, 'promote':True})

### show table to verify
conn.table.tableInfo(caslib=caslib, wildIgnore=False, name=in_mem_tbl)

NOTE: Cloud Analytic Services made the file ANNUITY_ADVISORS_PREP.sashdat available as table ANNUITY_ADVISORS_PREP in caslib public.


Unnamed: 0,Name,Rows,Columns,IndexedColumns,Encoding,CreateTimeFormatted,ModTimeFormatted,AccessTimeFormatted,JavaCharSet,CreateTime,View,MultiPart,SourceName,SourceCaslib,Compressed,Creator,Modifier,SourceModTimeFormatted,SourceModTime,TableRedistUpPolicy
0,ANNUITY_ADVISORS_PREP,15351,33,0,utf-8,2025-11-05T22:09:35+00:00,2025-11-05T22:09:35+00:00,2025-11-05T22:09:35+00:00,UTF8,2078000000.0,0,0,ANNUITY_ADVISORS_PREP.sashdat,Public,0,chris.parrish@sas.com,,2025-11-05T16:20:31+00:00,2077979000.0,Not Specified


In [15]:
########################
### Create Dataframe ###
########################

dm_inputdf =  conn.CASTable(in_mem_tbl, caslib=caslib).to_frame()

### print columns for review of model parameters
print(dm_inputdf.dtypes)

advisor                          float64
advisor_event_indicator          float64
sf_face_2_face                   float64
sf_call_outbound                 float64
sf_call_inbound                  float64
sf_email_inbound                 float64
channel_bank                     float64
channel_wirehouse                float64
channel_ria                      float64
primary_prod_sold_fixed          float64
primary_prod_sold_va             float64
sf_email_campaigns               float64
advisor_hh_children              float64
annuity_mkt_opp                  float64
advisor_advising_years           float64
advisor_aum                      float64
advisor_annuity_selling_years    float64
advisor_age                      float64
advisor_net_worth                float64
advisor_credit_hist_mos          float64
advisor_firm_changes             float64
advisor_credit_score             float64
wholesaler                       float64
region_ca                        float64
region_ny       

In [16]:
########################
### Model Parameters ###
########################

### import python libraries
import numpy as np
import xgboost as xgb
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.utils import shuffle
from pathlib import Path

xgb_params = {
             'base_score': 0.5, 
             'booster': 'gbtree', 
             'eval_metric': 'auc', 
             'colsample_bytree': 1, 
             'colsample_bylevel': 1, 
             'colsample_bynode': 1, 
             'gamma': 0, 
             'grow_policy': 'depthwise', 
             'learning_rate': 0.1, 
             'max_bin': 256, 
             'max_delta_step': 0, 
             'max_depth': 3, 
             'max_leaves': 0, 
             'min_child_weight': 1, 
             'nthread': None, 
             'num_parallel_tree': 1, 
             'objective': 'binary:logistic', 
             'predictor': 'auto', 
             'process_type': 'default', 
             'refresh_leaf': 1, 
             'reg_alpha': 0, 
             'reg_lambda': 1, 
             'sampling_method': 'uniform', 
             'scale_pos_weight': 1, 
             'seed': None, 
             'seed_per_iteration': False, 
             'sketch_eps': 0.03, 
             'subsample': 1, 
             'tree_method': 'auto'
             } 
print(xgb_params)

### XGBOOST CORE
xgb_params_train = {'num_boost_round': 100,
                    'early_stopping_rounds': 50}
cutoff = 0.1

### define macro variables for model
dm_dec_target = 'advisor_event_indicator'
dm_partitionvar = 'analytic_partition'
create_new_partition = 'no' # 'yes', 'no'
dm_key = 'advisor' 
dm_classtarget_level = ['0', '1']
dm_partition_validate_val, dm_partition_train_val, dm_partition_test_val = [0, 1, 2]
dm_partition_validate_perc, dm_partition_train_perc, dm_partition_test_perc = [0.3, 0.6, 0.1]

### create list of regressors
keep_predictors = [
    ]
rejected_predictors = [
    'channel_ria',
    'region_we',
    'primary_prod_sold_fixed'
    ] 

### mlflow
use_mlflow = 'no' # 'yes', 'no'
mlflow_run_to_use = 0
mlflow_class_labels =['TENSOR']
mlflow_predict_syntax = 'predict'

### var to consider in bias assessment
bias_vars = ['sf_face_2_face']

### var to consider in partial dependency
pd_var1 = ''
pd_var2 = ''

### create partition column, if not already in dataset
if create_new_partition == 'yes':
    dm_inputdf = shuffle(dm_inputdf)
    dm_inputdf.reset_index(inplace=True, drop=True)
    validate_rows = round(len(dm_inputdf)*dm_partition_validate_perc)
    train_rows = round(len(dm_inputdf)*dm_partition_train_perc) + validate_rows
    test_rows = len(dm_inputdf)-train_rows
    dm_inputdf.loc[0:validate_rows,dm_partitionvar] = dm_partition_validate_val
    dm_inputdf.loc[validate_rows:train_rows,dm_partitionvar] = dm_partition_train_val
    dm_inputdf.loc[train_rows:,dm_partitionvar] = dm_partition_test_val

{'base_score': 0.5, 'booster': 'gbtree', 'eval_metric': 'auc', 'colsample_bytree': 1, 'colsample_bylevel': 1, 'colsample_bynode': 1, 'gamma': 0, 'grow_policy': 'depthwise', 'learning_rate': 0.1, 'max_bin': 256, 'max_delta_step': 0, 'max_depth': 3, 'max_leaves': 0, 'min_child_weight': 1, 'nthread': None, 'num_parallel_tree': 1, 'objective': 'binary:logistic', 'predictor': 'auto', 'process_type': 'default', 'refresh_leaf': 1, 'reg_alpha': 0, 'reg_lambda': 1, 'sampling_method': 'uniform', 'scale_pos_weight': 1, 'seed': None, 'seed_per_iteration': False, 'sketch_eps': 0.03, 'subsample': 1, 'tree_method': 'auto'}


In [17]:
##############################
### Final Modeling Columns ###
##############################

### create list of model variables
dm_input = list(dm_inputdf.columns.values)
macro_vars = (dm_dec_target + ' ' + dm_partitionvar + ' ' + dm_key).split()
#rejected_predictors = [i for i in dm_input if i not in keep_predictors]
rejected_vars = rejected_predictors + macro_vars #(include macro_vars if rejected_predictors are explicitly listed - not contra keep_predictors)
for i in rejected_vars:
    dm_input.remove(i)
print(dm_input)

### create prediction variables
dm_predictionvar = [str('P_') + dm_dec_target + dm_classtarget_level[0], str('P_') + dm_dec_target + dm_classtarget_level[1]]
dm_classtarget_intovar = str('I_') + dm_dec_target

##################
### Data Split ###
##################

### create train, test, validate datasets using existing partition column
dm_traindf = dm_inputdf[dm_inputdf[dm_partitionvar] == dm_partition_train_val]
X_train = dm_traindf.loc[:, dm_input]
y_train = dm_traindf[dm_dec_target]
dm_testdf = dm_inputdf.loc[(dm_inputdf[dm_partitionvar] == dm_partition_test_val)]
X_test = dm_testdf.loc[:, dm_input]
y_test = dm_testdf[dm_dec_target]
dm_validdf = dm_inputdf.loc[(dm_inputdf[dm_partitionvar] == dm_partition_validate_val)]
X_valid = dm_validdf.loc[:, dm_input]
y_valid = dm_validdf[dm_dec_target]
fullX = dm_inputdf.loc[:, dm_input]
fully = dm_inputdf[dm_dec_target]

['sf_face_2_face', 'sf_call_outbound', 'sf_call_inbound', 'sf_email_inbound', 'channel_bank', 'channel_wirehouse', 'primary_prod_sold_va', 'sf_email_campaigns', 'advisor_hh_children', 'annuity_mkt_opp', 'advisor_advising_years', 'advisor_aum', 'advisor_annuity_selling_years', 'advisor_age', 'advisor_net_worth', 'advisor_credit_hist_mos', 'advisor_firm_changes', 'advisor_credit_score', 'wholesaler', 'region_ca', 'region_ny', 'region_fl', 'region_tx', 'region_ne', 'region_so', 'region_mw', 'sf_email_responses']


In [9]:
##########################
### Variable Selection ###
##########################

from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
from sklearn.feature_selection import RFE, RFECV
from sklearn.model_selection import cross_val_score
from time import time

### Recursive Feature Elimination (RFE) with Crossvalidation (auto-select number of variables)
models_for_rfe = [DecisionTreeRegressor(), GradientBoostingRegressor(), RandomForestRegressor()]
start = time()
rfe_cols_cv = []
for i in models_for_rfe:
    rfe_cv = RFECV(estimator=i, step=1, cv=10, min_features_to_select=1)
    rfe_cv.fit(fullX,fully)
    rfe_cols_cv.append(list(rfe_cv.get_feature_names_out()))

finish = time()

time_to_complete = finish-start
print("Time to complete feature selection with Python:", time_to_complete)

Time to complete feature selection with Python: 2118.7103004455566


In [10]:
print ("Selected variables using Scikit-Learn Decision Tree:", rfe_cols_cv[0])
print ("Selected variables using Scikit-Learn Gradient Boosting:", rfe_cols_cv[1])
print ("Selected variables using Scikit-Learn Random Forest:", rfe_cols_cv[2])

Selected variables using Scikit-Learn Decision Tree: ['sf_face_2_face', 'sf_call_outbound', 'sf_call_inbound', 'sf_email_inbound', 'channel_bank', 'channel_wirehouse', 'primary_prod_sold_va', 'sf_email_campaigns', 'advisor_hh_children', 'annuity_mkt_opp', 'advisor_advising_years', 'advisor_aum', 'advisor_annuity_selling_years', 'advisor_age', 'advisor_net_worth', 'advisor_credit_hist_mos', 'advisor_firm_changes', 'advisor_credit_score', 'wholesaler', 'region_ca', 'region_ny', 'region_fl', 'region_tx', 'region_ne', 'region_so', 'region_mw', 'sf_email_responses']
Selected variables using Scikit-Learn Gradient Boosting: ['sf_face_2_face', 'sf_call_outbound', 'sf_call_inbound', 'sf_email_inbound', 'channel_wirehouse', 'primary_prod_sold_va', 'sf_email_campaigns', 'advisor_hh_children', 'annuity_mkt_opp', 'advisor_advising_years', 'advisor_aum', 'advisor_annuity_selling_years', 'advisor_age', 'advisor_net_worth', 'advisor_credit_hist_mos', 'advisor_firm_changes', 'advisor_credit_score', 're

#### Use the CAS action set dataScientPilot to find features

In [None]:
conn.loadactionset(actionset="dataSciencePilot")

start = time()
conn.dataSciencePilot.selectFeatures(
     table     = dict(caslib='public', name='annuity_advisors_prep'),
     target    = dm_dec_target,
     selectionPolicy = dict(criterion="SU"),
     inputs    = dm_input,
     casOut    = dict(name='agent_advisors_features', replace=True)
 )

finish = time()

time_to_complete = finish-start
print("Time to complete feature selection with CAS action:", time_to_complete)

NOTE: Added action set 'dataSciencePilot'.
Time to complete feature selection with CAS action: 0.6820318698883057


In [None]:
results = conn.fetch(table = dict(name='agent_advisors_features'))

Unnamed: 0,Variable,Target,Rank,CritValue
0,sf_call_inbound,advisor_event_indicator,1.0,0.233597
1,advisor_hh_children,advisor_event_indicator,2.0,0.140637
2,advisor_net_worth,advisor_event_indicator,3.0,0.116597
3,sf_email_campaigns,advisor_event_indicator,4.0,0.105315
4,advisor_advising_years,advisor_event_indicator,5.0,0.072528
5,advisor_credit_score,advisor_event_indicator,6.0,0.071365
6,sf_face_2_face,advisor_event_indicator,7.0,0.067302
7,sf_email_responses,advisor_event_indicator,8.0,0.061159
8,advisor_firm_changes,advisor_event_indicator,9.0,0.046331
9,advisor_credit_hist_mos,advisor_event_indicator,10.0,0.046127


#### Choose the variables that are to be used in the training and test data sets

In [29]:
conn.simple.freq(
    table=dict(caslib=caslib, name=in_mem_tbl, vars='analytic_partition')
)

Unnamed: 0,Column,NumVar,FmtVar,Level,Frequency
0,analytic_partition,0.0,0,1,1535.0
1,analytic_partition,1.0,1,2,4605.0
2,analytic_partition,2.0,2,3,9211.0


In [22]:
### create train, test, validate datasets using existing partition column
### replacing full set of input vars with those chosen from variable selection
#dm_input = rfe_cols_cv[2]
dm_input = ['sf_face_2_face', 'sf_call_inbound', 'sf_email_campaigns', 'advisor_hh_children', 'annuity_mkt_opp', 'advisor_advising_years', 'advisor_aum', 'advisor_annuity_selling_years', 'advisor_age', 'advisor_net_worth', 'advisor_credit_hist_mos', 'advisor_firm_changes', 'advisor_credit_score', 'wholesaler', 'region_ca', 'region_ny', 'sf_email_responses']

dm_traindf = dm_inputdf[dm_inputdf[dm_partitionvar] == dm_partition_train_val]
X_train = dm_traindf.loc[:, dm_input]
y_train = dm_traindf[dm_dec_target]
dm_testdf = dm_inputdf.loc[(dm_inputdf[dm_partitionvar] == dm_partition_test_val)]
X_test = dm_testdf.loc[:, dm_input]
y_test = dm_testdf[dm_dec_target]
dm_validdf = dm_inputdf.loc[(dm_inputdf[dm_partitionvar] == dm_partition_validate_val)]
X_valid = dm_validdf.loc[:, dm_input]
y_valid = dm_validdf[dm_dec_target]
fullX = dm_inputdf.loc[:, dm_input]
fully = dm_inputdf[dm_dec_target]

In [19]:
####################
### XGBOOST CORE ###
####################

import xgboost as xgb
from sklearn.metrics import classification_report, confusion_matrix

### prediction parameter
predict_syntax = 'predict'

### convert data to matrices
xgb_train = xgb.DMatrix(X_train, y_train)
xgb_test = xgb.DMatrix(X_test, y_test)
xgb_valid = xgb.DMatrix(X_valid, y_valid)

### estimate & fit model
eval_list = [(xgb_valid, 'valid'), (xgb_test, 'test'), (xgb_train, 'train')]
dm_model = xgb.train(dtrain=xgb_train, params=xgb_params, evals=eval_list, **xgb_params_train)


[0]	valid-auc:0.96382	test-auc:0.96722	train-auc:0.96733
[1]	valid-auc:0.96478	test-auc:0.96804	train-auc:0.96783
[2]	valid-auc:0.97692	test-auc:0.97824	train-auc:0.98012
[3]	valid-auc:0.97744	test-auc:0.97872	train-auc:0.98076
[4]	valid-auc:0.97736	test-auc:0.97868	train-auc:0.98078
[5]	valid-auc:0.97873	test-auc:0.98016	train-auc:0.98173
[6]	valid-auc:0.97956	test-auc:0.98112	train-auc:0.98251
[7]	valid-auc:0.98186	test-auc:0.98455	train-auc:0.98622
[8]	valid-auc:0.98897	test-auc:0.98937	train-auc:0.99068
[9]	valid-auc:0.99061	test-auc:0.99071	train-auc:0.99186
[10]	valid-auc:0.99067	test-auc:0.99081	train-auc:0.99210
[11]	valid-auc:0.99140	test-auc:0.99146	train-auc:0.99263
[12]	valid-auc:0.99198	test-auc:0.99208	train-auc:0.99319
[13]	valid-auc:0.99248	test-auc:0.99308	train-auc:0.99391
[14]	valid-auc:0.99338	test-auc:0.99350	train-auc:0.99430
[15]	valid-auc:0.99415	test-auc:0.99390	train-auc:0.99484
[16]	valid-auc:0.99451	test-auc:0.99427	train-auc:0.99529
[17]	valid-auc:0.99465	t

Parameters: { "predictor", "sketch_eps" } are not used.

  self.starting_round = model.num_boosted_rounds()


[52]	valid-auc:0.99746	test-auc:0.99742	train-auc:0.99890
[53]	valid-auc:0.99749	test-auc:0.99746	train-auc:0.99893
[54]	valid-auc:0.99754	test-auc:0.99748	train-auc:0.99899
[55]	valid-auc:0.99766	test-auc:0.99752	train-auc:0.99905
[56]	valid-auc:0.99768	test-auc:0.99754	train-auc:0.99907
[57]	valid-auc:0.99773	test-auc:0.99752	train-auc:0.99909
[58]	valid-auc:0.99772	test-auc:0.99754	train-auc:0.99911
[59]	valid-auc:0.99774	test-auc:0.99756	train-auc:0.99914
[60]	valid-auc:0.99775	test-auc:0.99758	train-auc:0.99915
[61]	valid-auc:0.99771	test-auc:0.99757	train-auc:0.99915
[62]	valid-auc:0.99780	test-auc:0.99758	train-auc:0.99917
[63]	valid-auc:0.99781	test-auc:0.99760	train-auc:0.99920
[64]	valid-auc:0.99782	test-auc:0.99762	train-auc:0.99921
[65]	valid-auc:0.99783	test-auc:0.99763	train-auc:0.99923
[66]	valid-auc:0.99785	test-auc:0.99765	train-auc:0.99924
[67]	valid-auc:0.99785	test-auc:0.99766	train-auc:0.99926
[68]	valid-auc:0.99789	test-auc:0.99764	train-auc:0.99928
[69]	valid-auc

In [30]:
#################################
### Score Data from the Model ###
#################################

import pandas as pd

### create dataframes by split with model probabilities and indicator level (p), bias variables (b), and dependent variable (y)

### for xgboost core model, the 'predict' function produces a probability, so a classification column needs to be created based on a cutoff value
def binary_col (row):
    if row[dm_predictionvar[1]] > cutoff:
        return 1
    else:
        return 0

def score_data(df_x, df_y):
    xgb_m = xgb.DMatrix(df_x)
    # p = pd.DataFrame(dm_model.predict(xgb_m), columns=[dm_predictionvar[1]])
    # p[dm_predictionvar[0]] = 1-p[dm_predictionvar[1]]
    p = pd.DataFrame(1-dm_model.predict(xgb_m), columns=[dm_predictionvar[0]])
    p[dm_predictionvar[1]] = dm_model.predict(xgb_m)
    p[dm_classtarget_intovar] = p.apply (lambda row: binary_col(row), axis=1)
    b = df_x[bias_vars].reset_index(drop=True)
    y = pd.DataFrame(df_y.reset_index(drop=True))
    scored_df = pd.concat([p, b, y], axis=1)
    return scored_df

full_score = score_data(fullX, fully)
train_score = score_data(X_train, y_train)
test_score = score_data(X_test, y_test)
valid_score = score_data(X_valid, y_valid)

In [31]:
### print model & results
description = 'XGBoost'
cols = X_train.columns
predictors = np.array(cols)
tn, fp, fn, tp = confusion_matrix(y_test, test_score[dm_classtarget_intovar]).ravel()
print(description)
print('model_parameters')
print(dm_model)
print(' ')
print('confusion_matrix test data:')
print('(tn, fp, fn, tp)')
print((tn, fp, fn, tp))
print('classification_report:')
print(classification_report(y_test, test_score[dm_classtarget_intovar]))

### print scoring columns
print(' ')
print('***** 5 rows from test scoring *****')
print(test_score.head(5))
print(' ')
print('***** scoring columns *****')
print((', '.join(dm_input)))
print(dm_input)
print(*dm_input)

XGBoost
model_parameters
<xgboost.core.Booster object at 0x0000016102F4F950>
 
confusion_matrix test data:
(tn, fp, fn, tp)
(np.int64(6092), np.int64(373), np.int64(7), np.int64(2739))
classification_report:
              precision    recall  f1-score   support

         0.0       1.00      0.94      0.97      6465
         1.0       0.88      1.00      0.94      2746

    accuracy                           0.96      9211
   macro avg       0.94      0.97      0.95      9211
weighted avg       0.96      0.96      0.96      9211

 
***** 5 rows from test scoring *****
   P_advisor_event_indicator0  P_advisor_event_indicator1  \
0                    0.999423                    0.000577   
1                    0.999339                    0.000661   
2                    0.439849                    0.560151   
3                    0.917786                    0.082214   
4                    0.997463                    0.002537   

   I_advisor_event_indicator  sf_face_2_face  advisor_event

In [32]:
test_score

Unnamed: 0,P_advisor_event_indicator0,P_advisor_event_indicator1,I_advisor_event_indicator,sf_face_2_face,advisor_event_indicator
0,0.999423,0.000577,0,0.0,0.0
1,0.999339,0.000661,0,0.0,0.0
2,0.439849,0.560151,1,0.0,0.0
3,0.917786,0.082214,0,1.0,0.0
4,0.997463,0.002537,0,1.0,0.0
...,...,...,...,...,...
9206,0.002291,0.997709,1,0.0,1.0
9207,0.017290,0.982710,1,0.0,1.0
9208,0.010264,0.989736,1,0.0,1.0
9209,0.002701,0.997299,1,0.0,1.0


In [33]:
#######################################
### Register Model in Model Manager ###
#######################################

from sasctl import Session
import sasctl.pzmm as pzmm
import shutil

### define paramters
metadata_output_dir = 'outputs'
model_name = 'agent_advisor_xgboost_python'
project_name = 'Agent Advisor Propensity to Sell'
model_type = 'xgboost'
predict_syntax = 'predict'
input_df = X_train
target_df = y_train
predictors = np.array(X_train.columns)
#prediction_labels = ['I_ml_indicator', 'P_ml_indicator0', 'P_ml_indicator1']
prediction_labels = ['EM_CLASSIFICATION', 'EM_EVENTPROBABILITY']
target_event = dm_predictionvar[1]
non_target_event = dm_predictionvar[0]
target_event_level = dm_classtarget_level[1]
non_target_event_level = dm_classtarget_level[0]
target_level = 'BINARY'
num_target_categories = len(dm_classtarget_level)
predict_method = str('{}.')+str(predict_syntax)+str('({})')
output_vars = pd.DataFrame(columns=prediction_labels, data=[['A', 0.5]])

In [34]:
### create directories for files
output_path = Path(output_dir) / metadata_output_dir / model_name
if output_path.exists() and output_path.is_dir():
    shutil.rmtree(output_path)

### create output path
os.makedirs(output_path)

In [35]:
### create model files and metadata
pzmm.PickleModel.pickle_trained_model(trained_model=dm_model, model_prefix=model_name, pickle_path=output_path)
pzmm.JSONFiles().write_var_json(input_data=input_df, is_input=True, json_path=output_path)
pzmm.JSONFiles().write_var_json(input_data=output_vars, is_input=False, json_path=output_path)
pzmm.JSONFiles().write_model_properties_json(
    model_name=model_name, 
    target_variable=dm_dec_target,
    target_values=[dm_classtarget_level[1], dm_classtarget_level[0]],
    json_path=output_path,
    model_desc=description,
    model_algorithm=model_type,
    #model_function=model_function,
    modeler=username,
    #train_table=in_mem_tbl,
    #properties=None
    )
pzmm.JSONFiles().write_file_metadata_json(model_prefix=model_name, json_path=output_path, is_h2o_model=False, is_tf_keras_model=False)

Model agent_advisor_xgboost_python was successfully pickled and saved to C:\Users\chparr\OneDrive - SAS\python\outputs\agent_advisor_xgboost_python\agent_advisor_xgboost_python.pickle.
inputVar.json was successfully written and saved to C:\Users\chparr\OneDrive - SAS\python\outputs\agent_advisor_xgboost_python\inputVar.json
outputVar.json was successfully written and saved to C:\Users\chparr\OneDrive - SAS\python\outputs\agent_advisor_xgboost_python\outputVar.json
ModelProperties.json was successfully written and saved to C:\Users\chparr\OneDrive - SAS\python\outputs\agent_advisor_xgboost_python\ModelProperties.json
fileMetadata.json was successfully written and saved to C:\Users\chparr\OneDrive - SAS\python\outputs\agent_advisor_xgboost_python\fileMetadata.json


In [36]:
### create requirements file
import json
requirements_json = pzmm.JSONFiles().create_requirements_json(model_path=output_path)
print(json.dumps(requirements_json, sort_keys=True, indent=4))
for requirement in requirements_json:
    if 'sklearn' in requirement['step']:
        requirement['command'] = requirement["command"].replace('sklearn', 'scikit-learn')
        requirement['step'] = requirement['step'].replace('sklearn', 'scikit-learn')
print(json.dumps(requirements_json, sort_keys=True, indent=4))
with open(Path(output_path) / "requirements.json", "w") as req_file:
    req_file.write(json.dumps(requirements_json, indent=4))

[
    {
        "command": "pip install xgboost==3.0.0",
        "step": "install xgboost"
    }
]
[
    {
        "command": "pip install xgboost==3.0.0",
        "step": "install xgboost"
    }
]


In [49]:
### create session in cas
sess = Session(hostname=session, token=access_token, client_secret='access_token')

In [38]:
### create model statistics

validData=valid_score[[dm_dec_target, dm_classtarget_intovar, dm_predictionvar[1]]]
trainData=train_score[[dm_dec_target, dm_classtarget_intovar, dm_predictionvar[1]]]
testData=test_score[[dm_dec_target, dm_classtarget_intovar, dm_predictionvar[1]]]

pzmm.JSONFiles().calculate_model_statistics(
    target_value=int(dm_classtarget_level[1]),
    validate_data=validData,
    train_data=trainData, 
    test_data=testData, 
    json_path=output_path,
    #target_type=model_function,
    #cutoff=None
    )

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data["predict_proba2"] = 1 - data["predict_proba"]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data["predict_proba2"] = 1 - data["predict_proba"]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data["predict_proba2"] = 1 - data["predict_proba"]


dmcas_fitstat.json was successfully written and saved to C:\Users\chparr\OneDrive - SAS\python\outputs\agent_advisor_xgboost_python\dmcas_fitstat.json
dmcas_roc.json was successfully written and saved to C:\Users\chparr\OneDrive - SAS\python\outputs\agent_advisor_xgboost_python\dmcas_roc.json
dmcas_lift.json was successfully written and saved to C:\Users\chparr\OneDrive - SAS\python\outputs\agent_advisor_xgboost_python\dmcas_lift.json


In [39]:
### create model bias measures
# scored_table requires a specific format and column order; default is scored "test" table
scored_table_keep = [target_event, non_target_event, dm_dec_target, bias_vars[0]]
scored_table = test_score.astype({dm_dec_target: int, bias_vars[0]: int, dm_classtarget_intovar: int})

pzmm.JSONFiles().assess_model_bias(
    score_table=scored_table,
    sensitive_values=bias_vars, 
    actual_values=dm_dec_target,
    #pred_values=None,
    prob_values=[target_event, non_target_event],
    levels=[target_event_level, non_target_event_level],
    json_path=output_path,
    #cutoff=0.5,
    #datarole="TEST",
    return_dataframes=True
    )

  pzmm.JSONFiles().assess_model_bias(


maxDifferences.json was successfully written and saved to C:\Users\chparr\OneDrive - SAS\python\outputs\agent_advisor_xgboost_python\maxDifferences.json
groupMetrics.json was successfully written and saved to C:\Users\chparr\OneDrive - SAS\python\outputs\agent_advisor_xgboost_python\groupMetrics.json


  json_files = cls.bias_dataframes_to_json(


{'maxDifferencesData':     BASE  COMPARE                      Metric  \
 0    0.0      1.0  P_advisor_event_indicator1   
 1    1.0      0.0  P_advisor_event_indicator0   
 2    0.0      1.0                         TPR   
 3    0.0      1.0                         FPR   
 4    1.0      0.0                         TNR   
 5    1.0      0.0                         FNR   
 6    1.0      0.0                         FDR   
 7    1.0      0.0                         ACC   
 8    0.0      1.0                           C   
 9    0.0      1.0                          F1   
 10   0.0      1.0                        GINI   
 11   0.0      1.0                   MISCEVENT   
 12   1.0      0.0                 MISCEVENTKS   
 13   0.0      1.0                         MCE   
 14   0.0      1.0                         ASE   
 15   0.0      1.0                        RASE   
 16   0.0      1.0                        MCLL   
 17   0.0      1.0                       maxKS   
 18   0.0      1.0          

In [40]:
print(X_train.columns)
print(X_train.dtypes)

Index(['sf_face_2_face', 'sf_call_inbound', 'sf_email_campaigns',
       'advisor_hh_children', 'annuity_mkt_opp', 'advisor_advising_years',
       'advisor_aum', 'advisor_annuity_selling_years', 'advisor_age',
       'advisor_net_worth', 'advisor_credit_hist_mos', 'advisor_firm_changes',
       'advisor_credit_score', 'wholesaler', 'region_ca', 'region_ny',
       'sf_email_responses'],
      dtype='object')
sf_face_2_face                   float64
sf_call_inbound                  float64
sf_email_campaigns               float64
advisor_hh_children              float64
annuity_mkt_opp                  float64
advisor_advising_years           float64
advisor_aum                      float64
advisor_annuity_selling_years    float64
advisor_age                      float64
advisor_net_worth                float64
advisor_credit_hist_mos          float64
advisor_firm_changes             float64
advisor_credit_score             float64
wholesaler                       float64
region_ca    

In [41]:
dm_inputdf_pd = pd.DataFrame(dm_inputdf)

In [None]:
### create model card information
pzmm.JSONFiles().generate_model_card(
        model_prefix=model_name,
        model_files=output_path,
        algorithm=description,
        train_data=dm_inputdf_pd,
        train_predictions=train_score[dm_classtarget_intovar],
        target_type="classification",
        target_value=int(target_event_level),
        interval_vars=['annuity_mkt_opp', 'advisor_advising_years', 'advisor_aum', 'advisor_annuity_selling_years', 'advisor_age',
                       'advisor_net_worth', 'advisor_credit_hist_mos', 'advisor_firm_changes', 'advisor_credit_score', 'wholesaler'],
        class_vars=['sf_face_2_face', 'sf_call_inbound', 'sf_email_campaigns', 'advisor_hh_children', 
                    'region_ca', 'region_ny', 'sf_email_responses']
        #selection_statistic="_KS_",
        # training_table_name="training_table",
        # server="cas-shared-default",
        # caslib=caslib
        )

In [None]:
### copy script to output path
### right click script and copy path (change to forward slash)
src = str(git_dir) + str('/python/logit_python/aml_bank/logit_python_amlbank.ipynb')
print(src)
dst = output_path
shutil.copy(src, dst)
output_path

C:/Users/chparr/OneDrive - SAS/git/sas_viya/python/logit_python/aml_bank/logit_python_amlbank.ipynb


WindowsPath('C:/Users/chparr/OneDrive - SAS/python/outputs/logit_python_amlbank')

##### Note: for XGBoost core models (not with scikit-learn wrapper), the score code generated by sasclt needs to be changed.  First, add 'import xgboost as xgb' along with the other imports.  Next, convert input_array (the dataframe with the input values) to a DMatrix.  Put this statement in right before the prediction: 'input_array = xgb.DMatrix(input_array)'. Lastly, since XGBoost uses 'predict' to score the model and it only creates a decimal probability of pr(1) only, the references to prediction[1] need to be changed to prediction [0].  Lastly, the batch scoring output needs to change to only reference the first column - df[0] since there is only one value created in the 'predict' function.

##### Register Python Model in Model Manager

In [None]:
import sasctl.pzmm as pzmm

pzmm.ImportModel().import_model(
    model_files=output_path, 
    model_prefix=model_name, 
    project=project_name, 
    input_data=input_df,
    predict_method=[dm_model.predict, [int, int]],
    score_metrics=prediction_labels,
    pickle_type='pickle',
    project_version='latest',
    missing_values=False,
    overwrite_model=False,
    mlflow_details=None,
    predict_threshold=None,
    target_values=dm_classtarget_level,
    overwrite_project_properties=False,
    target_index=1,
    model_file_name=model_name + str('.pickle'))

  warn(


Model score code was written successfully to C:\Users\chparr\OneDrive - SAS\python\outputs\agent_advisor_xgboost_python\score_agent_advisor_xgboost_python.py and uploaded to SAS Model Manager.
All model files were zipped to C:\Users\chparr\OneDrive - SAS\python\outputs\agent_advisor_xgboost_python.


  warn(f"No project with the name or UUID {project} was found.")


A new project named Agent Advisor Propensity to Sell was created.
Model was successfully imported into SAS Model Manager as agent_advisor_xgboost_python with the following UUID: 49b6d2cf-d2e9-4080-9d60-0ea0684686e2.


(<class 'sasctl.core.RestObj'>(headers={'Date': 'Wed, 05 Nov 2025 22:32:46 GMT', 'Content-Type': 'application/vnd.sas.collection+json; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Cache-Control': 'no-cache, no-store, max-age=0, must-revalidate', 'Content-Security-Policy': "default-src 'self'; object-src 'none'; frame-ancestors 'self'; form-action 'self';", 'Expires': '0', 'Pragma': 'no-cache', 'Sas-Activity-Correlator-Id': '9b54d67c-3997-4f65-a556-ebd240a6b22f', 'Sas-Service-Response-Flag': 'true', 'Vary': 'Origin', 'X-Content-Type-Options': 'nosniff', 'X-Xss-Protection': '1; mode=block', 'Strict-Transport-Security': 'max-age=6.3072e+07; includeSubDomains'}, data={'creationTimeStamp': '2025-11-05T22:32:45.274Z', 'createdBy': 'chris.parrish@sas.com', 'modifiedTimeStamp': '2025-11-05T22:32:46.957Z', 'modifiedBy': 'chris.parrish@sas.com', 'id': '49b6d2cf-d2e9-4080-9d60-0ea0684686e2', 'name': 'agent_advisor_xgboost_python', 'description': 'XGBoost', 'role': 