# Train and Register models to SAS Viya Model Manager

Before starting, the `sasctl` package has to be installed and the `scikit-learn` has to be upgraded to the latest version (1.5.1). To do this, let's open a Terminal in VSCode and run the following two commands:
- `pip install sasctl`
- `pip install -U scikit-learn`
<div style="text-align: center;">
    <img src='img/terminal_vscode.png' width=70%>
</div>
<div style="text-align: center;">
    <img src='img/installation_terminal.png' width=70%>
</div>

As a last step, restart the kernel after installing the packages so that the updates can be visible:
<div style="text-align: center;">
    <img src='img/restart_kernel.png' width=70%>
</div>

## 1. Importing Packages

In [1]:
import sklearn
print(sklearn.__version__)

1.5.2


In [2]:
import os
import json
import pickle
import requests
import warnings

import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder

import sasviya
from sasviya.ml.tree import ForestClassifier

from sasctl import Session, pzmm
from sasctl.services import model_repository as mr

## 2. `sasviya` Model

### 2.1 Model Building

In [3]:
target = 'Inspection'
df = pd.read_csv('/workspaces/c2/workshop/Viya-connection/Data/CUSTOMS.csv')
df.head()

Unnamed: 0,CertificateOfOrigin,EUCitizen,Perishable,Fragile,Volume,PreDeclared,MultiplePackage,Category,OnlineDeclaration,ExporterValidation,...,LithiumBatteries,ExpressDelivery,EntryPoint,Origin,PaperlessBilling,PaymentMethod,Weight,Price,Inspection,packageID
0,No,0,Yes,No,2,No,,Clothing,No,Yes,...,No,No,Antwerp,China,Yes,Electronic check,29.85,29.85,No,7590-VHVEG
1,Yes,0,No,No,35,Yes,No,Clothing,Yes,No,...,No,No,Antwerp,US,No,Mailed check,56.95,1889.5,No,5575-GNVDE
2,Yes,0,No,No,3,Yes,No,Clothing,Yes,Yes,...,No,No,Antwerp,China,Yes,Mailed check,53.85,108.15,Yes,3668-QPYBK
3,Yes,0,No,No,46,No,,Clothing,Yes,No,...,Yes,No,Antwerp,US,No,Bank transfer (automatic),42.3,1840.75,No,7795-CFOCW
4,No,0,No,No,3,Yes,No,Electronics,No,No,...,No,No,Antwerp,China,Yes,Electronic check,70.7,151.65,Yes,9237-HQITU


In [4]:
X = df.drop([target, "packageID"], axis=1)
y = df[target]

In [5]:
nominal_cols = ['CertificateOfOrigin', 'EUCitizen', 
                'Perishable', 'Fragile', 'PreDeclared',
                'MultiplePackage', 'OnlineDeclaration',
                'ExporterValidation', 'SecuredDelivery', 'LithiumBatteries',
                'ExpressDelivery', 'PaperlessBilling', 'Category', 'EntryPoint',
                'Origin', 'PaymentMethod']

sasviya_rf = ForestClassifier()
sasviya_rf.fit(X, y, nominals=nominal_cols)

ForestClassifier()

### 2.2 Register the model to SAS Model Manager

The steps to follow are:

1. Obtaining the authorization code to the Viya instance of interest
2. Establishing a connection to the Viya server
3. Creating a project in SAS Model Manager from Workbench 
4. Specifying model parameters
5. Pushing the model into Model Manager

#### 2.2.1 Obtain the authorization code to the Viya instance of interest (`create.demo.sas` in our case)

[Follow this link to get the authorization code](https://create.demo.sas.com/SASLogon/oauth/authorize?client_id=sas.cli&response_type=code). In the case you are using a different Viya instance, you need to change the URL to: `https://your-server-url.com/SASLogon/oauth/authorize?client_id=sas.cli&response_type=code`.

Request a `.pem` file with the necessary permissions from your Viya Admin, and replace `"your-pem-file-name"` with the path to this `.pem` file for the `verification_file` variable (this step can be skipped by replacing everywhere `verify=verification_file` to `verify=False`):

In [8]:
# paste the authorization code here
auth_code = "Ijl2cxv1Wk5D09AecwhwRbFr1eRq4NtH"

server = "https://create.demo.sas.com/"

# URL to obtain the access token
url = f"{server}/SASLogon/oauth/token"

# Payload and headers for the request
auth_payload = f'grant_type=authorization_code&code={auth_code}'
headers = {
    'Accept': 'application/json',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Authorization': 'Basic c2FzLmNsaTo='
}
verification_file = "eyJqa3UiOiJodHRwczovL2xvY2FsaG9zdC9TQVNMb2dvbi90b2tlbl9rZXlzIiwia2lkIjoibGVnYWN5LXRva2VuLWtleSIsInR5cCI6IkpXVCIsImFsZyI6IlJTMjU2In0.
eyJzdWIiOiIwN2NhMzEwMC02NTJjLTQ3MTItYTQzMi0xZjZkNjdkYTVlNDciLCJ1c2VyX25hbWUiOiJTaW1vbmUuU3BhZ25vbGlAc2FzLmNvbSIsIm9yaWdpbiI6ImF6dXJl
IiwiaXNzIjoiaHR0cDovL2xvY2FsaG9zdC9TQVNMb2dvbi9vYXV0aC90b2tlbiIsImF1dGhvcml0aWVzIjpbIlNBU1Njb3JlVXNlcnMiLCJEYXRhQnVpbGRlcnMiLCJHbG9z
c2FyeS5HbG9zc2FyeUFkbWluaXN0cmF0b3JzIiwiQ0lTIFZpeWEgQWx3YXlzIE9uIiwiQ2F0YWxvZy5TdWJqZWN0TWF0dGVyRXhwZXJ0cyIsIkFwcGxpY2F0aW9uQWRtaW5p
c3RyYXRvcnMiLCJJQ1VzIiwiTWlncmF0aW9uQWRtaW5zIiwiRXNyaVVzZXJzIiwiQ0FTSG9zdEFjY291bnRSZXF1aXJlZCJdLCJjbGllbnRfaWQiOiJzYXMubGF1bmNoZXIi
LCJhdWQiOlsic2FzLmxhdW5jaGVyIiwidWFhIl0sImV4dF9pZCI6Il9ZSVZFSDJYT1lseV93TGJ4WHhVRmFiZVZXSkVYRHU1OG5UeTFhNkhWbXciLCJ6aWQiOiJ1YWEiLCJn
cmFudF90eXBlIjoidXJuOmlldGY6cGFyYW1zOm9hdXRoOmdyYW50LXR5cGU6and0LWJlYXJlciIsInVzZXJfaWQiOiIwN2NhMzEwMC02NTJjLTQ3MTItYTQzMi0xZjZkNjdk
YTVlNDciLCJhenAiOiJzYXMubGF1bmNoZXIiLCJzY29wZSI6WyJ1YWEudXNlciJdLCJleHAiOjE3MjcxMjU2MTEsImlhdCI6MTcyNzEyMjAxMSwianRpIjoiNWNjZmM1ODY5
NTJhNDI5N2IzODAzNTcwNWQ5ZDIyNDAiLCJlbWFpbCI6IlNpbW9uZS5TcGFnbm9saUBzYXMuY29tIiwicmV2X3NpZyI6ImI2NTIzMjEiLCJjaWQiOiJzYXMubGF1bmNoZXIi
fQ.0PAFkCOEdsTXq5GBrGR0wwz5vjvCOhuiTyeXrzVvhyhjpT4ifO9lmIWRhQvFXx35mC4wPT-2gBNl2mVCp8BV_C_08oa_ZQejZxuybWdBVq2xwFp520HVOzZUsgHwPvgxB
lhqUlbSWCwlfW2T178nJ1U2bkZvy4BE_zrjLS1fS97gUY_Vg5iXIRFEUP-vfKnwEGRsNONh8uP-sMdpkC0Q3BaW6SLrw4y-0odeWQI1QnY2U9tGWx2NVMt8w0gcZGnWgYRwb
-WhRXXEPOEDj6VBNR4SQiM7c0fmPfjApcnNZ8HF69zNF37grKoxzX4P5dtaZMt3H-pWp-J3hyP_sfUa4Q"

# Send the POST request to obtain the access and the refresh token
# response = requests.request("POST", url, headers=headers, data=auth_payload, verify=False)
response = requests.request("POST", url, headers=headers, data=auth_payload, verify=verification_file)
response_json = json.loads(response.text)

# Extract the access and the refresh tokens from the response
access_token = response_json['access_token']
refresh_token = response_json['refresh_token']

# Save the refresh token to a .txt file:
with open('refresh_token.txt', 'w') as file:
    file.write(refresh_token)

SyntaxError: unterminated string literal (detected at line 16) (2232521560.py, line 16)

In [7]:
# Establish a connection to SAS Viya using the access token
# st = Session(server, token=access_token, verify_ssl=False)
os.environ['CAS_CLIENT_SSL_CA_LIST'] = verification_file
st = Session(server, token=access_token)
st

<sasctl.core.Session at 0x7fe3d7759d90>

The access token has a default life of 1 hour before it expires (`response_json['expires_in']`). The **refresh token** can be used to issue a new token when the current one expires. Its validity is 14 days (by looking at `response_json['refresh_expires_in']`):

In [11]:
response_json['expires_in'], response_json['refresh_expires_in']

(3599, 1209599)

A new access token can be obtained by means of the following procedure, without repeating all the previous steps:

In [16]:
# get access token for viya env using refresh token.
server = "https://create.demo.sas.com/"
url = f"{server}/SASLogon/oauth/token"
headers = {
    'Accept': 'application/json',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Authorization': 'Basic c2FzLmNsaTo='
}

with open('refresh_token.txt', 'r') as token:
    refresh_token = token.read()

refresh_payload = f'grant_type=refresh_token&refresh_token={refresh_token}'

# response = requests.request("POST", url, headers=headers, data=refresh_payload, verify=False)
response = requests.request("POST", url, headers=headers, data=refresh_payload, verify=verification_file)
new_access_token = response.json()['access_token']

# Establish a connection to SAS Viya using the new access token
# st = Session(server, token=new_access_token, verify_ssl=False)
os.environ['CAS_CLIENT_SSL_CA_LIST'] = verification_file
st = Session(server, token=new_access_token)
st

<sasctl.core.Session at 0x7fa9fae4f750>

#### 2.2.2 Create a project from Workbench and import the model to the project

In [8]:
project_name = "Workbench Models"
repository_name = "DMRepository"

repository = mr.get_repository(repository_name)

try:
    project = mr.create_project(project_name, repository)
except:
    project = mr.get_project(project_name)

In [9]:
model_params = {
    "name": "your-name_sasviyaModel",
    "projectId": project.id,
    "type": "ASTORE",
}

astore = mr.post(
    "/models",
    files={"files": ("model_export.astore", sasviya_rf.export())},
    data=model_params,
)

## 3. `sklearn` Model

### 3.1 Model Building

In [10]:
binary_cols = ['CertificateOfOrigin', 'Perishable', 'Fragile',
               'PreDeclared', 'PaperlessBilling']
ohe_cols = ['MultiplePackage', 'OnlineDeclaration',
            'ExporterValidation', 'SecuredDelivery', 'LithiumBatteries',
            'ExpressDelivery', 'Category', 'EntryPoint', 'Origin', 'PaymentMethod']
binary_mapping = [['No', 'Yes']]

# Define the preprocessing steps
preprocessor = ColumnTransformer(
    transformers=[
        ('binary', OrdinalEncoder(categories=[['No', 'Yes']]*len(binary_cols)), binary_cols),
        ('ohe', OneHotEncoder(dtype='int64', handle_unknown='ignore', sparse_output=False), ohe_cols),
        ('impute', SimpleImputer(), ['Price'])
    ],
    remainder='passthrough',  # Keep the remaining columns as they are
    force_int_remainder_cols=False
)

# Define the pipeline
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier(random_state=0))
])

In [11]:
pipeline.fit(X, y)

In [12]:
with open('sklearn_mm_assets/sklearnPipeline.pkl', 'wb') as file:
    pickle.dump(pipeline, file)

### 2. Register the model to SAS Model Manager

The steps to follow are:

1. Obtaining the authorization code to the Viya instance of interest ✅
2. Establishing a connection to the Viya server ✅
3. Creating a project in SAS Model Manager from Workbench ✅
4. Saving information about the model in JSON files 
5. Importing the JSON files into the project
6. Creating the scoring code and adding both the pickle file and the scoring code to the model in SAS Model Manager

#### 2.2 Specify some model properties in json files

In [13]:
target_df =pd.DataFrame(data=[[0.8,0.2,"No"]],columns=['P_InspectionNo','P_InspectionYes','I_Inspection'])

pzmm.JSONFiles.write_var_json(X, is_input=True, json_path="sklearn_mm_assets/")
pzmm.JSONFiles.write_var_json(target_df, is_input=False, json_path="sklearn_mm_assets/")

inputVar.json was successfully written and saved to sklearn_mm_assets/inputVar.json
outputVar.json was successfully written and saved to sklearn_mm_assets/outputVar.json


In [14]:
model_name = "your-name_sklearnPipeline"

pzmm.JSONFiles.write_model_properties_json(model_name=model_name,
                            model_desc='scikit-learn Random Forest Classification model',
                            target_variable='Inspection',
                            model_algorithm='sklearn.ensemble.RandomForestClassifier',
                            target_values=["No","Yes"],
                            json_path="sklearn_mm_assets/",
                            modeler='Mattia')

ModelProperties.json was successfully written and saved to sklearn_mm_assets/ModelProperties.json


#### 2.3 Import the model to the project

In [15]:
warnings.filterwarnings("ignore", message="The following arguments are required for the automatic generation of score code")
warnings.filterwarnings("ignore", message="This model's properties are different from the project's.")

import_model = pzmm.ImportModel.import_model(
    overwrite_model=True,
    model_files="sklearn_mm_assets/",
    model_prefix=model_name,
    project=project_name
)

All model files were zipped to sklearn_mm_assets.


#### 2.4 Add scoring code and .pkl file

In [16]:
%%writefile ./sklearn_mm_assets/sklearnPipelineScore.py
import settings
import pickle
import pandas as pd

with open(settings.pickle_path+'/sklearnPipeline.pkl', "rb") as f:
    pipeline = pickle.load(f)

def score_method(CertificateOfOrigin, EUCitizen, Perishable, Fragile, Volume,
                 PreDeclared, MultiplePackage, Category, OnlineDeclaration,
                 ExporterValidation, SecuredDelivery, LithiumBatteries,
                 ExpressDelivery, EntryPoint, Origin, PaperlessBilling,
                 PaymentMethod, Weight, Price):
    "Output: P_InspectionNo, P_InspectionYes, I_Inspection"

    df = pd.DataFrame([[CertificateOfOrigin, EUCitizen, Perishable, Fragile, Volume,
                            PreDeclared, MultiplePackage, Category, OnlineDeclaration,
                            ExporterValidation, SecuredDelivery, LithiumBatteries,
                            ExpressDelivery, EntryPoint, Origin, PaperlessBilling,
                            PaymentMethod, Weight, Price]],
                            columns=['CertificateOfOrigin', 'EUCitizen', 'Perishable', 'Fragile', 'Volume',
                                    'PreDeclared', 'MultiplePackage', 'Category', 'OnlineDeclaration',
                                    'ExporterValidation', 'SecuredDelivery', 'LithiumBatteries',
                                    'ExpressDelivery', 'EntryPoint', 'Origin', 'PaperlessBilling',
                                    'PaymentMethod', 'Weight', 'Price'])
    
    y_pred_prob = pipeline.predict_proba(df)
    y_pred = pipeline.predict(df)
       
    return float(y_pred_prob[0][0]), float(y_pred_prob[0][1]), str(y_pred[0])

Overwriting ./sklearn_mm_assets/sklearnPipelineScore.py


In [17]:
model = mr.get_model(model_name)

scorefile = mr.add_model_content(
    model,
    open('sklearn_mm_assets/sklearnPipelineScore.py', 'rb'),
    name='sklearnPipelineScore.py',
    role='score'
)

In [18]:
pklFileName = 'sklearn_mm_assets/sklearnPipeline.pkl'

python_pickle = mr.add_model_content(
    model,
    open(pklFileName, 'rb'),
    name=pklFileName,
    role='python pickle'
)