# Credit Scoring Business Application

## Overview

## Objective

This notebook provides an example of **how to registry a Tensoflow model on SAS Model Manager using SASCTL library**.

The goal is manage the e2e with the model deployment on RedHat OpenShift

## Assumption

We can use and sasctl functionaliy**.

# 0. Import and Setup

## Libraries

In [1]:
# General
import os
import random
import shutil
import subprocess
import getpass
import yaml
import pprint
import zipfile
import uuid

# Data
import pandas as pd

# SAS Model Manager
import sasctl
from sasctl import Session
from sasctl.services import model_repository, model_management
import sasctl.pzmm as pzmm

#Settings
import warnings
warnings.filterwarnings('ignore')

## Define Helpers

In [2]:
def read_yaml(filepath):
    '''
    Given file path, Read yaml file 
    :param filepath:
    :return: conn_dict
    '''
    with open(filepath) as file:
        conn_dict = yaml.load(file, Loader=yaml.FullLoader)
    return conn_dict

def write_requirements(folder, filename):
    '''
    Given a folder and the filename, 
    create the requirements file.
    :param folder: 
    :param filename: 
    :return: 
    '''
    reqfile_path = os.path.join(folder, filename)
    with open(reqfile_path, "w") as f:
        sterr = subprocess.call(["pip", "freeze"], stdout=f, stderr=-1)
    if sterr==0:
        print("Requirements file created under " , reqfile_path)
    else:
        print("pip freeze command fails!")

def get_output_variables(names, labels, eventprob):
    '''
    Given variable names, labels and event probability, 
    it creates dataframes for pzmm metadata generation
    :param names: 
    :param labels: 
    :param eventprob: 
    :return: outputVar
    '''
    outputVar = pd.DataFrame(columns=names)
    outputVar[names[0]] = [random.random(), random.random()]
    outputVar[names[1]] = [random.random(), random.random()]
    outputVar[names[2]] = labels
    outputVar[names[3]] = eventprob
    return outputVar

def zip_folder(folder_to_zip_path, rmtree=False):
    '''
    Given the folder to zip path,
    create an archive
    :param folder_to_zip_path: 
    :param rmtree: 
    :return: zipath
    '''
    path_sep = '/'
    root_dir = path_sep.join(folder_to_zip_path.split('/')[:-1])
    base_dir = folder_to_zip_path.split('/')[-1]
    zipath = shutil.make_archive(
        folder_to_zip_path,         # folder to zip
        'zip',                      # the archive format - or tar, bztar, gztar 
        root_dir=root_dir,          # folder to zip root
        base_dir=base_dir)          # folder to zip name
    if rmtree:
        shutil.rmtree(folder_to_zip_path) # remove .zip folder
    return zipath
    
def run_model_tracking (project, model):
    '''
    Given project and model names, 
    create a project and register the model in SAS Model manager
    :param project: 
    :param model: 
    :return: None
    '''
    
    with Session(hostname=SERVER, username=USER, password=PASSWORD, verify_ssl=False):
        #id = uuid.uuid4()
        #uuid_project = project + '_' + str(id)[:8]

        model_repository.create_project(project=project,
                                        repository='Public',
                                        function='classification'
                                        )

        zipfile = open(ZIP_CHAMPION_FOLDER, 'rb')

        model_repository.import_model_from_zip(model,
                                               project,
                                               file=zipfile
                                               )
        zipfile.close()
        
    return 0

## Setup Variables

In [3]:
#Base
BASE_DIR_PATH = os.getcwd()
DATA_DIR_PATH = os.path.join(BASE_DIR_PATH, '../data')

# Data directories paths
TRAIN_DIR_PATH = os.path.join(DATA_DIR_PATH, 'train')

# Data file paths
TRAIN_DATA_PATH = os.path.join(TRAIN_DIR_PATH, 'train.csv')

# Models directory
MODELS_DIR = os.path.join(BASE_DIR_PATH, '../models')

# Deriverables directory
DELIVERS_DIR = os.path.join(BASE_DIR_PATH, '../deliverables')
CHAMPION_DIR_NAME = 'champion'

# Champion directory
WRK_DIR = os.path.join(DELIVERS_DIR, CHAMPION_DIR_NAME)

In [4]:
# Model Registry Connection

MODEL_REGISTRY_META = read_yaml('./model_registry_config.yaml')
SERVER = MODEL_REGISTRY_META['connection']['server']
USER= 'russasdemo'

print('Please provide User Password:')

PASSWORD = getpass.getpass()

PROJECT_NAME = MODEL_REGISTRY_META['modelrepository_meta']['project_name']
REPOSITORY = MODEL_REGISTRY_META['modelrepository_meta']['repository']

Please provide User Password:


 ········


# 1. Model Governance with SAS Model Manager Registry

In general, SAS Model Manager handles several files to guarantee model governance in the registry. 

For example, in case of pickle model, we have

- Required

    1. requirement.json
    2. score.py
    3. model.pkl
    4. inputVar.json
    5. outputVar.json
    6. ModelProperties.json
    

- Optional

    7. train.py
    8. fileMetadata.json
    9. dmcas_fitstat.json
    10. dmcas_roc
    11. dmcas_lift

Because we're going to deploy on RedHat OpenShift, we jusy need some of them for compliance.

## Create Model Folder with SAS pzmm

### Write requirement.txt

In [5]:
write_requirements(WRK_DIR, 'requirements.txt')

Requirements file created under  /home/jovyan/work/notebooks/../deliverables/champion/requirements.txt


### Write Metadata files

In [6]:
data_train = pd.read_csv(TRAIN_DATA_PATH, sep=',')

TARGET = 'BAD'
PREDICTORS = ['REASON', 'JOB', 'LOAN', 'MORTDUE', 'VALUE', 'YOJ', 'DEROG', 'DELINQ', 'CLAGE', 'NINQ', 'CLNO', 'DEBTINC']

In [7]:
JSONFiles = pzmm.JSONFiles()
#write input.json
JSONFiles.writeVarJSON(data_train[PREDICTORS], isInput=True, jPath=WRK_DIR)

In [8]:
NAMES=['P_BAD0', 'P_BAD1', 'EM_CLASSIFICATION', 'EM_EVENTPROBABILITY']
LABELS=['0', '1']
EVENTPROB=0.5
outputVar = get_output_variables(NAMES, LABELS, EVENTPROB)

#write output.json
JSONFiles.writeVarJSON(outputVar, isInput=False, jPath=WRK_DIR)

In [9]:
MODELNAME = 'Tensorflow_BoostedTreesClassifier'
#write 
JSONFiles.writeModelPropertiesJSON(modelName=MODELNAME,
                                   modelDesc='A Classifier for Tensorflow Boosted Trees models',
                                   targetVariable=TARGET,
                                   modelType='Boosted Tree',
                                   modelPredictors=PREDICTORS,
                                   targetEvent=1,
                                   numTargetCategories=1,
                                   eventProbVar='EM_EVENTPROBABILITY',
                                   jPath=WRK_DIR,
                                   modeler='ivnard')

### Create zip files

In [10]:
# Zip TF variables
TF_SAVEDMODEL_NAME = [file for file in os.listdir(WRK_DIR) if os.path.isdir(os.path.join(WRK_DIR, file))][0]
TF_SAVEDMODEL_PATH = os.path.join(WRK_DIR, TF_SAVEDMODEL_NAME)
print(TF_SAVEDMODEL_PATH)

/home/jovyan/work/notebooks/../deliverables/champion/1603087114


In [11]:
# Zip TF SavedModel format
ZIP_TF_SAVEDMODEL_PATH = zip_folder(TF_SAVEDMODEL_PATH, rmtree=True)
print(ZIP_TF_SAVEDMODEL_PATH)

/home/jovyan/work/deliverables/champion/1603087114.zip


In [12]:
# Zip the entire folder
ZIP_CHAMPION_FOLDER = zip_folder(WRK_DIR)
print(ZIP_CHAMPION_FOLDER)

/home/jovyan/work/deliverables/champion.zip


## Register the Model with SAS sasctl

In [13]:
status = run_model_tracking(PROJECT_NAME, MODELNAME)



In [14]:
if status == 0:
    print(f'{PROJECT_NAME} successfully created!')

sas_modelops_tensorflow_openshift successfully created!
