## Create and Test Scoring Pipeline and Deploy R Shiny Dashboard App

### Introduction

Now that we have built the machine learning pipeline, stored and deployed it, we can use the pipeline to ingest new data, prep it and score it. 

In the first part of the notebook we will:

* Programmatically get the ID's for the deployment space and model deployment that were created in the **1-model_training** notebook.
* Create a deployable function which will take raw data for scoring, complete the initial prep, feed it to the pipeline and score it.
* Deploy the function.
* Create the required payload, invoke the deployed function and return predictions.

In the second part we will:
* Store Shiny assets into the same deployment space.
* Deploy Shiny assets as an app and view the dashboard

**Sample Materials, provided under license. <br>
Licensed Materials - Property of IBM. <br>
© Copyright IBM Corp. 2020. All Rights Reserved. <br>
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. <br>**

In [2]:
import os
import pandas as pd
import datetime


from watson_machine_learning_client import WatsonMachineLearningAPIClient

token = os.environ['USER_ACCESS_TOKEN']

wml_credentials = {
   "token": token,
   "instance_id" : "openshift",
   "url": os.environ['RUNTIME_ENV_APSX_URL'],
   "version": "3.0.0"
}


client = WatsonMachineLearningAPIClient(wml_credentials)

# use this library for reading and saving data in CP4D
from project_lib import Project
project = Project()

### Set up Deployment Space, Deployments and Assets

The following code programmatically gets the deployment space and the model deployment details which were created in 1-model_training. 
 We use the space name and default tags that were used when creating the deployments as specified below. If multiple spaces with the same name exist, the code will take the space that was created most recently. Similarly, if multiple deployments within the selected space have the same tag, the most recently created deployment is used.


Alternatively, the user can manually enter the space and deployment guid's.

The code also promotes an asset into the deployment space. Before passing data to the pipeline, we completed one step of prepping the data, we aggregated some categories that had a low number of cases. This step needs to be completed when scoring any new data. We saved the category names that were aggregated out into a json file, `metadata.json`. We promote this asset into the deployment space. By promoting the asset into the deployment space, it is available and can be accessed by the deployed function.

In [3]:

space_name = 'Utilities Customer Attrition Space'
model_tag = 'utilities_attrition_pipeline_tag'
deployment_tag = 'utilities_attrition_deployment_tag'

Get the space we are working in, which is found using the name that was hardcoded in **1-model_training**. If there are multiple spaces with the same name, we take the one that was created most recently. 

If the user would like to use a different space manually set the space_id.

Set the space as the default space for working.

In [4]:
l_space_details = []
l_space_details_created_times = []
for space_details in client.spaces.get_details()['resources']:
    if space_details['entity']['name'] == space_name:
        l_space_details.append(space_details)
        l_space_details_created_times.append(datetime.datetime.strptime(space_details['metadata']['created_at'],  '%Y-%m-%dT%H:%M:%S.%fZ'))
        
# get the index of the latest created date from the list and use that to get the space_id
list_latest_index = l_space_details_created_times.index(max(l_space_details_created_times))
space_id = l_space_details[list_latest_index]['metadata']['guid']
# set this space as default space
client.set.default_space(space_id)

'SUCCESS'

Get the deployment id, again, found using the tags that were hardcoded. If there are multiple deployments with the same tag in the same space, we take the latest.

In [5]:
l_deployment_details = []
l_deployment_details_created_times = []
for deployment in client.deployments.get_details()['resources']:
    if 'tags' in deployment['entity']:
        if deployment['entity']['tags'][0]['value'] == deployment_tag:            
            l_deployment_details.append(deployment)
            l_deployment_details_created_times.append(datetime.datetime.strptime(deployment['metadata']['created_at'],  '%Y-%m-%dT%H:%M:%S.%fZ'))

# get the index of the latest created date from the list and use that to get the deployment_id
list_latest_index = l_deployment_details_created_times.index(max(l_deployment_details_created_times))
deployment_id = l_deployment_details[list_latest_index]['metadata']['guid']

### Create the Deployable Function

Functions can be deployed in Watson Machine Learning in the same way models can be deployed. The python client or REST API can be used to send data to the deployed function. Using the deployed function allows us to prepare the data and pass it to the pipeline for scoring all within the deployed function.

We start off by creating the dictionary of default parameters to be passed to the function. We get the ID of the asset that has been promoted into the deployment space. We also add the model deployment ID and space ID into the dictionary.

In [6]:
# create the wml_credentials again. After already creating the client using the credentials, the instance_id gets updated to 999
# update the value
wml_credentials["instance_id"] = "openshift"

ai_parms = {'wml_credentials' : wml_credentials, 'space_id' : space_id, 'model_deployment_id' : deployment_id}

#### Scoring Pipeline Function

The function below takes a dictionary of raw data to be scored as a payload. Any aggregation on categorical columns that are required is completed before the data is passed to the deployed pipeline. The pipeline completes the remaining steps in prepping the data, passes the data to the model and returns the predicted class and probabilities for attrition.

In [6]:
def scoring_pipeline(parms=ai_parms):
    
    from watson_machine_learning_client import WatsonMachineLearningAPIClient
    client = WatsonMachineLearningAPIClient(parms["wml_credentials"])
    client.set.default_space(parms['space_id'])    

    def score(payload):
        import json
        import requests
        import pandas as pd
     
        extracted_payload = payload['input_data'][0]['values']
        
        # the data passed in from the r shiny app will be in string format
        # convert to json s we can read it into a dataframe
        if isinstance(extracted_payload, str):
            # we need to remove the \ from the string
            extracted_payload = extracted_payload.replace('\\', '')
            extracted_payload = json.loads(extracted_payload)
        
        # create the dataframe from the values and fields that have been passed in the payload
        df = pd.DataFrame(extracted_payload)
        
        l_customer_ids = df['CUSTOMER_ID'].tolist()
        
        metadata_dict = client.deployments.get_details(parms['model_deployment_id'])['entity']['custom']          
        
        grouping_dict = metadata_dict['grouping_cols']
        # loop through each key in the dictionary, which is the name of a column that needs some aggregation 
        for key, value_dict in grouping_dict.items():    
            df[key].replace(value_dict, inplace=True)
            
        # all other prep steps are handled by the pipeline - columns not needed are removed, missing values are replaced
        # get the deployment and score the data      
        scoring_payload = {"input_data":  [{ "values" : df.values.tolist()}]}
        predictions = client.deployments.score(parms['model_deployment_id'], scoring_payload)
        
        # update the predicted class returned based on our threshold
        # by default the predicted class is based on 0.5 probability, we changed this based on ROC curve
        for idx, val in enumerate(predictions['predictions'][0]['values']):
            if predictions['predictions'][0]['values'][idx][1][1] >= metadata_dict['probability_threshold']:
                predictions['predictions'][0]['values'][idx][0] = 1
            else:
                predictions['predictions'][0]['values'][idx][0] = 0
            
            
        return {"predictions" : [{'values' : predictions, 'customer_ids' : l_customer_ids}]}
            
    return score

### Deploy the Function

The user can specify the name of the function and deployment in the code below. As we have previously seen, we use tags in the metadata to allow us to programmatically identify the deployed function.

In [7]:
# store the function and deploy it 
function_name = 'attrition_scoring_pipeline_function'
function_deployment_name = 'attrition_scoring_pipeline_function_deployment'

#### Get the ID of software specification to be used with the function

The Software Specification refers to the runtime used in the Notebook, WML training and WML deployment. It contains details about the runtime platform, framework versions, other packages used and any custom library used in the concerned runtime.

Our notebooks use the `default_py3.6` software specification. When we deploy our function we want it to have the same software specification as the notebooks. We get the ID of the notebook software specification and include it in the metadata when storing the function.

In [8]:
default_software_spec_id = client.software_specifications.get_uid_by_name("default_py3.6")

In [10]:
# add the metadata for the function and deployment    
meta_data = {
    client.repository.FunctionMetaNames.NAME : function_name,
    client.repository.FunctionMetaNames.TAGS : [{'value' : 'utilities_attrition_scoring_pipeline_function_tag'}],
    client.repository.FunctionMetaNames.SOFTWARE_SPEC_UID: default_software_spec_id,
    client.repository.FunctionMetaNames.SPACE_UID: space_id
}

function_details = client.repository.store_function(meta_props=meta_data, function=scoring_pipeline)

function_id = function_details["metadata"]["guid"]

meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: function_deployment_name,
    client.deployments.ConfigurationMetaNames.TAGS : [{'value' : 'utilities_attrition_scoring_pipeline_function_deployment_tag'}],
    client.deployments.ConfigurationMetaNames.ONLINE: {},
    client.deployments.ConfigurationMetaNames.SPACE_UID: space_id
}

# deploy the stored model
function_deployment_details = client.deployments.create(artifact_uid=function_id, meta_props=meta_props)



#######################################################################################

Synchronous deployment creation for uid: '5d79f652-c2fe-4d31-a2cc-1ca2fcf733f2' started

#######################################################################################


initializing......
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='def8ed3e-fbcc-4cb1-9428-7438b1e44338'
------------------------------------------------------------------------------------------------




### Score New Data

To create the payload, we pass a dictionary with raw data as the function payload. For demonstration purposes we will use the same csv file that was used in **1-model_training** notebook as the raw data. We take 5 records and convert them into a dictionary form to be passed to the payload.  

We then get the guid of the deployed function and use the python client to score the data. The deployed function returns the classification prediction along with the probabilities. 

In [11]:
# specify the name of the csv file with raw customer data that we would like to score for
dataset_name = 'Attrition View.csv'

my_file = project.get_file(dataset_name)
my_file.seek(0)
df_raw_data = pd.read_csv(my_file)

# remove the target variable so the data has the same inputs as training data
df_raw_data.drop('ATTRITION_STATUS', axis=1, inplace=True)

In [12]:
payload_input_dict = df_raw_data.head(5).to_dict(orient='records')

Looking at the payload, not all of these fields are used in the model, transformers and pipeline will take care of removing columns that aren't used.

In [13]:
payload_input_dict[1]

{'CUSTOMER_ID': 2,
 'GENDER_ID': 1,
 'FIRST_NAME': 'Ima',
 'LAST_NAME': 'Labadie',
 'PHONE_1': '505-339-5197',
 'EMAIL': 'Ima.Labadie@allie.tv',
 'AGE': 34,
 'ENERGY_USAGE_PER_MONTH': 4970,
 'ENERGY_EFFICIENCY': 0.35600000000000004,
 'IS_REGISTERED_FOR_ALERTS': 0,
 'OWNS_HOME': 1,
 'COMPLAINTS': 1,
 'HAS_THERMOSTAT': 1,
 'HAS_HOME_AUTOMATION': 0,
 'PV_ZONING': 1,
 'WIND_ZONING': 0,
 'SMART_METER_COMMENTS': 'Negative',
 'IS_CAR_OWNER': 1,
 'HAS_EV': 0,
 'HAS_PV': 0,
 'HAS_WIND': 0,
 'TENURE': 11,
 'EBILL': 0,
 'IN_WARRANTY': 1,
 'CITY': 'Mountain View',
 'CURRENT_OFFER': 'Free Energy Audits',
 'CURRENT_CONTRACT': 'Dynamic Pricing 240 minute plan',
 'CURRENT_ISSUE': 'Billing Issue',
 'MARITAL_STATUS': 'U',
 'EDUCATION': "Bachelor's degree",
 'SEGMENT': 'GOLD',
 'EMPLOYMENT': 'Employed full-time',
 'STD_YRLY_USAGE_CUR_YEAR_MINUS_1': 52098,
 'STD_YRLY_USAGE_CUR_YEAR_MINUS_2': 40740,
 'STD_YRLY_USAGE_CUR_YEAR_MINUS_3': 26666,
 'STD_YRLY_USAGE_CUR_YEAR_MINUS_4': 26666,
 'STD_YRLY_USAGE_CUR_Y

In [14]:
scoring_deployment_id = client.deployments.get_uid(function_deployment_details)

payload = [{'values' : payload_input_dict}]

payload_metadata = {client.deployments.ScoringMetaNames.INPUT_DATA: payload}
# score
funct_output = client.deployments.score(scoring_deployment_id, payload_metadata)
funct_output

{'predictions': [{'values': {'predictions': [{'fields': ['prediction',
       'probability'],
      'values': [[0, [0.8085850664760637, 0.19141493352393651]],
       [1, [0.2273804809663913, 0.7726195190336088]],
       [0, [0.780502232264842, 0.21949776773515772]],
       [1, [0.3863743779099732, 0.6136256220900268]],
       [1, [0.604246298971217, 0.39575370102878316]]]}]},
   'customer_ids': [1, 2, 3, 4, 5]}]}

# Deploy Shiny App

In this section we will complete the steps to deploy a Shiny Dashboard in Cloud Pak for Data. The app can be deployed in a similar way to models and functions, using the watson_machine_learning_client package.

All of the files associated with the dashboard are contained in a zip file which is stored in data assets. If the user would like to make changes to the dashboard, they can download the zip from data assets and upload it in the RStudio IDE. 

In [7]:
r_shiny_deployment_name='Utilities_Customer_Attrition_Shiny_App'

### Store the App

Create the associated metadata and store the dashboard zip file in the deployment space. 

In [9]:
# Meta_props to store assets in space 
meta_props = {
    client.shiny.ConfigurationMetaNames.NAME: "Utilities_Customer_Attrition_Shiny_assets",
    client.shiny.ConfigurationMetaNames.DESCRIPTION: 'Store shiny assets in deployment space' # optional
}
app_details = client.shiny.store(meta_props, '/project_data/data_asset/utilities-customer-attrition-prediction-analytics-dashboard.zip')

Creating Shiny asset...
SUCCESS


### Deploy the App

Create the metadata for the Shiny deployment by providing  name, description, R-Shiny options and Hardware specifications. R-Shiny configuration provides options on whom you want to share the dashboard with, they are 1) anyone with the link 2) Authenticated users 3) Collaborators in this deployment space

In [10]:
# Deployment metadata.
deployment_meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: r_shiny_deployment_name,
    client.deployments.ConfigurationMetaNames.DESCRIPTION: 'Deploy Utilities Customer Attrition dashboard',
    client.deployments.ConfigurationMetaNames.R_SHINY: { 'authentication': 'anyone_with_url' },
    client.deployments.ConfigurationMetaNames.HARDWARE_SPEC: { 'name': 'S', 'num_nodes': 1}
}

# Create the deployment.
app_uid = client.shiny.get_uid(app_details)
rshiny_deployment = client.deployments.create(app_uid, deployment_meta_props)



#######################################################################################

Synchronous deployment creation for uid: 'b50cc4b6-0540-4bb5-9b0e-20916f5f2fa1' started

#######################################################################################


initializing......
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='e50884be-3630-45b9-9be1-d8fc98d9fc1d'
------------------------------------------------------------------------------------------------




### Launch Shiny App
Now that the dashboard is deployed, it can be accessed through the web browser. The app URL can be found by navigating to the deployed app in the deployment space. 

Open the Navigation Menu, under ***Analytics*** select ***Analytics deployments -> Utilities Customer Attrition Space -> Deployments -> Utilities_Customer_Attrition_Shiny_App*** to find the dashboard URL.

Alternatively, the path for the app URL can be found from the deployment metadata created in the previous cell. This path should be appended to the user's Cloud Pak for Data hostname to get the complete app URL. To get the path, run the cell below:

In [11]:
print("{HOSTNAME}"+rshiny_deployment['metadata']['href'] + '/r_shiny')

{HOSTNAME}/v4/deployments/e50884be-3630-45b9-9be1-d8fc98d9fc1d/r_shiny
