# Create and Test Scoring Pipeline and Deploy R Shiny Dashboard App

### Introduction

Now that we have built the machine learning models, stored and deployed them, we can use the models to cluster new data. 

In the first part of the notebook we will:

* Programmatically get the ID's for the deployment space and model deployments that were created in the 1-model_training notebook.
* Promote assets required for clustering new data into the deployment space.
* Create a deployable function which will take raw data for clustering, prep it into the format required for the models and cluster it.
* Deploy the function.
* Create the required payload, invoke the deployed function and return clusters.

In the second part we will:
* Store Shiny assets into the same deployment space.
* Deploy Shiny assets as an app and view the dashboard

**Sample Materials, provided under license. <br>
Licensed Materials - Property of IBM. <br>
© Copyright IBM Corp. 2020. All Rights Reserved. <br>
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. <br>**

In [1]:
import os
import pandas as pd
import datetime
import json
from ibm_watson_machine_learning import APIClient
import os

token = os.environ['USER_ACCESS_TOKEN']

wml_credentials = {
   "token": token,
   "instance_id" : "openshift",
   "url": os.environ['RUNTIME_ENV_APSX_URL'],
   "version": "3.5"
}

client = APIClient(wml_credentials)

### User Inputs

Enter the path to the csv file with raw data to be clustered.

In [2]:
# specify the location of the csv file with raw data that we would like to score for
dataset_loc = '/project_data/data_asset/Customer Micro-Segmentation Input.csv'
dataset_name = os.path.basename(dataset_loc)

### Set up Deployment Space, Deployments and Assets

The following code programmatically gets the deployment space and the pipeline deployment details which were created in **1-model_training**. 
 We use the space name and deployment names when creating the deployments as specified below. If multiple deployments within the selected space have the same tag, the most recently created deployment is used.

Alternatively, the user can manually enter the space and deployment guid's.

The code also promotes a data asset into the deployment space, specifically, the dataset with raw data for scoring. By promoting this asset into the deployment space, it is available and can be accessed by the deployed function.

In [3]:
space_name = 'Utilities Customer Micro-Segmentation Space'

deployment_details_dict = {'lifestlye' : 'lifestyle_pipeline_deployment', 'customer_engagement' : 'customer_engagement_pipeline_deployment'}

Get the space we are working in, which is found using the tag that was hardcoded in **1-model_training**. 

If the user would like to use a different space, manually set the space_id.

Set the space as the default space for working.

In [4]:
l_space_details = []
l_space_details_created_times = []
for space_details in client.spaces.get_details()['resources']:
    if space_details['entity']['name'] == space_name:
        space_id=space_details['metadata']['id']

# set this space as default space
client.set.default_space(space_id)

'SUCCESS'

Get the deployment id for each pipeline. If there are multiple deployments with the same name in the same space, we take the latest.

In [5]:
pipeline_deployments_dict = {}
for model, deployment_name in deployment_details_dict.items():
    # get the id of the deployments - 
    # if there are multiple deployments with the same name in the same space, we take the latest
    l_deployment_details = []
    l_deployment_details_created_times = []
    for deployment in client.deployments.get_details()['resources']:
        

        if deployment['entity']['name'] == deployment_name:            
                l_deployment_details.append(deployment)
                l_deployment_details_created_times.append(datetime.datetime.strptime(deployment['metadata']['created_at'],  '%Y-%m-%dT%H:%M:%S.%fZ'))

    # get the index of the latest created date from the list and use that to get the deployment_id
    list_latest_index = l_deployment_details_created_times.index(max(l_deployment_details_created_times))
    deployment_id = l_deployment_details[list_latest_index]['metadata']['id']
    
    pipeline_deployments_dict[model] = deployment_id

Promote the raw data for scoring asset into the deployment space.

In [6]:
dataset_asset_details = client.data_assets.create(dataset_name, file_path=dataset_loc)
dataset_id = dataset_asset_details['metadata']['guid']

Creating data asset...
SUCCESS


## Create the Deployable Function

Functions can be deployed in Watson Machine Learning in the same way models can be deployed. The python client or REST API can be used to send data to the deployed function. Using the deployed function allows us to prepare the data and pass it to the model for scoring all within the deployed function.

We start off by creating the dictionary of default parameters to be passed to the function. We get the ID of the asset that has been promoted into the deployment space. We also add the deployment ID and space ID into the dictionary.

In [7]:
assets_dict = {'dataset_asset_id' : dataset_id, 'dataset_name' : dataset_name}

In [8]:
# update wml_credentials. After already creating the client using the credentials, the instance_id gets updated to 999
# update the instance_id
wml_credentials["instance_id"] = "openshift"

ai_parms = {'wml_credentials' : wml_credentials,'space_id' : space_id, 'assets' : assets_dict, 'pipeline_deployment_id' : pipeline_deployments_dict}

### Scoring Pipeline Function

The function below takes a dictionary of raw data to be scored as a payload. The pipeline completes the remaining steps in prepping the data, passes the data to the model and returns the prediction of lifestyle segments of the customers and their engagements.

The following rules are required to make a valid deployable function:

* The deployable function must include a nested function named `score`.
* The score function accepts a list.
* The list must include an array with the name `values`.
* The score function must return an array with the name `predictions`, with a list as the value, which in turn contains an array with the name `values`. Example: `{"predictions" : [{'values' : }]}`
* We pass default parameters into the function, credentials and space detail, details of the assets that were promoted into the space and also the model deployment id.
* The assets are downloaded into the deployment space and imported as variables. The raw data to be scored is then prepared and the function calls the model deployment endpoint to score and return predictions.

In [19]:
def scoring_pipeline(parms=ai_parms):
     
    import pandas as pd
    import os

    
    from ibm_watson_machine_learning import APIClient
    client = APIClient(parms["wml_credentials"])
    client.set.default_space(parms['space_id'])

    # use the client to download the stored dataset asset and return the path
    dataset_path = client.data_assets.download(parms['assets']['dataset_asset_id'], parms['assets']['dataset_name'])
    df_raw = pd.read_csv(dataset_path)
    
    def prep_data(cust_ids, user_inputs):  
        # filter data to only include customers we are scoring
        df_prep = df_raw[df_raw[user_inputs['customer_id']].isin(cust_ids)]

        # the order that the customer ids appear in the dataframe may not be in the same order as they were provided in the list in the payload
        # reorder the dataframe so that the records are in the order of customer ids passed in the payload
        # we do this by setting the customer id column as the index, then reindexing based on the list provided and resetting the index
        df_prep.set_index(user_inputs['customer_id'], inplace=True)
        df_prep = df_prep.reindex(cust_ids).reset_index()
        
        # add additional columns that were created as part of the data prep
        df_prep['NUMBER_OF_QUESTIONS_ANSWERED']=df_prep[user_inputs['survey_cols_to_summarize']].sum(axis=1)
        df_prep['NUMBER_OF_QUESTIONS_ANSWERED_CAT'] = 'TWO OR THREE'
        df_prep.loc[df_prep['NUMBER_OF_QUESTIONS_ANSWERED']<=1, 'NUMBER_OF_QUESTIONS_ANSWERED_CAT'] = 'ONE OR LESS'
        df_prep.loc[df_prep['NUMBER_OF_QUESTIONS_ANSWERED']>=4, 'NUMBER_OF_QUESTIONS_ANSWERED_CAT'] = 'FOUR OR MORE'
        df_prep['ENERGY_SAVING']=df_prep[user_inputs['energy_usage_cols']].apply(lambda row: (row.iloc[1]-row.iloc[0])/row.iloc[0]*100, axis=1)
                
        return df_prep
    
    def score(payload):
        
        cust_ids = payload['input_data'][0]['values']
        
        # create variable for the deployment id dictionary
        pipeline_deployment_id_dict = parms['pipeline_deployment_id']
        # we stored the user inputs when we deployed each pipeline, as the dictionary is the same in both, just retrieve from first deployment
        user_inputs_dict = client.deployments.get_details(next(iter(pipeline_deployment_id_dict.values())))['entity']['custom']
        # call the function to prep the data
        prepped_data = prep_data(cust_ids, user_inputs_dict)

        # loop through each pipeline and return the cluster assignment
        results={}
        for pipeline_name, deployment_id in pipeline_deployment_id_dict.items():
            if prepped_data is None:
                return {"predictions" : [{'values' : 'Data prep filtered out customer data. Unable to score. Check that the billing date is valid for the input data.'}]}
            elif prepped_data.shape[0] == 0:
                return {"predictions" : [{'values' : 'Data prep filtered out customer data. Unable to score. Check that the billing date is valid for the input data.'}]}
            else:
                print(prepped_data)
                scoring_payload = {"input_data":  [{ "values" : prepped_data.values.tolist()}]}
                predictions = client.deployments.score(deployment_id, scoring_payload)
                # extract cluster from prediction
                # in case of customer engagement, where we used kmeans, we increment cluster number by 1 so clusters start at 1 instead of 0
                l_cluster = []
                for cluster_num in predictions['predictions'][0]['values']:
                    cluster = cluster_num[0]
                    if pipeline_name == 'customer_engagement':
                        cluster = cluster + 1
                        
                    l_cluster.append(cluster)
                
                results[pipeline_name] = l_cluster
                
        # add customer id list to the dictionary
        results['cust_ids'] = list(prepped_data[user_inputs_dict['customer_id']])
        
        return {"predictions" : [{'values' : results}]}
            
    return score

### Deploy the Function

The user can specify the name of the function and deployment in the code below. As we have previously seen, we use tags in the metadata to allow us to programmatically identify the deployed function.

In [10]:
# store the function and deploy it 
function_name = 'utilities_customer_micro_segmentation_scoring_pipeline_function'
function_deployment_name = 'utilities_customer_micro_segmentation_scoring_pipeline_function_deployment'


The Software Specification refers to the runtime used in the Notebook, WML training and WML deployment. We use the software specification `default_py3.7` to store the function. We get the ID of the software specification and include it in the metadata when storing the function. Available Software specifications can be retrieved using `client.software_specifications.list()`.


In [11]:

software_spec_id = client.software_specifications.get_id_by_name("default_py3.7")

In [12]:
# add the metadata for the function and deployment    
meta_data = {
    client.repository.FunctionMetaNames.NAME : function_name,
    client.repository.FunctionMetaNames.TAGS : ['utilities_customer_micro_segmentation_scoring_pipeline_function_tag'],
    client.repository.FunctionMetaNames.SOFTWARE_SPEC_UID: software_spec_id

}

function_details = client.repository.store_function(meta_props=meta_data, function=scoring_pipeline)

function_id = function_details["metadata"]["id"]

meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: function_deployment_name,
   client.deployments.ConfigurationMetaNames.TAGS : ['utilities_customer_micro_segmentation_scoring_pipeline_function_deployment_tag'],
    client.deployments.ConfigurationMetaNames.ONLINE: {}
}

# deploy the function
function_deployment_details = client.deployments.create(artifact_uid=function_id, meta_props=meta_props)



#######################################################################################

Synchronous deployment creation for uid: '496b26d9-583a-48e9-a09d-621f5dc13ec5' started

#######################################################################################


initializing.......
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='08c8a95e-d356-4333-8b8a-4dc8d35e019d'
------------------------------------------------------------------------------------------------




### Score New Data

Get the guid of the deployed function, create the payload and use the python client to score the data. The deployed function returns Lifestyle and Customer Engagement clusters for the customer.

The payload contains the ID of the customer who we would like to find the clusters for.

In [13]:
scoring_deployment_id = client.deployments.get_uid(function_deployment_details)

payload = [{"values" : [21, 22, 23, 24, 25, 26]}]

payload_metadata = {client.deployments.ScoringMetaNames.INPUT_DATA: payload}
# score
funct_output = client.deployments.score(scoring_deployment_id, payload_metadata)
funct_output

{'predictions': [{'values': {'lifestlye': [3, 2, 2, 2, 2, 1],
    'customer_engagement': [1, 3, 2, 5, 4, 5],
    'cust_ids': [21, 22, 23, 24, 25, 26]}}]}

# Deploy Shiny App

In this section we will complete the steps to deploy a Shiny Dashboard in Cloud Pak for Data. The app can be deployed in a similar way to models and functions, using the [ibm-watson-machine-learning](http://ibm-wml-api-pyclient.mybluemix.net/) package.

All of the files associated with the dashboard are contained in a zip file which is stored in data assets. If the user would like to make changes to the dashboard, they can download the zip from data assets and upload it in the RStudio IDE. 

In [14]:
r_shiny_deployment_name='Utilities_Customer_Micro-Segmentation_Shiny_App'

### Store the App

Create the associated metadata and store the dashboard zip file in the deployment space. 

In [15]:
# Meta_props to store assets in space 
meta_props = {
    client.shiny.ConfigurationMetaNames.NAME: "Utilities_Customer_Micro-Segmentation_Shiny_assets",
    client.shiny.ConfigurationMetaNames.DESCRIPTION: 'Store shiny assets in deployment space' # optional
}
app_details = client.shiny.store(meta_props, '/project_data/data_asset/Utilities-Customer-Micro-Segmentation-Analytics-Dashboard.zip')

Creating Shiny asset...
SUCCESS


### Deploy the App

Create the metadata for the Shiny deployment by providing  name, description, R-Shiny options and Hardware specifications. R-Shiny configuration provides options on whom you want to share the dashboard with, they are 1) anyone with the link 2) Authenticated users 3) Collaborators in this deployment space

In [21]:
# Deployment metadata.
deployment_meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: r_shiny_deployment_name,
    client.deployments.ConfigurationMetaNames.DESCRIPTION: 'Deploy Utilities Customer Micro-Segmentation dashboard',
    client.deployments.ConfigurationMetaNames.R_SHINY: { 'authentication': 'anyone_with_url' },
    client.deployments.ConfigurationMetaNames.HARDWARE_SPEC: { 'name': 'S', 'num_nodes': 1}
}

# Create the deployment.
app_uid = client.shiny.get_uid(app_details)
rshiny_deployment = client.deployments.create(app_uid, deployment_meta_props)



#######################################################################################

Synchronous deployment creation for uid: '0813cd18-e9b0-49aa-b8c7-1d792fdc7b77' started

#######################################################################################


initializing.......
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='3cc5c460-a57c-49b0-8582-beb2d018d218'
------------------------------------------------------------------------------------------------




### Launch Shiny App
Now that the dashboard is deployed, it can be accessed through the web browser. The app URL can be found by navigating to the deployed app in the deployment space. 

Open the Navigation Menu, select ***Deployments -> Spaces -> Utilities Customer Micro-Segmentation Space -> Deployments -> Utilities_Customer_Micro-Segmentation_Shiny_App*** to find the dashboard URL.

Alternatively, the path for the app URL can be found from the deployment metadata created in the previous cell. This path should be appended to the user's Cloud Pak for Data hostname to get the complete app URL. To get the path, run the cell below:

In [22]:
print("{HOSTNAME}"+"/ml/v4/deployments/"+rshiny_deployment['metadata']['id'] + '/r_shiny')

{HOSTNAME}/ml/v4/deployments/3cc5c460-a57c-49b0-8582-beb2d018d218/r_shiny
