# Industry Accelerators - Utilities Customer Micro-Segmentation Prediction Models 
### Introduction

Now that we have built the machine learning models and then stored and deployed them using [ibm-watson-machine-learning](http://ibm-wml-api-pyclient.mybluemix.net), we can use the models to score new data.





Before executing this notebook on IBM Cloud, you need to:<br>
1) When you import this project on an IBM Cloud environment, insert a project access token at the top of this notebook as a code cell. <br>
If you do not see the cell above, Insert a project token: Click on **More -> Insert project token** in the top-right menu section and run the cell. <br>

![ws-project.mov](https://media.giphy.com/media/jSVxX2spqwWF9unYrs/giphy.gif)
2) Provide your IBM Cloud API key in the cell below.<br>
3) You can then step through the notebook execution cell by cell, by selecting Shift-Enter. You can also execute the entire notebook by selecting **Cell -> Run All** from the menu.<br>


#### Insert IBM Cloud API key
You can generate your Cloud API key by going to the <a href="https://cloud.ibm.com/iam/apikeys" target="_blank" rel="noopener noreferrer">API Keys section of the Cloud console</a>. From that page, scroll down to the API Keys section, and click Create an IBM Cloud API key. Give your key a name and click Create, then copy the created key and paste it below. 

If you are running this notebook on Cloud Pak for Data on-prem, leave the `ibmcloud_api_key` field blank.

In [2]:
ibmcloud_api_key = ''

In [3]:
try:
    project
except NameError:
    # READING AND WRITING PROJECT ASSETS
    import project_lib
    project = project_lib.Project() 

## Create and Test Scoring Pipeline 
In the first part of the notebook we will:

* Programmatically get the ID's for the deployment space and model deployments that were created in the 1-model-training notebook
* Promote assets required for clustering new data into the deployment space
* Create a deployable function which will take raw data for clustering, prep it into the format required for the models and cluster it
* Deploy the function
* Create the required payload, invoke the deployed function and return clusters


In [4]:
import os
import pandas as pd
import datetime
import json
from ibm_watson_machine_learning import APIClient


if ibmcloud_api_key != '':
    wml_credentials = {
        "apikey": ibmcloud_api_key,
        "url": 'https://' + os.environ['RUNTIME_ENV_REGION'] + '.ml.cloud.ibm.com'
    }
else:
    token = os.environ['USER_ACCESS_TOKEN']
    wml_credentials = {
        "token": token,
        "instance_id" : "openshift",
        "url": os.environ['RUNTIME_ENV_APSX_URL'],
        "version": "3.5"
     }
client = APIClient(wml_credentials)

### User Inputs

Path to the csv file with raw data to be clustered needs to be entered.
The csv file is stored in your IBM Cloud Object Storage. We use the `project_lib` library to find the file and download it.

In [5]:
file_name='Customer Micro-Segmentation Input.csv'

In [6]:
f = open(file_name, 'w+b')
f.write(project.get_file(file_name).getbuffer())
f.close()

### Set up Deployment Space, Deployments and Assets

The following code gets the deployment space and pipeline deployment details which were created in **1-model-training**. 
 We use the space name and deployment names when creating the deployments, as specified below. If multiple deployments within the selected space have the same name, the most recently created deployment is used.

Alternatively, the user can manually enter the space and deployment guid's.

The code also promotes a data asset into the deployment space, specifically, the dataset with raw data for scoring. Promoting this asset into the deployment space makes it available and accessible by the deployed function.

In [7]:
space_name = 'Utilities Customer Micro-Segmentation Space'

deployment_details_dict = {'lifestlye' : 'lifestyle_pipeline_deployment', 'customer_engagement' : 'customer_engagement_pipeline_deployment'}

Get the space we are working in. It can be found using the tag that was hardcoded in **1-model-training**. 

If you would like to use a different space, manually set the `space_id`.

Set the space as the default space for working.

In [8]:

l_space_details = []
l_space_details_created_times = []
for space_details in client.spaces.get_details()['resources']:
    if space_details['entity']['name'] == space_name:
        space_id=space_details['metadata']['id']

# set this space as default space

print("Setting the Default Space ... ")
client.set.default_space(space_id)

Setting the Default Space ... 


'SUCCESS'

Get the deployment id for each pipeline. If there are multiple deployments with the same name in the same space, we take the latest.

In [9]:
pipeline_deployments_dict = {}
for model, deployment_name in deployment_details_dict.items():
    # get the id of the deployments - 
    # if there are multiple deployments with the same name in the same space, we take the latest
    l_deployment_details = []
    l_deployment_details_created_times = []
    for deployment in client.deployments.get_details()['resources']:
        
        if deployment['entity']['name'] == deployment_name:            
                l_deployment_details.append(deployment)
                l_deployment_details_created_times.append(datetime.datetime.strptime(deployment['metadata']['created_at'],  '%Y-%m-%dT%H:%M:%S.%fZ'))

    # get the index of the latest created date from the list and use that to get the deployment_id
    list_latest_index = l_deployment_details_created_times.index(max(l_deployment_details_created_times))
    deployment_id = l_deployment_details[list_latest_index]['metadata']['id']
    
    pipeline_deployments_dict[model] = deployment_id
print("Models and their deployment IDs\n",pipeline_deployments_dict)

Models and their deployment IDs
 {'lifestlye': '7cf9e0e3-4647-4498-9243-d995a676d329', 'customer_engagement': 'c8ef92fa-4495-4f7f-b4bc-d7159ebf76e7'}


Promote the raw data for scoring asset into the deployment space.

In [10]:

dataset_asset_details = client.data_assets.create(file_name, file_path=file_name)
dataset_id = dataset_asset_details['metadata']['guid']

Creating data asset...
SUCCESS


## Create the Deployable Function

Functions can be deployed in Watson Machine Learning in the same way as models. The python client or REST API can be used to send data to the deployed function. Using the deployed function allows us to prepare the data and pass it to the model for scoring, all within the deployed function.

We start off by creating the dictionary of default parameters to be passed to the function. We get the ID of the asset that has been promoted into the deployment space. We also add the deployment ID and space ID into the dictionary.

In [11]:
assets_dict = {'dataset_asset_id' : dataset_id, 'dataset_name' : 'Customer Micro-Segmentation Input.csv'}

In [12]:
# update wml_credentials. After already creating the client using the credentials, the instance_id gets updated to 999
# update the instance_id
#wml_credentials["instance_id"] = "openshift"

ai_parms = {'wml_credentials' : wml_credentials,'space_id' : space_id, 'assets' : assets_dict, 'pipeline_deployment_id' : pipeline_deployments_dict}

### Scoring Pipeline Function

The function below takes a dictionary of raw data to be scored as payload. The pipeline completes the remaining steps in prepping the data, passes the data to the model and returns the prediction of lifestyle segments of the customers and their engagements.

The following rules are required to make a valid deployable function:

* The deployable function must include a nested function named `score`.
* The score function accepts a list.
* The list must include an array with the name `values`.
* The score function must return an array with the name `predictions`, with a list as the value, which in turn contains an array with the name `values`. Example: `{"predictions" : [{'values' : }]}`
* We pass the following into the function: default parameters, credentials and space detail, details of the assets that were promoted into the space, model deployment id.
* The assets are downloaded into the deployment space and imported as variables. Raw data to be scored is then prepared and the function calls the model deployment endpoint to score and return predictions.

In [13]:
def scoring_pipeline(parms=ai_parms):
     
    import pandas as pd
    import os

    
    from ibm_watson_machine_learning import APIClient
    client = APIClient(parms["wml_credentials"])
    client.set.default_space(parms['space_id'])

    # use the client to download the stored dataset asset and return the path
    dataset_path = client.data_assets.download(parms['assets']['dataset_asset_id'], parms['assets']['dataset_name'])
    df_raw = pd.read_csv(dataset_path)
    #df_prep_dict=parms['assets']
    #df_raw=pd.DataFrame.from_dict(df_prep_dict)
    
    def prep_data(cust_ids, user_inputs):  
        # filter data to only include customers we are scoring
        df_prep = df_raw[df_raw[user_inputs['customer_id']].isin(cust_ids)]

        # the order that the customer ids appear in the dataframe may not be in the same order as they were provided in the list in the payload
        # reorder the dataframe so that the records are in the order of customer ids passed in the payload
        # we do this by setting the customer id column as the index, then reindexing based on the list provided and resetting the index
        df_prep.set_index(user_inputs['customer_id'], inplace=True)
        df_prep = df_prep.reindex(cust_ids).reset_index()
        
        # add additional columns that were created as part of the data prep
        df_prep['NUMBER_OF_QUESTIONS_ANSWERED']=df_prep[user_inputs['survey_cols_to_summarize']].sum(axis=1)
        df_prep['NUMBER_OF_QUESTIONS_ANSWERED_CAT'] = 'TWO OR THREE'
        df_prep.loc[df_prep['NUMBER_OF_QUESTIONS_ANSWERED']<=1, 'NUMBER_OF_QUESTIONS_ANSWERED_CAT'] = 'ONE OR LESS'
        df_prep.loc[df_prep['NUMBER_OF_QUESTIONS_ANSWERED']>=4, 'NUMBER_OF_QUESTIONS_ANSWERED_CAT'] = 'FOUR OR MORE'
        df_prep['ENERGY_SAVING']=df_prep[user_inputs['energy_usage_cols']].apply(lambda row: (row.iloc[1]-row.iloc[0])/row.iloc[0]*100, axis=1)
                
        return df_prep
    
    def score(payload):
        
        cust_ids = payload['input_data'][0]['values']
        
        # create variable for the deployment id dictionary
        pipeline_deployment_id_dict = parms['pipeline_deployment_id']
        # we stored the user inputs when we deployed each pipeline, as the dictionary is the same in both, just retrieve from first deployment
        user_inputs_dict = client.deployments.get_details(next(iter(pipeline_deployment_id_dict.values())))['entity']['custom']
        # call the function to prep the data
        prepped_data = prep_data(cust_ids, user_inputs_dict)

        # loop through each pipeline and return the cluster assignment
        results={}
        for pipeline_name, deployment_id in pipeline_deployment_id_dict.items():
            if prepped_data is None:
                return {"predictions" : [{'values' : 'Data prep filtered out customer data. Unable to score. Check that the billing date is valid for the input data.'}]}
            elif prepped_data.shape[0] == 0:
                return {"predictions" : [{'values' : 'Data prep filtered out customer data. Unable to score. Check that the billing date is valid for the input data.'}]}
            else:
                print(prepped_data)
                scoring_payload = {"input_data":  [{ "values" : prepped_data.values.tolist()}]}
                predictions = client.deployments.score(deployment_id, scoring_payload)
                # extract cluster from prediction
                # in case of customer engagement, where we used kmeans, we increment cluster number by 1 so clusters start at 1 instead of 0
                l_cluster = []
                for cluster_num in predictions['predictions'][0]['values']:
                    cluster = cluster_num[0]
                    if pipeline_name == 'customer_engagement':
                        cluster = cluster + 1
                        
                    l_cluster.append(cluster)
                
                results[pipeline_name] = l_cluster
                
        # add customer id list to the dictionary
        results['cust_ids'] = list(prepped_data[user_inputs_dict['customer_id']])
        
        return {"predictions" : [{'values' : results}]}
            
    return score

### Deploy the Function

The user can specify the name of the function and deployment in the code below. As we have previously seen, we use tags in the metadata to allow us to programmatically identify the deployed function.

In [14]:
# store the function and deploy it 
function_name = 'utilities_customer_micro_segmentation_scoring_pipeline_function'
function_deployment_name = 'utilities_customer_micro_segmentation_scoring_pipeline_function_deployment'


The Software Specification refers to the runtime used in the Notebook, WML training and WML deployment. We use the `default_py3.7` software specification to store the function. We get the ID of the software specification and include it in the metadata when storing the function. Available Software specifications can be retrieved using `client.software_specifications.list()`.


In [15]:

software_spec_id = client.software_specifications.get_id_by_name("default_py3.7")

In [16]:
# add the metadata for the function and deployment    
meta_data = {
    client.repository.FunctionMetaNames.NAME : function_name,
   # client.repository.FunctionMetaNames.TAGS : ['utilities_customer_micro_segmentation_scoring_pipeline_function_tag'],
    client.repository.FunctionMetaNames.SOFTWARE_SPEC_UID: software_spec_id

}

function_details = client.repository.store_function(meta_props=meta_data, function=scoring_pipeline)


In [17]:

function_id = function_details["metadata"]["id"]

meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: function_deployment_name,
   client.deployments.ConfigurationMetaNames.TAGS : ['utilities_customer_micro_segmentation_scoring_pipeline_function_deployment_tag'],
    client.deployments.ConfigurationMetaNames.ONLINE: {}
}

# deploy the function
function_deployment_details = client.deployments.create(artifact_uid=function_id, meta_props=meta_props)
function_deployment_details



#######################################################################################

Synchronous deployment creation for uid: '4911e9a9-f693-4b62-90e5-14bab2da7a1b' started

#######################################################################################


initializing.....
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='df50df37-c84d-4460-a61c-339b5109978d'
------------------------------------------------------------------------------------------------




{'entity': {'asset': {'id': '4911e9a9-f693-4b62-90e5-14bab2da7a1b'},
  'custom': {},
  'deployed_asset_type': 'function',
  'hardware_spec': {'id': 'Not_Applicable', 'name': 'XS', 'num_nodes': 1},
  'name': 'utilities_customer_micro_segmentation_scoring_pipeline_function_deployment',
  'online': {},
  'space_id': '7b75747c-1b87-4fd0-8287-e90e8b29396d',
  'status': {'online_url': {'url': 'https://eu-de.ml.cloud.ibm.com/ml/v4/deployments/df50df37-c84d-4460-a61c-339b5109978d/predictions'},
   'state': 'ready'}},
 'metadata': {'created_at': '2021-05-28T16:22:26.924Z',
  'id': 'df50df37-c84d-4460-a61c-339b5109978d',
  'modified_at': '2021-05-28T16:22:26.924Z',
  'name': 'utilities_customer_micro_segmentation_scoring_pipeline_function_deployment',
  'owner': 'IBMid-550003B08R',
  'space_id': '7b75747c-1b87-4fd0-8287-e90e8b29396d',
  'tags': ['utilities_customer_micro_segmentation_scoring_pipeline_function_deployment_tag']}}

### Score New Data

Get the guid of the deployed function, create the payload and use the python client to score the data. The deployed function returns Lifestyle and Customer Engagement clusters for the customer.

The payload contains the ID of the customer who we would like to find the clusters for.

In [18]:
scoring_deployment_id = client.deployments.get_uid(function_deployment_details)

payload = [{"values" : [21, 22, 23, 24, 25, 26]}]

payload_metadata = {client.deployments.ScoringMetaNames.INPUT_DATA: payload}
# score
funct_output = client.deployments.score(scoring_deployment_id, payload_metadata)
funct_output

{'predictions': [{'values': {'lifestlye': [3, 2, 2, 2, 2, 1],
    'customer_engagement': [1, 3, 2, 5, 4, 5],
    'cust_ids': [21, 22, 23, 24, 25, 26]}}]}

**Follow the instructions from Readme to launch the R-Shiny Dashboard**

<hr>

Sample Materials, provided under <a href="https://github.com/IBM/Industry-Accelerators/blob/master/CPD%20SaaS/LICENSE" target="_blank" rel="noopener noreferrer">license.</a> <br>
Licensed Materials - Property of IBM. <br>
© Copyright IBM Corp. 2020, 2021. All Rights Reserved. <br>
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. <br>