# Recommedation Demo

In this notebook, we will see how to prepare the data for recommendation, upload the data, start training and do inference.

### Install pyjwt library if not already installed

In [55]:
!pip install pyjwt



In [56]:
import os
import pandas as pd
from sklearn.model_selection import train_test_split

%matplotlib inline
import matplotlib.pyplot as plt
import requests
import base64

### Load and prepare data

We have a small dataset of service request tickets on complaints received about financial products and services. We will attempt to build a recommendation model that will recommend similiar service request tickets if we get similar questions (e.g. worded differently, but semantically have same meaning).

The below code block loads the data from file

In [57]:
df = pd.read_csv("../datasets/complaint_clean.csv")

### Let's see the data

In [58]:
df.head()

Unnamed: 0,Product,Sub-product,Issue,Sub-issue,Description,TicketID
0,"Credit reporting, credit repair services, or other personal consumer reports",Other personal consumer report,Improper use of your report,Reporting company used your report improperly,On XXXX XXXX XXXX 2015 Barclays Ba nk Delaware placed a hard Inquiry on my XXXX Credit Report. The report is two yea rs old and needs to be removed and furthermore I never provided authorization for the inquiry. I do not have any open accounts with this company so this inquiry needs to be removed.,8f634167-1466-41b6-903e-ae8406d95232
1,Debt collection,Credit card,Communication tactics,Frequent or repeated calls,I have been receiving numerous calls from Synchrony Bank from XXXX XXXX to XXXX XXXX. They are calling 20 plus times a day. I have repeatedly asked them to stop calling and have explained to them that I will pay them when I have money. I have asked for a cease and desist notification to be put on the account under my social security number and they stated that they will notate the account and then 5 minutes later they call back. I have advised them that the account should be notated and that they need to stop calling.,863efe9f-1452-431e-a5e7-f1756489ec69
2,Student loan,Federal student loan servicing,Dealing with my lender or servicer,Received bad information about my loan,"I have been on economic deferment since I left college and my student loans are through XXXX XXXX XXXX, XXXX and Salle Mae. All of my loans are currently in good standing except for Salle Mae. They sent my loans to a secondary company that I was never able to call or contact. Every number I was given from the collection agencies and from XXXX to reach the secondary agency just lead to a phone number that listed offers for XXXX gift cards and XXXX services and then you could never get to an agent to try and find out why the debt had been sent there. I filed a dispute with a lawyer through XXXX services, but was never contacted again about if anything had come from it. Now there are XXXX loans in collections and they took my last years tax returns. Any attempt to file anything with Salle Mae was always difficult and I filed my documentation through FAFSA. My biggest concern is that SallyeMae had a secondary company it used that was unreachable and then they just had XXXX send my loans to a collection agency although I had filed forms with FAFSA for economic deferment and now they have gone so far the companies will not pull them back and the collection agencies do n't recognize economic deferment. I get countless calls a day from autodialers that do not leave voice mails and my tax returns are going to be taken again next year. Prior to this they in 2008 I was attending school and they were calling me telling me I had to repay my student loans although I was in school and was exempt.",bf9b849b-c1cf-44b8-9f40-4a44a7c7cf86
3,"Credit reporting, credit repair services, or other personal consumer reports",Credit reporting,Improper use of your report,Credit inquiries on your report that you don't recognize,XXXX and Equifax will not remove unauthorized inquiries even when fraud alert was placed on report.,2d6a58b3-efc3-4ea8-9b2c-a3fc93364db0
4,Debt collection,Credit card,False statements or representation,Attempted to collect wrong amount,"XXXX XXXX XXXX XXXX XXXX ( serviced by XXXX XXXX XXXX ) purchased a debt that originated with Citibank South Dakota. Citibank South Dakota was merged with Citibank NA as of XXXX/XXXX/XXXX - coincidently the same time period that the debt shows as being charged off ( XXXX/XXXX/XXXX ). My XXXX report dated XXXX/XXXX/XXXX shows the account being reported with XXXX will fall off on XXXX and XXXX XXXX it shows it will be on my report until XXXX. This debt had a 1st delinquency date in early XXXX, with a charge off date with Citibank South Dakota much earlier than when reported by Citibank NA. I disagreed with the balance that XXXX is reporting, in addition to the dates provided on the reports. I have disputed this debt with all three credit bureaus XXXX, XXXX and XXXX XXXX, not once but twice, with the credit bureaus responding that the debt has been researched and no changes to the report. Then on XXXX/XXXX/XXXX, a new merged credit report was pulled showing all XXXX credit bureaus - the XXXX debt is now reflecting a last delinquent date of XXXX, a last payment date of XXXX, and a last activity as of XXXX. This is completely different information that was being previously reported and/or what is reflected on their letter they provided to me. It appears that they are trying to re-set the date - or at the very most change the time period of how long it will continue to be reported on my credit bureaus. I have attached copies of all documents mentioned in this complaint. I would like XXXX to report correct information to all XXXX credit bureaus, or remove the debt from immediately from all XXXX reports. Thank you for your assistance",58e5c75c-98b3-4876-9f0a-16cf7236e161


### Let's select the input and output mappings for training

The mapping describes which columns in the upload file should be used as sample input and which ones are to be used as the classification output that the model should learn.

In [59]:
input_cols = ['Description']
output_cols = ['TicketID', 'Description']
all_cols = input_cols + output_cols

# STI REST Endpoints

The STI service can be accessed and controlled through REST endpoint.
Documentation can be found in the following link: https://help.sap.com/viewer/product/SERVICE_TICKET_INTELLIGENCE

## Subscription and Authentication

Now we are ready to train a model using the Service Ticket Intelligence API. This requires a valid subscription to the STI API.

Note: Update the values for `service url`, `uaa url`, `client id` and `client secret` in the config file `sti_config.ini`. This config file is placed one directory above this notebook. These values will be available in `service_keys` of your STI instance in the cloud foundry cockpit.

In [60]:
import configparser
from pathlib import Path
import sys

sys.path.append("..")
from sti_functions import STIFunctions, get_connection_object

In [61]:
STI_BASE_DIR = Path.cwd().parent
config_file_path = STI_BASE_DIR / 'sti_config.ini'

connection = get_connection_object(config_file=config_file_path)
sti = STIFunctions(connection)

## List models

Now lets do list model call using this python function to view all the models in this account

In [62]:
sti.list_models()

Returning token from cache
Response time: 1186.0159999999998 ms


{'results': [{'business_object': 'ticket',
   'language': 'en',
   'model_id': ['aada667d0e384a63bf0cad43d88f3b81',
    'f1cead0633a44e32a449cf285b542adf',
    '2eb7e7c1b6684c238083f8a93a140138',
    '522e3730af304c43a25d18b8905d89f9',
    '63689c91793145a780e482682aa525f2',
    'a3177cf70fec4705bfb6839bc0b30636',
    '416c46992bc74920832106eda95255a0',
    '8d1145c3d94a44ceb3d6cae8bdee14a3',
    '8b5a8ef24966450bb8b4c08d6622ef94',
    '3616e94d85d54fd890d2f1bd03ac61fe',
    'fe5b0ba4a99649668b0e9ba36e466044',
    'cc078a539d6a433a92f0ac0a2fb445d2',
    '8d62dbfd0e3043e5a16928190c8f965f',
    'e16b7677bb3d4e339716317854c74864'],
   'type': 'classification'},
  {'business_object': 'ticket',
   'language': 'en',
   'model_id': ['41932214b9a74aa3949824cc527a4fb2',
    '3c086206ddbb4a6081c171ac5ab447d3',
    'e0477c0ec4844fdab0848547e3079b28'],
   'type': 'recommendation'},
  {'business_object': 'ticket',
   'language': 'pt',
   'model_id': ['42cf1bf4074f461cba7831fa19c364c7'],
   'type': 

### Let's check if we need to delete any unused model
Based on the model list above, ensure that the number of models does not exceed 20. Otherwise, we need to delete some unused model.

In [63]:
# sti.delete_model("b3e0da989393450ba4b66e637722034a")

## File upload

This process will take a few minutes to complete depending on the file size. If file upload is successful, the response text will contain a model id - an UUID identifier which we can use as a reference to the uploaded training file.

In [64]:
payload = {
    "scenario": {
        "desc": "Complaint data for recommendation",
        "type": "recommendation",
        "language": "en",
        "business_object": "ticket",
    },
    "mapping": {
        "input": input_cols,
        "output": output_cols
    },
    "training": {
        "file": "{}".format(
            base64.b64encode(df.to_csv(index=False).encode("utf-8")).decode("utf-8")
        )
    },
}
response = sti.file_upload(payload)
payload = {}
our_model_id = response.get("model_id")
response

Returning token from cache
Response time: 91987.321 ms


{'business_object': 'ticket',
 'desc': 'Complaint data for recommendation',
 'disabled_category_class': [],
 'enabled_category_class': [],
 'extracted_input': [{'field': 'Description', 'type': 'str'}],
 'extracted_output': [{'field': 'TicketID'}, {'field': 'Description'}],
 'file_upload_timestamp': '2020-06-09 08:15:37 UTC',
 'incremental': False,
 'incremental_possible': True,
 'language': 'en',
 'model_id': '96fc1022e1094bc696bdc19b6ecac97a',
 'model_status': 'NEW',
 'new_category_class': [],
 'records_count': 9833,
 'status': 0,
 'status_message': 'ok',
 'tabulations_count': 0,
 'type': 'recommendation'}

## Start training on uploaded file

Take the model id from file upload response text and pass it when in starting the model training

In [65]:
# our_model_id = "e44816c7732944288d85b890373eb5f4"
sti.start_model_training(model_id=our_model_id)

Getting new token from https://sti-test.authentication.sap.hana.ondemand.com/oauth/token
New token expires at 2020-06-09 19:08:19
Response time: 1979.402 ms


{'language': 'en',
 'model_id': '96fc1022e1094bc696bdc19b6ecac97a',
 'status': 0,
 'status_message': 'ok'}

## Wait for training to succeed

After starting the model training, do a get model status and check if model status is `READY`

The model status transitions from `NEW` to `PENDING_TRAINING` once training is submitted and will further transition to `IN_TRAINING` and finally `READY` when training succeeds

In [68]:
status = sti.get_model_status(model_id=our_model_id)
print("Model status: {}".format(status.get('model_status')))

Returning token from cache
Response time: 1569.389 ms
Model status: READY


Wait for model status to be `READY` before proceeding to next step. This will take upto 10-20 mins from the training submission time. Repeatedly run the above cell to get the latest model status

Once the model status is `READY` proceed to next step.

## Model recall rate

As this is an unsupervised model, we will retrieve the recall rate once training is completed and status becomes ready.

In [69]:
# our_model_id = "cc078a539d6a433a92f0ac0a2fb445d2"
status = sti.get_model_status(model_id=our_model_id)
status

Returning token from cache
Response time: 1079.6090000000002 ms


{'Recall': [{'language': 'en', 'recall_rate': 100.0},
  {'language': 'de', 'recall_rate': 100.0},
  {'language': 'pl', 'recall_rate': 100.0}],
 'Recall@10': 100.0,
 'business_object': 'ticket',
 'desc': 'Complaint data for recommendation',
 'disabled_category_class': [],
 'enabled_category_class': [],
 'extracted_input': [{'field': 'Description', 'type': 'str'}],
 'extracted_output': [{'field': 'TicketID'}, {'field': 'Description'}],
 'file_upload_timestamp': '2020-06-09 08:15:37 UTC',
 'incremental': False,
 'incremental_possible': True,
 'language': 'en',
 'language_mix': ['en', 'de', 'pl'],
 'model_id': '96fc1022e1094bc696bdc19b6ecac97a',
 'model_status': 'READY',
 'new_category_class': [],
 'records_count': 9833,
 'status': 0,
 'status_message': 'ok',
 'tabulations_count': 0,
 'training_finish_timestamp': '2020-06-09 10:42:28 UTC',
 'training_start_timestamp': '2020-06-09 10:40:35 UTC',
 'training_submit_timestamp': '2020-06-09 10:38:20 UTC',
 'type': 'recommendation'}

## Activate the model

Once you are satisfied with the results, model needs to activated before inference can be run on

In [70]:
sti.activate_model(model_id=our_model_id)

Returning token from cache
Response time: 1163.089 ms


{'model_id': '96fc1022e1094bc696bdc19b6ecac97a',
 'status': 0,
 'status_message': 'ok'}

## Wait for activation

Wait till model status transition from `READY` to `LOADING` to `ACTIVE`

In [71]:
status = sti.get_model_status(model_id=our_model_id)
print("Model status: {}".format(status.get('model_status')))

Returning token from cache
Response time: 1078.79 ms
Model status: ACTIVE


It takes up to 10 - 15 minutes for the model status to transition from `READY` -> `LOADING` -> `ACTIVE`

## Let's send some inference request

Let's pick one of the data from training dataset

In [72]:
df.iloc[100]

Product        Student loan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             

Reword the question and send the inference request. `field` should match that of the input column. Response should closest question which of might be same as our original question

In [73]:
inference_payload = {
    "business_object": "ticket",
    "messages": [],
    "options": {
        "recommendation_top_k": 2
    }
}
inference_payload['messages'].append(
    {
        'id': 1,
        'contents': [
            {
                'field': 'Description',
                'value': 'Accuracy of student loans'
            }
        ]
    }
)
inference_response = sti.recommend(data_payload=inference_payload)
inference_response

Returning token from cache
Response time: 3184.2980000000002 ms


{'results': [{'detected_language': 'en',
   'id': 1,
   'recommendation': [{'score': 0.21,
     'solutions': [{'field': 'TicketID',
       'value': '142d369b-f6c2-4f2a-b218-f780796901a8'},
      {'field': 'Description',
       'value': 'I can not understand why I have to pay American Education Services {$17000.00} for  XXXX  {$3100.00} private student loans. I pay {$81.00} a month, and the balance keeps increasing. I am on  XXXX  and just received a notice from AES that the interest rate has changed, therefore the payment willl increase from {$81.00} per month to {$83.00} and that when the lo an ( s ) is  paid in full, I would have paid almost {$17000.00}. Is this a payday loan?'}]},
    {'score': 0.1,
     'solutions': [{'field': 'TicketID',
       'value': '836b1731-249a-4c7e-a3b5-07347770f9cf'},
      {'field': 'Description',
       'value': 'I received my first student loan more than 30 years ago, and my last disbursement was on XX/XX/1991. I have been paying on them off and on sin

## Deactivate model

We can deactivate any active models here.

In [20]:
# sti.deactivate_model(model_id="")

## Delete model

We can delete any unused models here.

In [21]:
# sti.delete_model(model_id="")