# Improve customer communication using generative AI and DataRobot

Author: abdul.jilani@datarobot.com

## Summary

A crucial part of the machine learning life cycle is the effective consumption of the prediction results by the end users. A good machine learning model provides not only the prediction, but also  auxillary information like prediction explanations and the prediction threshold. This additional information is crucial for interpreting and recommending subsequent actions for the user consuming the model predictions. However, all this information is technical in nature and an end user not familiar with this information might not be able to utilize its full potential.

This notebook aims to provide an example of how generative AI models like GPT-3 can be used to augment predictions and provide customer friendly subject matter expert responses. The example chosen for this notebook shows how a generative AI model can provide positive user communication to adverse events like loan rejection that is predicted by the model. The generative AI model provides expert advice that is tailored for each individual loan applicant based on the <a href='https://docs.datarobot.com/en/docs/modeling/analyze-models/understand/pred-explain/predex-overview.html'>prediction explanations</a> provided by DataRobot. 

Positive and engaging customer communication is a key factor for customer success for organizations and DataRobot along with Large Language Model can provide highly tailored, expert level customer communication.

- The dataset used in this notebook is the Lending Club dataset. This dataset can be used to build machine learning models that ingest various features of a loan application and infer if the applicant will default on the loan if approved. 
- This notebook will use the prediction explanations from the loan default model and use generative AI to provide positive and domain-expert-level responses to loan applicants whose loans applications have been ejected by the model. 
- The notebook assumes data is uploaded and available in DataRobot's AI Catalog.
- The notebook assumes that you have an API key for OpenAI systems. This method can be used for generative text AI models similar to GPT-3.

## Setup

### Import libraries

In [1]:
import pandas as pd
import datarobot as dr
import yaml

### Configure connections to DataRobot and OpenAI

In [2]:
with open("./settings.yaml", 'r') as stream:
    config = yaml.safe_load(stream)

In [3]:
dr.Client(endpoint=config['endpoint'], 
          token=config['token'])

<datarobot.rest.RESTClientObject at 0x7fb8a3162520>

In [4]:
import openai
openai.api_key = config['openai_key']

## Retrieve a loan default project

A loan default model is already built and a deployment has been created for making predictions on the recommended DataRobot model. Please use this <a href='https://community.datarobot.com/t5/ai-accelerators/end-to-end-workflows-with-datarobot-and-aws/td-p/15985'>tutorial</a> to create the project and deployment. The dataset used for this project is available <a href='https://s3.amazonaws.com/datarobot_public_datasets/10K_Lending_Club_Loans.csv'>here</a>.

In [5]:
projectID = '64a63925cdbc0e8191b96bb0'
project = dr.Project.get(projectID)
project

Project(Project Big Query)

In [6]:
DEPLOYMENT_ID = '64a63eaaccaae422aae17bbf'
deployment = dr.Deployment.get(DEPLOYMENT_ID)
deployment

Deployment(is_bad Predictions)

In [7]:
df_inference_id = '64954b1d2ec1de1758d5bb07'
df_inference_dataset = dr.Dataset.get(df_inference_id)
df_inference_dataset

Dataset(name='gcp-demo-390701-Demo-10K_Lending_club-2023-06-23T07:34:52.472Z', id='64954b1d2ec1de1758d5bb07')

## Make predictions from inference data

The following cells illustrate the process of making predictions from inference data and filtering the negative class predictions which have to be communicated to the loan applicants. 

In [8]:
# Set the number of top prediction explanations to extract from DataRobot
# This will also be used as the number of bullet points in the prompt response from the LLM
n_explanations = 3

In [9]:
job = dr.BatchPredictionJob.score(
        deployment=DEPLOYMENT_ID,
        intake_settings={
            'type': 'dataset',
            'dataset': df_inference_dataset
        },
        output_settings={
            'type': 'localFile',
            'path': './prediction.csv',
        },
        max_explanations=n_explanations
    )

In [10]:
predictions = pd.read_csv('./prediction.csv')
predictions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 18 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   is_bad_1_PREDICTION                 10000 non-null  float64
 1   is_bad_0_PREDICTION                 10000 non-null  float64
 2   is_bad_PREDICTION                   10000 non-null  int64  
 3   THRESHOLD                           10000 non-null  float64
 4   POSITIVE_CLASS                      10000 non-null  int64  
 5   EXPLANATION_1_FEATURE_NAME          10000 non-null  object 
 6   EXPLANATION_1_STRENGTH              10000 non-null  float64
 7   EXPLANATION_1_ACTUAL_VALUE          9762 non-null   object 
 8   EXPLANATION_1_QUALITATIVE_STRENGTH  10000 non-null  object 
 9   EXPLANATION_2_FEATURE_NAME          10000 non-null  object 
 10  EXPLANATION_2_STRENGTH              10000 non-null  float64
 11  EXPLANATION_2_ACTUAL_VALUE          9680 n

In [11]:
rejections = predictions[predictions.is_bad_PREDICTION==1]
rejections.shape

(34, 18)

## Response generation

Once the negative outcome records are available, use Generative AI models like GPT-3 to consume prediction explanations and generate responses for communication. This demo uses OpenAI's ChatGPT, but the approach can be used on similar LLM models. The prompt structure and completion functions are inspired from Andrew Ng's course on <a href='https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/'>Prompt Engineering.</a>

In [12]:
max_token_size = 4097
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, 
    )
    return response.choices[0].message["content"]

The function below takes the prediction explanations from DataRobot and applies domain knowledge to convert highly technical information into user communication. 

In [46]:
def provide_rejection_advice(sample_1,  n_explanations):
    sample_1.fillna('not available', inplace = True)
    explanation_string = sample_1.EXPLANATION_1_FEATURE_NAME.iloc[0] + ' is ' \
    + str(sample_1.EXPLANATION_1_ACTUAL_VALUE.iloc[0]) + ', ' \
    + sample_1.EXPLANATION_2_FEATURE_NAME.iloc[0] + ' is ' \
    + str(sample_1.EXPLANATION_2_ACTUAL_VALUE.iloc[0]) + ', ' \
    + sample_1.EXPLANATION_3_FEATURE_NAME.iloc[0] + ' is ' \
    + str(sample_1.EXPLANATION_3_ACTUAL_VALUE.iloc[0]) + ', '
    explanation_string = explanation_string.replace('loan_amnt','loan amount')\
    .replace('emp_length', 'employment tenure')\
    .replace('inq_last_6mths', 'number of customer inquiries for loan in last 6 months')\
    .replace('emp_title', 'employee designation')
    
    prompt = 'You are a telephonic loan sales representative. Based on the model prediction of loan rejection for a customer due to the following reasons "'+\
    explanation_string+\
    '", please provide a positive sentiment reply to the customer with ' + str(n_explanations) \
    + ' of the most urgent steps to improve the chances of loan approval. Do not mention about any models or predictions in the response.'
    response = get_completion(prompt)
    return prompt, response

## Outcome

In the below examples, it is evident that marrying DataRobot's prediction explanations with LLM's like GPT-3/4 provides a superior customer experience. This also reduces the effort on SMEs and Domain experts in an organization and improves their productivity.

In [47]:
sample_1 = rejections.sample(1)
sample_1

Unnamed: 0,is_bad_1_PREDICTION,is_bad_0_PREDICTION,is_bad_PREDICTION,THRESHOLD,POSITIVE_CLASS,EXPLANATION_1_FEATURE_NAME,EXPLANATION_1_STRENGTH,EXPLANATION_1_ACTUAL_VALUE,EXPLANATION_1_QUALITATIVE_STRENGTH,EXPLANATION_2_FEATURE_NAME,EXPLANATION_2_STRENGTH,EXPLANATION_2_ACTUAL_VALUE,EXPLANATION_2_QUALITATIVE_STRENGTH,EXPLANATION_3_FEATURE_NAME,EXPLANATION_3_STRENGTH,EXPLANATION_3_ACTUAL_VALUE,EXPLANATION_3_QUALITATIVE_STRENGTH,DEPLOYMENT_APPROVAL_STATUS
3869,0.540126,0.459874,1,0.5,1,int_rate,1.300985,0.2248,+++,term,0.36881,60 months,++,sub_grade,0.272515,G2,++,APPROVED


In [48]:
# please replace n_explanations with a lower number if you want to reduce the amount of
# text in the response.
prompt, loan_rejection_advice = provide_rejection_advice(sample_1, n_explanations)
print(prompt)
print('=====================')
print(loan_rejection_advice)

You are a telephonic loan sales representative. Based on the model prediction of loan rejection for a customer due to the following reasons "int_rate is 0.2248, term is  60 months, sub_grade is G2, ", please provide a positive sentiment reply to the customer with 3 of the most urgent steps to improve the chances of loan approval. Do not mention about any models or predictions in the response.
Dear Customer,

Thank you for considering our loan services. We understand that you may have concerns about the loan approval process, and we appreciate the opportunity to address them. While we cannot guarantee approval, we can certainly provide you with some steps that may improve your chances:

1. Improve your credit score: Lenders often consider credit history as a crucial factor in loan approval. To enhance your chances, focus on paying your bills on time, reducing outstanding debts, and maintaining a low credit utilization ratio. This will demonstrate your financial responsibility and increa

In [17]:
sample_2 = rejections.sample(1)
sample_2

Unnamed: 0,is_bad_1_PREDICTION,is_bad_0_PREDICTION,is_bad_PREDICTION,THRESHOLD,POSITIVE_CLASS,EXPLANATION_1_FEATURE_NAME,EXPLANATION_1_STRENGTH,EXPLANATION_1_ACTUAL_VALUE,EXPLANATION_1_QUALITATIVE_STRENGTH,EXPLANATION_2_FEATURE_NAME,EXPLANATION_2_STRENGTH,EXPLANATION_2_ACTUAL_VALUE,EXPLANATION_2_QUALITATIVE_STRENGTH,EXPLANATION_3_FEATURE_NAME,EXPLANATION_3_STRENGTH,EXPLANATION_3_ACTUAL_VALUE,EXPLANATION_3_QUALITATIVE_STRENGTH,DEPLOYMENT_APPROVAL_STATUS
2105,0.543579,0.456421,1,0.5,1,int_rate,0.692471,0.1991,+++,emp_title,0.276402,,++,term,0.276129,60 months,++,APPROVED


In [49]:
prompt, loan_rejection_advice = provide_rejection_advice(sample_2, n_explanations)
print(prompt)
print('=====================')
print(loan_rejection_advice)

You are a telephonic loan sales representative. Based on the model prediction of loan rejection for a customer due to the following reasons "int_rate is 0.1991, employee designation is not available, term is  60 months, ", please provide a positive sentiment reply to the customer with 3 of the most urgent steps to improve the chances of loan approval. Do not mention about any models or predictions in the response.
Dear Customer,

Thank you for considering our loan services. We appreciate your interest in obtaining a loan from us. While we understand that your recent loan application was not approved, we would like to provide you with some steps that can help improve your chances of loan approval in the future.

1. Improve your credit score: Lenders often consider credit scores as an important factor in loan approval. Maintaining a good credit score by making timely payments, reducing outstanding debts, and avoiding new credit applications can significantly enhance your chances of loan 

In [39]:
sample_3 = rejections[rejections.index==9918].head()
sample_3

Unnamed: 0,is_bad_1_PREDICTION,is_bad_0_PREDICTION,is_bad_PREDICTION,THRESHOLD,POSITIVE_CLASS,EXPLANATION_1_FEATURE_NAME,EXPLANATION_1_STRENGTH,EXPLANATION_1_ACTUAL_VALUE,EXPLANATION_1_QUALITATIVE_STRENGTH,EXPLANATION_2_FEATURE_NAME,EXPLANATION_2_STRENGTH,EXPLANATION_2_ACTUAL_VALUE,EXPLANATION_2_QUALITATIVE_STRENGTH,EXPLANATION_3_FEATURE_NAME,EXPLANATION_3_STRENGTH,EXPLANATION_3_ACTUAL_VALUE,EXPLANATION_3_QUALITATIVE_STRENGTH,DEPLOYMENT_APPROVAL_STATUS
9918,0.502272,0.497728,1,0.5,1,loan_amnt,0.524048,1000,+++,int_rate,0.353008,0.1629,++,inq_last_6mths,0.262294,3,++,APPROVED


In [50]:
prompt, loan_rejection_advice = provide_rejection_advice(sample_3, n_explanations)
print(prompt)
print('=====================')
print(loan_rejection_advice)

You are a telephonic loan sales representative. Based on the model prediction of loan rejection for a customer due to the following reasons "loan amount is 1000, int_rate is 0.1629, number of customer inquiries for loan in last 6 months is 3, ", please provide a positive sentiment reply to the customer with 3 of the most urgent steps to improve the chances of loan approval. Do not mention about any models or predictions in the response.
Dear Customer,

Thank you for considering our loan services. We appreciate your interest in obtaining financial assistance. We understand that loan approval is important to you, and we are here to help you improve your chances. 

To increase the likelihood of loan approval, we recommend focusing on the following three steps:

1. Strengthen your credit history: Maintaining a good credit score is crucial for loan approval. We suggest reviewing your credit report and ensuring that all information is accurate. Paying bills on time, reducing credit card bala

## Conclusion
In this notebook, you can see how you can use Generative AI with DataRobot's prediction explanations to augment predictions and provide customer friendly and subject matter expert-level communication. 