# Exploring the benefits of Lead Scoring
This notebook explore the benefits of Lead Scoring by comparing the converstion rate and return on investment for calling 20% of prospects on a contact list. Two approaches are compared: 
* Unscored: Take a random selection of 20% of the contact list
* Scored: Score each prospect using a simple machine learning model and call the top 20%

The contact list is taken from a real world banking example. The [data](https://archive.ics.uci.edu/ml/datasets/Bank+Marketing) contains 41k CRM records from a Portugues bank during the year 2010. 

## Measuring the benefits of lead scoring
To measure the benefits of lead scoring, we define a function to calculate the Conversion Rate 

$CVR =  \frac{sale}{calls} $ 

and another to calculate Return on Investment 

$ROI = \frac{profit - cost}{cost} $ 

where $profit = unitProfit*sales$ and $cost = unitCost*calls$

We have assumed the unit cost is \$5 and unit profit \$45.  These are very rought estimates and have a big effect on ROI and need to be future refined

In [1]:
def calc_call_roi(contactList, leadScore, percentToCall, cost = 5.00, profit = 45.00): 
    
    sales, calls = calc_calls(contactList, leadScore, percentToCall)
    return (sales*profit - calls*cost) / float(calls*cost)

def calc_call_cvr(contactList, leadScore, percentToCall):

    sales, calls = calc_calls(contactList, leadScore, percentToCall)  
    return sales / float(calls)

def calc_calls(contactList, leadScore, percentToCall):
    
    calls = int(len(contactList)*percentToCall)
    if 'lead_score' in contactList.column_names():
        contactList.remove_column('lead_score')
    contactList = contactList.add_column(leadScore,name='lead_score')
    
    callList = contactList.topk('lead_score', k=calls)
    sales = len(callList[callList['y']=='yes']) 
    
    return  sales, calls

## Load Data

In [2]:
import graphlab as gl
bank = gl.SFrame.read_csv('Data/bank-additional/bank-additional-full.csv', delimiter=';', verbose=False)
train, validate = bank.random_split(0.8)

2016-06-02 04:30:40,292 [INFO] graphlab.cython.cy_server, 176: GraphLab Create v1.9 started. Logging: /tmp/graphlab_server_1464841838.log


This non-commercial license of GraphLab Create is assigned to kevin.mcisaac@gmail.com and will expire on November 06, 2016. For commercial licensing options, visit https://dato.com/buy/.


For the first approach we call a random 20% of the prospect. Note the RIO can vary from -2.8% to 11% depending on the luck of the draw! 

In [3]:
import random
for i in range(1,5):
    randLeadScores = gl.SArray([random.random() for _ in validate])
    initROI = calc_call_roi(validate, randLeadScores, 0.2)
    initCVR = calc_call_cvr(validate, randLeadScores, 0.2)
    print 'Call random 20%: ROI = {0:.2%}, CVR =  {1:.2%}'.format(initROI, initCVR)

Call random 20%: ROI = 8.81%, CVR =  12.09%
Call random 20%: ROI = -10.33%, CVR =  9.96%
Call random 20%: ROI = -3.77%, CVR =  10.69%
Call random 20%: ROI = 3.89%, CVR =  11.54%


## Create a machine learning model to score leads

In [4]:
features = set(train.column_names()) - set(['duration', 'y'])

toolkit_model = gl.classifier.boosted_trees_classifier.create(train, features = features, target='y', verbose=False,
                                                             early_stopping_rounds=10, max_iterations=500 )
results = toolkit_model.evaluate(validate)
print "accuracy: %g, precision: %g, recall: %g" % (results['accuracy'], results['precision'], results['recall'])

accuracy: 0.895907, precision: 0.668639, recall: 0.23275


Model can be considered accurate as it correctly predicts the purchasing decisions of ~90% of the contacts. However there is room for improvement as it only 64% of its predictions actually convert to sales and only 24% of actual sales were predicted by the model. 

In [5]:
toolkitLeadScore = toolkit_model.predict(validate,output_type='probability')
toolkitROI = calc_call_roi(validate, toolkitLeadScore, 0.2 )
toolkitCVR = calc_call_cvr(validate, toolkitLeadScore, 0.2 )

print 'Call top 20%: ROI = {0:.2%}, CVR =  {1:.2%}'.format(toolkitROI, toolkitCVR)

Call top 20%: ROI = 237.91%, CVR =  37.55%


Even still the converstion rates on the top 20% is 3 times better than choosing random contacts and teh ROI has increased from 4.1% to %239%. So even a simple model is very well worth implementing