# Market Campaign Prediction NoteBook

## Summary
Market campaign prediction aims to predict the success of telemarketing.

## Description

### Use Case Description
A company, such as bank, wants to do market campaign prediction. The bank collects customer demographic data, bank account information, history telemarketing activity record from various data sources. The task is to build a pipeline that automatically analyze the bank market dataset, to predict the success of telemarketing calls for selling bank long-term deposits. The aim is to provide market intelligence for the bank and better target valuable customers and hence reduce marketing cost.

#### Use Case Data
The data used in this use case is [BankMarket dataset](https://archive.ics.uci.edu/ml/datasets/Bank+Marketing), a publicly available data set collected from UCI Machine Learning repository. The data contains 17 variables and 4521 rows. 

We shared the market data in the data folder. You can use this shared data to follow the steps in this template, or you can access the full dataset from UCI website.

Each instance in the data set has 17 fields:

* 1 - age (numeric)
* 2 - job : type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student", "blue-collar","self-employed","retired","technician","services") 
* 3 - marital : marital status (categorical: "married","divorced","single"; note: "divorced" means divorced or widowed)
* 4 - education (categorical: "unknown","secondary","primary","tertiary")
* 5 - default: has credit in default? (binary: "yes","no")
* 6 - balance: average yearly balance, in euros (numeric) 
* 7 - housing: has housing loan? (binary: "yes","no")
* 8 - loan: has personal loan? (binary: "yes","no")
* 9 - contact: contact communication type (categorical: "unknown","telephone","cellular") 
* 10 - day: last contact day of the month (numeric)
* 11 - month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec")
* 12 - duration: last contact duration, in seconds (numeric)
* 13 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
* 14 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted)
* 15 - previous: number of contacts performed before this campaign and for this client (numeric)
* 16 - poutcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success")

Target variable:
* 17 - y - has the client subscribed a term deposit? (binary: "yes","no")

### Market Campaign Operationalization

### Schema Generation

In order to deploy the model as a web-service, we need first define functions to generate schema file for the service.

In [11]:
# This script generates the scoring and schema files
# necessary to operaitonalize the Market Campaign prediction sample
# Init and run functions

from azureml.api.schema.dataTypes import DataTypes
from azureml.api.schema.sampleDefinition import SampleDefinition
from azureml.api.realtime.services import generate_schema
import pandas as pd

In [12]:
# Prepare the web service definition by authoring
# init() and run() functions. Test the fucntions
# before deploying the web service.

def init():
    from sklearn.externals import joblib

    # load the model file
    global model
    model = joblib.load('./code/marketcampaign/dt.pkl')

def run(input_df):
    import json
    df = df1.append(input_df, ignore_index=True)
    columns_to_encode = list(df.select_dtypes(include=['category','object']))
    for column_to_encode in columns_to_encode:
        dummies = pd.get_dummies(df[column_to_encode])
        one_hot_col_names = []
        for col_name in list(dummies.columns):
            one_hot_col_names.append(column_to_encode + '_' + col_name)
        dummies.columns = one_hot_col_names
        df = df.drop(column_to_encode, axis=1)
        df = df.join(dummies)
    pred = model.predict(df)
    return json.dumps(str(pred[12]))
    #return pred[12]
print('executed')

executed


In [13]:
df1 = pd.DataFrame(data=[[30,'admin.','divorced','unknown','yes',1787,'no','no','telephone',19,'oct',79,1,-1,0,'unknown'],[33,'blue-collar','married','secondary','no',4789,'yes','yes','cellular',11,'may',220,1,339,4,'success'],[35,'entrepreneur','single','tertiary','no',1350,'yes','no','cellular',16,'apr',185,1,330,1,'failure'],[30,'housemaid','married','tertiary','no',1476,'yes','yes','unknown',3,'jun',199,4,-1,0,'unknown'],[59,'management','married','secondary','no',0,'yes','no','unknown',5,'jan',226,1,-1,0,'unknown'],[35,'retired','single','tertiary','no',747,'no','no','cellular',23,'feb',141,2,176,3,'failure'],[36,'self-employed','married','tertiary','no',307,'yes','no','cellular',14,'mar',341,1,330,2,'other'],[39,'services','married','secondary','no',147,'yes','no','cellular',6,'jul',151,2,-1,0,'unknown'],[41,'student','married','tertiary','no',221,'yes','no','unknown',14,'aug',57,2,-1,0,'unknown'],[43,'technician','married','primary','no',-88,'yes','yes','cellular',17,'sep',313,1,147,2,'failure'],[39,'unemployed','married','secondary','no',9374,'yes','no','unknown',20,'nov',273,1,-1,0,'unknown'],[43,'unknown','married','secondary','no',264,'yes','no','cellular',17,'dec',113,2,-1,0,'unknown']], columns=['age','job','marital','education','default','balance','housing','loan','contact','day','month','duration','campaign','pdays','previous','poutcome'])

df1.dtypes
df1

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome
0,30,admin.,divorced,unknown,yes,1787,no,no,telephone,19,oct,79,1,-1,0,unknown
1,33,blue-collar,married,secondary,no,4789,yes,yes,cellular,11,may,220,1,339,4,success
2,35,entrepreneur,single,tertiary,no,1350,yes,no,cellular,16,apr,185,1,330,1,failure
3,30,housemaid,married,tertiary,no,1476,yes,yes,unknown,3,jun,199,4,-1,0,unknown
4,59,management,married,secondary,no,0,yes,no,unknown,5,jan,226,1,-1,0,unknown
5,35,retired,single,tertiary,no,747,no,no,cellular,23,feb,141,2,176,3,failure
6,36,self-employed,married,tertiary,no,307,yes,no,cellular,14,mar,341,1,330,2,other
7,39,services,married,secondary,no,147,yes,no,cellular,6,jul,151,2,-1,0,unknown
8,41,student,married,tertiary,no,221,yes,no,unknown,14,aug,57,2,-1,0,unknown
9,43,technician,married,primary,no,-88,yes,yes,cellular,17,sep,313,1,147,2,failure


In [14]:
df = pd.DataFrame([[30,'unemployed','married','primary','no',1787,'no','no','cellular',19,'oct',79,1,-1,0,'unknown']], columns=['age','job','marital','education','default','balance','housing','loan','contact','day','month','duration','campaign','pdays','previous','poutcome'])
df.dtypes
df

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome
0,30,unemployed,married,primary,no,1787,no,no,cellular,19,oct,79,1,-1,0,unknown


In [15]:
init()
input1 = pd.DataFrame([[30,'unemployed','married','primary','no',1787,'no','no','cellular',19,'oct',79,1,-1,0,'unknown']], columns=['age','job','marital','education','default','balance','housing','loan','contact','day','month','duration','campaign','pdays','previous','poutcome'])
input1.head()

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome
0,30,unemployed,married,primary,no,1787,no,no,cellular,19,oct,79,1,-1,0,unknown


In [16]:
run(input1)

'"0"'

In [17]:
inputs = {"input_df": SampleDefinition(DataTypes.PANDAS, df)}

# The prepare statement writes the scoring file (main.py) and
# the schema file (service_schema.json) the the output folder.

generate_schema(run_func=run, inputs=inputs, filepath='market_service_schema.json')

{'input': {'input_df': {'internal': 'gANjYXp1cmVtbC5hcGkuc2NoZW1hLnBhbmRhc1V0aWwKUGFuZGFzU2NoZW1hCnEAKYFxAX1xAihYDAAAAGNvbHVtbl90eXBlc3EDXXEEKGNudW1weQpkdHlwZQpxBVgCAAAAaThxBksASwGHcQdScQgoSwNYAQAAADxxCU5OTkr/////Sv////9LAHRxCmJoBVgCAAAATzhxC0sASwGHcQxScQ0oSwNYAQAAAHxxDk5OTkr/////Sv////9LP3RxD2JoDWgNaA1oCGgNaA1oDWgIaA1oCGgIaAhoCGgNZVgMAAAAY29sdW1uX25hbWVzcRBdcREoWAMAAABhZ2VxElgDAAAAam9icRNYBwAAAG1hcml0YWxxFFgJAAAAZWR1Y2F0aW9ucRVYBwAAAGRlZmF1bHRxFlgHAAAAYmFsYW5jZXEXWAcAAABob3VzaW5ncRhYBAAAAGxvYW5xGVgHAAAAY29udGFjdHEaWAMAAABkYXlxG1gFAAAAbW9udGhxHFgIAAAAZHVyYXRpb25xHVgIAAAAY2FtcGFpZ25xHlgFAAAAcGRheXNxH1gIAAAAcHJldmlvdXNxIFgIAAAAcG91dGNvbWVxIWVYBQAAAHNoYXBlcSJLAUsQhnEjWAoAAABzY2hlbWFfbWFwcSR9cSUoaB1oCGgaaA1oE2gNaBhoDWgSaAhoGWgNaBVoDWgXaAhoIWgNaBxoDWgWaA1oH2gIaBRoDWgbaAhoIGgIaB5oCHV1Yi4=',
   'swagger': {'example': [{'age': 30,
      'balance': 1787,
      'campaign': 1,
      'contact': 'cellular',
      'day': 19,
      'default': 'no',
      'duration': 79,
      'education': 'primary',


### Scoring Function

Then, we will need to define a scoring function to score on the new instance.

In [18]:
import pandas as pd

def init():

    from sklearn.externals import joblib
    # load the model file
    global model
    model = joblib.load('./code/marketcampaign/dt.pkl')

In [19]:
def run(input_df):
    import json
    df = df1.append(input_df, ignore_index=True)
    columns_to_encode = list(df.select_dtypes(include=['category','object']))
    for column_to_encode in columns_to_encode:
        dummies = pd.get_dummies(df[column_to_encode])
        one_hot_col_names = []
        for col_name in list(dummies.columns):
            one_hot_col_names.append(column_to_encode + '_' + col_name)
        dummies.columns = one_hot_col_names
        df = df.drop(column_to_encode, axis=1)
        df = df.join(dummies)
    pred = model.predict(df)
    return json.dumps(str(pred[12]))
    #return pred[12]
print('executed')

df1 = pd.DataFrame(data=[[30,'admin.','divorced','unknown','yes',1787,'no','no','telephone',19,'oct',79,1,-1,0,'unknown'],[33,'blue-collar','married','secondary','no',4789,'yes','yes','cellular',11,'may',220,1,339,4,'success'],[35,'entrepreneur','single','tertiary','no',1350,'yes','no','cellular',16,'apr',185,1,330,1,'failure'],[30,'housemaid','married','tertiary','no',1476,'yes','yes','unknown',3,'jun',199,4,-1,0,'unknown'],[59,'management','married','secondary','no',0,'yes','no','unknown',5,'jan',226,1,-1,0,'unknown'],[35,'retired','single','tertiary','no',747,'no','no','cellular',23,'feb',141,2,176,3,'failure'],[36,'self-employed','married','tertiary','no',307,'yes','no','cellular',14,'mar',341,1,330,2,'other'],[39,'services','married','secondary','no',147,'yes','no','cellular',6,'jul',151,2,-1,0,'unknown'],[41,'student','married','tertiary','no',221,'yes','no','unknown',14,'aug',57,2,-1,0,'unknown'],[43,'technician','married','primary','no',-88,'yes','yes','cellular',17,'sep',313,1,147,2,'failure'],[39,'unemployed','married','secondary','no',9374,'yes','no','unknown',20,'nov',273,1,-1,0,'unknown'],[43,'unknown','married','secondary','no',264,'yes','no','cellular',17,'dec',113,2,-1,0,'unknown']], columns=['age','job','marital','education','default','balance','housing','loan','contact','day','month','duration','campaign','pdays','previous','poutcome'])

executed


In [20]:
# Implement test code to run in IDE or Azure ML Workbench
if __name__ == '__main__':
    init()
    input = input1
    print(run(input))
    #input = "{}"
    #run(input)

"0"
