# Direct Marketing Optimization Case Report

Steps:
1. Created an analytical dataset (both training and targeting sets)
2. Three models have been developed for (consumer loan, credit card, mutual fund) using training set
3. Optimized targeting clients with the direct marketing offer to maximize the revenue by listing the top 100  clients to be contacted for offer

In [1]:
import pandas as pd
from sklearn import preprocessing, cross_validation, svm
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.feature_selection import chi2, SelectKBest, RFE
from sklearn import metrics
import numpy as np



## Data Preprocessing

Creating dataset by extracting all the required information required for the task given. The steps including Data Preprocessing,Data cleansing and preparation will be done in this step. Transforming data into meaningful arrays will improve the model performance and help understand the insights of the data.

In [2]:

#reading the customer data file into the pandas
data = pd.ExcelFile("./data.xlsx")
#parse all the datasheets 
page2=data.parse(sheetname=1)
page3=data.parse(sheetname=2)
page4=data.parse(sheetname=3)

#list columns of page2
column_name=list(page2.columns)
page2_sorted=page2.sort_index(by=column_name, ascending=[True,False,False,False])
page2_numeric=page2_sorted.replace(to_replace=dict(F=0, M=1, NAN=2), inplace=False)
#combining data page 2 and 3
page2plus3=page3
page2plus3[page2_sorted.columns[1:4]]=page2_sorted[page2_sorted.columns[1:4]]
page4_sorted=page4.sort_values(by=list(page4.columns), ascending=list(page4.columns=='Clients'))
page3plus4 = pd.merge(page2plus3, page4_sorted, how='inner', on=['Client'])

f = pd.DataFrame(index=range(28), columns=list(page4.columns) )

nodata_client =[]
for client in range(1,1615):
    if client not in list(page4_sorted.Client):
        nodata_client.append(client)
f.Client=nodata_client
page4cols=list(page4.columns)
page4extended= page4
page4extended =page4extended.append(f)

page4extended_sorted=page4extended.sort_values(by=page4cols, ascending=list(page4.columns=='Client') )
page4extended_sorted.head(50)
page2plus34 = pd.merge(page2plus3, page4extended_sorted, how='inner', on=['Client'])

page5=data.parse(sheetname=4)

        

nodata_client_page5 =[]
for client in range(1,1615):
    if client not in list(page5.Client):
        nodata_client_page5.append(client)



f5 = pd.DataFrame(index=range(len(nodata_client_page5)), columns=['Client', 'Sale_MF', 'Sale_CC', 'Sale_CL', 'Revenue_MF', 'Revenue_CC',
       'Revenue_CL'])

f5.Client=nodata_client_page5

page5extended= page5
page5extended =page5extended.append(f5)

page5extended_sorted=page5extended.sort_values(by=list(page5.columns), ascending= list(page5.columns == 'Client'))
page5extended_sorted.head(50)
page2plus345 = pd.merge(page2plus34, page5extended_sorted, how='inner', on=['Client'])
page2plus345

#sorting presence of clients by page 
page2plus345['Page3client']=[i in list(page3.Client) for i in list(page2plus345.Client)]
page2plus345['Page4client']=[i in list(page4.Client) for i in list(page2plus345.Client)]
page2plus345['Page5client']=[i in list(page5.Client) for i in list(page2plus345.Client)]

#final dataset
#sum(list(page2plus345.Page5client))



In [3]:
# developing training set : Fetching information related to the CLients present on the datasheet 5
list(page4.columns)
trainingset=page2plus345.loc[(page2plus345.Page5client==True),['Client', 'Count_CA', 'Count_SA', 'Count_MF', 'Count_OVD', 'Count_CC',
       'Count_CL', 'ActBal_CA', 'ActBal_SA', 'ActBal_MF', 'ActBal_OVD',
       'ActBal_CC', 'ActBal_CL', 'Sex', 'Age', 'Tenure', 'VolumeCred',
       'VolumeCred_CA', 'TransactionsCred', 'TransactionsCred_CA', 'VolumeDeb',
       'VolumeDeb_CA', 'VolumeDebCash_Card', 'VolumeDebCashless_Card',
       'VolumeDeb_PaymentOrder', 'TransactionsDeb', 'TransactionsDeb_CA',
       'TransactionsDebCash_Card', 'TransactionsDebCashless_Card',
       'TransactionsDeb_PaymentOrder', 'Sale_MF', 'Sale_CC', 'Sale_CL',
       'Revenue_MF', 'Revenue_CC', 'Revenue_CL', 'Page3client', 'Page4client',
       'Page5client']]

predictingset =page2plus345.loc[(page2plus345.Page5client==False),['Client', 'Count_CA', 'Count_SA', 'Count_MF', 'Count_OVD', 'Count_CC',
       'Count_CL', 'ActBal_CA', 'ActBal_SA', 'ActBal_MF', 'ActBal_OVD',
       'ActBal_CC', 'ActBal_CL', 'Sex', 'Age', 'Tenure', 'VolumeCred',
       'VolumeCred_CA', 'TransactionsCred', 'TransactionsCred_CA', 'VolumeDeb',
       'VolumeDeb_CA', 'VolumeDebCash_Card', 'VolumeDebCashless_Card',
       'VolumeDeb_PaymentOrder', 'TransactionsDeb', 'TransactionsDeb_CA',
       'TransactionsDebCash_Card', 'TransactionsDebCashless_Card',
       'TransactionsDeb_PaymentOrder', 'Sale_MF', 'Sale_CC', 'Sale_CL',
       'Revenue_MF', 'Revenue_CC', 'Revenue_CL', 'Page3client', 'Page4client',
       'Page5client']]

#### Logistic regression model 
The model is implemented for the task of predicting customers most likely to buy credit card mutual fund and loan. Here, the features are the descriptive attributes, and the label is what we are attempting to predict or forecast. As per standard with machine learning in code to define X (capital x), as the features, and y (lowercase y) as the label that corresponds to the features.

In [4]:
#  list of feature names to be evaluated for analysis 
feature_cols=['ActBal_CA', 'ActBal_SA','Age','Count_CA', 'Count_SA', 'Count_MF', 'Count_OVD', 'Count_CC',
       'Count_CL', 'ActBal_CA', 'ActBal_SA', 'ActBal_MF', 'ActBal_OVD',
       'ActBal_CC', 'ActBal_CL', 'Age', 'Tenure', 'VolumeCred',
       'VolumeCred_CA', 'TransactionsCred', 'TransactionsCred_CA', 'VolumeDeb',
       'VolumeDeb_CA', 'VolumeDebCash_Card', 'VolumeDebCashless_Card',
       'VolumeDeb_PaymentOrder', 'TransactionsDeb', 'TransactionsDeb_CA',
       'TransactionsDebCash_Card', 'TransactionsDebCashless_Card',
       'TransactionsDeb_PaymentOrder', 'Sale_MF',  'Sale_CL',
       'Revenue_MF', 'Revenue_CC', 'Revenue_CL']
data =trainingset.loc[:,['Client', 'Count_CA', 'Count_SA', 'Count_MF', 'Count_OVD', 'Count_CC',
       'Count_CL', 'ActBal_CA', 'ActBal_SA', 'ActBal_MF', 'ActBal_OVD',
       'ActBal_CC', 'ActBal_CL', 'Age', 'Tenure', 'VolumeCred',
       'VolumeCred_CA', 'TransactionsCred', 'TransactionsCred_CA', 'VolumeDeb',
       'VolumeDeb_CA', 'VolumeDebCash_Card', 'VolumeDebCashless_Card',
       'VolumeDeb_PaymentOrder', 'TransactionsDeb', 'TransactionsDeb_CA',
       'TransactionsDebCash_Card', 'TransactionsDebCashless_Card',
       'TransactionsDeb_PaymentOrder', 'Sale_MF', 'Sale_CC', 'Sale_CL',
       'Revenue_MF', 'Revenue_CC', 'Revenue_CL', 'Page3client', 'Page4client',
       'Page5client']]

# use the list to select a subset of the original DataFrame
data=data.fillna(0)
X = data[feature_cols]

# checking the type and shape of X
#print(type(X))
#print(X.shape)
# selecting a Series from the DataFrame for labels
y=data['Sale_CC']


Below is the training and testing phase. The 75% of data is used to train the classifier. Then the remaining 25% is used to test the classifier. For accuracy and reliability, the best way is using the build in cross_validation.

In [5]:
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
# instantiate a model
clf=LogisticRegression(C=1e5)
#clf=LinearRegression(fit_intercept=True, normalize=True)
# fit the model to the training data (learn the coefficients)
clf.fit(X_train, y_train)
#Classification accuracy: percentage of correct predictions
accuracy=clf.score(X_test,y_test)
#print(clf.intercept_)
print(accuracy)
y_pred = clf.predict(X_test)


0.942386831276


In [6]:
#predicting set for credit card problem.
cc_target=page2plus345[page2plus345.Page5client !=1]
#predicting set for mutual  fund 
MF_target= page2plus345[page2plus345.Page5client != 1]
#predicting set for consumer loan
CL_target= page2plus345[page2plus345.Page5client != 1]

## Model for Credit Card 


In [7]:

#feature_cols = ['TransactionsDebCashless_Card','ActBal_CA','ActBal_SA','ActBal_MF','Age']
feature_cols=['Count_CA', 'Count_SA', 'Count_MF', 'Count_OVD', 'Count_CC',
       'Count_CL', 'ActBal_CA', 'ActBal_SA', 'ActBal_MF', 'ActBal_OVD',
       'ActBal_CC', 'ActBal_CL', 'Age', 'Tenure', 'VolumeCred',
       'VolumeCred_CA', 'TransactionsCred', 'TransactionsCred_CA', 'VolumeDeb',
       'VolumeDeb_CA', 'VolumeDebCash_Card', 'VolumeDebCashless_Card',
       'VolumeDeb_PaymentOrder', 'TransactionsDeb', 'TransactionsDeb_CA',
       'TransactionsDebCash_Card', 'TransactionsDebCashless_Card',
       'TransactionsDeb_PaymentOrder', 'Revenue_MF', 'Revenue_CL',]

X = data[feature_cols]
# select a Series from the DataFrame
data.fillna(0,inplace=True)
#sale_cc=training_set.Sale_CC.fillna(0)
y=trainingset.Sale_CC.fillna(0)
# split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

# fit the model to the training data (learn the coefficients)
clf.fit(X_train, y_train)

# make predictions on the testing set
y_pred = clf.predict(X_test)

accuracy=clf.score(X_test,y_test)
print('credit card model accuracy : ',accuracy)

X_predict=cc_target[feature_cols].fillna(0)
y_predict = clf.predict(X_predict)

CC=list(cc_target[y_predict==1].Client)
#print(len(CC))


credit card model accuracy :  0.744855967078


### Clients have higher propensity to buy Credit Card

In [8]:
print('Customers to be targetted for the credit card offer', CC )

Customers to be targetted for the credit card offer [5, 19, 145, 151, 153, 161, 197, 206, 352, 359, 373, 382, 389, 532, 535, 587, 592, 633, 851, 886, 931, 951, 978, 996, 1076, 1077, 1241, 1249, 1278, 1289, 1349, 1365, 1410, 1414, 1419, 1455, 1487, 1491, 1588]


## Model for Mutual Fund


In [9]:

#feature_cols = ['TransactionsDebCashless_Card','ActBal_CA','ActBal_SA','ActBal_MF','Age']
feature_cols=['Count_CA', 'Count_SA', 'Count_MF', 'Count_OVD', 'Count_CC',
       'Count_CL', 'ActBal_CA', 'ActBal_SA', 'ActBal_MF', 'ActBal_OVD',
       'ActBal_CC', 'ActBal_CL', 'Age', 'Tenure', 'VolumeCred',
       'VolumeCred_CA', 'TransactionsCred', 'TransactionsCred_CA', 'VolumeDeb',
       'VolumeDeb_CA', 'VolumeDebCash_Card', 'VolumeDebCashless_Card',
       'VolumeDeb_PaymentOrder', 'TransactionsDeb', 'TransactionsDeb_CA',
       'TransactionsDebCash_Card', 'TransactionsDebCashless_Card',
       'TransactionsDeb_PaymentOrder', 'Sale_MF', 'Sale_CC', 'Sale_CL', 'Revenue_CC',
              'Revenue_CL']

#data=selected_trainingset2
X = data[feature_cols]
# select a Series from the DataFrame
data.fillna(0,inplace=True)
#sale_cc=training_set.Sale_CC.fillna(0)
y=trainingset.Sale_MF.fillna(0)
# split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

# fit the model to the training data (learn the coefficients)
clf.fit(X_train, y_train)

# make predictions on the testing set
y_pred = clf.predict(X_test)

accuracy=clf.score(X_test,y_test)
print('Mutual fund model accuracy : ',accuracy)

X_predict=MF_target[feature_cols].fillna(0)
y_predict = clf.predict(X_predict)

MF= list(MF_target[y_predict==1].Client)
#print(len(MF))
#print('Customers to be targetted for the Mutualfund offer', MF)

Mutual fund model accuracy :  0.864197530864


### Clients have higher propensity to buy Mutual Fund


In [10]:
print('Customers to be targetted for the Mutualfund offer', MF)

Customers to be targetted for the Mutualfund offer [30, 39, 64, 196, 506, 583, 766, 785, 878, 940, 1007, 1008, 1119, 1226, 1416, 1435, 1480, 1508, 1516, 1569]


## Model for consumer loan

In [11]:
feature_cols=['Count_CA', 'Count_SA', 'Count_MF', 'Count_OVD', 'Count_CC',
       'Count_CL', 'ActBal_CA', 'ActBal_SA', 'ActBal_MF', 'ActBal_OVD',
       'ActBal_MF', 'ActBal_CL', 'Age', 'Tenure', 'VolumeCred',
       'VolumeCred_CA', 'TransactionsCred', 'TransactionsCred_CA', 'VolumeDeb',
       'VolumeDeb_CA', 'VolumeDebCash_Card', 'VolumeDebCashless_Card',
       'VolumeDeb_PaymentOrder', 'TransactionsDeb', 'TransactionsDeb_CA',
       'TransactionsDebCash_Card', 'TransactionsDebCashless_Card',
       'TransactionsDeb_PaymentOrder', 'Revenue_CC',
              'Revenue_MF']

#data=selected_trainingset2
X = data[feature_cols]
# select a Series from the DataFrame
data.fillna(0,inplace=True)
#sale_cc=training_set.Sale_CC.fillna(0)
y=trainingset.Sale_CL.fillna(0)
# split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)


# fit the model to the training data (learn the coefficients)
clf.fit(X_train, y_train)


# make predictions on the testing set
y_pred = clf.predict(X_test)
accuracy=clf.score(X_test,y_test)
print('Consumer Loan model accuracy : ', accuracy)

X_predict=CL_target[feature_cols].fillna(0)
y_predict = clf.predict(X_predict)

CL=list(CL_target[y_predict==1].Client)


#print(len(CL))

Consumer Loan model accuracy :  0.711934156379


### Clients have higher propensity to buy Consumer Loan

In [12]:
print('Customers to be targeted for the loan offer', CL)

Customers to be targeted for the loan offer [5, 89, 153, 161, 164, 217, 239, 314, 353, 401, 496, 532, 543, 1051, 1077, 1200, 1218, 1237, 1289, 1365, 1443, 1455, 1569]


## Revenues forcasting for Credit Card

In [15]:
feature_cols=[ 'Count_CA', 'Count_SA', 'Count_MF', 'Count_OVD', 'Count_CC',
       'Count_CL', 'ActBal_CA', 'ActBal_SA', 'ActBal_MF', 'ActBal_OVD',
       'ActBal_CC', 'ActBal_CL', 'Age', 'Tenure', 'VolumeCred',
       'VolumeCred_CA', 'TransactionsCred', 'TransactionsCred_CA', 'VolumeDeb',
       'VolumeDeb_CA', 'VolumeDebCash_Card', 'VolumeDebCashless_Card',
       'VolumeDeb_PaymentOrder', 'TransactionsDeb', 'TransactionsDeb_CA',
       'TransactionsDebCash_Card', 'TransactionsDebCashless_Card',
       'TransactionsDeb_PaymentOrder', 'Revenue_MF','Revenue_CL']

X = data[feature_cols]
# select a Series from the DataFrame
data.fillna(0,inplace=True)
y=(trainingset.Revenue_CC.fillna(0)).apply(np.int64)
# split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

# fit the model to the training data (learn the coefficients)
clf.fit(X_train, y_train)

accuracy=clf.score(X_test,y_test)
print('credit card revenue model accuracy : ', accuracy)

# make predictions on the testing set
#y_pred = clf.predict(X_test)
      
X_predict=cc_target[feature_cols].fillna(0)
y_predict = clf.predict(X_predict)

#print('number of customer with revenue', sum(y!=0))

CC_revenue=list(cc_target[y_predict!=0].Client)
#print('Total expected revenue from predicted credit card customers:',sum(y_predict))
#print('targeted customers :', CC_revenue)
CC_client_pred =y_predict[y_predict!=0]
CC_zip=list(zip(CC_revenue,CC_client_pred))

#Revenue_forcast = pd.DataFrame(columns=columns,index=index)
#print('Customer and their targeted revenue:',sorted(CC_zip,key=lambda x: x[1],reverse=True))

# Create another list of those who are suppose to be in the predicted set of target CC customers
sum_CC_revenue =0
sale_CC_revenue_list =[]
for client in list(CC_revenue):
    if client in list(CC):
        sale_CC_revenue_list.append((client,CC_client_pred[CC_revenue.index(client)]))
        sum_CC_revenue = sum_CC_revenue + CC_client_pred[CC_revenue.index(client)]


print ('List of (Client id, revenue forecasted) tuples:', sale_CC_revenue_list)
print ('sum of revenue CC:', sum_CC_revenue)
y_predict

credit card revenue model accuracy :  0.716049382716
List of (Client id, revenue forecasted) tuples: [(19, 42), (153, 14), (161, 2), (352, 42), (359, 14), (373, 15), (382, 15), (532, 7), (535, 15), (587, 15), (633, 3), (851, 6), (886, 1), (951, 5), (1076, 15), (1077, 8), (1289, 13), (1349, 15), (1410, 4), (1414, 18), (1455, 25)]
sum of revenue CC: 294


array([  0,   0,   0,   0,  14,   0,   0,   0,   0,  76,  42,   0,   0,
         0,   0,   0,   0,  10,   0,   0,   0,  15,   0,   0,   0,   0,
        76,   0,   0,   0,   9,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   8,   0,   0,   0,   0,  10,   0,   0,  14,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  14,   0,
         0,   2,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,  37,   0,   0,   0,   0,   0,  12,  36, 102,   0,  36,   0,
         0,   0,  36,   0,   0,   0,   0,   0,   0,   0,   0,   0,   4,
         0,   0,   0,   0,  14,   0,   0,  15,   0,   0,   0,   0,   0,
         0,   0,   0,   0,  15,   0,  36,  76,   0,   0,   0,   0,   0,
        39,   0,   0,   0,  28,   0,  15,   0,   0,   0, 102,   0,  42,
         0,   0,  42,   0,  42,   0,   0,   0,  15,   0,  39,  14,   0,
         0,   0,   0,   0,  15,   0,   0,  42,  15,   0,   0,   0,   0,
         0,   0,   0,  76,   0,   0,   0,   0,   0,   0,   0,   

## Revenue forcasting for Mutual funds

In [567]:
feature_cols=[ 'Count_CA', 'Count_SA', 'Count_MF', 'Count_OVD', 'Count_CC',
       'Count_CL', 'ActBal_CA', 'ActBal_SA', 'ActBal_MF', 'ActBal_OVD',
       'ActBal_CC', 'ActBal_CL', 'Age', 'Tenure', 'VolumeCred',
       'VolumeCred_CA', 'TransactionsCred', 'TransactionsCred_CA', 'VolumeDeb',
       'VolumeDeb_CA', 'VolumeDebCash_Card', 'VolumeDebCashless_Card',
       'VolumeDeb_PaymentOrder', 'TransactionsDeb', 'TransactionsDeb_CA',
       'TransactionsDebCash_Card', 'TransactionsDebCashless_Card',
       'TransactionsDeb_PaymentOrder', 'Revenue_CC','Revenue_CL']
X = data[feature_cols]
# select a Series from the DataFrame
data.fillna(0,inplace=True)
y=(trainingset.Revenue_MF.fillna(0)).apply(np.int64)
# split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

# fit the model to the training data (learn the coefficients)
clf.fit(X_train, y_train)

X_predict=MF_target[feature_cols].fillna(0)
y_predict = clf.predict(X_predict)

accuracy=clf.score(X_test,y_test)
print('Mutual fund revenue accuracy :', accuracy)
#print('number of customer with mutual fund revenue', sum(y!=0))
#predicted customer revenue

MF_revenue=list(MF_target[y_predict!=0].Client)
#print('Total expected revenue from predicted credit card customers:',sum(y_predict))
#print('targeted customers :', MF_revenue)
MF_client_pred = y_predict[y_predict!=0]
MF_zip=list(zip(MF_revenue,MF_client_pred))

#print('customer and their targeted revenue:',sorted(MF_zip,key=lambda x: x[1],reverse=True))


targeted_MF_client =[]
for client in MF_revenue:
    if client in list (MF):
        targeted_MF_client.append(client)
# only those customer who are targeted for MF sale
targeted_MF_client  

# Create another list of those who are suppose to be in the predicted set of target MF customers
sum_MF_revenue =0
sale_MF_revenue_list =[]
for client in list(MF_revenue):
    if client in list(MF):
        sale_MF_revenue_list.append((client,MF_client_pred[MF_revenue.index(client)]))
        sum_MF_revenue = sum_MF_revenue + MF_client_pred[MF_revenue.index(client)]
        #print(sum_MF_revenue,MF_client_pred[MF_revenue.index(client)])

print('sum of revenue MF:', sum_MF_revenue)
print('List of (Client id, revenue forecasted) tuples:', sale_MF_revenue_list)

Mutual fund revenue accuracy : 0.80658436214
sum MF revenue: 387
List of (Client id, revenue forecasted) tuples: [(30, 6), (196, 73), (506, 6), (583, 9), (766, 54), (785, 1), (940, 18), (1007, 54), (1008, 1), (1119, 73), (1226, 9), (1416, 6), (1435, 9), (1508, 34), (1516, 34)]


## Revenues forcasting for Consumer Loan

In [569]:
feature_cols=[ 'Count_CA', 'Count_SA', 'Count_MF', 'Count_OVD', 'Count_CC',
       'Count_CL', 'ActBal_CA', 'ActBal_SA', 'ActBal_MF', 'ActBal_OVD',
       'ActBal_CC', 'ActBal_CL', 'Age', 'Tenure', 'VolumeCred',
       'VolumeCred_CA', 'TransactionsCred', 'TransactionsCred_CA', 'VolumeDeb',
       'VolumeDeb_CA', 'VolumeDebCash_Card', 'VolumeDebCashless_Card',
       'VolumeDeb_PaymentOrder', 'TransactionsDeb', 'TransactionsDeb_CA',
       'TransactionsDebCash_Card', 'TransactionsDebCashless_Card',
       'TransactionsDeb_PaymentOrder', 'Revenue_MF','Revenue_CC']
X = data[feature_cols]
# select a Series from the DataFrame
data.fillna(0,inplace=True)
y=(trainingset.Revenue_CL.fillna(0)).apply(np.int64)
# split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

# fit the model to the training data (learn the coefficients)
clf.fit(X_train, y_train)
# make predictions on the testing set
#y_pred = clf.predict(X_test)

accuracy=clf.score(X_test,y_test)
print('Consumer Loan revenue accuracy :', accuracy)
#print('number of customer with revenue', sum(y!=0))
#predicted customer revenue
X_predict=CL_target[feature_cols].fillna(0)
y_predict = clf.predict(X_predict)

CL_revenue=list(CL_target[y_predict!=0].Client)
#print('Total expected revenue from predicted credit card customers:',sum(y_predict))
#print('targeted customers :', CL_revenue)
CL_client_pred = y_predict[y_predict!=0]
CL_zip=list(zip(CL_revenue,CL_client_pred))

#print('customer and their targeted revenue(sorted in descending order of revenue):',sorted(CL_zip,key=lambda x: x[1],reverse=True))

targeted_CL_client =[]
for client in CL_revenue:
    if client in list (CL):
        targeted_CL_client.append(client)
# only those customer who are targeted for MF sale
targeted_CL_client  

# Create another list of those who are suppose to be in the predicted set of target CL customers
sale_CL_revenue_list =[]
sum_CL_revenue = 0
for client in list(CL_revenue):
    if client in list(CL):
        sale_CL_revenue_list.append((client,CL_client_pred[CL_revenue.index(client)]))
        sum_CL_revenue = sum_CL_revenue + CL_client_pred[CL_revenue.index(client)]



print ("Sum of Revenue CL ", sum_CL_revenue)
print ('List of (Client id, revenue forecasted) tuples:',sale_CL_revenue_list)

Consumer Loan revenue accuracy : 0.641975308642
Revenue CL  249
List of (Client id, revenue forecasted) tuples: [(153, 19), (161, 6), (217, 20), (314, 17), (496, 30), (532, 4), (543, 27), (1051, 22), (1077, 22), (1218, 31), (1289, 1), (1443, 20), (1455, 30)]


### Accuracy is too low in this case. 
One possibility is to reduce the feature set  by selecting only few parameters and reduce the probability of over fitting/mispredicting using recursive feature elimination or by using principal component analysis. I skip it for now due to limited time. 

#### Code snippet
$$ model = LogisticRegression()$$
$$ rfe = RFE(model, 3)$$ 
$$ fit = rfe.fit(X, Y)$$ 
$$ print("Num Features: ", fit.n\_features\_)$$ 
$$ print("Selected Features: ",  fit.support\_)$$ 
$$ print("Feature Ranking:", fit.ranking\_)$$ 

%%latex
model = LogisticRegression()
rfe = RFE(model, 3)
fit = rfe.fit(X, Y)
print("Num Features: %d") % fit.nfeatures
print("Selected Features: %s") % fit.support_
print("Feature Ranking: %s") % fit.ranking_



## Forecasting the Expected Revenue
Creating a list of clinets who according the model developed highly likely to be ineterested in buying the bank products ; credit cards, loans and the mutual funds. There are few customers who would be interested in all the three products. By summing the revenues generated per client based  on the products the revenue sum is generated. 

### Which clients are to be targeted with which offer?

The table of probable customers contains three columns which indicates their respective interest in buying credit card, mutual fund and consumer loan and the projected revenues. 

In [570]:
data=page2plus345
columns = ['Client', 'Revenue_CC', 'Revenue_CL','Revenue_MF','Sum_Revenues']
index = list(range(1,len(data.Client)))

Revenue_forcast = pd.DataFrame(columns=columns,index=index)
Revenue_forcast.Client= list(range(1,len(data.Client)))
#fetching revenue from three sets cc, cl and mf

Revenue_forcast.fillna(0)

for client in range(1,len(data.Client)):
    for keys, values in sale_MF_revenue_list:
        if client == keys:
            Revenue_forcast.set_value(client, 'Revenue_MF', values)
    for keys, values in sale_CC_revenue_list:
        if client == keys:
            Revenue_forcast.set_value(client, 'Revenue_CC', values)
    for keys, values in sale_CL_revenue_list:
        if client == keys:
            Revenue_forcast.set_value(client, 'Revenue_CL', values)

Revenue_forcast=Revenue_forcast.fillna(0)
Revenue_forcast.Sum_Revenues =Revenue_forcast.Revenue_CC+Revenue_forcast.Revenue_CL +Revenue_forcast.Revenue_MF
target_call_cust=Revenue_forcast.sort_values(by='Sum_Revenues',ascending=False)
print('Individual target revenue :\n')
target_call_cust.iloc[0:43,1:5]



Individual target revenue :



Unnamed: 0,Revenue_CC,Revenue_CL,Revenue_MF,Sum_Revenues
196,0,0,73,73
1119,0,0,73,73
1455,25,30,0,55
1007,0,0,54,54
766,0,0,54,54
19,42,0,0,42
352,42,0,0,42
1516,0,0,34,34
1508,0,0,34,34
153,14,19,0,33


### What would be the expected revenue based on your strategy?

A list of first 43 clients is generated which are not present in training set i.e. excel sheet (sales_revenues). The forcasted revenue by the addition of new customers in the pool is expected to be:



In [571]:
print('Total targeted revenue: ', sum(list(target_call_cust.iloc[0:43,:].Sum_Revenues)))

Total targeted revenue:  930
