# Introduction

In the previous noteboot [here](http://localhost:8890/notebooks/BuildingPredictiveModel.ipynb), we built a predictive model (based on the company's existing customers), that would help us predict average sales amount for future customers. 

What we want to do in this notebook, is to apply the model to predict future average sales amount for new customers in the company's mailing list.

In [1]:
# Import the relevant libraries 
import pandas as pd 
import numpy as np 
import statsmodels.api as sm 
import matplotlib.pyplot as plt 
from sklearn.linear_model import LinearRegression 
import seaborn as sns 
sns.set()

import pickle

In [2]:
df_test = pd.read_excel('p1-mailinglist.xlsx')

In [3]:
df_test.head()

Unnamed: 0,Name,Customer_Segment,Customer_ID,Address,City,State,ZIP,Store_Number,Avg_Num_Products_Purchased,#_Years_as_Customer,Score_No,Score_Yes
0,A Giametti,Loyalty Club Only,2213,5326 S Lisbon Way,Centennial,CO,80015,105,3,0.2,0.694964,0.305036
1,Abby Pierson,Loyalty Club and Credit Card,2785,4344 W Roanoke Pl,Denver,CO,80236,101,6,0.6,0.527275,0.472725
2,Adele Hallman,Loyalty Club Only,2931,5219 S Delaware St,Englewood,CO,80110,101,7,0.9,0.421118,0.578882
3,Alejandra Baird,Loyalty Club Only,2231,2301 Lawrence St,Denver,CO,80205,103,2,0.6,0.694862,0.305138
4,Alice Dewitt,Loyalty Club Only,2530,5549 S Hannibal Way,Centennial,CO,80015,104,4,0.5,0.612294,0.387706


Here we see additional features compared to our training dataset from [here](http://localhost:8890/notebooks/BuildingPredictiveModel.ipynb)

'Score_No': Gives the probability that the customer would not buy the catalog
'Score_Yes = 1 - Score_No': Gives the probability that the customer would buy the catalog.

In [4]:
#load our pickle file, the predictive model
reg = pickle.load(open('reg_model.pickle','rb'))

In [5]:
df = df_test.drop(['Name','Customer_ID','Address','State','City','ZIP','Store_Number', '#_Years_as_Customer'], axis = 1) 
df.head() # Dropping unnecessary columns 

Unnamed: 0,Customer_Segment,Avg_Num_Products_Purchased,Score_No,Score_Yes
0,Loyalty Club Only,3,0.694964,0.305036
1,Loyalty Club and Credit Card,6,0.527275,0.472725
2,Loyalty Club Only,7,0.421118,0.578882
3,Loyalty Club Only,2,0.694862,0.305138
4,Loyalty Club Only,4,0.612294,0.387706


In [6]:
df=pd.get_dummies(df, drop_first = True) # Get dummy encoding for the column: Customer_Segment

In [7]:
df.head()

Unnamed: 0,Avg_Num_Products_Purchased,Score_No,Score_Yes,Customer_Segment_Loyalty Club Only,Customer_Segment_Loyalty Club and Credit Card,Customer_Segment_Store Mailing List
0,3,0.694964,0.305036,1,0,0
1,6,0.527275,0.472725,0,1,0
2,7,0.421118,0.578882,1,0,0
3,2,0.694862,0.305138,1,0,0
4,4,0.612294,0.387706,1,0,0


In [8]:
col = ['Avg_Num_Products_Purchased', 'Customer_Segment_Loyalty Club Only',
       'Customer_Segment_Loyalty Club and Credit Card',
       'Customer_Segment_Store Mailing List',
       'Score_Yes']

In [9]:
df_model = df[['Avg_Num_Products_Purchased','Customer_Segment_Loyalty Club Only', 
                        'Customer_Segment_Loyalty Club and Credit Card', 'Customer_Segment_Store Mailing List']]
df_model.head()

Unnamed: 0,Avg_Num_Products_Purchased,Customer_Segment_Loyalty Club Only,Customer_Segment_Loyalty Club and Credit Card,Customer_Segment_Store Mailing List
0,3,1,0,0
1,6,0,1,0
2,7,1,0,0
3,2,1,0,0
4,4,1,0,0


In [10]:
df['Predicted_Revenue'] = reg.predict(df_model) # Predict the data

In [11]:
# Calculate the expected sales values of each customer
#this is expected revenue * probability that the customer would purchase the catalog
df['Expected_Revenue'] = df['Predicted_Revenue']*df['Score_Yes'] 

In [12]:
#total expected revenue for all customers
total_exp_rev=df['Expected_Revenue'].sum()
#
total_exp_profit = total_exp_rev*0.5 - (6.5*250)

print('Total Expected Revenue: USD', total_exp_rev.round(2) )
print('Total Expected Profit: USD', total_exp_profit.round(2) )

Total Expected Revenue: USD 47224.87
Total Expected Profit: USD 21987.44


Findings: When assuming this year’s catalog is sent to these 250 customers, the expected profit from the new catalog is USD 21,987.44

# Summary/Conclusion

In the notebook [here](http://localhost:8890/notebooks/BuildingPredictiveModel.ipynb),

- We explored our training dataset  (which has information about existing customers), to find features that would help us predict average sales amount for future customers.

- Then we used the observed features to build a model that would predict average sales amount for future customers.

- In this notebook, we used the predictive model to predict average sales amount for the 250 customers in the mailing list dataset. 

- Then we performed all necessary calculations with this predicted average sales amount:

 - `expected revenue/ per customer = predicted sales * score_yes`

 - `total expected revenue = sum(expected revenue/ per customer) = 47,224.87`
 
 - `expected profit = total expected revenue* average gross margin(0.5) -  cost of sending catalog to 25 customers (6.5*250) = USD 21,987.43
 
Therefore, if the catalog is sent to the 250 customers, the company should expect a profit of USD 21,987.43. Which is higher than the cutoff value of USD 10,000 suggested by the management. 