# Apprentice Chef Analysis

Apprentice Chef, Inc. is a home-prep food delivery company developed for busy professionals with little skills in the kitchen, offering a wide range of routinely prepared gourmet meals delivered directly to your doorstep.
This report proposes actionable insights by predicting how much revenue to expect from each customer within their first year of orders and which customers will join a cross-selling promotion, "Halfway There".

Revenue prediction model is established with 77.8% coefficient of determination (R squared). Surprisingly, data shows that an additional meal kit order does not necessarily contribute to increasing per customer revenue; the number of meal kits sold is negatively associated with per customer revenue. Unless Apprentice Chef's strategy is to expand its market presence at all cost, the company should stop providing promotional discounts on meal kits sold or consider increasing the meal kit price. The company should rather focus on sales of unique meal kits to maximize revenue from customers. 
According to the promotional success modeling result from the data provided, company’s sales promotional success is most importantly associated with the promotional email is classified as junk or not. This finding is based on the model with 78.1% explanatory power on the success of the promotion (AUC = 0.781). Therefore, I suggest that the company to negotiate with major email service providers to make sure that the company’s email domain is not classified as junk but arrives at recipients’ inbox. 

•	R-Square  :  0.778 <br>
•	AUC score : 0.781

<br><br>Appendix : Code

In [None]:
## Basic packages ##
import random            as rand                     # random number gen
import numpy as np                                   # mathematical essentials
import matplotlib.pyplot as plt                      # data visualization
import seaborn           as sns                      # enhanced data visualization
import pandas            as pd                       # data science essentials

## Data preprocessing package ##
from sklearn.preprocessing import StandardScaler     # standard scaler

## Data split package ##
from sklearn.model_selection import train_test_split # train-test split

## Regression Model packages ##
import statsmodels.formula.api as smf # regression modeling
import sklearn.linear_model           # linear models
from sklearn.linear_model import LinearRegression # linear regression (scikit-learn)

## Classification Model packages ##
from sklearn.ensemble import RandomForestClassifier     # random forest

# CART model packages
from sklearn.tree import DecisionTreeClassifier      # classification trees
from sklearn.tree import export_graphviz             # exports graphics
from six import StringIO                             # saves objects in memory
from IPython.display import Image                    # displays on frontend
import pydotplus                                     # interprets dot objects

## Model test hypothesis packages ##
from sklearn.metrics import confusion_matrix         # confusion matrix
from sklearn.metrics import roc_auc_score            # auc score

## Optimization packages ##
from sklearn.model_selection import RandomizedSearchCV     # hyperparameter tuning
from sklearn.metrics import make_scorer                    # customizable scorer

In [None]:
# Setting pandas print options
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

# Specifying file name
file = './Apprentice_Chef_Dataset.xlsx'

# Reading the file into Python
chef_df = pd.read_excel(io = file)

# Checking the first 5 rows of the dataset
#chef_df.head(n = 5)

In [None]:
#########################
# text_split_feature
#########################
def text_split_feature(col, df, sep=' ', new_col_name='number_of_names'):
    """
Splits values in a string Series (as part of a DataFrame) and sums the number
of resulting items. Automatically appends summed column to original DataFrame.

PARAMETERS
----------
col          : column to split
df           : DataFrame where column is located
sep          : string sequence to split by, default ' '
new_col_name : name of new column after summing split, default
               'number_of_names'
"""
    
    df[new_col_name] = 0
    
    
    for index, val in df.iterrows():
        df.loc[index, new_col_name] = len(df.loc[index, col].split(sep = ' '))

In [None]:
# Calling text_split_feature function
text_split_feature(col = 'NAME', df  = chef_df)

In [None]:
# Splitting emails and concatenating with original DataFrame
chef_df['email_domain'] = chef_df.EMAIL.str.split('@').str[-1]

# Aggregating the email domains into domain groups
# Creating 3 types of email domain
professional_email_domains = ['@mmm.com', '@amex.com', '@apple.com',
                              '@boeing.com', '@caterpillar.com', '@chevron.com',
                              '@cisco.com', '@cocacola.com', '@disney.com', 
                              '@dupont.com', '@exxon.com', '@ge.org', '@goldmansacs.com',
                              '@homedepot.com', '@ibm.com', '@intel.com', '@jnj.com',
                              '@jpmorgan.com', '@mcdonalds.com', '@merck.com', 
                              '@microsoft.com', '@nike.com', '@pfizer.com', 
                              '@pg.com', '@travelers.com', '@unitedtech.com',
                              '@unitedhealth.com', '@verizon.com', '@visa.com', 
                              '@walmart.com']
personal_email_domains = ['@gmail.com', '@yahoo.com', '@protonmail.com']
junk_email_domains = ['@me.com', '@aol.com', '@hotmail.com', '@live.com', 
                      '@msn.com', '@passport.com']

# Creating a placeholder list
placeholder_lst = []

# Looping to group observations by domain types
for domain in chef_df['email_domain']:
        if '@' + domain in professional_email_domains:
            placeholder_lst.append('email_professional')
            
        elif '@' + domain in personal_email_domains:
            placeholder_lst.append('email_personal')
            
        elif '@' + domain in junk_email_domains:
            placeholder_lst.append('email_junk')
            
        else:
            placeholder_lst.append('unknown')

# Concatenating with original DataFrame
chef_df['domain_group'] = pd.Series(placeholder_lst)

# Checking results
chef_df['domain_group'].value_counts()

In [None]:
# One hot encoding categorical variables
one_hot_domain = pd.get_dummies(chef_df['domain_group'])

# Dropping categorical variables after they've been encoded
chef_df = chef_df.drop('domain_group', axis = 1)

# Joining codings together
chef_df = chef_df.join([one_hot_domain])

In [None]:
gender_lst = ['unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'female',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'mostly_male',
 'female',
 'unknown',
 'male',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'male',
 'male',
 'male',
 'female',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'female',
 'male',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'male',
 'female',
 'female',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'female',
 'unknown',
 'andy',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'mostly_male',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_male',
 'unknown',
 'mostly_male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'mostly_male',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'mostly_male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_female',
 'unknown',
 'unknown',
 'andy',
 'unknown',
 'unknown',
 'female',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_male',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_female',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'andy',
 'male',
 'unknown',
 'unknown',
 'male',
 'male',
 'female',
 'female',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_female',
 'female',
 'unknown',
 'male',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'male',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'female',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'male',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_female',
 'mostly_female',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'female',
 'female',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'male',
 'male',
 'male',
 'male',
 'male',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'female',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_female',
 'mostly_female',
 'unknown',
 'male',
 'unknown',
 'female',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'female',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'male',
 'male',
 'male',
 'male',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'female',
 'male',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'male',
 'male',
 'mostly_male',
 'male',
 'male',
 'male',
 'male',
 'mostly_male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'andy',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'mostly_male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'female',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'mostly_male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'male',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'male',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'female',
 'female',
 'male',
 'male',
 'female',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown', 
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_male',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'male',
 'andy',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'female',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'male',
 'male',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'andy',
 'unknown',
 'female',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'female',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'mostly_male',
 'male',
 'male',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'mostly_male',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'male',
 'male',
 'unknown',
 'male',
 'male',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'male',
 'male',
 'unknown',
 'male',
 'unknown',
 'mostly_female',
 'male',
 'unknown',
 'unknown',
 'female',
 'male',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'male',
 'male',
 'female',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'mostly_male',
 'mostly_male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'female',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'male',
 'male',
 'male',
 'mostly_male',
 'unknown',
 'unknown',
 'male',
 'andy',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'mostly_female',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'male',
 'female',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'mostly_female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'female',
 'male',
 'female',
 'mostly_female',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'mostly_female',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'female',
 'female',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_female',
 'male',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'mostly_female',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'mostly_female',
 'female',
 'female',
 'male',
 'male',
 'male',
 'unknown',
 'unknown',
 'mostly_female',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'female',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'female',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'andy',
 'unknown',
 'unknown',
 'male',
 'male',
 'male',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'mostly_male',
 'male',
 'male',
 'unknown',
 'male',
 'unknown',
 'mostly_male',
 'female',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_female',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'mostly_male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_female',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'male',
 'male',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'mostly_female',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'andy',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'mostly_male',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'male',
 'male',
 'male',
 'male',
 'andy',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'female',
 'female',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'unknown',
 'male',
 'unknown',
 'unknown',
 'female',
 'unknown',
 'unknown']

In [None]:
# Creating 3 types of gender
male = ['male', 'mostly_male']
female = ['female', 'mostly_female']
unknown = ['unknown', 'andy']

# Creating a placeholder list
adjusted_gender_lst = []

# Looping to group observations by domain types
for gender in gender_lst:
        if gender in male:
            adjusted_gender_lst.append('male')
            
        elif gender in female:
            adjusted_gender_lst.append('female')
            
        elif gender in unknown:
            adjusted_gender_lst.append('gender_unknown')
            

# Concatenating with original DataFrame
chef_df['gender_test'] = pd.Series(adjusted_gender_lst)

In [None]:
# One hot encoding categorical variables
one_hot_gender = pd.get_dummies(chef_df['gender_test'])

# Dropping categorical variables after they've been encoded
chef_df = chef_df.drop('gender_test', axis = 1)

# Joining codings together
chef_df = chef_df.join([one_hot_gender])

In [None]:
# Dropping unnecessary features
chef_df = chef_df.drop(labels = ['FAMILY_NAME','NAME','EMAIL','email_domain','FIRST_NAME'], axis = 1)

In [None]:
# Dropping unnecessary features
chef_df = chef_df.drop(labels = ['email_personal','gender_unknown'], axis = 1)

### Final Regression Model -- OLS Regression(standard linear regression)

In [None]:
# Log transforming interval data, saving new variables to the chef_df datasetchef_df['log_REVENUE'] = np.log10(chef_df['REVENUE'])
chef_df['log_REVENUE'] = np.log10(chef_df['REVENUE'])

chef_df['log_AVG_PREP_VID_TIME'] = np.log10(chef_df['AVG_PREP_VID_TIME'])

chef_df['log_TOTAL_MEALS_ORDERED'] = np.log10(chef_df['TOTAL_MEALS_ORDERED'])

chef_df['log_UNIQUE_MEALS_PURCH'] = np.log10(chef_df['UNIQUE_MEALS_PURCH'])
chef_df['log_CONTACTS_W_CUSTOMER_SERVICE'] = np.log10(chef_df['CONTACTS_W_CUSTOMER_SERVICE'])
chef_df['log_LARGEST_ORDER_SIZE'] = np.log10(chef_df['LARGEST_ORDER_SIZE'])
chef_df['log_MEDIAN_MEAL_RATING'] = np.log10(chef_df['MEDIAN_MEAL_RATING'])
chef_df['log_AVG_CLICKS_PER_VISIT'] = np.log10(chef_df['AVG_CLICKS_PER_VISIT'])

In [None]:
# Creating a new variable and saving new variables to the chef_df dataset
chef_df['Total_cancellations'] = chef_df['CANCELLATIONS_BEFORE_NOON'] + chef_df['CANCELLATIONS_AFTER_NOON']

In [None]:
# Adding dummy variable
chef_df['has_TOTAL_PHOTOS_VIEWED'] = 0
chef_df['has_WEEKLY_PLAN'] = 0
chef_df['has_CANCELLATIONS'] = 0
chef_df['has_MASTER_CLASSES_ATTENDED'] = 0

# Iterating over each original column to change values in the new feature columns
for index, value in chef_df.iterrows():
    
    # TOTAL_PHOTOS_VIEWED
    if chef_df.loc[index, 'TOTAL_PHOTOS_VIEWED'] > 0:
        chef_df.loc[index, 'has_TOTAL_PHOTOS_VIEWED'] = 1 

    # WEEKLY_PLAN
    if chef_df.loc[index, 'WEEKLY_PLAN'] > 0:
        chef_df.loc[index, 'has_WEEKLY_PLAN'] = 1
        
    # Total_cancellations
    if chef_df.loc[index, 'Total_cancellations'] > 0:
        chef_df.loc[index, 'has_CANCELLATIONS'] = 1
        
    # Total_cancellations
    if chef_df.loc[index, 'MASTER_CLASSES_ATTENDED'] > 0:
        chef_df.loc[index, 'has_MASTER_CLASSES_ATTENDED'] = 1

In [None]:
# Preparing explanatory variable data
x_chef_df= chef_df.drop(['REVENUE', 'log_REVENUE'],  axis = 1)

# Preparing response variables
log_chef_df_target = chef_df.loc[ : , 'log_REVENUE']

# Preparing training and testing sets with log_REVENUE(Y-variable)
X_train, X_test, y_train, y_test = train_test_split(
            x_chef_df,
            log_chef_df_target,
            test_size = 0.25, random_state = 219)

In [None]:
# Merging X_train and y_train so that they can be used in statsmodels
df_train = pd.concat([X_train, y_train], axis = 1)

# Step 1: Building a model
lm_best = smf.ols(formula = """ log_REVENUE ~ CROSS_SELL_SUCCESS +
                                            TOTAL_MEALS_ORDERED +
                                            UNIQUE_MEALS_PURCH +
                                            CONTACTS_W_CUSTOMER_SERVICE +
                                            PRODUCT_CATEGORIES_VIEWED +
                                            AVG_TIME_PER_SITE_VISIT +
                                            MOBILE_NUMBER +
                                            TASTES_AND_PREFERENCES +
                                            PC_LOGINS +
                                            MOBILE_LOGINS +
                                            WEEKLY_PLAN +
                                            EARLY_DELIVERIES +
                                            LATE_DELIVERIES +
                                            PACKAGE_LOCKER +
                                            REFRIGERATED_LOCKER +
                                            AVG_PREP_VID_TIME +
                                            LARGEST_ORDER_SIZE +
                                            MASTER_CLASSES_ATTENDED +
                                            MEDIAN_MEAL_RATING +
                                            AVG_CLICKS_PER_VISIT +
                                            TOTAL_PHOTOS_VIEWED +
                                            email_junk +
                                            email_professional +
                                            female +
                                            male +
                                            log_AVG_PREP_VID_TIME +
                                            log_TOTAL_MEALS_ORDERED +
                                            log_UNIQUE_MEALS_PURCH +
                                            log_CONTACTS_W_CUSTOMER_SERVICE +
                                            log_LARGEST_ORDER_SIZE +
                                            log_MEDIAN_MEAL_RATING +
                                            log_AVG_CLICKS_PER_VISIT +
                                            Total_cancellations +
                                            has_TOTAL_PHOTOS_VIEWED +
                                            has_WEEKLY_PLAN +
                                            has_CANCELLATIONS +
                                            has_MASTER_CLASSES_ATTENDED""", data = df_train)

# Step 2: Fitting the model based on the data
results = lm_best.fit()

# Step 3: Analyze the summary output
print(results.summary())

In [None]:
# INSTANTIATING a model object
lr = LinearRegression()

# FITTING to the training data
lr_fit = lr.fit(X_train, y_train)

# PREDICTING on new data
lr_pred = lr_fit.predict(X_test)

# SCORING the results
print('OLS Training Score :', lr.score(X_train, y_train).round(4)) # using R-square
print('OLS Testing Score  :',  lr.score(X_test, y_test).round(4))  # using R-square

lr_train_score = lr.score(X_train, y_train).round(4)
lr_test_score  = lr.score(X_test, y_test).round(4)

# Displaying and saving the gap between training and testing
print('OLS Train-Test Gap :', abs(lr_train_score - lr_test_score).round(4))
lr_test_gap = abs(lr_train_score - lr_test_score).round(4)


# Zipping each feature name to its coefficient
lr_model_values = zip(x_chef_df.columns,lr_fit.coef_.round(decimals = 3))

# Setting up a placeholder list to store model features
lr_model_lst = [('intercept', lr_fit.intercept_.round(decimals = 3))]

# Printing out each feature-coefficient pair one by one
for val in lr_model_values:
    lr_model_lst.append(val)
    
# Checking the results
print("\n\n\ncoefficients:")
for pair in lr_model_lst:
    print(pair)

### Final Classification Model -- Gradient Boosted Machines

In [None]:
# Dropping unnecessary features
chef_df = chef_df.drop(labels = ['log_REVENUE','log_AVG_PREP_VID_TIME','log_TOTAL_MEALS_ORDERED','log_UNIQUE_MEALS_PURCH','log_CONTACTS_W_CUSTOMER_SERVICE','log_LARGEST_ORDER_SIZE','log_MEDIAN_MEAL_RATING','log_AVG_CLICKS_PER_VISIT','Total_cancellations','has_TOTAL_PHOTOS_VIEWED','has_WEEKLY_PLAN','has_CANCELLATIONS','has_MASTER_CLASSES_ATTENDED'], axis = 1)

In [None]:
# Developing a correlation matrix
df_corr = chef_df.corr(method = "pearson").round(decimals = 2)

# Filtering the results to only show correlations with CROSS_SELL_SUCCESS
df_corr['CROSS_SELL_SUCCESS'].sort_values(ascending = False)

In [None]:
# Declaring explanatory variables
chef_df_data = chef_df.drop('CROSS_SELL_SUCCESS' , axis = 1)

# Declaring response variable
chef_df_response = chef_df.loc[ :  ,'CROSS_SELL_SUCCESS']

In [None]:
# Train-Test split with stratification
x_train, x_test, y_train, y_test = train_test_split(
            chef_df_data,
            chef_df_response,
            test_size    = 0.25,
            random_state = 219,
            stratify     = chef_df_response)


# Merging training data for statsmodels
chef_df_train = pd.concat([x_train, y_train], axis = 1)

In [None]:
# Creating a dictionary to store candidate models
candidate_dict = {
    'logit_good'   : [ 'TOTAL_MEALS_ORDERED',
                   'CONTACTS_W_CUSTOMER_SERVICE', 'PRODUCT_CATEGORIES_VIEWED', 
                   'AVG_TIME_PER_SITE_VISIT', 'MOBILE_NUMBER', 'CANCELLATIONS_BEFORE_NOON',
                   'CANCELLATIONS_AFTER_NOON', 'TASTES_AND_PREFERENCES', 'PC_LOGINS', 
                   'MOBILE_LOGINS', 'WEEKLY_PLAN', 'EARLY_DELIVERIES', 'LATE_DELIVERIES',
                   'PACKAGE_LOCKER', 'REFRIGERATED_LOCKER', 'AVG_PREP_VID_TIME',
                   'LARGEST_ORDER_SIZE', 'MASTER_CLASSES_ATTENDED',
                   'MEDIAN_MEAL_RATING', 'AVG_CLICKS_PER_VISIT','TOTAL_PHOTOS_VIEWED', 
                   'number_of_names', 'email_junk', 'email_professional', 'female']    
}

# train/test split with the model
chef_df_data   =  chef_df.loc[ : , candidate_dict['logit_good']]
chef_df_target =  chef_df.loc[ : , 'CROSS_SELL_SUCCESS']

x_train, x_test, y_train, y_test = train_test_split(
            chef_df_data,
            chef_df_target,
            random_state = 219,
            test_size    = 0.25,
            stratify     = chef_df_target)

In [None]:
# building a model based on hyperparameter tuning results

# copy/pasting in the best_estimator_ results to avoid running another RandomizedSearch
forest_tuned = RandomForestClassifier(bootstrap=False, min_samples_leaf=11, random_state=219,
                       warm_start=True)


# FITTING the model object
forest_tuned_fit = forest_tuned.fit(chef_df_data, chef_df_target)


# PREDICTING based on the testing set
forest_tuned_pred = forest_tuned_fit.predict(x_test)


# SCORING the results
print('Forest Tuned Training ACCURACY:', forest_tuned.score(x_train, y_train).round(4))
print('Forest Tuned Testing  ACCURACY:', forest_tuned.score(x_test, y_test).round(4))
print('Forest Tuned AUC Score        :', roc_auc_score(y_true  = y_test,
                                                       y_score = forest_tuned_pred).round(decimals = 3))


# saving scoring data for future use
forest_tuned_train_score = forest_tuned.score(x_train, y_train).round(4) # accuracy
forest_tuned_test_score  = forest_tuned.score(x_test, y_test).round(4)   # accuracy


# saving the AUC score
forest_tuned_auc = roc_auc_score(y_true  = y_test,
                                 y_score = forest_tuned_pred).round(decimals = 3) # auc

In [None]:
# unpacking the confusion matrix
tuned_rf_tn, \
tuned_rf_fp, \
tuned_rf_fn, \
tuned_rf_tp = confusion_matrix(y_true = y_test, y_pred = forest_tuned_pred).ravel()


# printing each result one-by-one
print(f"""
True Negatives : {tuned_rf_tn}
False Positives: {tuned_rf_fp}
False Negatives: {tuned_rf_fn}
True Positives : {tuned_rf_tp}
""")