# Management Report
Split datasets were to ensure the best fit quality.
 
## Revenue 
The factors in figure below are statistically related to revenue. 71% of variance in revenue can be explained by these variables. A 1% change in a factor will result in a percentage change in revenue. The KNN Neighbors Algorithm has the best test result of 0.780. It looks at the data and groups revenue into classes with similar factors.

![revenue](https://static.wixstatic.com/media/3fe52d_93bf69a7a0844dbfb390f10cacb03a2d~mv2.png/v1/fill/w_1200,h_562,al_c,q_90,usm_0.66_1.00_0.01/3fe52d_93bf69a7a0844dbfb390f10cacb03a2d~mv2.webp)

### Insights

1.	Promotion deals such as discounts, limited offers, and free delivery with a minimum spending amount can push more meals per order as these factors influence revenue the most. 
2.	High (ranking/ satisfaction) influences spending behavior. Offering customer service to those with a low rank can maximize retention. This can be done by offering personalized coupons/discounts. Discovering why the ranking is low to improve service/offerings is important. 

## Cross-Sell Success Customers
Different models to calculate the probability of a characteristic to occur. Based on the probabilities and importance, the characteristics in figure below were chosen. Gradient Boosted Machines Algorithm has an AUC score of 0.826, showing that it can distinguish the someone being likely to use the Halfway promotion. 

![cross_sell_success](https://static.wixstatic.com/media/3fe52d_8a3bd385e0514ff582b3415a810ff568~mv2.png/v1/fill/w_1200,h_728,al_c,q_90,usm_0.66_1.00_0.01/3fe52d_8a3bd385e0514ff582b3415a810ff568~mv2.webp)

### Insights
Customers are female, use their personal email. and give 3–4-star rankings. They received early deliveries, have lockers, and, on average, order more meals, which could indicate that they are open to receive offers. They view quite some content before making decisions (click and photos viewed). They have also attended master classes previously. 

1. It is important to create email promotions that are oriented towards women, with great visual content and details, specifying a relaxed process due to early wine deliveries, stored in their locker on Wednesdays. 
2. Inform master class participants about this promotion. Showing them the benefits and details, and a sign-up option, creates a cross sell opportunity. 

In [None]:
# Dataset 

### PACKAGES, FILE, CHANGES AND SHOWING DATA 

#1. IMPORTANT PACKAGES
import pandas as pd                                    # essential datascience
import matplotlib.pyplot as plt                        # data visualization
import numpy as np                                     # mathimatics
import seaborn as sns                                  # enhanced graphics
import sklearn.linear_model                            # Different models
import statsmodels.formula.api as smf                  # statsmodels
from sklearn.model_selection import train_test_split   # train/test split
from sklearn.preprocessing import StandardScaler       # standard scaler
from sklearn.metrics import confusion_matrix           # confusion matrix
from sklearn.metrics import roc_auc_score              # auc score
from sklearn.neighbors import KNeighborsClassifier     # KNN Classification
from sklearn.neighbors import KNeighborsRegressor      # KNN regressions
from sklearn.tree import DecisionTreeClassifier        # classification trees
from sklearn.tree import export_graphviz               # exports graphics
from six import StringIO                               # objects in memory
from IPython.display import Image                      # displays on frontend
import pydotplus                                       # Graphviz’s Interface
from sklearn.linear_model import LogisticRegression    # logistic regression
from sklearn.model_selection import RandomizedSearchCV # hyperparameter tuning
from sklearn.metrics import make_scorer                # customizable scorer
from sklearn.ensemble import RandomForestClassifier    # random forest
from sklearn.ensemble import GradientBoostingClassifier# gbm


# pip install gender_guesser (remove # if you need to install it)
import gender_guesser.detector as gender # guess gender based on (given) name

# setting pandas print options (columns, rows, and display width)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
pd.options.display.max_columns = None


## 2. IMPORTING THE DATESET
# Specifying the path and file name
file = './datasets/Apprentice_Chef_Dataset.xlsx'

# Reading the file into Python
ap_customers = pd.read_excel(io=file)

## 3. CHANGING MISLABELD COLUMN NAME AND COLUMN PRESENTATION
# Changing the name of largest_order_size to average_meals_ordered
ap_customers.rename(columns={'LARGEST_ORDER_SIZE':'AVERAGE_MEALS_ORDERED'}, 
                    inplace=True)

# Changing the capitalized columns to lowercase (personal preference)
ap_customers.columns = map(str.lower, ap_customers.columns)

## 4. SHOWING THE DATAFRAME
# Checking if the data was imported and changed correctly 
#ap_customers.head(n = 5)

##############################################################################
## PLACEHOLDERS

# Creating an placeholder for delivery variables 
delivery_variables = ['early_deliveries', 'late_deliveries']

# Creating an placeholder for cancellations variables 
cancellation_variables = ['cancellations_before_noon','cancellations_after_noon']

# Creating an placeholder for behavior_variables
behavior_variables = ['pc_logins', 'mobile_logins','avg_clicks_per_visit','product_categories_viewed', 'total_photos_viewed', 'median_meal_rating']

# Creating an placeholder for service_variables
purchase_variables = ['unique_meals_purch','average_meals_ordered','weekly_plan', "total_meals_ordered"]

# Creating an placeholder for service_variables
service_variables = ['contacts_w_customer_service','master_classes_attended']

# Creating an placeholder for categorical_variables
categorical_variables = ['name','email', 'first_name','family_name','mobile_number','cross_sell_success','tastes_and_preferences','package_locker', 'refrigerated_locker']


In [None]:
# KNN Neighbors Regression (revenue) 
##############################################################################
## COPY DATASET

# Make a copy
ap_customer_2 = ap_customers.copy()

##############################################################################
## Count the number of 0 values of each variables 
early_deliveries_no   = ap_customers['early_deliveries'].isin([0]).sum() 
late_deliveries_no = ap_customers['late_deliveries'].isin([0]).sum()
cancellations_before_noon_no = ap_customers['cancellations_before_noon'].isin([0]).sum() 
cancellations_after_noon_no = ap_customers['cancellations_after_noon'].isin([0]).sum()
total_photos_viewed_no = ap_customers['total_photos_viewed'].isin([0]).sum() 
product_categories_viewed_no = ap_customers['product_categories_viewed'].isin([0]).sum()
pc_logins_no = ap_customers['pc_logins'].isin([0]).sum()
mobile_logins_no = ap_customers['mobile_logins'].isin([0]).sum()
avg_clicks_per_visit_no = ap_customers['avg_clicks_per_visit'].isin([0]).sum()
total_meals_ordered_no = ap_customers['total_meals_ordered'].isin([0]).sum()
average_meals_ordered_no = ap_customers['average_meals_ordered'].isin([0]).sum()
weekly_plan_no = ap_customers['weekly_plan'].isin([0]).sum()
unique_meals_purch_no = ap_customers['unique_meals_purch'].isin([0]).sum()
master_classes_attended_no = ap_customers['master_classes_attended'].isin([0]).sum()
contacts_w_customer_service_no = ap_customers['contacts_w_customer_service'].isin([0]).sum()
median_meal_rating_no = ap_customers['median_meal_rating'].isin([0]).sum()


##############################################################################
# FEATURE ENGINEERING

##########################################
#Variable: total cancellations
ap_customer_2['total_cancellations'] = ap_customer_2['cancellations_before_noon'] + ap_customer_2['cancellations_after_noon']

##########################################
# Variable: Occasion 
# STEP 1: splitting personal emails

# placeholder list
placeholder_lst = []

# looping over each email address
for index, col in ap_customer_2.iterrows():
    
    # splitting email domain at '@'
    split_email = ap_customer_2.loc[index, 'email'].split(sep = '@')
    
    # appending placeholder_lst with the results
    placeholder_lst.append(split_email)
    

# converting placeholder_lst into a DataFrame to convert the email
email_df = pd.DataFrame(placeholder_lst)

# Creating a new list
placeholder_lst2 = []

#defining which emails belong to professional 
professional = ['mmm.com','amex.com','apple.com','boeing.com','caterpillar.com',\
                'chevron.com','cisco.com','cocacola.com','disney.com','dupont.com',\
                'exxon.com','ge.org','goldmansacs.com','homedepot.com','ibm.com',\
                'intel.com@jnj.com','jpmorgan.com','mcdonalds.com','merck.com',\
                'microsoft.com','nike.com','pfizer.com','pg.com','travelers.com',\
                'unitedtech.com','unitedhealth.com','verizon.com','visa.com',\
                'walmart.com']

#defining which emails belong to personal
personal = ['gmail.com','yahoo.com','protonmail.com']

# loop over each variable to determine which category it is and put it in
# a list
for row in email_df[1]:
    if row in professional: 
        placeholder_lst2.append('work')
    elif row in personal:
        placeholder_lst2.append('personal')
    else: 
        placeholder_lst2.append('other')

# Adding a new column to the dataframe 
ap_customer_2['occasion'] = placeholder_lst2  

# Setting a placeholder
occasion = ['occasion']

# Getting the dummies for each of the occasions. 
occasion_dummies = pd.get_dummies(ap_customer_2['occasion'])

# Dropping the original column
ap_customer_2 = ap_customer_2.drop(columns= occasion)

# Adding the dummies to the feature engieering dataset
ap_customer_2 = ap_customer_2.join([occasion_dummies])

##########################################
# Creating a new variable: total order with a calculation of other variables
ap_customer_2['total_orders'] = round(ap_customer_2['total_meals_ordered'] / ap_customer_2['average_meals_ordered'],2)

##########################################
gender_guess = np.array(['unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'female', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'mostly_male', 'female', 'unknown', 'male', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'male', 'unknown', 'male', 'unknown', 'unknown', 'male', 'male', 'male', 'female', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'male', 'unknown', 'male', 'unknown', 'female', 'male', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'male', 'female', 'female', 'unknown', 'male', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'female', 'unknown', 'andy', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'mostly_male', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_male', 'unknown', 'mostly_male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'female', 'unknown', 'male', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'mostly_male', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'female', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'male', 'unknown', 'male', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'mostly_male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'unknown', 'unknown', 'andy', 'unknown', 'unknown', 'female', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_male', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'unknown', 'male', 'male', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'andy', 'male', 'unknown', 'unknown', 'male', 'male', 'female', 'female', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'female', 'unknown', 'male', 'female', 'unknown', 'unknown', 'unknown', 'female', 'male', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'female', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'male', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'mostly_female', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'female', 'female', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'female', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'female', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'male', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'female', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'mostly_female', 'unknown', 'male', 'unknown', 'female', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'female', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'male', 'male', 'unknown', 'female', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'male', 'male', 'male', 'male', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'female', 'male', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'male', 'unknown', 'unknown', 'male', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'male', 'male', 'mostly_male', 'male', 'male', 'male', 'male', 'mostly_male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'male', 'unknown', 'unknown', 'andy', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'mostly_male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'female', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'male', 'mostly_male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'male', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'male', 'unknown', 'male', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'female', 'female', 'male', 'male', 'female', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_male', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'male', 'andy', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'female', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'male', 'male', 'unknown', 'female', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'andy', 'unknown', 'female', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'female', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'mostly_male', 'male', 'male', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'male', 'male', 'male', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'mostly_male', 'male', 'unknown', 'male', 'unknown', 'unknown', 'female', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'male', 'male', 'unknown', 'male', 'male', 'male', 'male', 'unknown', 'unknown', 'unknown', 'female', 'male', 'male', 'unknown', 'male', 'unknown', 'mostly_female', 'male', 'unknown', 'unknown', 'female', 'male', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'male', 'unknown', 'unknown', 'unknown', 'female', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'male', 'male', 'female', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'mostly_male', 'mostly_male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'female', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'male', 'male', 'male', 'mostly_male', 'unknown', 'unknown', 'male', 'andy', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'mostly_female', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'male', 'female', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'mostly_female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'female', 'male', 'female', 'mostly_female', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'male', 'unknown', 'male', 'male', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'mostly_female', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'female', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'female', 'female', 'female', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'male', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'mostly_female', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'mostly_female', 'female', 'female', 'male', 'male', 'male', 'unknown', 'unknown', 'mostly_female', 'unknown', 'unknown', 'male', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'female', 'unknown', 'male', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'female', 'unknown', 'male', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'male', 'unknown', 'andy', 'unknown', 'unknown', 'male', 'male', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'mostly_male', 'male', 'male', 'unknown', 'male', 'unknown', 'mostly_male', 'female', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'unknown', 'unknown', 'unknown', 'female', 'female', 'unknown', 'unknown', 'unknown', 'mostly_male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'male', 'male', 'unknown', 'female', 'unknown', 'unknown', 'male', 'male', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'andy', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'mostly_male', 'unknown', 'male', 'male', 'unknown', 'unknown', 'male', 'male', 'male', 'male', 'andy', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'female', 'female', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'female', 'unknown', 'unknown'])
 
gender_guess[gender_guess == 'mostly_male'] = 'male'
gender_guess[gender_guess == 'mostly_female'] = 'female'
gender_guess[gender_guess == 'andy'] = 'unknown'

ap_customer_2['gender_guess'] = pd.Series(gender_guess)

# Getting the dummies for each of the genders 
gender_dummies = pd.get_dummies(ap_customer_2['gender_guess'])

# Dropping the original column
ap_customer_2 = ap_customer_2.drop(columns= ['gender_guess'])

# Adding the dummies to the feature engieering dataset
ap_customer_2 = ap_customer_2.join([gender_dummies])

##########################################
# Creating a dataframe for all log values
log_variables = ap_customer_2.copy()

# Specifying which variables data types need to be changed to a float
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']

# loop over datatypes to change the datatype
for c in log_variables.columns: 
    if log_variables[c].dtype in numerics:
        log_variables[c] = log_variables[c].astype(float)

# Dropping all the variables that do not need changing
log_variables = log_variables.drop(columns = categorical_variables) 

# Specifying whihch columns have 0 in the dataset.
value_change_columns = ["cancellations_before_noon","cancellations_after_noon","mobile_logins","weekly_plan","total_photos_viewed", "late_deliveries", "early_deliveries", "master_classes_attended", "total_cancellations"]

# Changing the 0 to 0.01 for a proper log transformation
log_variables[value_change_columns] = log_variables[value_change_columns].replace({0.0:0.01})

#loop over all values in columns to change it into log; 
for column in log_variables.columns:
    try:
        log_variables[column] = np.log10(log_variables[column])
    except (ValueError, AttributeError):
        pass

# Adding log to each of the columsn to make it clear which are transformations
log_variables.columns = [str(col) + '_log' for col in log_variables.columns]

# Adding these columns to the feature engineering dataframe
ap_customer_2 = pd.concat([ap_customer_2, log_variables],axis = 1)

##########################################
# dummy variable for spending time on the website
ap_customer_2['length_time_spent_website'] = 0

# iterating over each original column to
# change values in the new feature columns
for index, value in ap_customer_2.iterrows():
    
    # people that spend more than 60 minutes on the website
    if ap_customer_2.loc[index, 'avg_time_per_site_visit'] > 60:
        ap_customer_2.loc[index, 'length_time_spent_website'] = 1


  # people that spend more less than 60 minutes on the website
    if ap_customer_2.loc[index, 'avg_time_per_site_visit'] <=60:
        ap_customer_2.loc[index, 'length_time_spent_website'] = 0
        
##########################################
#Delivery variables 

# dummy variable for having a basement.
ap_customer_2['has_early_deliveries'] = 0
ap_customer_2['has_late_deliveries'] = 0

# iterating over each original column to
# change values in the new feature columns
for index, value in ap_customer_2.iterrows():
    
    # has_early_deliveries 
    if ap_customer_2.loc[index, 'early_deliveries'] > 0:
        ap_customer_2.loc[index, 'has_early_deliveries'] = 1


  # has_late_deliveries 
    if ap_customer_2.loc[index, 'late_deliveries'] > 0:
        ap_customer_2.loc[index, 'has_late_deliveries'] = 1

##########################################
#Cancellations variable

#Creating a dummy variable
ap_customer_2['has_cancellations']   = 0

# iterating over each original column to
# change values in the new feature columns
for index, value in ap_customer_2.iterrows():
    
    # has_cancellations_b_noon
    if ap_customer_2.loc[index, 'total_cancellations'] > 0:
        ap_customer_2.loc[index, 'has_cancellations'] = 1
        
##########################################
# Behavior Variables 

#Creating  dummy variables
ap_customer_2['has_total_photos_viewed']   = 0
ap_customer_2['has_mobile_logins']         = 0

# iterating over each original column to
# change values in the new feature columns
for index, value in ap_customer_2.iterrows():
    
    # has_early_deliveries 
    if ap_customer_2.loc[index, 'mobile_logins'] > 0:
        ap_customer_2.loc[index, 'has_mobile_logins'] = 1


  # has_late_deliveries 
    if ap_customer_2.loc[index, 'total_photos_viewed'] > 0:
        ap_customer_2.loc[index, 'has_total_photos_viewed'] = 1

##########################################
# Purchase Variables 

#Creating a dummy variable
ap_customer_2['has_weekly_plan']   = 0

# iterating over each original column to
# change values in the new feature columns
for index, value in ap_customer_2.iterrows():
    
    # has_early_deliveries 
    if ap_customer_2.loc[index, 'weekly_plan'] > 0:
        ap_customer_2.loc[index, 'has_weekly_plan'] = 1
        

##########################################
# Service Variables 

#Creating a dummy variable
ap_customer_2['has_master_classes_attended']   = 0

# iterating over each original column to
# change values in the new feature columns
for index, value in ap_customer_2.iterrows():
    
    # has_early_deliveries 
    if ap_customer_2.loc[index, 'master_classes_attended'] > 0:
        ap_customer_2.loc[index, 'has_master_classes_attended'] = 1
        
##########################################     
#Splitting up the variables for ranking Because of high correlation

# Using pd.get_dummies to get the different rankings
ratings = pd.get_dummies(ap_customer_2['median_meal_rating'])

#Giving the columns a name for each of the rankings 
ratings.columns = ['one_star_rank', 'two_star_rank', 'three_star_rank', 'four_star_rank','five_star_rank'] 

# Dropping the orihinal column
ap_customer_2 = ap_customer_2.drop(columns=['median_meal_rating'])

#Adding the new created columns to the feature engineering dataset
ap_customer_2 = ap_customer_2.join([ratings])

##########################################
# dropping categorical variables after they've been encoded
categorical_variables2 = ['name', 'first_name','email','family_name']

# Cropping the columns that are not needed
ap_customer_2 = ap_customer_2.drop(columns= categorical_variables2)

# Creating a variables that shows all the columns that need to be dropped
drop_final = ['other_log','personal_log','work_log','five_star_rank']

# Dropping the final variables that are not supposed to be created
ap_customer_2=ap_customer_2.drop(columns=drop_final)

##############################################################################
#KNN Standardized Data Response: revenue_log 

#SUBSETTING original dataset
model_1_data = ap_customer_2[['avg_prep_vid_time','average_meals_ordered', 'total_orders', 'total_meals_ordered_log','unique_meals_purch_log', 'contacts_w_customer_service','master_classes_attended_log','total_photos_viewed_log','length_time_spent_website','one_star_rank', 'two_star_rank', 'four_star_rank']]
target1 = ap_customer_2.loc[ : , 'revenue_log'] #

# INSTANTIATING a StandardScaler() object
scaler = StandardScaler()

# FITTING the scaler with housing_data
scaler.fit(model_1_data)


# TRANSFORMING our data after fit
X_scaled = scaler.transform(model_1_data)

# converting scaled data into a DataFrame
X_scaled_df = pd.DataFrame(X_scaled)

#New training data
X_train_STAND, X_test_STAND, y_train_STAND, y_test_STAND = train_test_split(
            X_scaled_df,
            target1,
            test_size = 0.25,
            random_state = 219)

# INSTANTIATING a model with the optimal number of neighbors
knn_stand = KNeighborsRegressor(algorithm = 'auto',
                   n_neighbors = 23)


# FITTING the model based on the training data
knn_stand_fit = knn_stand.fit(X_train_STAND, y_train_STAND)


# PREDITCING on new data
knn_stand_pred = knn_stand_fit.predict(X_test_STAND)


# SCORING the results
print('KNN Training Score:', knn_stand.score(X_train_STAND,y_train_STAND).round(4))
print('KNN Testing Score :', knn_stand.score(X_test_STAND, y_test_STAND).round(4))


# saving scoring data for future use
knn_stand_score_train = knn_stand.score(X_train_STAND,y_train_STAND).round(4)
knn_stand_score_test  = knn_stand.score(X_test_STAND, y_test_STAND).round(4)


# displaying and saving the gap between training and testing
print('KNN Train-Test Gap:', abs(knn_stand_score_train - knn_stand_score_test).round(4))
knn_stand_test_gap = abs(knn_stand_score_train - knn_stand_score_test).round(4)


In [None]:
# Gradient Boosted Machines (Cross sell success) 

##############################################################################
## COPY DATASET

# Make a copy
ap_customer_2 = ap_customers.copy()

##############################################################################
## FEATURE ENGIGEERING: 

#####################################
# OCCASION

# STEP 1: splitting personal emails

# placeholder list
placeholder_lst = []

# looping over each email address
for index, col in ap_customer_2.iterrows():
    
    # splitting email domain at '@'
    split_email = ap_customer_2.loc[index, 'email'].split(sep = '@')
    
    # appending placeholder_lst with the results
    placeholder_lst.append(split_email)
    

# converting placeholder_lst into a DataFrame to convert the email
email_df = pd.DataFrame(placeholder_lst)

# Creating a new list
placeholder_lst2 = []

#defining which emails belong to professional 
professional = ['mmm.com','amex.com','apple.com','boeing.com','caterpillar.com',\
                'chevron.com','cisco.com','cocacola.com','disney.com','dupont.com',\
                'exxon.com','ge.org','goldmansacs.com','homedepot.com','ibm.com',\
                'intel.com@jnj.com','jpmorgan.com','mcdonalds.com','merck.com',\
                'microsoft.com','nike.com','pfizer.com','pg.com','travelers.com',\
                'unitedtech.com','unitedhealth.com','verizon.com','visa.com',\
                'walmart.com']

#defining which emails belong to personal
personal = ['gmail.com','yahoo.com','protonmail.com']

# loop over each variable to determine which category it is and put it in
# a list
for row in email_df[1]:
    if row in professional: 
        placeholder_lst2.append('work')
    elif row in personal:
        placeholder_lst2.append('personal')
    else: 
        placeholder_lst2.append('other')

# Adding a new column to the dataframe 
ap_customer_2['occasion'] = placeholder_lst2  

# Setting a placeholder
occasion = ['occasion']

# Getting the dummies for each of the occasions. 
occasion_dummies = pd.get_dummies(ap_customer_2['occasion'])


# Dropping the original column
ap_customer_2 = ap_customer_2.drop(columns=['occasion'])

# Adding the dummies to the feature engieering dataset
ap_customer_2['other']= occasion_dummies.loc[:,'other']
ap_customer_2['work']= occasion_dummies.loc[:,'work']
ap_customer_2['personal']= occasion_dummies.loc[:,'personal']


#####################################
#GENDER
# guessing gender based on (given) name
def gender_guesser():
    # Setting a placeholder for the list/ 
    placeholder_lst = []
    # looping to guess gender
    for name in ap_customer_2['first_name']:
        guess = gender.Detector().get_gender(name)
        print(guess)
        placeholder_lst.append(guess)

# Calling genderguesser to generate variables (remove # to run)
# gender_guesser()

# All variables copied from Gender Guesser
gender_guess = np.array(['unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'female', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'mostly_male', 'female', 'unknown', 'male', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'male', 'unknown', 'male', 'unknown', 'unknown', 'male', 'male', 'male', 'female', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'male', 'unknown', 'male', 'unknown', 'female', 'male', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'male', 'female', 'female', 'unknown', 'male', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'female', 'unknown', 'andy', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'mostly_male', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_male', 'unknown', 'mostly_male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'female', 'unknown', 'male', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'mostly_male', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'female', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'male', 'unknown', 'male', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'mostly_male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'unknown', 'unknown', 'andy', 'unknown', 'unknown', 'female', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_male', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'unknown', 'male', 'male', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'andy', 'male', 'unknown', 'unknown', 'male', 'male', 'female', 'female', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'female', 'unknown', 'male', 'female', 'unknown', 'unknown', 'unknown', 'female', 'male', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'female', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'male', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'mostly_female', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'female', 'female', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'female', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'female', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'male', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'male', 'male', 'male', 'male', 'male', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'female', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'mostly_female', 'unknown', 'male', 'unknown', 'female', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'female', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'male', 'male', 'unknown', 'female', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'male', 'male', 'male', 'male', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'female', 'male', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'male', 'unknown', 'unknown', 'male', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'male', 'male', 'mostly_male', 'male', 'male', 'male', 'male', 'mostly_male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'male', 'unknown', 'unknown', 'andy', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'mostly_male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'female', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'male', 'mostly_male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'male', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'male', 'unknown', 'male', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'female', 'female', 'male', 'male', 'female', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_male', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'male', 'andy', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'female', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'male', 'male', 'unknown', 'female', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'andy', 'unknown', 'female', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'female', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'mostly_male', 'male', 'male', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'male', 'male', 'male', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'mostly_male', 'male', 'unknown', 'male', 'unknown', 'unknown', 'female', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'male', 'male', 'unknown', 'male', 'male', 'male', 'male', 'unknown', 'unknown', 'unknown', 'female', 'male', 'male', 'unknown', 'male', 'unknown', 'mostly_female', 'male', 'unknown', 'unknown', 'female', 'male', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'male', 'unknown', 'unknown', 'unknown', 'female', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'male', 'male', 'female', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'mostly_male', 'mostly_male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'female', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'male', 'male', 'male', 'mostly_male', 'unknown', 'unknown', 'male', 'andy', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'mostly_female', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'male', 'female', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'mostly_female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'female', 'male', 'female', 'mostly_female', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'male', 'unknown', 'male', 'male', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'mostly_female', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'female', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'female', 'female', 'female', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'male', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'mostly_female', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'mostly_female', 'female', 'female', 'male', 'male', 'male', 'unknown', 'unknown', 'mostly_female', 'unknown', 'unknown', 'male', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'female', 'unknown', 'male', 'unknown', 'unknown', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'female', 'unknown', 'male', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'male', 'unknown', 'andy', 'unknown', 'unknown', 'male', 'male', 'male', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'mostly_male', 'male', 'male', 'unknown', 'male', 'unknown', 'mostly_male', 'female', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'unknown', 'unknown', 'unknown', 'female', 'female', 'unknown', 'unknown', 'unknown', 'mostly_male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'male', 'male', 'unknown', 'female', 'unknown', 'unknown', 'male', 'male', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'mostly_female', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'andy', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'male', 'mostly_male', 'unknown', 'male', 'male', 'unknown', 'unknown', 'male', 'male', 'male', 'male', 'andy', 'unknown', 'unknown', 'female', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'female', 'female', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'unknown', 'male', 'unknown', 'unknown', 'female', 'unknown', 'unknown'])

# Changing some of the variables 
gender_guess[gender_guess == 'mostly_male'] = 'male'
gender_guess[gender_guess == 'mostly_female'] = 'female'
gender_guess[gender_guess == 'andy'] = 'unknown'

# Adding the genders to the dataset
ap_customer_2['gender_guess'] = pd.Series(gender_guess)

# Getting the dummies for each of the genders 
gender_dummies = pd.get_dummies(ap_customer_2['gender_guess'])

# Dropping the original column
ap_customer_2 = ap_customer_2.drop(columns= ['gender_guess'])

# Adding the dummies to the feature engieering dataset
ap_customer_2 = ap_customer_2.join([gender_dummies])

#####################################
# Creating a new variable: total order with a calculation of other variables
ap_customer_2['total_orders'] = round(ap_customer_2['total_meals_ordered'] / ap_customer_2['average_meals_ordered'],2)

#####################################
# Creating a new variable: total logins with a calculation of other variables
ap_customer_2['total_logins'] = (ap_customer_2['pc_logins'] + ap_customer_2['mobile_logins'])

#####################################
# AVERAGE TIME ON THE WEBSITE

# Creating a column
ap_customer_2['length_time_spent_website'] = 0

# iterating over each original column to
# change values in the new feature columns
for index, value in ap_customer_2.iterrows():
    
    # people that spend more than 60 minutes on the website
    if ap_customer_2.loc[index, 'avg_time_per_site_visit'] > 60:
        ap_customer_2.loc[index, 'length_time_spent_website'] = 1


  # people that spend more less than 60 minutes on the website
    if ap_customer_2.loc[index, 'avg_time_per_site_visit'] <=60:
        ap_customer_2.loc[index, 'length_time_spent_website'] = 0
        
# Getting the dummies for the avg time spent on the website
length_website = pd.get_dummies(ap_customer_2['length_time_spent_website'])

# Changing the column names 
length_website.columns = ['below_60_min_website_time', 'above_60_min_website_time']

# adding the data to the dataset
ap_customer_2['below_60_min_website_time'] = length_website.loc[:, "below_60_min_website_time"]
ap_customer_2['above_60_min_website_time'] = length_website.loc[:, "above_60_min_website_time"]

# dropping original column
ap_customer_2.drop(columns=("length_time_spent_website"))

#####################################
#Delivery variables 

ap_customer_2['has_early_deliveries']   = 0
ap_customer_2['has_late_deliveries'] = 0

# create new variable named HAVE_EARLY_DELIVERIES
for index, value in ap_customer_2.iterrows():
    
    # has_early_deliveries
    if ap_customer_2.loc[index, 'early_deliveries'] > 0:
        ap_customer_2.loc[index, 'has_early_deliveries'] = 1
        
      # has_late_deliveries 
    if ap_customer_2.loc[index, 'late_deliveries'] > 0:
        ap_customer_2.loc[index, 'has_late_deliveries'] = 1
        
#####################################
# Locker Variables
# Creating a new columns for lockers based on a calculation       
ap_customer_2['total_lockers'] = ap_customer_2['package_locker'] + ap_customer_2['refrigerated_locker']
ap_customer_2['locker']   = 0

#go through values in column
for index, value in ap_customer_2.iterrows():
    
    # creating locker value based on condition
    if ap_customer_2.loc[index, 'total_lockers'] > 0:
        ap_customer_2.loc[index, 'locker'] = 1
        
#####################################  
# CANCELLATIONS
# Total number of calculations
ap_customer_2['total_cancellations'] = ap_customer_2['cancellations_before_noon'] + ap_customer_2['cancellations_after_noon']


# creating an empty column
ap_customer_2['has_cancellations']   = 0

#go through values in column
for index, value in ap_customer_2.iterrows():
    
    # creating cancellation  value based on condition
    if ap_customer_2.loc[index, 'total_cancellations'] > 0:
        ap_customer_2.loc[index, 'has_cancellations'] = 1

#####################################
# Clicks, who has more than 10 clicks? 

# creating an empty column  
ap_customer_2['click_above_avg_10']   = 0

#go through values in column
for index, value in ap_customer_2.iterrows():
    
    # creating click value based on condition
    if ap_customer_2.loc[index, 'avg_clicks_per_visit'] > 10:
        ap_customer_2.loc[index, 'click_above_avg_10'] = 1

##################################### 
# Median Ranking 

# creating an empty column  
ap_customer_2['high_low_ranking']   = 0


#go through values in column
for index, value in ap_customer_2.iterrows():
    
    # creating ranking value based on condition
    if ap_customer_2.loc[index, 'median_meal_rating'] > 3:
        ap_customer_2.loc[index, 'high_low_ranking'] = 1
        
#####################################         
#Video prep time above 60min

# creating an empty column  
ap_customer_2['60_min_prep_time']   = 0

#go through values in column
for index, value in ap_customer_2.iterrows():
    
    # creating video value based on condition
    if ap_customer_2.loc[index, 'avg_prep_vid_time'] > 60:
        ap_customer_2.loc[index, '60_min_prep_time'] = 1
        
#####################################
# Has attended Masterclasses

# creating an empty column  
ap_customer_2['has_master_class_attended']   = 0

#go through values in column
for index, value in ap_customer_2.iterrows():
    
    # creating has attended value based on condition
    if ap_customer_2.loc[index, 'master_classes_attended'] > 0:
        ap_customer_2.loc[index, 'has_master_class_attended'] = 1

#####################################
# has early deliveries

# creating an empty column  
ap_customer_2['has_early_deliveries']   = 0

#go through values in column
for index, value in ap_customer_2.iterrows():
    
    # creating has early delivery value based on condition
    if ap_customer_2.loc[index, 'early_deliveries'] > 0:
        ap_customer_2.loc[index, 'has_early_deliveries'] = 1

#####################################
# Behavior Variables 

# creating an empty columns
ap_customer_2['has_total_photos_viewed']   = 0
ap_customer_2['has_mobile_logins']         = 0

#go through values in columns
for index, value in ap_customer_2.iterrows():
    
    # creating has mobile logins  value based on condition
    if ap_customer_2.loc[index, 'mobile_logins'] > 0:
        ap_customer_2.loc[index, 'has_mobile_logins'] = 1

  # creating has photos viewed value based on condition
    if ap_customer_2.loc[index, 'total_photos_viewed'] > 0:
        ap_customer_2.loc[index, 'has_total_photos_viewed'] = 1
        
#####################################
# Purchase Variables 

# creating an empty columns
ap_customer_2['has_weekly_plan']   = 0


#go through values in columns
for index, value in ap_customer_2.iterrows():
    
    # creating has weekly plan value based on condition
    if ap_customer_2.loc[index, 'weekly_plan'] > 0:
        ap_customer_2.loc[index, 'has_weekly_plan'] = 1
        
#####################################
# dropping categorical variables after they've been encoded
categorical_variables2 = ['name', 'first_name','email','family_name']

# Cropping the columns that are not needed
ap_customer_2 = ap_customer_2.drop(columns= categorical_variables2)

##############################################################################
# VARIABLE OPTIONS

options = {

 'option1'   : ['mobile_number','cancellations_before_noon','pc_logins', 'mobile_logins','refrigerated_locker','work','personal','other','female','male','unknown','tastes_and_preferences','contacts_w_customer_service','product_categories_viewed','has_early_deliveries'],

 'option2'   : ['contacts_w_customer_service','mobile_number', 'tastes_and_preferences','pc_logins','early_deliveries','total_cancellations','locker','60_min_prep_time','has_master_class_attended'], 

 'option3'   : ['mobile_number','cancellations_before_noon','total_logins','refrigerated_locker','personal','other','tastes_and_preferences','contacts_w_customer_service','product_categories_viewed','has_early_deliveries','female','male','total_orders','has_master_class_attended','60_min_prep_time','total_meals_ordered'],

 'option4'   : ['revenue', 'total_meals_ordered', 'unique_meals_purch', 'contacts_w_customer_service', 'product_categories_viewed', 'avg_time_per_site_visit', 'mobile_number', 'tastes_and_preferences', 'pc_logins', 'mobile_logins', 'weekly_plan', 'late_deliveries', 'average_meals_ordered','total_photos_viewed', 'personal', 'female', 'has_cancellations', 'locker', 'click_above_avg_10', 'high_low_ranking', '60_min_prep_time', 'has_master_class_attended', 'has_early_deliveries']  
}
##############################################################################
# CREATING PERFORMANCE OVERVIEW

# Convertion variables to show
values = options.values()
values_list = list(values)

#Creating a dataframe
model_performance = pd.DataFrame()
model_performance['Model Name'] = 0
model_performance['AUC Score'] = 0
model_performance['Training Accuracy'] = 0
model_performance['Testing Accuracy'] = 0
model_performance['Confusion Matrix'] = 0
model_performance['Variables'] = 0

##############################################################################
# FUNCTION: DISPLAY TREE

# display_tree
########################################
def display_tree(tree, feature_df, height = 500, width = 800):
    """
    PARAMETERS
    ----------
    tree       : fitted tree model object
        fitted CART model to visualized
    feature_df : DataFrame
        DataFrame of explanatory features (used to generate labels)
    height     : int, default 500
        height in pixels to which to constrain image in html
    width      : int, default 800
        width in pixels to which to constrain image in html
    """

    # visualizing the tree
    dot_data = StringIO()

    
    # exporting tree to graphviz
    export_graphviz(decision_tree      = tree,
                    out_file           = dot_data,
                    filled             = True,
                    rounded            = True,
                    special_characters = True,
                    feature_names      = feature_df.columns)


    # declaring a graph object
    graph = pydotplus.graph_from_dot_data(dot_data.getvalue())


    # creating image
    img = Image(graph.create_png(),
                height = height,
                width  = width)
    
    return img

########################################
# plot_feature_importances
########################################
def plot_feature_importances(model, train, export = False):
    """
    Plots the importance of features from a CART model.
    
    PARAMETERS
    ----------
    model  : CART model
    train  : explanatory variable training data
    export : whether or not to export as a .png image, default False
    """
    
    # declaring the number
    n_features = train.shape[1]
    
    # setting plot window
    fig, ax = plt.subplots(figsize=(12,9))
    
    model_sorted = np.sort(model.feature_importances_)
    
    plt.barh(range(n_features), model_sorted, align='center')
    plt.yticks(pd.np.arange(n_features), train.columns)
    plt.xlabel("Feature importance")
    plt.ylabel("Feature")
    
    if export == True:
        plt.savefig('./analysis_images/Feature_Importance.png')

##############################################################################
# TUNED GRADIENT BOOSTED MACHINE MODEL

#Setting variables
gbm_data =  ap_customer_2.loc[ : , options['option4']]
target = ap_customer_2.loc[ :,"cross_sell_success"]

# train-test split with stratification
x_train, x_test, y_train, y_test = train_test_split(
            gbm_data,
            target,
            test_size    = 0.25,
            random_state = 219,
            stratify     = target)

# declaring a hyperparameter space
learn_space        = pd.np.arange(0.1, 1.0, 0.2)
estimator_space    = pd.np.arange(100, 300, 25)
depth_space        = pd.np.arange(1, 8, 1)
warm_start_space   = [True, False]

# creating a hyperparameter grid
param_grid = {'learning_rate' : learn_space,
              'max_depth'     : depth_space,
              'n_estimators'  : estimator_space,
              'warm_start'     : warm_start_space}


# INSTANTIATING the model object without hyperparameters
full_gbm_grid = GradientBoostingClassifier(random_state = 219)


# GridSearchCV object
full_gbm_cv = RandomizedSearchCV(estimator     = full_gbm_grid,
                           param_distributions = param_grid,
                           cv                  = 3,
                           n_iter              = 100,
                           random_state        = 219,
                           n_jobs              = -1,
                           scoring             = make_scorer(roc_auc_score,
                                                 needs_threshold = False))


# FITTING to the FULL DATASET (due to cross-validation)
full_gbm_cv.fit(gbm_data,target)


# PREDICT step is not needed


# printing the optimal parameters and best score
print("Tuned best estimors:")
print("Tuned Parameters  :", full_gbm_cv.best_params_)
print("Tuned Training AUC:", full_gbm_cv.best_score_.round(4))


# INSTANTIATING with best_estimator
gbm_tuned = full_gbm_cv.best_estimator_


# FIT step not needed


# PREDICTING based on the testing set
gbm_tuned_pred = gbm_tuned.predict(x_test)


gbm_tuned_training_score = gbm_tuned.score(x_train, y_train).round(4)
gbm_tuned_testing_score = gbm_tuned.score(x_test, y_test).round(4)
gbm_tuned_auc_score = roc_auc_score(y_true  = y_test,y_score = gbm_tuned_pred).round(4)

# unpacking the confusion matrix
gbm_tuned_tn, \
gbm_tuned_fp, \
gbm_tuned_fn, \
gbm_tuned_tp = confusion_matrix(y_true = y_test, y_pred = gbm_tuned_pred).ravel()


##############################################################################
model_performance = model_performance.append(
                          {'Model Name'         : 'Tuned GBM FINAL',
                           'Training Accuracy'  : gbm_tuned_training_score,
                           'Testing Accuracy'   : gbm_tuned_testing_score,
                           'AUC Score'          : gbm_tuned_auc_score,
                           'Confusion Matrix'   : (gbm_tuned_tn,
                                                   gbm_tuned_fp,
                                                   gbm_tuned_fn,
                                                   gbm_tuned_tp),
                           'Variables'           : values_list[3]},ignore_index = True)



##############################################################################
#PLOT IMPORTANT FEATURES (remove # to show)
#plot_feature_importances(gbm_tuned,train  = x_train, export = False)

##############################################################################
# SHOW MODEL PERFORMANCE
print(f""" 
Model Performance:""")
model_performance