# Assignment 8: Automated Machine Learning (Part 2)
## Objective:

As we learned from the class, the high demand for machine learning has produced a large amount of data scientists who have developed expertise in tools and algorithms. The features in the data will directly influence the results. However, it is tedious and unscalable to manually design and select features without domain knowledge. Thus, using some AutoML techniques will significantly help data scientists save labour and time. 
After completing this assignment, you should be able to answer the following questions:

1. Why do we need AutoML?
2. How does auto feature generation work?
3. How to use featuretools library to automatically generate features?
4. How to get useful features in a large feature space?

Imagine you are a data scientist in an online retailer company, for example, Amazon. Your task is to provide some recommendations to customers based on their historical purchase records.

In this assignment, we predict whether the customer will buy **Banana** in the next 4 weeks. It is a classification problem. To simplify the problem, we have already generated some features and provide the accuracy of the model (Random Forest model). The task for you is to generate **10** useful features and beat our model performance (AUC = 0.61, see below). 

For example, <br>
`MODE(orders.MODE(order_products.product_name)) = Bag of Organic Bananas` means whether the most frequent purchase of the customer is Bag of Organic Bananas. 

```
1: Feature: MODE(orders.MODE(order_products.product_name)) = Bag of Organic Bananas
2: Feature: MODE(order_products.aisle_id) is unknown
3: Feature: SUM(orders.NUM_UNIQUE(order_products.product_name))
4: Feature: MODE(orders.MODE(order_products.product_name)) = Boneless Skinless Chicken Breasts
5: Feature: MODE(order_products.product_name) = Boneless Skinless Chicken Breasts
6: Feature: STD(orders.NUM_UNIQUE(order_products.aisle_id))
7: Feature: MODE(order_products.aisle_id) = 83
8: Feature: MEDIAN(orders.MINUTE(order_time))
9: Feature: MODE(orders.DAY(order_time)) = 23
10: Feature: MODE(orders.MODE(order_products.department)) = produce

AUC 0.61
```


## Preliminary
If you never use featuretools before, you need to learn some basic knowledge of this topic. 
I found that these are some good resources: 
* [featuretools documentation](https://docs.featuretools.com/en/stable/)
* [Tutorial: Automated Feature Engineering in Python](https://towardsdatascience.com/automated-feature-engineering-in-python-99baf11cc219)

The data can be downloaded from [A8-2-data.zip](A8-2-data.zip). 

## 0. Preparation
Import relevant libraries and load the dataset: <br>
users: <br>
* user_id: customer identifier
* label:  1 if the customer will buy banana in next 4 weeks, 0 otherwise

orders: <br>
* order_id: order identifier
* user_id: customer identifier
* order_time: date of the order was placed on 

order_products: <br>
* order_id: order identifier
* order_product_id: foreign key
* reordered:  1 if this product has been ordered by this user in the past, 0 otherwise
* product_name: name of the product
* aisle_id: aisle identifier
* department: the name of the department
* order_time: date of the order was placed on

In [114]:
import pandas as pd

import featuretools as ft
from woodwork.logical_types import Categorical, Boolean
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
import os
ft.__version__

# list all rows and columns
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)

In [115]:
orders = pd.read_csv("orders.csv")
order_products = pd.read_csv("order_products.csv")
users = pd.read_csv("users.csv")

# could drop columns here, if needed
# Drop the first column from each dataframe
orders = orders.drop(orders.columns[0], axis=1)
order_products = order_products.drop(order_products.columns[0], axis=1)
users = users.drop(users.columns[0], axis=1)

print(users["label"].value_counts())
print(orders.shape, order_products.shape)
users.head()

label
False    628
True     139
Name: count, dtype: int64
(5997, 3) (57780, 7)


Unnamed: 0,user_id,label
0,1,False
1,2,True
2,3,False
3,7,False
4,10,False


## Task 1. Feature Generation
In this task, you need to use featuretools to generate candidate features by using the above three tables.

### 1.1 Representing Data with EntitySet

Define entities and their relationships (see [https://docs.featuretools.com/en/stable/generated/featuretools.EntitySet.html](https://docs.featuretools.com/en/stable/generated/featuretools.EntitySet.html))

In [116]:
# Get the relationship between entities
def load_entityset(orders, order_products, users):
    # --- Write your code below ---
    
    # Create an EntitySet
    entitySet = ft.EntitySet(id = 'customers')

    # Create an entity from the users dataframe
    entitySet = entitySet.add_dataframe(dataframe_name = 'users',
                                        dataframe = users, 
                                        index = 'user_id',
                                        logical_types = {'label': Boolean})

    # Create an entity from the orders dataframe
    entitySet = entitySet.add_dataframe(dataframe_name = 'orders',
                                        dataframe = orders,
                                        index = 'order_id',
                                        time_index = 'order_time')

    # Create an entity from the order_products dataframe
    entitySet = entitySet.add_dataframe(dataframe_name = 'order_products',
                                        dataframe = order_products,
                                        index = 'order_product_id',
                                        time_index = 'order_time',
                                        logical_types = {'reordered': Categorical})
    
    # Create a relationship between users (parent) and orders (child) on 'user_id'
    entitySet = entitySet.add_relationship('users', 'user_id', 'orders', 'user_id')

    # Create a relationship between orders (parent) and order_products (child) on 'order_id'
    entitySet = entitySet.add_relationship('orders', 'order_id', 'order_products', 'order_id')

    # return the EntitySet object
    return entitySet

In [117]:
es = load_entityset(orders, order_products, users)
es

  pd.to_datetime(
  pd.to_datetime(
  pd.to_datetime(
  pd.to_datetime(
  pd.to_datetime(
  pd.to_datetime(


Entityset: customers
  DataFrames:
    users [Rows: 767, Columns: 2]
    orders [Rows: 5997, Columns: 3]
    order_products [Rows: 57780, Columns: 7]
  Relationships:
    orders.user_id -> users.user_id
    order_products.order_id -> orders.order_id

### 1.2 Deep Feature Synthesis

In [118]:
# Automatically generate features
es = load_entityset(orders, order_products, users)

# use ft.dfs to perform feature engineering
# --- Write your code below ---
feature_matrix, feature_defs = ft.dfs(entityset = es, target_dataframe_name='users')

  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)
  ).agg(to_agg)


In [119]:
# output what features you generate
feature_matrix

Unnamed: 0_level_0,label,COUNT(orders),COUNT(order_products),MAX(order_products.aisle_id),MEAN(order_products.aisle_id),MIN(order_products.aisle_id),MODE(order_products.department),MODE(order_products.product_name),MODE(order_products.reordered),NUM_UNIQUE(order_products.department),NUM_UNIQUE(order_products.product_name),NUM_UNIQUE(order_products.reordered),SKEW(order_products.aisle_id),STD(order_products.aisle_id),SUM(order_products.aisle_id),MAX(orders.COUNT(order_products)),MAX(orders.MEAN(order_products.aisle_id)),MAX(orders.MIN(order_products.aisle_id)),MAX(orders.NUM_UNIQUE(order_products.department)),MAX(orders.NUM_UNIQUE(order_products.product_name)),MAX(orders.NUM_UNIQUE(order_products.reordered)),MAX(orders.SKEW(order_products.aisle_id)),MAX(orders.STD(order_products.aisle_id)),MAX(orders.SUM(order_products.aisle_id)),MEAN(orders.COUNT(order_products)),MEAN(orders.MAX(order_products.aisle_id)),MEAN(orders.MEAN(order_products.aisle_id)),MEAN(orders.MIN(order_products.aisle_id)),MEAN(orders.NUM_UNIQUE(order_products.department)),MEAN(orders.NUM_UNIQUE(order_products.product_name)),MEAN(orders.NUM_UNIQUE(order_products.reordered)),MEAN(orders.SKEW(order_products.aisle_id)),MEAN(orders.STD(order_products.aisle_id)),MEAN(orders.SUM(order_products.aisle_id)),MIN(orders.COUNT(order_products)),MIN(orders.MAX(order_products.aisle_id)),MIN(orders.MEAN(order_products.aisle_id)),MIN(orders.NUM_UNIQUE(order_products.department)),MIN(orders.NUM_UNIQUE(order_products.product_name)),MIN(orders.NUM_UNIQUE(order_products.reordered)),MIN(orders.SKEW(order_products.aisle_id)),MIN(orders.STD(order_products.aisle_id)),MIN(orders.SUM(order_products.aisle_id)),MODE(orders.DAY(order_time)),MODE(orders.MODE(order_products.department)),MODE(orders.MODE(order_products.product_name)),MODE(orders.MODE(order_products.reordered)),MODE(orders.MONTH(order_time)),MODE(orders.WEEKDAY(order_time)),MODE(orders.YEAR(order_time)),NUM_UNIQUE(orders.DAY(order_time)),NUM_UNIQUE(orders.MODE(order_products.department)),NUM_UNIQUE(orders.MODE(order_products.product_name)),NUM_UNIQUE(orders.MODE(order_products.reordered)),NUM_UNIQUE(orders.MONTH(order_time)),NUM_UNIQUE(orders.WEEKDAY(order_time)),NUM_UNIQUE(orders.YEAR(order_time)),SKEW(orders.COUNT(order_products)),SKEW(orders.MAX(order_products.aisle_id)),SKEW(orders.MEAN(order_products.aisle_id)),SKEW(orders.MIN(order_products.aisle_id)),SKEW(orders.NUM_UNIQUE(order_products.department)),SKEW(orders.NUM_UNIQUE(order_products.product_name)),SKEW(orders.NUM_UNIQUE(order_products.reordered)),SKEW(orders.STD(order_products.aisle_id)),SKEW(orders.SUM(order_products.aisle_id)),STD(orders.COUNT(order_products)),STD(orders.MAX(order_products.aisle_id)),STD(orders.MEAN(order_products.aisle_id)),STD(orders.MIN(order_products.aisle_id)),STD(orders.NUM_UNIQUE(order_products.department)),STD(orders.NUM_UNIQUE(order_products.product_name)),STD(orders.NUM_UNIQUE(order_products.reordered)),STD(orders.SKEW(order_products.aisle_id)),STD(orders.SUM(order_products.aisle_id)),SUM(orders.MAX(order_products.aisle_id)),SUM(orders.MEAN(order_products.aisle_id)),SUM(orders.MIN(order_products.aisle_id)),SUM(orders.NUM_UNIQUE(order_products.department)),SUM(orders.NUM_UNIQUE(order_products.product_name)),SUM(orders.NUM_UNIQUE(order_products.reordered)),SUM(orders.SKEW(order_products.aisle_id)),SUM(orders.STD(order_products.aisle_id))
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1
1,False,4,21,121.0,60.523810,21.0,snacks,Original Beef Jerky,1,7,10,2,0.339280,38.070486,1271.0,6.0,65.200000,23.0,4.0,6.0,2.0,0.713454,47.305038,385.0,5.250000,111.500000,60.341667,22.000000,4.000000,5.250000,1.500000,0.284635,40.106604,317.750000,5.0,91.0,53.600000,4.0,5.0,1.0,-0.050477,30.899838,268.0,1,snacks,Aged White Cheddar Popcorn,0,1,3,2015,4,1,3,2,3,4,1,2.000000,-1.885094,-0.606096,0.000000,0.000000,2.000000,0.000000,-0.832693,0.838651,0.500000,13.796135,5.399203,1.154701,0.000000,0.500000,0.577350,0.333166,50.756773,446.0,241.366667,88.0,16.0,21.0,6.0,1.138542,160.426417
2,True,7,85,123.0,63.752941,1.0,produce,Chipotle Beef & Pork Realstick,0,9,56,2,0.141339,39.596448,5419.0,21.0,81.095238,23.0,6.0,21.0,2.0,1.837844,47.337712,1703.0,12.142857,112.142857,58.698064,16.714286,5.000000,12.142857,1.857143,0.385028,37.321202,774.142857,5.0,96.0,33.000000,4.0,5.0,1.0,-0.441865,24.533598,165.0,1,produce,Baked Organic Sea Salt Crunchy Pea Snack,0,1,6,2015,5,3,6,2,3,4,1,0.161303,-0.630334,-0.306556,-1.222061,0.000000,0.161303,-2.645751,-0.674331,0.896376,5.367450,12.294017,15.911470,10.094789,0.816497,5.367450,0.377964,0.764827,501.812856,785.0,410.886447,117.0,35.0,85.0,13.0,2.695193,261.248415
3,False,5,41,123.0,69.048780,13.0,produce,Organic Baby Spinach,0,8,27,2,-0.029453,40.594305,2831.0,11.0,81.222222,24.0,5.0,11.0,2.0,0.627002,46.546393,731.0,8.200000,121.800000,69.434141,19.000000,4.000000,8.200000,1.800000,-0.018559,41.749900,566.200000,5.0,117.0,59.181818,3.0,5.0,1.0,-0.636786,38.266173,318.0,1,produce,100% Recycled Paper Towels,0,1,6,2015,5,3,5,2,3,4,1,-0.363269,-2.236068,0.329403,-0.410083,0.000000,-0.363269,-2.236068,0.762518,-0.800806,2.588436,2.683282,8.846253,5.612486,0.707107,2.588436,0.447214,0.569761,174.767560,609.0,347.170707,95.0,20.0,41.0,9.0,-0.092793,208.749499
7,False,4,73,123.0,65.493151,21.0,beverages,Original No Pulp 100% Florida Orange Juice,0,11,37,2,0.172950,35.379342,4781.0,24.0,71.761905,24.0,9.0,24.0,2.0,0.813504,39.097683,1523.0,18.250000,122.500000,65.596726,22.500000,7.750000,18.250000,1.750000,0.205753,35.747877,1195.250000,12.0,122.0,56.250000,6.0,12.0,1.0,-0.257210,32.945409,851.0,1,beverages,85% Lean Ground Beef,0,3,1,2015,3,1,2,2,3,4,1,-0.198134,0.000000,-0.768154,0.000000,-0.370370,-0.198134,-2.000000,0.480352,-0.013561,5.315073,0.577350,7.262810,1.732051,1.500000,5.315073,0.500000,0.456450,369.814706,490.0,262.386905,90.0,31.0,73.0,7.0,0.823014,142.991508
10,False,4,114,123.0,67.342105,5.0,produce,Asparagus,0,9,94,2,-0.142492,37.683787,7677.0,46.0,78.347826,16.0,9.0,46.0,2.0,0.565263,42.732528,3604.0,28.500000,113.000000,61.564229,12.750000,5.250000,28.500000,1.750000,0.064432,36.672337,1919.250000,5.0,83.0,46.000000,1.0,5.0,1.0,-0.743617,32.259515,230.0,1,produce,85% Lean Ground Beef,0,2,5,2015,3,1,4,1,2,3,1,-0.996725,-2.000000,0.273441,-1.949557,-0.436662,-0.996725,-2.000000,0.778005,-0.009675,17.136705,20.000000,13.247643,5.188127,3.304038,17.136705,0.500000,0.564522,1380.999970,452.0,246.256917,51.0,21.0,114.0,7.0,0.257729,146.689346
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
996,False,4,45,128.0,52.711111,3.0,snacks,Carob Chip,0,9,27,2,0.220281,44.964947,2372.0,14.0,62.307692,3.0,7.0,14.0,2.0,0.439234,58.759650,810.0,11.250000,114.250000,52.534959,3.000000,5.250000,11.250000,1.750000,0.107311,45.034157,593.000000,8.0,83.0,34.100000,4.0,8.0,1.0,-0.166578,31.575131,341.0,1,snacks,All Natural Maple Almond Butter,0,1,1,2015,4,2,3,2,2,4,1,-0.323231,-1.922522,-1.308125,0.000000,1.129338,-0.323231,-2.000000,0.071544,-0.295879,2.753785,20.966243,13.204164,0.000000,1.258306,2.753785,0.500000,0.249733,214.387500,457.0,210.139835,12.0,21.0,45.0,7.0,0.429243,180.136629
997,False,4,33,129.0,64.272727,9.0,produce,Banana,0,8,26,2,0.290248,40.314601,2121.0,13.0,75.600000,24.0,5.0,13.0,2.0,1.381983,43.575732,903.0,8.250000,106.500000,60.465385,19.500000,4.500000,8.250000,1.500000,0.365300,34.604820,530.250000,5.0,52.0,25.600000,4.0,5.0,1.0,-0.143472,15.978110,128.0,1,produce,85% Lean Ground Beef,0,1,3,2015,4,3,3,1,3,3,1,0.475483,-1.956203,-1.925845,-1.779179,0.000000,0.475483,0.000000,-1.850923,-0.175757,3.947573,36.464595,23.386708,7.141428,0.577350,3.947573,0.577350,0.707046,344.944802,426.0,241.861538,78.0,18.0,33.0,6.0,1.461201,138.419278
998,False,7,60,116.0,64.966667,4.0,dairy eggs,The Original Multi Grain Bread,1,9,19,2,-0.068441,37.087147,3898.0,14.0,104.000000,96.0,7.0,14.0,2.0,0.393527,42.547211,880.0,8.571429,113.714286,70.327221,22.000000,5.142857,8.571429,1.428571,-0.112493,34.158449,556.857143,2.0,112.0,53.111111,2.0,2.0,1.0,-0.576697,11.313708,208.0,6,dairy eggs,Bag of Organic Bananas,1,1,1,2015,6,2,3,2,3,3,1,-0.373967,0.374166,1.660520,2.348977,-0.841531,-0.373967,0.374166,-2.243009,0.033124,4.197505,2.138090,16.530226,33.600595,1.951800,4.197505,0.534522,0.338923,239.615267,796.0,492.290548,154.0,36.0,60.0,10.0,-0.674960,239.109143
999,True,12,300,131.0,73.936667,3.0,produce,Banana,1,14,118,2,-0.263885,40.019134,22181.0,38.0,85.956522,24.0,12.0,38.0,2.0,0.134978,45.905879,2786.0,25.000000,116.416667,70.794016,15.416667,7.500000,25.000000,1.750000,-0.304160,36.397978,1848.416667,2.0,24.0,24.000000,1.0,2.0,1.0,-0.715217,0.000000,48.0,1,produce,100% Raw Coconut Water,1,1,2,2015,11,2,3,2,3,6,1,-1.252784,-3.396875,-2.439152,-0.581739,-1.042593,-1.252784,-1.326650,-3.071364,-1.737998,9.302981,29.258902,16.234721,8.414904,2.645751,9.302981,0.452267,0.290169,677.683817,1397.0,849.528198,185.0,90.0,300.0,21.0,-3.345759,436.775733


## Task 2. Feature Selection
In this task, you are going to select 10 features that are useful and train the *Random Forest* model. The goal is to beat the accuracy performance as we have shown before. Note that you have to use the Random Forest and the hyperparameters we provide in Section 2.2. In other words, your job is to achieve a higher AUC than 0.61 through feature generation/selection rather than through hyperparameter tuning or model selectoin. 

### 2.1 Select top features

The first thing I will do is use feeaturetools inbuilt functions to remove features with no variance and those that are highly correlated.

In [120]:
import scipy.stats as stats
# --- Write your code below ---

# The first thing I will do is use feeaturetools inbuilt functions to remove 
# features with no variance and those that are highly correlated.

# Remove features without variance
new_fm, new_features = ft.selection.remove_single_value_features(feature_matrix, features=feature_defs)

# Remove features with high correlation
new_fm, new_features = ft.selection.remove_highly_correlated_features(new_fm, features=new_features, pct_corr_threshold=0.90)

  pd.to_datetime(
  pd.to_datetime(
  pd.to_datetime(
  pd.to_datetime(


Next I will try to decide whether to use spearman's or pearson's correlation

In [121]:
# Decide whether to use spearman's or pearson's correlation
gaussian = 0
non_gaussian = 0

for column in new_fm.select_dtypes(include='number').columns:
    print(f"Analyzing {column}")

    # Statistical Test: Shapiro-Wilk Test
    stat, p_value = stats.shapiro(new_fm[column])
    print(f'Shapiro-Wilk Test for {column} - Statistic: {stat:.4f}, P-Value: {p_value:.4g}')
    
    # Interpretation
    alpha = 0.05
    if p_value > alpha:
        gaussian += 1
        print('Sample looks Gaussian (fail to reject H0)\n')
    else:
        non_gaussian += 1
        print('Sample does not look Gaussian (reject H0)\n')

print(f'Gaussian: {gaussian}, Non-Gaussian: {non_gaussian}')
print(f'Proportions: {gaussian/(gaussian+non_gaussian):.2f}, {non_gaussian/(gaussian+non_gaussian):.2f}')

Analyzing COUNT(orders)
Shapiro-Wilk Test for COUNT(orders) - Statistic: 0.7567, P-Value: 4.139e-32
Sample does not look Gaussian (reject H0)

Analyzing COUNT(order_products)
Shapiro-Wilk Test for COUNT(order_products) - Statistic: 0.8123, P-Value: 6.144e-29
Sample does not look Gaussian (reject H0)

Analyzing MAX(order_products.aisle_id)
Shapiro-Wilk Test for MAX(order_products.aisle_id) - Statistic: 0.5592, P-Value: 4.844e-40
Sample does not look Gaussian (reject H0)

Analyzing MEAN(order_products.aisle_id)
Shapiro-Wilk Test for MEAN(order_products.aisle_id) - Statistic: 0.9572, P-Value: 3.746e-14
Sample does not look Gaussian (reject H0)

Analyzing MIN(order_products.aisle_id)
Shapiro-Wilk Test for MIN(order_products.aisle_id) - Statistic: 0.6748, P-Value: 7.239e-36
Sample does not look Gaussian (reject H0)

Analyzing NUM_UNIQUE(order_products.department)
Shapiro-Wilk Test for NUM_UNIQUE(order_products.department) - Statistic: 0.9839, P-Value: 1.886e-07
Sample does not look Gaussian

From the above we can see that basically all of the features are not gaussian. Some inspection shows that those that do appear gaussian have a p-value of 1, suggesting something wrong with the execution of the test itself. These results suggest that I should not use pearsons correlation, so I will instead use spearman's rank correlation.

## Convert categorial variables via one-hot-encoding ##
The first step I will take, before computing correlations, is to one-hot-encode all of the categorical variables

In [122]:
# View all non-numerical features
non_numerical_features = new_fm.select_dtypes(exclude='number').columns
non_numerical_features

# One-hot encode the non-numerical features
one_hot_fm = pd.get_dummies(new_fm, columns=non_numerical_features)

In [123]:
one_hot_fm.head()

Unnamed: 0_level_0,COUNT(orders),COUNT(order_products),MAX(order_products.aisle_id),MEAN(order_products.aisle_id),MIN(order_products.aisle_id),NUM_UNIQUE(order_products.department),NUM_UNIQUE(order_products.product_name),NUM_UNIQUE(order_products.reordered),SKEW(order_products.aisle_id),STD(order_products.aisle_id),MAX(orders.COUNT(order_products)),MAX(orders.MEAN(order_products.aisle_id)),MAX(orders.MIN(order_products.aisle_id)),MAX(orders.NUM_UNIQUE(order_products.department)),MAX(orders.NUM_UNIQUE(order_products.reordered)),MAX(orders.SKEW(order_products.aisle_id)),MAX(orders.STD(order_products.aisle_id)),MEAN(orders.COUNT(order_products)),MEAN(orders.MAX(order_products.aisle_id)),MEAN(orders.MIN(order_products.aisle_id)),MEAN(orders.NUM_UNIQUE(order_products.reordered)),MEAN(orders.SKEW(order_products.aisle_id)),MEAN(orders.STD(order_products.aisle_id)),MIN(orders.COUNT(order_products)),MIN(orders.MAX(order_products.aisle_id)),MIN(orders.MEAN(order_products.aisle_id)),MIN(orders.NUM_UNIQUE(order_products.department)),MIN(orders.SKEW(order_products.aisle_id)),MIN(orders.STD(order_products.aisle_id)),NUM_UNIQUE(orders.MODE(order_products.department)),NUM_UNIQUE(orders.MODE(order_products.product_name)),NUM_UNIQUE(orders.MODE(order_products.reordered)),NUM_UNIQUE(orders.MONTH(order_time)),NUM_UNIQUE(orders.WEEKDAY(order_time)),SKEW(orders.COUNT(order_products)),SKEW(orders.MAX(order_products.aisle_id)),SKEW(orders.MEAN(order_products.aisle_id)),SKEW(orders.MIN(order_products.aisle_id)),SKEW(orders.NUM_UNIQUE(order_products.department)),SKEW(orders.NUM_UNIQUE(order_products.reordered)),SKEW(orders.STD(order_products.aisle_id)),SKEW(orders.SUM(order_products.aisle_id)),STD(orders.COUNT(order_products)),STD(orders.MAX(order_products.aisle_id)),STD(orders.MEAN(order_products.aisle_id)),STD(orders.MIN(order_products.aisle_id)),STD(orders.NUM_UNIQUE(order_products.department)),STD(orders.NUM_UNIQUE(order_products.reordered)),STD(orders.SKEW(order_products.aisle_id)),SUM(orders.MIN(order_products.aisle_id)),SUM(orders.SKEW(order_products.aisle_id)),label_False,label_True,MODE(order_products.department)_alcohol,MODE(order_products.department)_babies,MODE(order_products.department)_bakery,MODE(order_products.department)_beverages,MODE(order_products.department)_breakfast,MODE(order_products.department)_canned goods,MODE(order_products.department)_dairy eggs,MODE(order_products.department)_deli,MODE(order_products.department)_frozen,MODE(order_products.department)_household,MODE(order_products.department)_international,MODE(order_products.department)_meat seafood,MODE(order_products.department)_pantry,MODE(order_products.department)_personal care,MODE(order_products.department)_pets,MODE(order_products.department)_produce,MODE(order_products.department)_snacks,MODE(order_products.product_name)_0% Fat Free Organic Milk,MODE(order_products.product_name)_0% Greek Strained Yogurt,MODE(order_products.product_name)_1 Ply Paper Towels,MODE(order_products.product_name)_1% Low Fat Milk,MODE(order_products.product_name)_1% Lowfat Milk,MODE(order_products.product_name)_1-Step 1-Minute Paneer Makhani,MODE(order_products.product_name)_100% Apple Juice,MODE(order_products.product_name)_100% Juice,"MODE(order_products.product_name)_100% Juice, Variety Pack",MODE(order_products.product_name)_100% Lactose Free Reduced Fat Milk,MODE(order_products.product_name)_100% Oatnut Bread,MODE(order_products.product_name)_100% Pineapple Juice,MODE(order_products.product_name)_100% Raw Coconut Water,MODE(order_products.product_name)_100% Recycled Bathroom Tissue,MODE(order_products.product_name)_100% Whole Wheat Bagels,MODE(order_products.product_name)_100% Whole Wheat Bread,MODE(order_products.product_name)_100% Whole Wheat English Muffin 6 Ct,MODE(order_products.product_name)_1500 Pale Ale,MODE(order_products.product_name)_2% Reduced Fat DHA Omega-3 Reduced Fat Milk,MODE(order_products.product_name)_2% Reduced Fat Milk,MODE(order_products.product_name)_2% Reduced Fat Organic Milk,MODE(order_products.product_name)_24 Hr Acid Reducer,MODE(order_products.product_name)_3 Color Deli Coleslaw,MODE(order_products.product_name)_5-Lettuce Mix,MODE(order_products.product_name)_70% Dark Chocolate With Orange Bar,MODE(order_products.product_name)_80 Vodka Holiday Edition,MODE(order_products.product_name)_93% Lean Ground Turkey,MODE(order_products.product_name)_Activia Vanilla Low Fat Yogurt,MODE(order_products.product_name)_Adult Grain-Free & Poultry-Free Dog Food,MODE(order_products.product_name)_Air Chilled Organic Boneless Skinless Chicken Breasts,MODE(order_products.product_name)_All Natural Classic Guacamole Snack,"MODE(order_products.product_name)_All Purpose Cleaner, Lemon Breeze Scent",MODE(order_products.product_name)_All-In-1,MODE(order_products.product_name)_Almond Breeze Original Almond Milk,MODE(order_products.product_name)_Almond Milk Vanilla Yogurt,MODE(order_products.product_name)_Alpine Spring Water,MODE(order_products.product_name)_Andouille Chicken & Turkey Sausage,MODE(order_products.product_name)_Anejo Tequila,MODE(order_products.product_name)_Ant & Roach Killer Fragrance Free Insecticide,MODE(order_products.product_name)_Apple Apple Applesauce On The Go Pouches,MODE(order_products.product_name)_Apple Honeycrisp Organic,MODE(order_products.product_name)_Apple Puffs Finger Foods,MODE(order_products.product_name)_Apple Sauce,MODE(order_products.product_name)_Apple Scent Foaming Hand Soap,MODE(order_products.product_name)_Applewood Smoked Bacon,MODE(order_products.product_name)_Applewood Smoked Turkey Breast,MODE(order_products.product_name)_Arancita Rossa,MODE(order_products.product_name)_Artesian Sparkling Water,MODE(order_products.product_name)_Artisan Sausage Pineapple Uncured Bacon Hardwood Smoked with Vermont Maple Syrup,MODE(order_products.product_name)_Asparagus,MODE(order_products.product_name)_Authentic French Brioche,MODE(order_products.product_name)_Authentic French Brioche Hamburger Buns,MODE(order_products.product_name)_Avocado Roll,MODE(order_products.product_name)_BBQ Recipe Beef Frozen Sandwiches,MODE(order_products.product_name)_Baby Cucumbers,MODE(order_products.product_name)_Baby Spinach,MODE(order_products.product_name)_Baby Sugar Snap Peas,MODE(order_products.product_name)_Bag of Organic Bananas,MODE(order_products.product_name)_Baked Beans,MODE(order_products.product_name)_Baker's Pure Cane Ultrafine Sugar,MODE(order_products.product_name)_Banana,MODE(order_products.product_name)_Bars Peanut Butter,MODE(order_products.product_name)_Bartlett Pears,MODE(order_products.product_name)_Basmati Ready Rice,MODE(order_products.product_name)_Bathroom Tissue,"MODE(order_products.product_name)_Bathroom Tissue Softness & Strength, Double Rolls",MODE(order_products.product_name)_Bean & Cheese Burrito,MODE(order_products.product_name)_Beef Dinner Franks,MODE(order_products.product_name)_Beef Steak with Cranberry & Sriracha,MODE(order_products.product_name)_Beer Brats Sausage,MODE(order_products.product_name)_Berry Medley,MODE(order_products.product_name)_Bistro Bowl Chicken Caesar Salad,MODE(order_products.product_name)_Black Cherry Barbecue Pork Jerky,MODE(order_products.product_name)_Black Eyed Peas,MODE(order_products.product_name)_Blackberries,MODE(order_products.product_name)_Blackberry Yogurt,MODE(order_products.product_name)_Blood Orange Italian Soda,MODE(order_products.product_name)_Blueberries,MODE(order_products.product_name)_Blueberry Blaze Trail Mix,MODE(order_products.product_name)_Blueberry Bliss Luna Bar,MODE(order_products.product_name)_Blueberry Pint,MODE(order_products.product_name)_Blueberry Yoghurt,MODE(order_products.product_name)_Boneless And Skinless Chicken Breast,MODE(order_products.product_name)_Boneless Skinless Chicken Breast,MODE(order_products.product_name)_Boneless Skinless Chicken Breasts,MODE(order_products.product_name)_Boneless Skinless Chicken Thighs,MODE(order_products.product_name)_BoomChocoBoom Dark Chocolate Bar,MODE(order_products.product_name)_Brazilian Cheese Bread Original Cheddar and Parmesan,"MODE(order_products.product_name)_Bread, Gluten-Free, Super Seeded Multi-Grain",MODE(order_products.product_name)_Broccoli & Cauliflower,MODE(order_products.product_name)_Broccoli Crown,MODE(order_products.product_name)_Broccoli Floret,MODE(order_products.product_name)_Broccoli Florettes,MODE(order_products.product_name)_Broccoli Rabe,MODE(order_products.product_name)_Buffalo Wings,MODE(order_products.product_name)_Bunched Cilantro,MODE(order_products.product_name)_Bunny Pasta with Yummy Cheese Macaroni & Cheese,MODE(order_products.product_name)_Burrata Mozzarela Cheese,MODE(order_products.product_name)_Caesar Salad Kit,MODE(order_products.product_name)_Cafe Latte Pure Lightly Sweetened Iced Coffee With Almond Milk,MODE(order_products.product_name)_Cage Free Real Egg Product,MODE(order_products.product_name)_Cage-Free Grade AA Large Eggs,MODE(order_products.product_name)_Calcium Magnesium Citrate Plus Vitamin D3 Liquid,MODE(order_products.product_name)_California Lemonade,MODE(order_products.product_name)_Caramel Cookie Crunch Gelato,MODE(order_products.product_name)_Caramel with Almond Milk Iced Coffee,MODE(order_products.product_name)_Carob Chip,MODE(order_products.product_name)_Carrot Raisin Flax Muffins,MODE(order_products.product_name)_Carrots,MODE(order_products.product_name)_Carving Board Oven Roasted Turkey,MODE(order_products.product_name)_Cauliflower Crumbles,MODE(order_products.product_name)_Celery Hearts,MODE(order_products.product_name)_Cereal,MODE(order_products.product_name)_Challah Bread,MODE(order_products.product_name)_Cheddar Puffs,MODE(order_products.product_name)_Cheese Cracker Sandwiches,MODE(order_products.product_name)_Cheese Party Pizza,"MODE(order_products.product_name)_Cherries and Chocolate, Organic 70% Cocoa",MODE(order_products.product_name)_Cherrios Honey Nut,MODE(order_products.product_name)_Cherry Chocolate,MODE(order_products.product_name)_Chicken Apple Sausage,MODE(order_products.product_name)_Chicken Broth,MODE(order_products.product_name)_Chicken Culinary Stock,MODE(order_products.product_name)_Chipotle Beef & Pork Realstick,MODE(order_products.product_name)_Chocolate Chip Cookie Dough,MODE(order_products.product_name)_Chocolate Chip Cookies,MODE(order_products.product_name)_Chocolate Chip Muffins,MODE(order_products.product_name)_Chocolate Chip Snackimals,MODE(order_products.product_name)_Chocolate Lowfat Milk,MODE(order_products.product_name)_Chocolate Miracle Tart,MODE(order_products.product_name)_Chocolate Peanut Butter,MODE(order_products.product_name)_Chocolate Peppermint Stick Bar,MODE(order_products.product_name)_Chocolate Protein Soy & Dairy Protein Shake,MODE(order_products.product_name)_Chopped Walnuts,MODE(order_products.product_name)_Chorizo Seitan,MODE(order_products.product_name)_Chubby Hubby Ice Cream,MODE(order_products.product_name)_Classic Wheat Bread,MODE(order_products.product_name)_Clementines,"MODE(order_products.product_name)_Clementines, Bag","MODE(order_products.product_name)_Clotted Cream, English Luxury",MODE(order_products.product_name)_Coconut Almond Creamer Blend,MODE(order_products.product_name)_Coconut Water,MODE(order_products.product_name)_Coke Classic,MODE(order_products.product_name)_Cold Brew Coffee,"MODE(order_products.product_name)_Cold-pressed, Deliciously Hydrating Watermelon Water",MODE(order_products.product_name)_Cookie Tray,MODE(order_products.product_name)_Corn Chips,"MODE(order_products.product_name)_Country Stand Juice, Medium Pulp",MODE(order_products.product_name)_Cracked Wheat Sourdough Loaf,MODE(order_products.product_name)_Crackers Harvest Whole Wheat,MODE(order_products.product_name)_Cran Raspberry Sparkling Water,MODE(order_products.product_name)_Cream Cheese,MODE(order_products.product_name)_Cream Top Smooth & Creamy Vanilla Yogurt,MODE(order_products.product_name)_Creamy Almond Butter,MODE(order_products.product_name)_Creamy Peanut Butter,MODE(order_products.product_name)_Crema Mexicana,MODE(order_products.product_name)_Crunch Chocolate Peanut Butter Granola Bar,MODE(order_products.product_name)_Crunchy Oats 'n Honey Granola Bars,MODE(order_products.product_name)_Crunchy Peanut Butter Energy Bar,MODE(order_products.product_name)_Cucumber Kirby,MODE(order_products.product_name)_Cucumbers Facial Wipes Soothing - 30 CT,MODE(order_products.product_name)_Cut Green Beans,MODE(order_products.product_name)_Dairy Free Coconut Milk Blueberry Yogurt Alternative,MODE(order_products.product_name)_Dairy Free Cream Cheese Style Spread Chive & Onion,MODE(order_products.product_name)_Dairy Free Soy Strawberry Alternative,MODE(order_products.product_name)_Dairy Free Unsweetened Vanilla Coconut Milk,MODE(order_products.product_name)_DanActive Vanilla Probiotic Dairy Drink,MODE(order_products.product_name)_Danimals Drinkable Yogurt Smoothie Strawberry Explosion & Strikin' Strawberry Kiwi,MODE(order_products.product_name)_Deli Fresh Rotisserie Seasoned Chicken Breast,MODE(order_products.product_name)_Deli Fresh Smoked Turkey Breast,MODE(order_products.product_name)_Dha Omega 3 Reduced Fat 2% Milk,MODE(order_products.product_name)_Dha Omega 3 Vitamin D Milk,MODE(order_products.product_name)_Diet Cherry Coke,MODE(order_products.product_name)_Diet Coke,MODE(order_products.product_name)_Diet Cola,MODE(order_products.product_name)_Diet Ginger Ale All Natural Soda,MODE(order_products.product_name)_Diet Orange Soda,MODE(order_products.product_name)_Diet Tonic Water,MODE(order_products.product_name)_Distilled Water,MODE(order_products.product_name)_Dragon Fruit,...,MODE(orders.MODE(order_products.product_name))_Arborio White Rice,MODE(orders.MODE(order_products.product_name))_Arrowroot Starch/Flour,MODE(orders.MODE(order_products.product_name))_Artesian Sparkling Water,"MODE(orders.MODE(order_products.product_name))_Artichoke Hearts, Quartered",MODE(orders.MODE(order_products.product_name))_Artichokes,MODE(orders.MODE(order_products.product_name))_Artisan Lettuce,MODE(orders.MODE(order_products.product_name))_Artisan Sausage Pineapple Uncured Bacon Hardwood Smoked with Vermont Maple Syrup,MODE(orders.MODE(order_products.product_name))_Asian Chopped Salad with Dressing,MODE(orders.MODE(order_products.product_name))_Asparagus,MODE(orders.MODE(order_products.product_name))_Asparation/Broccolini/Baby Broccoli,MODE(orders.MODE(order_products.product_name))_Assortment Bittersweet Chocolate Box,MODE(orders.MODE(order_products.product_name))_Ataulfo Mango,MODE(orders.MODE(order_products.product_name))_Authentic French Brioche,MODE(orders.MODE(order_products.product_name))_Authentic French Brioche Hamburger Buns,MODE(orders.MODE(order_products.product_name))_Avocado,MODE(orders.MODE(order_products.product_name))_Avocado Oil Canyon Cut Kettle Cooked Potato Chips Sea Salt,MODE(orders.MODE(order_products.product_name))_Avocado Roll,MODE(orders.MODE(order_products.product_name))_Awaken Aloe + Wheatgrass Drink,MODE(orders.MODE(order_products.product_name))_BBQ Chopped Salad,MODE(orders.MODE(order_products.product_name))_BBQ Recipe Beef Frozen Sandwiches,MODE(orders.MODE(order_products.product_name))_Baba Ghannouge Eggplant Dip,MODE(orders.MODE(order_products.product_name))_Baby Arugula,MODE(orders.MODE(order_products.product_name))_Baby Carrots,MODE(orders.MODE(order_products.product_name))_Baby Cucumbers,MODE(orders.MODE(order_products.product_name))_Baby Eggplant,MODE(orders.MODE(order_products.product_name))_Baby Nasal Relief Simply Saline Nasal Mist,MODE(orders.MODE(order_products.product_name))_Baby Portabella Mushrooms,MODE(orders.MODE(order_products.product_name))_Baby Seedless Cucumbers,MODE(orders.MODE(order_products.product_name))_Baby Spinach,MODE(orders.MODE(order_products.product_name))_Baby Spring Mix,MODE(orders.MODE(order_products.product_name))_Baby Sugar Snap Peas,MODE(orders.MODE(order_products.product_name))_Baby Swiss Slices Cheese,MODE(orders.MODE(order_products.product_name))_Baby Wipes,MODE(orders.MODE(order_products.product_name))_Bag Of Organic Lemons,MODE(orders.MODE(order_products.product_name))_Bag of Large Lemons,MODE(orders.MODE(order_products.product_name))_Bag of Lemons,MODE(orders.MODE(order_products.product_name))_Bag of Oranges,MODE(orders.MODE(order_products.product_name))_Bag of Organic Bananas,MODE(orders.MODE(order_products.product_name))_Bag of Organic Fuji Apples,MODE(orders.MODE(order_products.product_name))_Baked Aged White Cheddar Rice and Corn Puffs,MODE(orders.MODE(order_products.product_name))_Baked Beans,MODE(orders.MODE(order_products.product_name))_Baked Organic Sea Salt Crunchy Pea Snack,MODE(orders.MODE(order_products.product_name))_Baked Sea Salt & Vinegar Potato Chips,MODE(orders.MODE(order_products.product_name))_Baked Snack Crackers Original,MODE(orders.MODE(order_products.product_name))_Baked Whole Grain Wheat Original Crackers Thin Crisps,MODE(orders.MODE(order_products.product_name))_Baker's Pure Cane Ultrafine Sugar,MODE(orders.MODE(order_products.product_name))_Baking Soda,MODE(orders.MODE(order_products.product_name))_Balsamic Vinegar of Modena,MODE(orders.MODE(order_products.product_name))_Banana,MODE(orders.MODE(order_products.product_name))_Bananas,MODE(orders.MODE(order_products.product_name))_Barbecue Potato Chips,MODE(orders.MODE(order_products.product_name))_Bars Peanut Butter,MODE(orders.MODE(order_products.product_name))_Bartlett Pears,MODE(orders.MODE(order_products.product_name))_Basil Pesto,MODE(orders.MODE(order_products.product_name))_Basmati Ready Rice,MODE(orders.MODE(order_products.product_name))_Bean & Cheese Burrito,MODE(orders.MODE(order_products.product_name))_Beans,MODE(orders.MODE(order_products.product_name))_Beef Dinner Franks,MODE(orders.MODE(order_products.product_name))_Beef Steak with Cranberry & Sriracha,MODE(orders.MODE(order_products.product_name))_Beef Summer Sausage,MODE(orders.MODE(order_products.product_name))_Beef Tenderloin Steak,MODE(orders.MODE(order_products.product_name))_Beet Apple Carrot Lemon Ginger Organic Cold Pressed Juice Beverage,MODE(orders.MODE(order_products.product_name))_Berry Colossal Crunch Cereal,MODE(orders.MODE(order_products.product_name))_Berry Medley,MODE(orders.MODE(order_products.product_name))_Berry Veggie Juice Smoothie,MODE(orders.MODE(order_products.product_name))_Bing Cherries,MODE(orders.MODE(order_products.product_name))_Bistro Style Hot Link Sausage,MODE(orders.MODE(order_products.product_name))_Black Bean Vegetables Burrito,MODE(orders.MODE(order_products.product_name))_Black Beans,MODE(orders.MODE(order_products.product_name))_Black Cherry Barbecue Pork Jerky,MODE(orders.MODE(order_products.product_name))_Blackberries,MODE(orders.MODE(order_products.product_name))_Blackberry Yogurt,MODE(orders.MODE(order_products.product_name))_Blood Orange Italian Soda,MODE(orders.MODE(order_products.product_name))_Blood Oranges,MODE(orders.MODE(order_products.product_name))_Blue Machine Boosted 100% Juice Smoothie,MODE(orders.MODE(order_products.product_name))_Blueberries,MODE(orders.MODE(order_products.product_name))_Blueberry Almond Breakfast Bars,MODE(orders.MODE(order_products.product_name))_Blueberry Blaze Trail Mix,MODE(orders.MODE(order_products.product_name))_Blueberry on the Bottom Nonfat Greek Yogurt,MODE(orders.MODE(order_products.product_name))_Blush Wine Vinaigrette Dressing,MODE(orders.MODE(order_products.product_name))_Bok Choy,MODE(orders.MODE(order_products.product_name))_Boneless And Skinless Chicken Breast,MODE(orders.MODE(order_products.product_name))_Boneless Skinless Chicken Breast,MODE(orders.MODE(order_products.product_name))_Boneless Skinless Chicken Breasts,MODE(orders.MODE(order_products.product_name))_Boneless Skinless Chicken Thighs,MODE(orders.MODE(order_products.product_name))_BoomChocoBoom Dark Chocolate Bar,"MODE(orders.MODE(order_products.product_name))_Bread, Gluten-Free, Super Seeded Multi-Grain",MODE(orders.MODE(order_products.product_name))_Broccoli & Cauliflower,MODE(orders.MODE(order_products.product_name))_Broccoli & Cheddar Bake Meal Bowl,MODE(orders.MODE(order_products.product_name))_Broccoli Crown,MODE(orders.MODE(order_products.product_name))_Broccoli Florettes,MODE(orders.MODE(order_products.product_name))_Brown Eggs,MODE(orders.MODE(order_products.product_name))_Brownie Crunch High Protein Bar,MODE(orders.MODE(order_products.product_name))_Bunched Cilantro,MODE(orders.MODE(order_products.product_name))_Bunny Pasta with Yummy Cheese Macaroni & Cheese,MODE(orders.MODE(order_products.product_name))_Burrata,MODE(orders.MODE(order_products.product_name))_Burrata Mozzarela Cheese,MODE(orders.MODE(order_products.product_name))_Butter Pecan Ice Cream,MODE(orders.MODE(order_products.product_name))_Butternut Squash,MODE(orders.MODE(order_products.product_name))_Caesar Salad Kit,MODE(orders.MODE(order_products.product_name))_Cafe Latte Pure Lightly Sweetened Iced Coffee With Almond Milk,MODE(orders.MODE(order_products.product_name))_Cage-Free Grade AA Large Eggs,MODE(orders.MODE(order_products.product_name))_California Roll Sushi,MODE(orders.MODE(order_products.product_name))_Calm Chamomile Herbal Tea,MODE(orders.MODE(order_products.product_name))_Carrots,MODE(orders.MODE(order_products.product_name))_Casalingo Salami,MODE(orders.MODE(order_products.product_name))_Celery Sticks,MODE(orders.MODE(order_products.product_name))_Cereal,"MODE(orders.MODE(order_products.product_name))_Cereal, Puffed Rice","MODE(orders.MODE(order_products.product_name))_Cereal, Sprouted Oat, Honey O's",MODE(orders.MODE(order_products.product_name))_Chardonnay,MODE(orders.MODE(order_products.product_name))_Cheddar Bunnies Snack Crackers,MODE(orders.MODE(order_products.product_name))_Cheddar Cheezy Mac,MODE(orders.MODE(order_products.product_name))_Cheddar Puffs,"MODE(orders.MODE(order_products.product_name))_Cherries and Chocolate, Organic 70% Cocoa",MODE(orders.MODE(order_products.product_name))_Cherrios Honey Nut,MODE(orders.MODE(order_products.product_name))_Cherry Chocolate,MODE(orders.MODE(order_products.product_name))_Chicken & Vegetable Snack Sticks for Dogs,MODE(orders.MODE(order_products.product_name))_Chicken Broth,MODE(orders.MODE(order_products.product_name))_Chicken Culinary Stock,MODE(orders.MODE(order_products.product_name))_Chicken Stew Natural Food for Dogs,MODE(orders.MODE(order_products.product_name))_Chocolate Chip Cookie Dough,MODE(orders.MODE(order_products.product_name))_Chocolate Chip Cookies,MODE(orders.MODE(order_products.product_name))_Chocolate Chip Muffins,MODE(orders.MODE(order_products.product_name))_Chocolate Chip Snackimals,MODE(orders.MODE(order_products.product_name))_Chocolate Miracle Tart,MODE(orders.MODE(order_products.product_name))_Chocolate Peanut Butter Builder's Bar,MODE(orders.MODE(order_products.product_name))_Chocolate Peppermint Stick Bar,MODE(orders.MODE(order_products.product_name))_Chocolate Protein Nutritional Shake,MODE(orders.MODE(order_products.product_name))_Chocolate Protein Soy & Dairy Protein Shake,MODE(orders.MODE(order_products.product_name))_Chopped Curly Kale,MODE(orders.MODE(order_products.product_name))_Chopped Spinach,MODE(orders.MODE(order_products.product_name))_Classic Original Scent Deodorant,MODE(orders.MODE(order_products.product_name))_Clean Care Mega Rolls Toilet Paper,MODE(orders.MODE(order_products.product_name))_Clementines,"MODE(orders.MODE(order_products.product_name))_Clementines, Bag","MODE(orders.MODE(order_products.product_name))_Clotted Cream, English Luxury",MODE(orders.MODE(order_products.product_name))_Coco Crunch Sprouted Granola,MODE(orders.MODE(order_products.product_name))_Coconut Water,MODE(orders.MODE(order_products.product_name))_Coke Classic,MODE(orders.MODE(order_products.product_name))_Cola,MODE(orders.MODE(order_products.product_name))_Cold Brew Coffee,MODE(orders.MODE(order_products.product_name))_Cookie Dipped Drumsticks,MODE(orders.MODE(order_products.product_name))_Cookie Tray,MODE(orders.MODE(order_products.product_name))_Cooking Beef Stock,MODE(orders.MODE(order_products.product_name))_Country French Bread,"MODE(orders.MODE(order_products.product_name))_Country Stand Juice, Medium Pulp",MODE(orders.MODE(order_products.product_name))_Cracked Wheat Sourdough Loaf,MODE(orders.MODE(order_products.product_name))_Crackers Harvest Whole Wheat,MODE(orders.MODE(order_products.product_name))_Creamy Almond Butter,MODE(orders.MODE(order_products.product_name))_Creamy Peanut Butter,MODE(orders.MODE(order_products.product_name))_Crisp Hard Cider Crisp Apple,MODE(orders.MODE(order_products.product_name))_Crispy Cheddar Crackers,MODE(orders.MODE(order_products.product_name))_Crunchy Almond Butter,MODE(orders.MODE(order_products.product_name))_Crunchy Oats 'n Honey Granola Bars,MODE(orders.MODE(order_products.product_name))_Crunchy Peanut Butter Energy Bar,MODE(orders.MODE(order_products.product_name))_Cucumber Kirby,MODE(orders.MODE(order_products.product_name))_Curry Tiger Burrito,MODE(orders.MODE(order_products.product_name))_Dairy Free Coconut Milk Blueberry Yogurt Alternative,MODE(orders.MODE(order_products.product_name))_Dairy Free Mozarella Style Shreds,MODE(orders.MODE(order_products.product_name))_Dairy Free Plain Cultured Coconut Milk,MODE(orders.MODE(order_products.product_name))_Danimals Drinkable Yogurt Smoothie Strawberry Explosion & Strikin' Strawberry Kiwi,MODE(orders.MODE(order_products.product_name))_Dark Red Kidney Beans Reduced Sodium,MODE(orders.MODE(order_products.product_name))_Distilled Water,MODE(orders.MODE(order_products.product_name))_Dried Mango,MODE(orders.MODE(order_products.product_name))_Dried Mangos,"MODE(orders.MODE(order_products.product_name))_Energy Drink, Organic, Pomegranate Acai",MODE(orders.MODE(order_products.product_name))_Epsom Salt,MODE(orders.MODE(order_products.product_name))_Everything Bagels,MODE(orders.MODE(order_products.product_name))_Everything Deli Style Pretzel Crisps Crackers,MODE(orders.MODE(order_products.product_name))_Extra Beer Bottles,MODE(orders.MODE(order_products.product_name))_Extra Fancy Unsalted Mixed Nuts,MODE(orders.MODE(order_products.product_name))_Ezekiel 4:9 Bread Organic Sprouted Whole Grain,MODE(orders.MODE(order_products.product_name))_Free & Clear Unscented Baby Wipes,"MODE(orders.MODE(order_products.product_name))_French Roast, Dark Roast K-Cups",MODE(orders.MODE(order_products.product_name))_French Vanilla Creamer,MODE(orders.MODE(order_products.product_name))_French Vanilla Ice Cream,MODE(orders.MODE(order_products.product_name))_Fresh Air High Efficiency Laundry Detergent,MODE(orders.MODE(order_products.product_name))_Fresh Cauliflower,MODE(orders.MODE(order_products.product_name))_Frozen Organic Blueberries,MODE(orders.MODE(order_products.product_name))_Frozen Whole Strawberries,MODE(orders.MODE(order_products.product_name))_Glass Cleaner,MODE(orders.MODE(order_products.product_name))_Glazed Chicken,MODE(orders.MODE(order_products.product_name))_Gluten Free Classic Hotdog Buns,MODE(orders.MODE(order_products.product_name))_Granny Smith Apples,MODE(orders.MODE(order_products.product_name))_Grape White/Green Seedless,MODE(orders.MODE(order_products.product_name))_Greek Extra Virgin Olive Oil,MODE(orders.MODE(order_products.product_name))_Ground Coffee,MODE(orders.MODE(order_products.product_name))_Half & Half,MODE(orders.MODE(order_products.product_name))_Hass Avocados,MODE(orders.MODE(order_products.product_name))_Hemp Milk Non Dairy Beverage Original,MODE(orders.MODE(order_products.product_name))_Junmai Sake,MODE(orders.MODE(order_products.product_name))_Large Alfresco Eggs,MODE(orders.MODE(order_products.product_name))_Lemon Verbena Laundry Detergent,MODE(orders.MODE(order_products.product_name))_Lime Sparkling Water,MODE(orders.MODE(order_products.product_name))_Low Fat 1% Milk,MODE(orders.MODE(order_products.product_name))_Mandarin Oranges,MODE(orders.MODE(order_products.product_name))_Mango Lemonade,MODE(orders.MODE(order_products.product_name))_Mango Peach Salsa,MODE(orders.MODE(order_products.product_name))_Medium Square Containers & Lids,"MODE(orders.MODE(order_products.product_name))_Milk, Reduced Fat, 2% Milkfat","MODE(orders.MODE(order_products.product_name))_Milk, Vitamin D",MODE(orders.MODE(order_products.product_name))_Mineral Water,MODE(orders.MODE(order_products.product_name))_Mountain Spring Water,MODE(orders.MODE(order_products.product_name))_Natural Artesian Water,MODE(orders.MODE(order_products.product_name))_Natural Spring Water,MODE(orders.MODE(order_products.product_name))_New Orleans Iced Coffee,MODE(orders.MODE(order_products.product_name))_Non Fat Greek Yogurt,MODE(orders.MODE(order_products.product_name))_Omeprazole Acid Reducer Tablets,MODE(orders.MODE(order_products.product_name))_Organic 4% Milk Fat Whole Milk Cottage Cheese,MODE(orders.MODE(order_products.product_name))_Organic Baby Spinach,MODE(orders.MODE(order_products.product_name))_Organic Blueberries,MODE(orders.MODE(order_products.product_name))_Organic Extra Large Grade AA Brown Eggs,MODE(orders.MODE(order_products.product_name))_Organic Extra Virgin Olive Oil,MODE(orders.MODE(order_products.product_name))_Organic Garlic,MODE(orders.MODE(order_products.product_name))_Organic Ground Turkey,MODE(orders.MODE(order_products.product_name))_Organic Half & Half,MODE(orders.MODE(order_products.product_name))_Organic Hass Avocado,MODE(orders.MODE(order_products.product_name))_Organic Low Fat Milk,MODE(orders.MODE(order_products.product_name))_Organic Reduced Fat Omega-3 Milk,MODE(orders.MODE(order_products.product_name))_Organic Tortilla Chips,MODE(orders.MODE(order_products.product_name))_Original Beef Jerky,MODE(orders.MODE(order_products.product_name))_Red Plastic Cups,MODE(orders.MODE(order_products.product_name))_Roasted Pine Nut Hummus,MODE(orders.MODE(order_products.product_name))_Roasted Seaweed Snacks,MODE(orders.MODE(order_products.product_name))_Smartwater,"MODE(orders.MODE(order_products.product_name))_Sparkling Water, Bottles",MODE(orders.MODE(order_products.product_name))_Spring Water,MODE(orders.MODE(order_products.product_name))_smartwater® Electrolyte Enhanced Water,MODE(orders.MODE(order_products.reordered))_0,MODE(orders.MODE(order_products.reordered))_1,MODE(orders.MONTH(order_time))_1,MODE(orders.MONTH(order_time))_2,MODE(orders.MONTH(order_time))_3,MODE(orders.MONTH(order_time))_4,MODE(orders.MONTH(order_time))_5,MODE(orders.MONTH(order_time))_6,MODE(orders.MONTH(order_time))_7,MODE(orders.MONTH(order_time))_8,MODE(orders.MONTH(order_time))_9,MODE(orders.MONTH(order_time))_10,MODE(orders.MONTH(order_time))_11,MODE(orders.MONTH(order_time))_12,MODE(orders.WEEKDAY(order_time))_0,MODE(orders.WEEKDAY(order_time))_1,MODE(orders.WEEKDAY(order_time))_2,MODE(orders.WEEKDAY(order_time))_3,MODE(orders.WEEKDAY(order_time))_4,MODE(orders.WEEKDAY(order_time))_5,MODE(orders.WEEKDAY(order_time))_6
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1,Unnamed: 184_level_1,Unnamed: 185_level_1,Unnamed: 186_level_1,Unnamed: 187_level_1,Unnamed: 188_level_1,Unnamed: 189_level_1,Unnamed: 190_level_1,Unnamed: 191_level_1,Unnamed: 192_level_1,Unnamed: 193_level_1,Unnamed: 194_level_1,Unnamed: 195_level_1,Unnamed: 196_level_1,Unnamed: 197_level_1,Unnamed: 198_level_1,Unnamed: 199_level_1,Unnamed: 200_level_1,Unnamed: 201_level_1,Unnamed: 202_level_1,Unnamed: 203_level_1,Unnamed: 204_level_1,Unnamed: 205_level_1,Unnamed: 206_level_1,Unnamed: 207_level_1,Unnamed: 208_level_1,Unnamed: 209_level_1,Unnamed: 210_level_1,Unnamed: 211_level_1,Unnamed: 212_level_1,Unnamed: 213_level_1,Unnamed: 214_level_1,Unnamed: 215_level_1,Unnamed: 216_level_1,Unnamed: 217_level_1,Unnamed: 218_level_1,Unnamed: 219_level_1,Unnamed: 220_level_1,Unnamed: 221_level_1,Unnamed: 222_level_1,Unnamed: 223_level_1,Unnamed: 224_level_1,Unnamed: 225_level_1,Unnamed: 226_level_1,Unnamed: 227_level_1,Unnamed: 228_level_1,Unnamed: 229_level_1,Unnamed: 230_level_1,Unnamed: 231_level_1,Unnamed: 232_level_1,Unnamed: 233_level_1,Unnamed: 234_level_1,Unnamed: 235_level_1,Unnamed: 236_level_1,Unnamed: 237_level_1,Unnamed: 238_level_1,Unnamed: 239_level_1,Unnamed: 240_level_1,Unnamed: 241_level_1,Unnamed: 242_level_1,Unnamed: 243_level_1,Unnamed: 244_level_1,Unnamed: 245_level_1,Unnamed: 246_level_1,Unnamed: 247_level_1,Unnamed: 248_level_1,Unnamed: 249_level_1,Unnamed: 250_level_1,Unnamed: 251_level_1,Unnamed: 252_level_1,Unnamed: 253_level_1,Unnamed: 254_level_1,Unnamed: 255_level_1,Unnamed: 256_level_1,Unnamed: 257_level_1,Unnamed: 258_level_1,Unnamed: 259_level_1,Unnamed: 260_level_1,Unnamed: 261_level_1,Unnamed: 262_level_1,Unnamed: 263_level_1,Unnamed: 264_level_1,Unnamed: 265_level_1,Unnamed: 266_level_1,Unnamed: 267_level_1,Unnamed: 268_level_1,Unnamed: 269_level_1,Unnamed: 270_level_1,Unnamed: 271_level_1,Unnamed: 272_level_1,Unnamed: 273_level_1,Unnamed: 274_level_1,Unnamed: 275_level_1,Unnamed: 276_level_1,Unnamed: 277_level_1,Unnamed: 278_level_1,Unnamed: 279_level_1,Unnamed: 280_level_1,Unnamed: 281_level_1,Unnamed: 282_level_1,Unnamed: 283_level_1,Unnamed: 284_level_1,Unnamed: 285_level_1,Unnamed: 286_level_1,Unnamed: 287_level_1,Unnamed: 288_level_1,Unnamed: 289_level_1,Unnamed: 290_level_1,Unnamed: 291_level_1,Unnamed: 292_level_1,Unnamed: 293_level_1,Unnamed: 294_level_1,Unnamed: 295_level_1,Unnamed: 296_level_1,Unnamed: 297_level_1,Unnamed: 298_level_1,Unnamed: 299_level_1,Unnamed: 300_level_1,Unnamed: 301_level_1,Unnamed: 302_level_1,Unnamed: 303_level_1,Unnamed: 304_level_1,Unnamed: 305_level_1,Unnamed: 306_level_1,Unnamed: 307_level_1,Unnamed: 308_level_1,Unnamed: 309_level_1,Unnamed: 310_level_1,Unnamed: 311_level_1,Unnamed: 312_level_1,Unnamed: 313_level_1,Unnamed: 314_level_1,Unnamed: 315_level_1,Unnamed: 316_level_1,Unnamed: 317_level_1,Unnamed: 318_level_1,Unnamed: 319_level_1,Unnamed: 320_level_1,Unnamed: 321_level_1,Unnamed: 322_level_1,Unnamed: 323_level_1,Unnamed: 324_level_1,Unnamed: 325_level_1,Unnamed: 326_level_1,Unnamed: 327_level_1,Unnamed: 328_level_1,Unnamed: 329_level_1,Unnamed: 330_level_1,Unnamed: 331_level_1,Unnamed: 332_level_1,Unnamed: 333_level_1,Unnamed: 334_level_1,Unnamed: 335_level_1,Unnamed: 336_level_1,Unnamed: 337_level_1,Unnamed: 338_level_1,Unnamed: 339_level_1,Unnamed: 340_level_1,Unnamed: 341_level_1,Unnamed: 342_level_1,Unnamed: 343_level_1,Unnamed: 344_level_1,Unnamed: 345_level_1,Unnamed: 346_level_1,Unnamed: 347_level_1,Unnamed: 348_level_1,Unnamed: 349_level_1,Unnamed: 350_level_1,Unnamed: 351_level_1,Unnamed: 352_level_1,Unnamed: 353_level_1,Unnamed: 354_level_1,Unnamed: 355_level_1,Unnamed: 356_level_1,Unnamed: 357_level_1,Unnamed: 358_level_1,Unnamed: 359_level_1,Unnamed: 360_level_1,Unnamed: 361_level_1,Unnamed: 362_level_1,Unnamed: 363_level_1,Unnamed: 364_level_1,Unnamed: 365_level_1,Unnamed: 366_level_1,Unnamed: 367_level_1,Unnamed: 368_level_1,Unnamed: 369_level_1,Unnamed: 370_level_1,Unnamed: 371_level_1,Unnamed: 372_level_1,Unnamed: 373_level_1,Unnamed: 374_level_1,Unnamed: 375_level_1,Unnamed: 376_level_1,Unnamed: 377_level_1,Unnamed: 378_level_1,Unnamed: 379_level_1,Unnamed: 380_level_1,Unnamed: 381_level_1,Unnamed: 382_level_1,Unnamed: 383_level_1,Unnamed: 384_level_1,Unnamed: 385_level_1,Unnamed: 386_level_1,Unnamed: 387_level_1,Unnamed: 388_level_1,Unnamed: 389_level_1,Unnamed: 390_level_1,Unnamed: 391_level_1,Unnamed: 392_level_1,Unnamed: 393_level_1,Unnamed: 394_level_1,Unnamed: 395_level_1,Unnamed: 396_level_1,Unnamed: 397_level_1,Unnamed: 398_level_1,Unnamed: 399_level_1,Unnamed: 400_level_1,Unnamed: 401_level_1,Unnamed: 402_level_1,Unnamed: 403_level_1,Unnamed: 404_level_1,Unnamed: 405_level_1,Unnamed: 406_level_1,Unnamed: 407_level_1,Unnamed: 408_level_1,Unnamed: 409_level_1,Unnamed: 410_level_1,Unnamed: 411_level_1,Unnamed: 412_level_1,Unnamed: 413_level_1,Unnamed: 414_level_1,Unnamed: 415_level_1,Unnamed: 416_level_1,Unnamed: 417_level_1,Unnamed: 418_level_1,Unnamed: 419_level_1,Unnamed: 420_level_1,Unnamed: 421_level_1,Unnamed: 422_level_1,Unnamed: 423_level_1,Unnamed: 424_level_1,Unnamed: 425_level_1,Unnamed: 426_level_1,Unnamed: 427_level_1,Unnamed: 428_level_1,Unnamed: 429_level_1,Unnamed: 430_level_1,Unnamed: 431_level_1,Unnamed: 432_level_1,Unnamed: 433_level_1,Unnamed: 434_level_1,Unnamed: 435_level_1,Unnamed: 436_level_1,Unnamed: 437_level_1,Unnamed: 438_level_1,Unnamed: 439_level_1,Unnamed: 440_level_1,Unnamed: 441_level_1,Unnamed: 442_level_1,Unnamed: 443_level_1,Unnamed: 444_level_1,Unnamed: 445_level_1,Unnamed: 446_level_1,Unnamed: 447_level_1,Unnamed: 448_level_1,Unnamed: 449_level_1,Unnamed: 450_level_1,Unnamed: 451_level_1,Unnamed: 452_level_1,Unnamed: 453_level_1,Unnamed: 454_level_1,Unnamed: 455_level_1,Unnamed: 456_level_1,Unnamed: 457_level_1,Unnamed: 458_level_1,Unnamed: 459_level_1,Unnamed: 460_level_1,Unnamed: 461_level_1,Unnamed: 462_level_1,Unnamed: 463_level_1,Unnamed: 464_level_1,Unnamed: 465_level_1,Unnamed: 466_level_1,Unnamed: 467_level_1,Unnamed: 468_level_1,Unnamed: 469_level_1,Unnamed: 470_level_1,Unnamed: 471_level_1,Unnamed: 472_level_1,Unnamed: 473_level_1,Unnamed: 474_level_1,Unnamed: 475_level_1,Unnamed: 476_level_1,Unnamed: 477_level_1,Unnamed: 478_level_1,Unnamed: 479_level_1,Unnamed: 480_level_1,Unnamed: 481_level_1,Unnamed: 482_level_1,Unnamed: 483_level_1,Unnamed: 484_level_1,Unnamed: 485_level_1,Unnamed: 486_level_1,Unnamed: 487_level_1,Unnamed: 488_level_1,Unnamed: 489_level_1,Unnamed: 490_level_1,Unnamed: 491_level_1,Unnamed: 492_level_1,Unnamed: 493_level_1,Unnamed: 494_level_1,Unnamed: 495_level_1,Unnamed: 496_level_1,Unnamed: 497_level_1,Unnamed: 498_level_1,Unnamed: 499_level_1,Unnamed: 500_level_1,Unnamed: 501_level_1
1,4,21,121.0,60.52381,21.0,7,10,2,0.33928,38.070486,6.0,65.2,23.0,4.0,2.0,0.713454,47.305038,5.25,111.5,22.0,1.5,0.284635,40.106604,5.0,91.0,53.6,4.0,-0.050477,30.899838,1,3,2,3,4,2.0,-1.885094,-0.606096,0.0,0.0,0.0,-0.832693,0.838651,0.5,13.796135,5.399203,1.154701,0.0,0.57735,0.333166,88.0,1.138542,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False
2,7,85,123.0,63.752941,1.0,9,56,2,0.141339,39.596448,21.0,81.095238,23.0,6.0,2.0,1.837844,47.337712,12.142857,112.142857,16.714286,1.857143,0.385028,37.321202,5.0,96.0,33.0,4.0,-0.441865,24.533598,3,6,2,3,4,0.161303,-0.630334,-0.306556,-1.222061,0.0,-2.645751,-0.674331,0.896376,5.36745,12.294017,15.91147,10.094789,0.816497,0.377964,0.764827,117.0,2.695193,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True
3,5,41,123.0,69.04878,13.0,8,27,2,-0.029453,40.594305,11.0,81.222222,24.0,5.0,2.0,0.627002,46.546393,8.2,121.8,19.0,1.8,-0.018559,41.7499,5.0,117.0,59.181818,3.0,-0.636786,38.266173,3,5,2,3,4,-0.363269,-2.236068,0.329403,-0.410083,0.0,-2.236068,0.762518,-0.800806,2.588436,2.683282,8.846253,5.612486,0.707107,0.447214,0.569761,95.0,-0.092793,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True
7,4,73,123.0,65.493151,21.0,11,37,2,0.17295,35.379342,24.0,71.761905,24.0,9.0,2.0,0.813504,39.097683,18.25,122.5,22.5,1.75,0.205753,35.747877,12.0,122.0,56.25,6.0,-0.25721,32.945409,1,2,2,3,4,-0.198134,0.0,-0.768154,0.0,-0.37037,-2.0,0.480352,-0.013561,5.315073,0.57735,7.26281,1.732051,1.5,0.5,0.45645,90.0,0.823014,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False
10,4,114,123.0,67.342105,5.0,9,94,2,-0.142492,37.683787,46.0,78.347826,16.0,9.0,2.0,0.565263,42.732528,28.5,113.0,12.75,1.75,0.064432,36.672337,5.0,83.0,46.0,1.0,-0.743617,32.259515,1,4,1,2,3,-0.996725,-2.0,0.273441,-1.949557,-0.436662,-2.0,0.778005,-0.009675,17.136705,20.0,13.247643,5.188127,3.304038,0.5,0.564522,51.0,0.257729,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False


Now we can compute correlations using Spearman's Rank

In [124]:
# Compute the correlations between all features and the target
corrs = one_hot_fm.corrwith(feature_matrix['label'], axis=0, method='spearman')

# Drop NaN from corrs
corrs = corrs.dropna()

# Drop the label_True and label_False columns
corrs = corrs.drop(['label_True', 'label_False'])

# Sort the correlations by magnitude
corrs = corrs.abs().sort_values(ascending=False)



Here we have found quite a few different features with high magnitude of correlation. Lets choose the top 10 and see how they perform on the model.

In [125]:
# Select top-10 features and return X, y (X.shape = (767, 10)
X = one_hot_fm[corrs.index[:10]]
y = feature_matrix['label']
X.shape

(767, 10)

### 2.2 Get accuracy and list features

In [126]:
clf = RandomForestClassifier(n_estimators=400, n_jobs=-1)
scores = cross_val_score(estimator=clf,X=X, y=y, cv=3,
                             scoring="roc_auc", verbose=True)

print("AUC %.2f" % (scores.mean()))

# Print top-10 features

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


AUC 0.78


[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    1.1s finished


## Task3. Writing Questions

1. Please list three advantages and disadvantages of featuretools. 
2. For those disadvantages you listed above, do you have any idea to improve it? 

--- Write your answer here---
1. Please list three advantages and disadvantages of featuretools.
    - Advantages of featuretools.
        1. With very little domain knowledge a data scientist is able to compute a huge number of new features from raw data, saving time and minimizing the amount of manual effort required for feature engineering.
        2. Automatically combines transformation and aggregation operations, which may lead to the discovery of highly predictive features that even a data scientist with strong domain knowledge could miss.
        3. Appears to be a scalable solution that can handle both large volumes of data and schemas that are complex.

    - Disadvantages of featuretools.
        1. Could require a huge amount of computation and memory since the process of generating all of these features has the potential to be very resource intensive.
        2. Has the potential to produce more work for the data scientist, since you are left having to decide what features to actually use. Given that the final number of features after Deep Feature Synthesis could be an order of magnitude higher than the original feature set, a data scientist may be left with a huge amount of work to determine what features are best.
        3. The creation of meaningless features. While some features created will be very meaningful, the process itself is bound to produce a large number of features that are irrelevant.
        4. (I know we were only asked for 3, but I just thought of this one...) The creation of features that are somewhat hard to interpret. Because we are combining transformations with joins on a large number of features, we may end up with something that is highly predictive but hard to understand.

2. For those disadvantages you listed above, do you have any idea to improve it?
    - One thing they have already done that is quite useful is the implementation of remove_single_value_features, remove_highly_correlated_features, remove_low_information_features, and remove_highly_null_features. This has the effect of automating away some of the task of deciding what not to use.
    - To increase the interpretability of features that are created we could use techniques like SHAP or LIME to help interpret the impact of complex features.
    - To avoid creating meaning features we could define custom primitives (transformations and aggregations) that are tailored to the specific domain.
    - After feature generation, a data scientist could use statistical tests or do EDA to filter out features with low variance or low relevance to the target.
    - To deal with the potential for high computation and memory requirements a data scientist could opt to use cloud computing resources with distributed frameworks to scale the feature generation process across multiple nodes.

## Submission
Complete the code in this notebook, and submit it to the CourSys activity Assignment 8.