# Table of Contents: 
### Step 1: Create new notebook and import libraries and df
### Step 2: Find aggregated mean of order_number grouped by department_id
### Step 3: Analyze result
### Step 4: Create loyalty flag for existing customers
### Step 5: Determine if prices of products purchase by loyal customers differ from regular or new customers
### Step 6: Create spending flag for each user
### Step 7: Create order frequency flag
### Step 8: Ensure notebook is clean and structured and code is commented
### Step 9: Export df

### Step 1: Create new notebook and import relevant libraries and dataframe

In [1]:
# import libraries 
import pandas as pd
import numpy as np
import os

In [2]:
# turning folder path into string
path = r'/Users/davesmac/Desktop/04-2022- Instacart Basket Analysis'

In [3]:
# Import ords_prods_merge
ords_prods_merge = pd.read_pickle(os.path.join(path, '02 Data', 'Prepared Data', 'orders_products_merged.pkl'))

### Step 2: Find aggregated mean of order_number grouped by department_id

In [5]:
ords_prods_merge.groupby('department_id')['order_number'].mean()

department_id
1     15.457838
2     17.277920
3     17.170395
4     17.811403
5     15.215751
6     16.439806
7     17.225802
8     15.340650
9     15.895474
10    20.197148
11    16.170638
12    15.887671
13    16.583536
14    16.773669
15    16.165037
16    17.665606
17    15.694469
18    19.310397
19    17.177343
20    16.473447
21    22.902379
Name: order_number, dtype: float64

### Step 3: Analyze Result

#### The results in the entire dataframe have more department id's than the subset.  The mean's for the overlapping department id's are also slightly different (due to many more entires in the dataframe versus the subset)

### Step 4: Create loyalty flag for existing customers

In [26]:
# create new column, 'max_order' to show how many orders each customer placed
ords_prods_merge['max_order'] = ords_prods_merge.groupby(['user_id'])['order_number'].transform(np.max)

In [27]:
# create 'loyal customer' label for customers with more than 40 orders
ords_prods_merge.loc[ords_prods_merge['max_order'] > 40, 'loyalty_flag'] = 'loyal customer'

In [28]:
# create 'regular customer' label for customers with more than 10 orders and 40 or less
ords_prods_merge.loc[(ords_prods_merge['max_order'] <= 40) & (ords_prods_merge['max_order'] > 10), 'loyalty_flag'] = 'regular customer'

In [29]:
# create 'new customer' label for customers with 10 or less orders
ords_prods_merge.loc[ords_prods_merge['max_order'] <= 10, 'loyalty_flag'] = 'new customer'

In [30]:
ords_prods_merge.head()

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,first_order,product_id,add_to_cart_order,reordered,...,department_id,prices,_merge,price_label,Busiest_days,busiest_period_of_day,max_order,loyalty_flag,avg_price_per_order,spender_flag
0,2539329,1,1,2,8,,True,196,1,0,...,7,9.0,both,Mid-range product,Regular busy,Average orders,10,new customer,6.367797,Low Spender
1,2398795,1,2,3,7,15.0,False,196,1,1,...,7,9.0,both,Mid-range product,Slowest days,Average orders,10,new customer,6.367797,Low Spender
2,473747,1,3,3,12,21.0,False,196,1,1,...,7,9.0,both,Mid-range product,Slowest days,Average orders,10,new customer,6.367797,Low Spender
3,2254736,1,4,4,7,29.0,False,196,1,1,...,7,9.0,both,Mid-range product,Slowest days,Average orders,10,new customer,6.367797,Low Spender
4,431534,1,5,4,15,28.0,False,196,1,1,...,7,9.0,both,Mid-range product,Slowest days,Average orders,10,new customer,6.367797,Low Spender


### Step 5: Determine whether prices of products purchase by loyal customers differ from regular or new customers

In [31]:
# Group df by customer group, display stats for each
ords_prods_merge.groupby('loyalty_flag').agg({'prices': ['mean', 'min', 'max']})

Unnamed: 0_level_0,prices,prices,prices
Unnamed: 0_level_1,mean,min,max
loyalty_flag,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
loyal customer,10.386336,1.0,99999.0
new customer,13.29467,1.0,99999.0
regular customer,12.495717,1.0,99999.0


### The mean price of goods purchased by loyal customers is less than new or regular customers

### Step 6: Create spending flag for each user

In [32]:
# create new column, 'max_order' to show how many orders each customer placed
ords_prods_merge['avg_price_per_order'] = ords_prods_merge.groupby(['user_id'])['prices'].transform(np.mean)

In [34]:
# create 'Low spender' label for customers with mean price of products < than 10
ords_prods_merge.loc[ords_prods_merge['avg_price_per_order'] < 10, 'spender_flag'] = 'Low Spender'

In [35]:
# create 'High spender' label for customers with mean price of products >= 10
ords_prods_merge.loc[ords_prods_merge['avg_price_per_order'] >= 10, 'spender_flag'] = 'High Spender'

In [53]:
ords_prods_merge.head()

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,first_order,product_id,add_to_cart_order,reordered,...,_merge,price_label,Busiest_days,busiest_period_of_day,max_order,loyalty_flag,avg_price_per_order,spender_flag,order_frequency,order_frequency_flag
0,2539329,1,1,2,8,,True,196,1,0,...,both,Mid-range product,Regular busy,Average orders,10,new customer,6.367797,Low Spender,20.259259,Non-frequent customer
1,2398795,1,2,3,7,15.0,False,196,1,1,...,both,Mid-range product,Slowest days,Average orders,10,new customer,6.367797,Low Spender,20.259259,Non-frequent customer
2,473747,1,3,3,12,21.0,False,196,1,1,...,both,Mid-range product,Slowest days,Average orders,10,new customer,6.367797,Low Spender,20.259259,Non-frequent customer
3,2254736,1,4,4,7,29.0,False,196,1,1,...,both,Mid-range product,Slowest days,Average orders,10,new customer,6.367797,Low Spender,20.259259,Non-frequent customer
4,431534,1,5,4,15,28.0,False,196,1,1,...,both,Mid-range product,Slowest days,Average orders,10,new customer,6.367797,Low Spender,20.259259,Non-frequent customer


### Step 7: Create order frequency flag

In [37]:
# create new column, 'order_frequency' to determine frequent vs non frequent customers
ords_prods_merge['order_frequency'] = ords_prods_merge.groupby(['user_id'])['days_since_prior_order'].transform(np.mean)

In [38]:
# create 'Non-frequent customer' label for customers w/median of days_since_prior_order > than 20
ords_prods_merge.loc[ords_prods_merge['order_frequency'] > 20, 'order_frequency_flag'] = 'Non-frequent customer'

In [40]:
# create 'Regular customer' label for customers w/median of days_since_prior_order > 10 and <= to 20
ords_prods_merge.loc[(ords_prods_merge['order_frequency'] <= 20) & (ords_prods_merge['order_frequency'] > 10), 'order_frequency_flag'] = 'Regular customer'

In [42]:
# create 'Frequent customer' label for customers w/median of days_since_prior_order <= than 10
ords_prods_merge.loc[ords_prods_merge['order_frequency'] <= 10, 'order_frequency_flag'] = 'Frequent customer'

In [45]:
ords_prods_merge.head()

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,first_order,product_id,add_to_cart_order,reordered,...,_merge,price_label,Busiest_days,busiest_period_of_day,max_order,loyalty_flag,avg_price_per_order,spender_flag,order_frequency,order_frequency_flag
0,2539329,1,1,2,8,,True,196,1,0,...,both,Mid-range product,Regular busy,Average orders,10,new customer,6.367797,Low Spender,20.259259,Non-frequent customer
1,2398795,1,2,3,7,15.0,False,196,1,1,...,both,Mid-range product,Slowest days,Average orders,10,new customer,6.367797,Low Spender,20.259259,Non-frequent customer
2,473747,1,3,3,12,21.0,False,196,1,1,...,both,Mid-range product,Slowest days,Average orders,10,new customer,6.367797,Low Spender,20.259259,Non-frequent customer
3,2254736,1,4,4,7,29.0,False,196,1,1,...,both,Mid-range product,Slowest days,Average orders,10,new customer,6.367797,Low Spender,20.259259,Non-frequent customer
4,431534,1,5,4,15,28.0,False,196,1,1,...,both,Mid-range product,Slowest days,Average orders,10,new customer,6.367797,Low Spender,20.259259,Non-frequent customer


### Step 8: Ensure notebook is clean and structured and code is commented

#### Notebook is clean, structured and code is well commented.

### Step 9: Export dataframe

In [50]:
#exporting df as a pkl file
ords_prods_merge.to_pickle(os.path.join(path, '02 Data', 'Prepared Data', 'orders_products_merged.pkl'))