# TOC
1. Prep and Import
2. Create loyalty marker  
3. Find average orders per department  
4. Create average price column  
5. Set low & high spender flags   
6. Create frequency flag   
7. Export pickle   

## Prep

In [1]:
# Importing libraries
import pandas as pd
import numpy as np
import os

In [2]:
# Importing data
path = path = r'C:\Users\Ryzen RGB Madness!!!\Instacart Basket Analysis'

In [3]:
ords_prods_merge = pd.read_pickle(os.path.join(path, '02 Data', 'Cleaned', 'ords_prods_merge_grouped.pkl'))

In [4]:
ords_prods_merge.head(15)

Unnamed: 0,order_id,user_id,order_number,order_day_of_week,order_hour_of_day,days_since_prior_order,product_id,add_to_cart_order,reordered,_merge,product_name,aisle_id,department_id,prices,price_range_loc,busiest_day,busiest_days,busiest_period_of_day,max_order,loyalty_flag
0,2539329,1,1,2,8,,196,1,0,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer
1,2398795,1,2,3,7,15.0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Least busy,Average orders,10,New customer
2,473747,1,3,3,12,21.0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Least busy,Average orders,10,New customer
3,2254736,1,4,4,7,29.0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Least busy,Least busy,Average orders,10,New customer
4,431534,1,5,4,15,28.0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Least busy,Least busy,Average orders,10,New customer
5,3367565,1,6,2,7,19.0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer
6,550135,1,7,1,9,20.0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Busiest day,Average orders,10,New customer
7,3108588,1,8,1,14,14.0,196,2,1,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Busiest day,Most orders,10,New customer
8,2295261,1,9,1,16,0.0,196,4,1,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Busiest day,Average orders,10,New customer
9,2550362,1,10,4,8,30.0,196,1,1,both,Soda,77,7,9.0,Mid-range product,Least busy,Least busy,Average orders,10,New customer


In [5]:
ords_prods_merge.shape

(32404859, 20)

In [6]:
ords_prods_merge['loyalty_flag'].value_counts(dropna = False)

loyalty_flag
Regular customer    15876776
Loyal customer      10284093
New customer         6243990
Name: count, dtype: int64

## Task

#### 4.8.2 - Finding average orders for each department for full dataframe

In [7]:
# Calculating average orders for each department
ords_prods_merge.groupby('department_id').order_number.mean()

department_id
1     15.457838
2     17.277920
3     17.170395
4     17.811403
5     15.215751
6     16.439806
7     17.225802
8     15.340650
9     15.895474
10    20.197148
11    16.170638
12    15.887671
13    16.583536
14    16.773669
15    16.165037
16    17.665606
17    15.694469
18    19.310397
19    17.177343
20    16.473447
21    22.902379
Name: order_number, dtype: float64

#### 4.8.3 - Observations of dataframe vs subset

The entire dataframe contains ALL departments, which also changes which department has the lowest number of orders. In the subset, it was department 17, and for the full dataframe, it's department 5.

It also changes the highest-selling department - in the subset it was department 16, here it's department 21. 

#### 4.8.5 - Investigating spending habits of customer types

In [8]:
# Pulling basic stats for each loyalty category
ords_prods_merge.groupby('loyalty_flag').agg({'prices': ['mean', 'min', 'max']})

Unnamed: 0_level_0,prices,prices,prices
Unnamed: 0_level_1,mean,min,max
loyalty_flag,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Loyal customer,10.386336,1.0,99999.0
New customer,13.29467,1.0,99999.0
Regular customer,12.495717,1.0,99999.0


I would have expected the opposite conclusion to happen here, but looking at the results and thinking about use cases - new customers may be purchasing higher dollar items that they need in an 'emergency' situation, whereas the loyal customers are likely purchasing across more departments & categories, which is going to even out their average product cost. 

#### 4.8.6 - Creating a spending flag

In [9]:
# Creating average price column
ords_prods_merge['average_price'] = ords_prods_merge.groupby(['user_id'])['prices'].transform(np.mean)

In [10]:
# Checking new column
ords_prods_merge.head(20)

Unnamed: 0,order_id,user_id,order_number,order_day_of_week,order_hour_of_day,days_since_prior_order,product_id,add_to_cart_order,reordered,_merge,...,aisle_id,department_id,prices,price_range_loc,busiest_day,busiest_days,busiest_period_of_day,max_order,loyalty_flag,average_price
0,2539329,1,1,2,8,,196,1,0,both,...,77,7,9.0,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797
1,2398795,1,2,3,7,15.0,196,1,1,both,...,77,7,9.0,Mid-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797
2,473747,1,3,3,12,21.0,196,1,1,both,...,77,7,9.0,Mid-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797
3,2254736,1,4,4,7,29.0,196,1,1,both,...,77,7,9.0,Mid-range product,Least busy,Least busy,Average orders,10,New customer,6.367797
4,431534,1,5,4,15,28.0,196,1,1,both,...,77,7,9.0,Mid-range product,Least busy,Least busy,Average orders,10,New customer,6.367797
5,3367565,1,6,2,7,19.0,196,1,1,both,...,77,7,9.0,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797
6,550135,1,7,1,9,20.0,196,1,1,both,...,77,7,9.0,Mid-range product,Regularly busy,Busiest day,Average orders,10,New customer,6.367797
7,3108588,1,8,1,14,14.0,196,2,1,both,...,77,7,9.0,Mid-range product,Regularly busy,Busiest day,Most orders,10,New customer,6.367797
8,2295261,1,9,1,16,0.0,196,4,1,both,...,77,7,9.0,Mid-range product,Regularly busy,Busiest day,Average orders,10,New customer,6.367797
9,2550362,1,10,4,8,30.0,196,1,1,both,...,77,7,9.0,Mid-range product,Least busy,Least busy,Average orders,10,New customer,6.367797


In [11]:
# Setting low spender flag
ords_prods_merge.loc[ords_prods_merge['average_price'] < 10, 'spender_type'] = 'Low spender'

In [12]:
# Setting high spender flag
ords_prods_merge.loc[ords_prods_merge['average_price'] >= 10, 'spender_type'] = 'High spender'

In [13]:
# Verification
ords_prods_merge.shape

(32404859, 22)

In [14]:
ords_prods_merge['spender_type'].value_counts(dropna = False)

spender_type
Low spender     31770614
High spender      634245
Name: count, dtype: int64

Total of high spender and low spender is 32,404,859 - matching shape of dataframe. 

The question here then becomes HOW much higher the high spenders are (ex: sure, the Kia Rio is a very popular car but how much more profit is a car company making every time they sell a ... BMW? I drive an '05 Civic, the car analogy wasn't my best choice), and is it a better use of dollars to get the high spenders to spend MORE or try to coax more out of the low spenders? 

#### 4.8.7 - Frequent vs Non-Frequent Customers

In [15]:
# Creating order regularity column
ords_prods_merge['frequency'] = ords_prods_merge.groupby(['user_id'])['days_since_prior_order'].transform(np.median)

In [16]:
# Checking new column
ords_prods_merge.head(20)

Unnamed: 0,order_id,user_id,order_number,order_day_of_week,order_hour_of_day,days_since_prior_order,product_id,add_to_cart_order,reordered,_merge,...,prices,price_range_loc,busiest_day,busiest_days,busiest_period_of_day,max_order,loyalty_flag,average_price,spender_type,frequency
0,2539329,1,1,2,8,,196,1,0,both,...,9.0,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797,Low spender,20.5
1,2398795,1,2,3,7,15.0,196,1,1,both,...,9.0,Mid-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797,Low spender,20.5
2,473747,1,3,3,12,21.0,196,1,1,both,...,9.0,Mid-range product,Regularly busy,Least busy,Average orders,10,New customer,6.367797,Low spender,20.5
3,2254736,1,4,4,7,29.0,196,1,1,both,...,9.0,Mid-range product,Least busy,Least busy,Average orders,10,New customer,6.367797,Low spender,20.5
4,431534,1,5,4,15,28.0,196,1,1,both,...,9.0,Mid-range product,Least busy,Least busy,Average orders,10,New customer,6.367797,Low spender,20.5
5,3367565,1,6,2,7,19.0,196,1,1,both,...,9.0,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797,Low spender,20.5
6,550135,1,7,1,9,20.0,196,1,1,both,...,9.0,Mid-range product,Regularly busy,Busiest day,Average orders,10,New customer,6.367797,Low spender,20.5
7,3108588,1,8,1,14,14.0,196,2,1,both,...,9.0,Mid-range product,Regularly busy,Busiest day,Most orders,10,New customer,6.367797,Low spender,20.5
8,2295261,1,9,1,16,0.0,196,4,1,both,...,9.0,Mid-range product,Regularly busy,Busiest day,Average orders,10,New customer,6.367797,Low spender,20.5
9,2550362,1,10,4,8,30.0,196,1,1,both,...,9.0,Mid-range product,Least busy,Least busy,Average orders,10,New customer,6.367797,Low spender,20.5


In [17]:
# Setting non-frequent buyer flag
ords_prods_merge.loc[ords_prods_merge['frequency'] > 20, 'freq_flag'] = 'Non-frequent customer'

In [18]:
# Setting regular buyer flag
ords_prods_merge.loc[ords_prods_merge['frequency'] <= 20, 'freq_flag'] = 'Regular customer'

In [19]:
# Setting frequent buyer flag
ords_prods_merge.loc[ords_prods_merge['frequency'] <= 10, 'freq_flag'] = 'Frequent customer'

In [20]:
# Verification
ords_prods_merge.shape

(32404859, 24)

In [21]:
ords_prods_merge['freq_flag'].value_counts(dropna = False)

freq_flag
Frequent customer        21559853
Regular customer          7208564
Non-frequent customer     3636437
NaN                             5
Name: count, dtype: int64

#### 4.8.9 - Exporting Data

In [24]:
ords_prods_merge.to_pickle(os.path.join(path, '02 Data', 'Cleaned', 'ords_prods_merge_mktg.pkl'))