In this Notebook:
1. Find the order mean for each department
2. Create a loyal customer flag
3. Check the spending habits of different customers based on their loyalty category
4. Create a spending flag for each customer
5. Create a frequency flag for each customer

In [1]:
#Importing Libraries
import pandas as pd
import numpy as np
import os

In [2]:
#Creating path to Instacart Folder
path = r"/Users/katerinapilota/Desktop/Desktop - Pilot's Mac mini/dataimmersion/python/ 02:03:21 Instacart Basket Analysis"

In [3]:
#importing data
ords_prods_merge = pd.read_pickle(os.path.join(path, '02 Data', 'Prepared Data', 'final_combined_2'))

1. Finding the aggregated order mean for each department

In [4]:
#using agg() to find the mean by each department (groupby)
ords_prods_merge.groupby('department_id').agg({'order_number': ['mean']})

Unnamed: 0_level_0,order_number
Unnamed: 0_level_1,mean
department_id,Unnamed: 1_level_2
1,15.457838
2,17.27792
3,17.170395
4,17.811403
5,15.215751
6,16.439806
7,17.225802
8,15.34065
9,15.895474
10,20.197148


Answer to question 3 from the task: The most immediate difference is that the application of this function on the subset resulted in only a portion of the departments being included at all. 
And, of course, the means are a bit different when found from a larger sample. 

2. Create a loyal customer flag using transform() and loc()

In [6]:
#using transform to create a new column
ords_prods_merge['max_order'] = ords_prods_merge.groupby(['user_id'])['order_number'].transform(np.max)

In [7]:
#using loc()
ords_prods_merge.loc[ords_prods_merge['max_order'] > 40, 'loyalty_flag'] = 'Loyal customer'

In [8]:
ords_prods_merge.loc[(ords_prods_merge['max_order'] <= 40) & (ords_prods_merge['max_order'] > 10), 'loyalty_flag'] = 'Regular customer'

In [9]:
ords_prods_merge.loc[ords_prods_merge['max_order'] <= 10, 'loyalty_flag'] = 'New customer'

In [10]:
#checking frequency of loyalty_flag column
ords_prods_merge['loyalty_flag'].value_counts(dropna = False)

Regular customer    15876776
Loyal customer      10284093
New customer         6243990
Name: loyalty_flag, dtype: int64

3. Check the spending habits of different customers based on their loyalty category

In [11]:
#checking the statistics for each loyalty category
ords_prods_merge.head()

Unnamed: 0,Unnamed: 0_x,Unnamed: 0.1,order_id,user_id,order_number,orders_days_of_the_week,time_of_order_24hr_time,days_since_prior_order,product_id,add_to_cart_order,...,aisle_id,department_id,prices,True,price_range_loc,Busiest day,Two busiest days,busiest_period_day,max_order,loyalty_flag
0,0,0,2539329,1,1,2,8,0.0,196,1,...,77,7,9.0,both,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer
1,1,1,2398795,1,2,3,7,15.0,196,1,...,77,7,9.0,both,Mid-range product,Regularly busy,Two least busy days,Average orders,10,New customer
2,2,2,473747,1,3,3,12,21.0,196,1,...,77,7,9.0,both,Mid-range product,Regularly busy,Two least busy days,Most orders,10,New customer
3,3,3,2254736,1,4,4,7,29.0,196,1,...,77,7,9.0,both,Mid-range product,Least busy day,Two least busy days,Average orders,10,New customer
4,4,4,431534,1,5,4,15,28.0,196,1,...,77,7,9.0,both,Mid-range product,Least busy day,Two least busy days,Most orders,10,New customer


In [12]:
ords_prods_merge.groupby('loyalty_flag').agg({'prices': ['mean', 'max', 'min']})

Unnamed: 0_level_0,prices,prices,prices
Unnamed: 0_level_1,mean,max,min
loyalty_flag,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Loyal customer,10.386336,99999.0,1.0
New customer,13.29467,99999.0,1.0
Regular customer,12.495717,99999.0,1.0


From this we can see that new customers actually buy the most expensive products on average, followed by regular customers and then loyal customers

4. Create a spending flag for each customer

In [14]:
#create a new 'mean_spend' column
ords_prods_merge['mean_spend'] = ords_prods_merge.groupby(['user_id'])['prices'].transform(np.mean)

In [15]:
#create a spending flag for low spenders
ords_prods_merge.loc[ords_prods_merge['mean_spend'] < 10, 'spend_flag'] = 'Low spender'

In [17]:
#create a spending flag for high spenders
ords_prods_merge.loc[ords_prods_merge['mean_spend'] >= 10, 'spend_flag'] = 'High spender'

In [23]:
#checking the new spending flag
ords_prods_merge['spend_flag'].value_counts()

Low spender     31770742
High spender      634117
Name: spend_flag, dtype: int64

In [24]:
ords_prods_merge.head()

Unnamed: 0,Unnamed: 0_x,Unnamed: 0.1,order_id,user_id,order_number,orders_days_of_the_week,time_of_order_24hr_time,days_since_prior_order,product_id,add_to_cart_order,...,prices,True,price_range_loc,Busiest day,Two busiest days,busiest_period_day,max_order,loyalty_flag,mean_spend,spend_flag
0,0,0,2539329,1,1,2,8,0.0,196,1,...,9.0,both,Mid-range product,Regularly busy,Regularly busy,Average orders,10,New customer,6.367797,Low spender
1,1,1,2398795,1,2,3,7,15.0,196,1,...,9.0,both,Mid-range product,Regularly busy,Two least busy days,Average orders,10,New customer,6.367797,Low spender
2,2,2,473747,1,3,3,12,21.0,196,1,...,9.0,both,Mid-range product,Regularly busy,Two least busy days,Most orders,10,New customer,6.367797,Low spender
3,3,3,2254736,1,4,4,7,29.0,196,1,...,9.0,both,Mid-range product,Least busy day,Two least busy days,Average orders,10,New customer,6.367797,Low spender
4,4,4,431534,1,5,4,15,28.0,196,1,...,9.0,both,Mid-range product,Least busy day,Two least busy days,Most orders,10,New customer,6.367797,Low spender


5. Create a frequency flag for each customer

In [26]:
#Create a new column for median for 'days_since_prior_order'
ords_prods_merge['median_ordering'] = ords_prods_merge.groupby(['user_id'])['days_since_prior_order'].transform(np.median)

In [27]:
#create the flag for infrequent customers
ords_prods_merge.loc[ords_prods_merge['median_ordering'] > 20, 'freq_flag'] = 'Non-frequent customer'

In [28]:
#create a flag for regular customers
ords_prods_merge.loc[(ords_prods_merge['median_ordering'] <= 20) & (ords_prods_merge['median_ordering'] > 10), 'freq_flag'] = 'Regular customer'

In [30]:
#Create the Flag for frequent customer
ords_prods_merge.loc[ords_prods_merge['median_ordering'] <= 10, 'freq_flag'] = 'Frequent customer'

Export Data

In [31]:
ords_prods_merge.to_pickle(os.path.join(path, '02 Data', 'Prepared Data', 'final_combined_3'))