1) Create a new notebook for this task. Be sure to import the relevant libraries, along with your ords_prods_merge dataframe, which should include your newly derived columns from the previous Exercise.

2) In this Exercise, you learned how to find the aggregated mean of the “order_number” column grouped by “department_id” for a subset of your dataframe. Now, repeat this process for the entire dataframe.

3) Analyze the result. How do the results for the entire dataframe differ from those of the subset? Include your comments in a markdown cell below the executed code.

4) Follow the instructions in the Exercise for creating a loyalty flag for existing customers using the transform() and loc() functions.

5) The marketing team at Instacart wants to know whether there’s a difference between the spending habits of the three types of customers you identified. Use the loyalty flag you created and check the basic statistics of the product prices for each loyalty category (Loyal Customer, Regular Customer, and New Customer). What you’re trying to determine is whether the prices of products purchased by loyal customers differ from those purchased by regular or new customers.

6) The team now wants to target different types of spenders in their marketing campaigns. This can be achieved by looking at the prices of the items people are buying. Create a spending flag for each user based on the average price across all their orders using the following criteria:
If the mean of the prices of products purchased by a user is lower than 10, then flag them as a “Low spender.”
If the mean of the prices of products purchased by a user is higher than or equal to 10, then flag them as a “High spender.”

7) In order to send relevant notifications to users within the app (for instance, asking users if they want to buy the same item again), the Instacart team wants you to determine frequent versus non-frequent customers. Create an order frequency flag that marks the regularity of a user’s ordering behavior according to the median in the “days_since_prior_order” column. The criteria for the flag should be as follows:
If the median of “days_since_prior_order” is higher than 20, then the customer should be labeled a “Non-frequent customer.”
If the median is higher than 10 and lower than or equal to 20, then the customer should be labeled a “Regular customer.”
If the median is lower than or equal to 10, then the customer should be labeled a “Frequent customer.”

8) Ensure your notebook is clean and structured and that your code is well commented.

9) Export your dataframe as a pickle file and store it correctly in your “Prepared Data” folder.


# --------Task-------- 

## Importing Libraries

In [3]:
# Import Libraries
import pandas as pd
import numpy as np
import os

### Creating a path to the .pkl file

In [4]:
# Path to the project folder
path = r'C:\Users\mmoss\20-12-2021 Instacart Basket Analysis'

### Importing ords_prods_merge

In [6]:
# Importing the 4.8_ords_products_merged_derived
ords_prods_merge = pd.read_pickle(os.path.join(path, '02 Data', 'Prepared Data', '4.7_ords_products_merged_derived.pkl'))

### 2. Finding the aggregated mean of order_number column grouped by department_id  

In [5]:
# Finding the average order_number grouped by department for the entire dataframe
ords_prods_merge.groupby('department_id').agg({'order_number': ['mean']})


Unnamed: 0_level_0,order_number
Unnamed: 0_level_1,mean
department_id,Unnamed: 1_level_2
1,15.457838
2,17.27792
3,17.170395
4,17.811403
5,15.215751
6,16.439806
7,17.225802
8,15.34065
9,15.895474
10,20.197148


### 3. Result of analysis

The difference between the subset and the entire dataframe is that the subset has the mean number of orders for departments 4,7,13,14,16,17, 19 and 20. It's highest mean comes from the dairy eggs department (mean = 19.46) and the lowest mean comes from the household department (mean = 11.29).

The entire dataframe has means for all departments (1-21) and the department with the highest mean is missing (mean = 22.91) and the lowest mean is the alcohol department (mean = 15.21).

### 4. Loyalty Flag

In [8]:
# Creating the max_order column
ords_prods_merge['max_order'] = ords_prods_merge.groupby(['user_id'])['order_number'].transform(np.max)

In [9]:
# Testing it
ords_prods_merge.head(15)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,first_order,product_id,add_to_cart_order,reordered,_merge,product_name,aisle_id,department_id,prices,price_range_loc,busiest_day,busiest_day_2,busiest_period_of_day,max_order
0,2539329,1,1,2,8,,True,196,1,0,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10
1,2398795,1,2,3,7,15.0,False,196,1,1,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Slowest days,Fewest orders,10
2,473747,1,3,3,12,21.0,False,196,1,1,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Slowest days,Fewest orders,10
3,2254736,1,4,4,7,29.0,False,196,1,1,both,Soda,77,7,9.0,Mid-range product,Least busy,Regularly busy,Fewest orders,10
4,431534,1,5,4,15,28.0,False,196,1,1,both,Soda,77,7,9.0,Mid-range product,Least busy,Regularly busy,Fewest orders,10
5,3367565,1,6,2,7,19.0,False,196,1,1,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10
6,550135,1,7,1,9,20.0,False,196,1,1,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10
7,3108588,1,8,1,14,14.0,False,196,2,1,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10
8,2295261,1,9,1,16,0.0,False,196,4,1,both,Soda,77,7,9.0,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10
9,2550362,1,10,4,8,30.0,False,196,1,1,both,Soda,77,7,9.0,Mid-range product,Least busy,Regularly busy,Fewest orders,10


There's the column!

In [10]:
# Code for the Loyal Customer
ords_prods_merge.loc[ords_prods_merge['max_order'] > 40, 'loyalty_flag'] = 'Loyal customer'

In [11]:
# Code for Regular Customer
ords_prods_merge.loc[(ords_prods_merge['max_order'] <= 40) & (ords_prods_merge['max_order'] > 10), 'loyalty_flag'] = 'Regular customer'

In [12]:
# Code for New Customer
ords_prods_merge.loc[ords_prods_merge['max_order'] <= 10, 'loyalty_flag'] = 'New customer'

In [14]:
# Checking frequencies of all the customer types
ords_prods_merge['loyalty_flag'].value_counts(dropna = False)

Regular customer    15874128
Loyal customer      10282763
New customer         6242841
Name: loyalty_flag, dtype: int64

In [15]:
# Test it out
ords_prods_merge[['user_id', 'loyalty_flag', 'order_number']].head(60)

Unnamed: 0,user_id,loyalty_flag,order_number
0,1,New customer,1
1,1,New customer,2
2,1,New customer,3
3,1,New customer,4
4,1,New customer,5
5,1,New customer,6
6,1,New customer,7
7,1,New customer,8
8,1,New customer,9
9,1,New customer,10


Success!

### 5. What customer group pays the higher average price?

In [47]:
# Finding the average price grouped by loyalty flag for the entire dataframe
ords_prods_merge.groupby('loyalty_flag').agg({'prices': ['mean']})

Unnamed: 0_level_0,prices
Unnamed: 0_level_1,mean
loyalty_flag,Unnamed: 1_level_2
Loyal customer,7.773575
New customer,7.801206
Regular customer,7.798262


New customers appear to pay the most but just by a narrow margin. A couple cents is the difference.

### 6. 

In [8]:
# Create the average price paid column by the user
ords_prods_merge['average_price'] = ords_prods_merge.groupby(['user_id'])['prices'].transform(np.average)

In [49]:
# Testing it
ords_prods_merge.head(100)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,first_order,product_id,add_to_cart_order,reordered,...,price_range_loc,busiest_day,busiest_day_2,busiest_period_of_day,max_order,loyalty_flag,average_price,spending_flag,median_days_since_last_order,frequency_flag
0,2539329,1,1,2,8,,True,196,1,0,...,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10,New customer,6.367797,High Spender,20.5,Non-Frequent Customer
1,2398795,1,2,3,7,15.0,False,196,1,1,...,Mid-range product,Regularly busy,Slowest days,Fewest orders,10,New customer,6.367797,High Spender,20.5,Non-Frequent Customer
2,473747,1,3,3,12,21.0,False,196,1,1,...,Mid-range product,Regularly busy,Slowest days,Fewest orders,10,New customer,6.367797,High Spender,20.5,Non-Frequent Customer
3,2254736,1,4,4,7,29.0,False,196,1,1,...,Mid-range product,Least busy,Regularly busy,Fewest orders,10,New customer,6.367797,High Spender,20.5,Non-Frequent Customer
4,431534,1,5,4,15,28.0,False,196,1,1,...,Mid-range product,Least busy,Regularly busy,Fewest orders,10,New customer,6.367797,High Spender,20.5,Non-Frequent Customer
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,3226575,360,1,5,12,,True,196,1,0,...,Mid-range product,Regularly busy,Regularly busy,Fewest orders,3,New customer,10.006250,High Spender,4.0,Frequent Customer
96,1469869,377,3,5,17,3.0,False,196,9,0,...,Mid-range product,Regularly busy,Regularly busy,Fewest orders,3,New customer,8.496552,High Spender,16.5,Regular customer
97,1927023,387,2,4,10,22.0,False,196,3,0,...,Mid-range product,Least busy,Regularly busy,Most orders,8,New customer,7.396610,High Spender,8.0,Frequent Customer
98,858092,420,4,1,19,30.0,False,196,2,0,...,Mid-range product,Regularly busy,Regularly busy,Fewest orders,22,Regular customer,7.387805,High Spender,7.0,Frequent Customer


In [9]:
# Code for Low Spender
ords_prods_merge.loc[ords_prods_merge['average_price'] < 10, 'spending_flag'] = 'Low Spender'

In [10]:
# Code for High Spender
ords_prods_merge.loc[ords_prods_merge['average_price'] >= 10, 'spending_flag'] = 'High Spender'

In [33]:
# Testing it
ords_prods_merge.head(20)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,first_order,product_id,add_to_cart_order,reordered,...,department_id,prices,price_range_loc,busiest_day,busiest_day_2,busiest_period_of_day,max_order,loyalty_flag,average_price,spending_flag
0,2539329,1,1,2,8,,True,196,1,0,...,7,9.0,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10,New customer,14.0,High Spender
1,2398795,1,2,3,7,15.0,False,196,1,1,...,7,9.0,Mid-range product,Regularly busy,Slowest days,Fewest orders,10,New customer,14.0,High Spender
2,473747,1,3,3,12,21.0,False,196,1,1,...,7,9.0,Mid-range product,Regularly busy,Slowest days,Fewest orders,10,New customer,14.0,High Spender
3,2254736,1,4,4,7,29.0,False,196,1,1,...,7,9.0,Mid-range product,Least busy,Regularly busy,Fewest orders,10,New customer,14.0,High Spender
4,431534,1,5,4,15,28.0,False,196,1,1,...,7,9.0,Mid-range product,Least busy,Regularly busy,Fewest orders,10,New customer,14.0,High Spender
5,3367565,1,6,2,7,19.0,False,196,1,1,...,7,9.0,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10,New customer,14.0,High Spender
6,550135,1,7,1,9,20.0,False,196,1,1,...,7,9.0,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10,New customer,14.0,High Spender
7,3108588,1,8,1,14,14.0,False,196,2,1,...,7,9.0,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10,New customer,14.0,High Spender
8,2295261,1,9,1,16,0.0,False,196,4,1,...,7,9.0,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10,New customer,14.0,High Spender
9,2550362,1,10,4,8,30.0,False,196,1,1,...,7,9.0,Mid-range product,Least busy,Regularly busy,Fewest orders,10,New customer,14.0,High Spender


In [15]:
# Value Counts
ords_prods_merge['spending_flag'].value_counts(dropna = False)

Low Spender     32280045
High Spender      119687
Name: spending_flag, dtype: int64

Lots more low spenders than high!

In [11]:
spendingtest = ords_prods_merge[['average_price', 'spending_flag']]

In [14]:
#calling it
spendingtest

Unnamed: 0,average_price,spending_flag
0,6.367797,Low Spender
1,6.367797,Low Spender
2,6.367797,Low Spender
3,6.367797,Low Spender
4,6.367797,Low Spender
...,...,...
32404854,6.905655,Low Spender
32404855,6.905655,Low Spender
32404856,7.631579,Low Spender
32404857,7.631579,Low Spender


### 7. Frequent / Non-frequent Customer

In [39]:
# Create the median_days_since_last_order column by the user
ords_prods_merge['median_days_since_last_order'] = ords_prods_merge.groupby(['user_id'])['days_since_prior_order'].transform(np.median)

In [41]:
# Testing it
ords_prods_merge.head(25)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,first_order,product_id,add_to_cart_order,reordered,...,prices,price_range_loc,busiest_day,busiest_day_2,busiest_period_of_day,max_order,loyalty_flag,average_price,spending_flag,median_days_since_last_order
0,2539329,1,1,2,8,,True,196,1,0,...,9.0,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10,New customer,6.367797,High Spender,20.5
1,2398795,1,2,3,7,15.0,False,196,1,1,...,9.0,Mid-range product,Regularly busy,Slowest days,Fewest orders,10,New customer,6.367797,High Spender,20.5
2,473747,1,3,3,12,21.0,False,196,1,1,...,9.0,Mid-range product,Regularly busy,Slowest days,Fewest orders,10,New customer,6.367797,High Spender,20.5
3,2254736,1,4,4,7,29.0,False,196,1,1,...,9.0,Mid-range product,Least busy,Regularly busy,Fewest orders,10,New customer,6.367797,High Spender,20.5
4,431534,1,5,4,15,28.0,False,196,1,1,...,9.0,Mid-range product,Least busy,Regularly busy,Fewest orders,10,New customer,6.367797,High Spender,20.5
5,3367565,1,6,2,7,19.0,False,196,1,1,...,9.0,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10,New customer,6.367797,High Spender,20.5
6,550135,1,7,1,9,20.0,False,196,1,1,...,9.0,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10,New customer,6.367797,High Spender,20.5
7,3108588,1,8,1,14,14.0,False,196,2,1,...,9.0,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10,New customer,6.367797,High Spender,20.5
8,2295261,1,9,1,16,0.0,False,196,4,1,...,9.0,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10,New customer,6.367797,High Spender,20.5
9,2550362,1,10,4,8,30.0,False,196,1,1,...,9.0,Mid-range product,Least busy,Regularly busy,Fewest orders,10,New customer,6.367797,High Spender,20.5


In [42]:
# Code for Non-Frequent Customer
ords_prods_merge.loc[ords_prods_merge['median_days_since_last_order'] > 20, 'frequency_flag'] = 'Non-Frequent Customer'

In [43]:
# Code for Regular Customer
ords_prods_merge.loc[(ords_prods_merge['median_days_since_last_order'] > 10) & (ords_prods_merge['median_days_since_last_order'] <= 20), 'frequency_flag'] = 'Regular customer'

In [44]:
# Code for Frequent Customer
ords_prods_merge.loc[ords_prods_merge['median_days_since_last_order'] <= 10, 'frequency_flag'] = 'Frequent Customer'

In [46]:
# Testing it
ords_prods_merge.head(100)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,first_order,product_id,add_to_cart_order,reordered,...,price_range_loc,busiest_day,busiest_day_2,busiest_period_of_day,max_order,loyalty_flag,average_price,spending_flag,median_days_since_last_order,frequency_flag
0,2539329,1,1,2,8,,True,196,1,0,...,Mid-range product,Regularly busy,Regularly busy,Fewest orders,10,New customer,6.367797,High Spender,20.5,Non-Frequent Customer
1,2398795,1,2,3,7,15.0,False,196,1,1,...,Mid-range product,Regularly busy,Slowest days,Fewest orders,10,New customer,6.367797,High Spender,20.5,Non-Frequent Customer
2,473747,1,3,3,12,21.0,False,196,1,1,...,Mid-range product,Regularly busy,Slowest days,Fewest orders,10,New customer,6.367797,High Spender,20.5,Non-Frequent Customer
3,2254736,1,4,4,7,29.0,False,196,1,1,...,Mid-range product,Least busy,Regularly busy,Fewest orders,10,New customer,6.367797,High Spender,20.5,Non-Frequent Customer
4,431534,1,5,4,15,28.0,False,196,1,1,...,Mid-range product,Least busy,Regularly busy,Fewest orders,10,New customer,6.367797,High Spender,20.5,Non-Frequent Customer
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,3226575,360,1,5,12,,True,196,1,0,...,Mid-range product,Regularly busy,Regularly busy,Fewest orders,3,New customer,10.006250,High Spender,4.0,Frequent Customer
96,1469869,377,3,5,17,3.0,False,196,9,0,...,Mid-range product,Regularly busy,Regularly busy,Fewest orders,3,New customer,8.496552,High Spender,16.5,Regular customer
97,1927023,387,2,4,10,22.0,False,196,3,0,...,Mid-range product,Least busy,Regularly busy,Most orders,8,New customer,7.396610,High Spender,8.0,Frequent Customer
98,858092,420,4,1,19,30.0,False,196,2,0,...,Mid-range product,Regularly busy,Regularly busy,Fewest orders,22,Regular customer,7.387805,High Spender,7.0,Frequent Customer


Success!

### 9. Exporting the .pkl file

In [16]:
ords_prods_merge.to_pickle(os.path.join(path, '02 Data','Prepared Data', '4.8_ords_products_merged_derived_grouped.pkl'))