## Instacart Analysis – Identify Key Dept Shoppers
1.	Import libraries, set directory paths & import data
2.	Check data frame dimensions, columns and datatypes
3.	Calculate key_dept score per customer
4.	Assign Key Shopper status based on dept_score
5.	Check value counts, export to clipboard
6.	Export to new data set file.


### import libraries

In [1]:
import pandas as pd
import numpy as np
import os

### set data set directory path

In [2]:
datasetpath = r'D:\My Documents\! Omnicompetent Ltd\Courses\Career Foundry - Data Analytics\Data Analytics Course\Instacart Basket Analysis\02 Data Sets'
datasetpath

'D:\\My Documents\\! Omnicompetent Ltd\\Courses\\Career Foundry - Data Analytics\\Data Analytics Course\\Instacart Basket Analysis\\02 Data Sets'

### import product and customer reviewed dataset

In [3]:
df_testing = pd.read_pickle(os.path.join(datasetpath,'testing_sample_prodcust.pkl'))

### review dimensions, columns & datatypes

In [4]:
df_testing.shape

(9268148, 33)

In [5]:
df_testing.dtypes

order_id                    int64
user_id                     int64
number_of_orders            int64
order_day_of_week           int64
order_hour_of_day           int64
days_since_prior_order    float64
product_id                  int64
reordered                   int64
product_name               object
department_id               int64
price                     float64
gender                     object
state                      object
age                         int64
n_dependants                int64
marital_status             object
income                      int64
region                     object
max_order                   int64
prod_price_range           object
sum_product_order           int64
top_order                 float64
product_revenue           float64
big_revenue               float64
key_dept                  float64
avg_order_days            float64
shop_freq                  object
avg_spend                 float64
spend_level                object
loyalty_flag  

### calculate customer key_dept score
    a) grouby user_id
    b) sum key_dept values

#### The key_dept score represent how often a customer shops in a key department and therefore contrbutes to popular order placment as well as revenue.

In [6]:
df_testing['dept_score'] = df_testing.groupby(['user_id']) ['key_dept'].transform(np.sum)

In [7]:
df_testing[['department_id','key_dept','dept_score']].head(10)

Unnamed: 0,department_id,key_dept,dept_score
0,7,1.0,8.0
1,7,1.0,8.0
2,7,1.0,8.0
3,7,1.0,8.0
4,16,1.0,8.0
5,19,0.0,8.0
6,19,0.0,8.0
7,19,0.0,8.0
8,19,0.0,8.0
9,19,0.0,8.0


### review dept_score descriptive statistic

In [8]:
df_testing['dept_score'].describe()

count    9.268148e+06
mean     9.350593e+01
std      8.172111e+01
min      0.000000e+00
25%      3.300000e+01
50%      6.900000e+01
75%      1.300000e+02
max      7.050000e+02
Name: dept_score, dtype: float64

### assign flag for key shopping customers
    Basic Shopper:  dept score < 94
    Key Shopper:      dept score >= 94
    *where 93.5 is the mean dept_score

In [9]:
df_testing.loc[(df_testing['dept_score'] <94), 'key_shopper'] = 'Basic Shopper'

In [10]:
df_testing.loc[(df_testing['dept_score'] >=94), 'key_shopper'] = 'Key Shopper'

### review value counts for revenue_shopper & copy to clipboard

In [11]:
df_testing['key_shopper'].value_counts(dropna=False)

Basic Shopper    5740536
Key Shopper      3527612
Name: key_shopper, dtype: int64

In [12]:
key_shopper = df_testing['key_shopper'].value_counts()
key_shopper.to_clipboard()

### review columns & dimensions

In [13]:
df_testing.dtypes

order_id                    int64
user_id                     int64
number_of_orders            int64
order_day_of_week           int64
order_hour_of_day           int64
days_since_prior_order    float64
product_id                  int64
reordered                   int64
product_name               object
department_id               int64
price                     float64
gender                     object
state                      object
age                         int64
n_dependants                int64
marital_status             object
income                      int64
region                     object
max_order                   int64
prod_price_range           object
sum_product_order           int64
top_order                 float64
product_revenue           float64
big_revenue               float64
key_dept                  float64
avg_order_days            float64
shop_freq                  object
avg_spend                 float64
spend_level                object
loyalty_flag  

In [14]:
df_testing.shape

(9268148, 35)

### export to pickle for use with creating visualisations

In [15]:
df_testing.to_pickle(os.path.join(datasetpath,'testing_sample_prodcustkey.pkl'))