# Single Customer View

**The Conception of Customer Single View**

- RFM

    - R: Recency (ex. last visit)
    - F: Frequency (ex. total visit) 
    - M: Monetary (ex. total spend)

for more information: https://medium.com/@thanachart.rit/building-customer-single-view-customer-360-3539c971092c

In [1]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', 500)

In [2]:
df = pd.read_csv('supermarket_data.csv')

In [3]:
df.head()

Unnamed: 0,SHOP_DATE,SHOP_HOUR,BASKET_ID,CUST_CODE,STORE_CODE,PROD_CODE,QUANTITY,SPEND
0,20071006,21,994107800268406,CUST0000153531,STORE00001,PRD0901391,1,0.37
1,20070201,15,994104300305853,CUST0000219191,STORE00002,PRD0901915,1,5.08
2,20071103,13,994108200514137,CUST0000526979,STORE00003,PRD0903379,1,2.36
3,20070206,18,994104400743650,CUST0000913709,STORE00004,PRD0903305,1,0.2
4,20071015,19,994108000780959,CUST0000961285,STORE00001,PRD0903387,1,1.65


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 671914 entries, 0 to 671913
Data columns (total 8 columns):
 #   Column      Non-Null Count   Dtype  
---  ------      --------------   -----  
 0   SHOP_DATE   671914 non-null  int64  
 1   SHOP_HOUR   671914 non-null  int64  
 2   BASKET_ID   671914 non-null  int64  
 3   CUST_CODE   671914 non-null  object 
 4   STORE_CODE  671914 non-null  object 
 5   PROD_CODE   671914 non-null  object 
 6   QUANTITY    671914 non-null  int64  
 7   SPEND       671914 non-null  float64
dtypes: float64(1), int64(4), object(3)
memory usage: 41.0+ MB


In [5]:
df.describe()

Unnamed: 0,SHOP_DATE,SHOP_HOUR,BASKET_ID,QUANTITY,SPEND
count,671914.0,671914.0,671914.0,671914.0,671914.0
mean,20073910.0,14.745869,994107800000000.0,1.514344,1.852796
std,4594.496,3.551738,2286042000.0,1.668037,2.589564
min,20070100.0,8.0,994103900000000.0,1.0,0.01
25%,20070520.0,12.0,994105800000000.0,1.0,0.75
50%,20070930.0,15.0,994107700000000.0,1.0,1.21
75%,20080220.0,17.0,994109800000000.0,1.0,2.04
max,20080710.0,21.0,994111700000000.0,73.0,189.63


In [6]:
df['SHOP_DATE'] = pd.to_datetime(df['SHOP_DATE'], format='%Y%m%d')

In [7]:
df.head()

Unnamed: 0,SHOP_DATE,SHOP_HOUR,BASKET_ID,CUST_CODE,STORE_CODE,PROD_CODE,QUANTITY,SPEND
0,2007-10-06,21,994107800268406,CUST0000153531,STORE00001,PRD0901391,1,0.37
1,2007-02-01,15,994104300305853,CUST0000219191,STORE00002,PRD0901915,1,5.08
2,2007-11-03,13,994108200514137,CUST0000526979,STORE00003,PRD0903379,1,2.36
3,2007-02-06,18,994104400743650,CUST0000913709,STORE00004,PRD0903305,1,0.2
4,2007-10-15,19,994108000780959,CUST0000961285,STORE00001,PRD0903387,1,1.65


In [8]:
df2 = df.groupby('CUST_CODE').SHOP_DATE.max()

In [9]:
df2 

CUST_CODE
CUST0000000107   2008-03-25
CUST0000000369   2008-07-05
CUST0000001388   2008-04-11
CUST0000002302   2008-07-03
CUST0000002637   2008-05-30
                    ...    
CUST0000999439   2008-07-05
CUST0000999544   2007-03-23
CUST0000999593   2008-04-02
CUST0000999935   2008-06-05
CUST0000999936   2008-04-17
Name: SHOP_DATE, Length: 4891, dtype: datetime64[ns]

In [10]:
type(df2)

pandas.core.series.Series

In [11]:
df3 = pd.DataFrame(df2).rename(columns={"SHOP_DATE": "LAST_VISIT"})

In [12]:
df3.head()

Unnamed: 0_level_0,LAST_VISIT
CUST_CODE,Unnamed: 1_level_1
CUST0000000107,2008-03-25
CUST0000000369,2008-07-05
CUST0000001388,2008-04-11
CUST0000002302,2008-07-03
CUST0000002637,2008-05-30


In [13]:
df3["TOTAL_VISIT"] = df.groupby('CUST_CODE').BASKET_ID.nunique()

In [14]:
df3.head()

Unnamed: 0_level_0,LAST_VISIT,TOTAL_VISIT
CUST_CODE,Unnamed: 1_level_1,Unnamed: 2_level_1
CUST0000000107,2008-03-25,1
CUST0000000369,2008-07-05,126
CUST0000001388,2008-04-11,4
CUST0000002302,2008-07-03,71
CUST0000002637,2008-05-30,7


In [15]:
df3["TOTAL_SPEND"] = df.groupby('CUST_CODE').SPEND.sum()

In [16]:
df3.head()

Unnamed: 0_level_0,LAST_VISIT,TOTAL_VISIT,TOTAL_SPEND
CUST_CODE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
CUST0000000107,2008-03-25,1,0.8
CUST0000000369,2008-07-05,126,545.17
CUST0000001388,2008-04-11,4,21.9
CUST0000002302,2008-07-03,71,492.84
CUST0000002637,2008-05-30,7,48.5


## Exercise

1. create new column as average monthly visit
      - average monthly visit = Total visit / no. of active month
   
   
2. create new column as average monthly spend
      - average monthly spend = Total spend / no. of active month
      

3. create new column as average basket size
      - average basket size = Total spend / no. of basket
      

4. if you have time available, please create your own columns

In [17]:
df['YEAR'] = df['SHOP_DATE'].dt.year

In [18]:
df.head()

Unnamed: 0,SHOP_DATE,SHOP_HOUR,BASKET_ID,CUST_CODE,STORE_CODE,PROD_CODE,QUANTITY,SPEND,YEAR
0,2007-10-06,21,994107800268406,CUST0000153531,STORE00001,PRD0901391,1,0.37,2007
1,2007-02-01,15,994104300305853,CUST0000219191,STORE00002,PRD0901915,1,5.08,2007
2,2007-11-03,13,994108200514137,CUST0000526979,STORE00003,PRD0903379,1,2.36,2007
3,2007-02-06,18,994104400743650,CUST0000913709,STORE00004,PRD0903305,1,0.2,2007
4,2007-10-15,19,994108000780959,CUST0000961285,STORE00001,PRD0903387,1,1.65,2007


In [19]:
df['MONTH'] = df['SHOP_DATE'].dt.month

In [20]:
df.head()

Unnamed: 0,SHOP_DATE,SHOP_HOUR,BASKET_ID,CUST_CODE,STORE_CODE,PROD_CODE,QUANTITY,SPEND,YEAR,MONTH
0,2007-10-06,21,994107800268406,CUST0000153531,STORE00001,PRD0901391,1,0.37,2007,10
1,2007-02-01,15,994104300305853,CUST0000219191,STORE00002,PRD0901915,1,5.08,2007,2
2,2007-11-03,13,994108200514137,CUST0000526979,STORE00003,PRD0903379,1,2.36,2007,11
3,2007-02-06,18,994104400743650,CUST0000913709,STORE00004,PRD0903305,1,0.2,2007,2
4,2007-10-15,19,994108000780959,CUST0000961285,STORE00001,PRD0903387,1,1.65,2007,10


In [21]:
df['YEAR_MONTH'] = df['SHOP_DATE'].apply(lambda x: x.strftime('%Y-%m')) 

In [22]:
df.head()

Unnamed: 0,SHOP_DATE,SHOP_HOUR,BASKET_ID,CUST_CODE,STORE_CODE,PROD_CODE,QUANTITY,SPEND,YEAR,MONTH,YEAR_MONTH
0,2007-10-06,21,994107800268406,CUST0000153531,STORE00001,PRD0901391,1,0.37,2007,10,2007-10
1,2007-02-01,15,994104300305853,CUST0000219191,STORE00002,PRD0901915,1,5.08,2007,2,2007-02
2,2007-11-03,13,994108200514137,CUST0000526979,STORE00003,PRD0903379,1,2.36,2007,11,2007-11
3,2007-02-06,18,994104400743650,CUST0000913709,STORE00004,PRD0903305,1,0.2,2007,2,2007-02
4,2007-10-15,19,994108000780959,CUST0000961285,STORE00001,PRD0903387,1,1.65,2007,10,2007-10


In [23]:
df3["ACTIVE_MONTH"] = df.groupby('CUST_CODE').YEAR_MONTH.nunique()

In [24]:
df3.head()

Unnamed: 0_level_0,LAST_VISIT,TOTAL_VISIT,TOTAL_SPEND,ACTIVE_MONTH
CUST_CODE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CUST0000000107,2008-03-25,1,0.8,1
CUST0000000369,2008-07-05,126,545.17,19
CUST0000001388,2008-04-11,4,21.9,3
CUST0000002302,2008-07-03,71,492.84,13
CUST0000002637,2008-05-30,7,48.5,6


**1. Average Monthly Visit**

In [25]:
df3['AVG_MONTHLY_VISIT'] = df3.apply(lambda df3: df3['TOTAL_VISIT']/df3['ACTIVE_MONTH'], axis = 1)

In [26]:
df3.head()

Unnamed: 0_level_0,LAST_VISIT,TOTAL_VISIT,TOTAL_SPEND,ACTIVE_MONTH,AVG_MONTHLY_VISIT
CUST_CODE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
CUST0000000107,2008-03-25,1,0.8,1,1.0
CUST0000000369,2008-07-05,126,545.17,19,6.631579
CUST0000001388,2008-04-11,4,21.9,3,1.333333
CUST0000002302,2008-07-03,71,492.84,13,5.461538
CUST0000002637,2008-05-30,7,48.5,6,1.166667


**2. Average Monthly Spend**

In [27]:
df3['AVG_MONTHLY_SPEND'] = df3.apply(lambda df3: df3['TOTAL_SPEND']/df3['ACTIVE_MONTH'], axis = 1)

In [28]:
df3.head()

Unnamed: 0_level_0,LAST_VISIT,TOTAL_VISIT,TOTAL_SPEND,ACTIVE_MONTH,AVG_MONTHLY_VISIT,AVG_MONTHLY_SPEND
CUST_CODE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CUST0000000107,2008-03-25,1,0.8,1,1.0,0.8
CUST0000000369,2008-07-05,126,545.17,19,6.631579,28.693158
CUST0000001388,2008-04-11,4,21.9,3,1.333333,7.3
CUST0000002302,2008-07-03,71,492.84,13,5.461538,37.910769
CUST0000002637,2008-05-30,7,48.5,6,1.166667,8.083333


**3. Average Basket size**

In [29]:
df3['AVG_BASKET_SIZE'] = df3.apply(lambda df3: df3['TOTAL_SPEND']/df3['TOTAL_VISIT'], axis = 1)

In [30]:
df3.head()

Unnamed: 0_level_0,LAST_VISIT,TOTAL_VISIT,TOTAL_SPEND,ACTIVE_MONTH,AVG_MONTHLY_VISIT,AVG_MONTHLY_SPEND,AVG_BASKET_SIZE
CUST_CODE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
CUST0000000107,2008-03-25,1,0.8,1,1.0,0.8,0.8
CUST0000000369,2008-07-05,126,545.17,19,6.631579,28.693158,4.326746
CUST0000001388,2008-04-11,4,21.9,3,1.333333,7.3,5.475
CUST0000002302,2008-07-03,71,492.84,13,5.461538,37.910769,6.941408
CUST0000002637,2008-05-30,7,48.5,6,1.166667,8.083333,6.928571


## More Questions
1.How many members?

2.How much each store sales?

3.Which item was the most popular by quantity?

4.Which member had the highest sum spend?

5.How many members who spend more than 100 in 2007?

6.How much average spend each hour?

7.How many days that has sum spend more than 3000?

8.What date has sum spend more than 4000 and how much sum spend each date?

In [31]:
# 1
df['CUST_CODE'].nunique()

4891

In [32]:
# 2
df.loc[:, ['STORE_CODE', 'SPEND']].groupby('STORE_CODE').sum()

Unnamed: 0_level_0,SPEND
STORE_CODE,Unnamed: 1_level_1
STORE00001,399724.15
STORE00002,217743.62
STORE00003,415341.33
STORE00004,212110.27


In [33]:
# 3
df.loc[:, ['PROD_CODE', 'QUANTITY']].groupby('PROD_CODE').sum().sort_values(by = 'QUANTITY', ascending = False)

Unnamed: 0_level_0,QUANTITY
PROD_CODE,Unnamed: 1_level_1
PRD0903678,92694
PRD0904358,15738
PRD0903052,15587
PRD0900121,12861
PRD0903269,7234
...,...
PRD0904259,1
PRD0900395,1
PRD0904423,1
PRD0903544,1


In [34]:
# 4
df.loc[:, ['CUST_CODE', 'SPEND']].groupby('CUST_CODE').sum().sort_values(by = 'SPEND', ascending = False)

Unnamed: 0_level_0,SPEND
CUST_CODE,Unnamed: 1_level_1
CUST0000123240,10149.73
CUST0000783041,8045.70
CUST0000455778,7164.68
CUST0000942162,5528.45
CUST0000372422,5156.49
...,...
CUST0000902099,0.01
CUST0000478503,0.01
CUST0000179503,0.01
CUST0000309335,0.01


In [35]:
# 5
cust_2007 = df.loc[(df['YEAR'] == 2007), ['CUST_CODE', 'SPEND']].groupby('CUST_CODE').sum()
cust_2007[cust_2007['SPEND'] > 100].shape

(1271, 1)

In [36]:
# 6
df.loc[:, ['SHOP_HOUR', 'SPEND']].groupby('SHOP_HOUR').mean()

Unnamed: 0_level_0,SPEND
SHOP_HOUR,Unnamed: 1_level_1
8,1.859958
9,1.821021
10,1.801998
11,1.80777
12,1.81827
13,1.83176
14,1.851879
15,1.839337
16,1.860795
17,1.856006


In [37]:
# 7
shop_date_df = df.loc[:, ['SHOP_DATE', 'SPEND']].groupby('SHOP_DATE').sum()
shop_date_df[shop_date_df['SPEND'] > 3000]

Unnamed: 0_level_0,SPEND
SHOP_DATE,Unnamed: 1_level_1
2007-03-13,3161.41
2007-06-15,3157.59
2007-07-10,3023.3
2007-12-17,3226.08
2007-12-19,3402.47
2007-12-20,4833.46
2007-12-21,3380.47
2007-12-22,3000.13
2007-12-23,3271.74


In [38]:
# 8
shop_date_df = df.loc[:, ['SHOP_DATE', 'SPEND']].groupby('SHOP_DATE').sum()
shop_date_df[shop_date_df['SPEND'] > 4000]

Unnamed: 0_level_0,SPEND
SHOP_DATE,Unnamed: 1_level_1
2007-12-20,4833.46
